Break reports

Here are listed all breaks in HIIT's IT services.

Break in shell, www and file service (fs) at 2009-11-16 04:45-07:00

Description: 

Schedule:

2009-11-16 04:45 - 07:00

Duration:

2:15 h

Affected services:

HIIT's general purpose server, www and file service.

Reason:

File server frodo (fs) will be replaced with a new one.

During the last syncronisation shell will be rebooted to remove the possibility of data corruption by users. Kernel upgrade will also be installed.

There will be short breaks in www-services (www.hiit.fi) as well.

Update: The break was over at 6:58.

Break in wiki at 2009-11-04 22:00-22:05

Description: 

Schedule:

2009-11-04 22:00 - 22:05

Duration:

5 min

Affected services:

HIIT Wiki

Reason:

Wiki software (Confluence) will be restarted to adjust session timeout. Timeout value will be extented from 1 hour to 4 hours.

UPDATE: Actual schedule was 22:01 - 22:02.

Break in universe at 2009-11-01 06:40-08:16

Description: 

Schedule:

2009-11-01 06:40 - 08:16

Duration:

1:36 h

Affected services:

Adaptive's demo server universe.hiit.fi.

Reason:

Universe ran out of memory and halted at 6:40. It was rebooted at 7:49 and is currently running recovering journals from partitions. As soon as it comes up, updates will be installed and if needed a reboot is performed to finish the updates.

Update 8:19: Universe was up at 8:13. At that time updates (including new kernel) were installed. Universe was rebooted at 8:13 and was up and running again at 8:16.

Break in lose.it.hiit.fi at 2009-10-16 13:12-13:18

Description: 

Schedule:

2009-10-16 13:12 - 13:18

Duration:

6 min

Affected services:

Radius authentication (e.g. eduroam), LDAP- authentication to Wiki.

Reason:

Unscheduled shutdown due a mistake during HPC update.

Break in HIIT-HPC at 2009-10-16 10:00-17:00

Description: 

Schedule:

2009-10-16 10:00 - 17:00

Duration:

7:00 h

Affected services:

HIIT-HPC (cluster nodes and file service hpc-fs)

Reason:

Kernel upgrade. Firmware upgrades.

The following servers will be rebooted: master.hpc.hiit.fi, clnNNN.hpc.hiit.fi. Additionally firmware upgrades will be performed to cluster nodes (clnNNN).

Break in each server is approximately 15 minutes.

Update: Break extented because firmware update takes longer than expected.

Pages