kumpula1.jpg

Break in file service fs at 2011-09-26 18:00-19:22

Schedule: 
2011-09-26 18:00 to 19:22
Duration: 
1 h 22 min
Affected services: 
File services (fs), cluster file services (hpc-fs), web services, entry server shell.
Description: 

Operating system of file server frodo.it.hiit.fi will be upgraded. Pending firmware upgrades will also be installed.

The following web-sites will also be affected due to their dependency on file service:

  • www.futureinternet.fi
  • betelgeuse.hiit.fi
  • cgi.hiit.fi
  • cosco.hiit.fi
  • packages.hiit.fi
  • pgm2010.hiit.fi
  • www.mdl-research.org

There will be few short (< 15 min) breaks in file service.

Update at 19:24: The break was over at 19:22. There was one longer break (approx. 30 minutes) in Windows file sharing due to behaviour change of ldap ssl directive.

Break in meriadoc.it.hiit.fi at 2011-01-14 07:00 - 07:15

Schedule: 
2011-01-14 07:00 to 07:15
Duration: 
15 min
Affected services: 
General purpose Windows server rwin.hiit.fi.
Description: 

Monthly security updates wil be installed and system will be rebooted.

Break in meriadoc.it.hiit.fi at 2010-01-07 16:50 - 2010-01-08 13:03

Description: 

Schedule:

2010-01-07 16:50 - 2010-01-08 13:03

Duration:

20:13 h

Affected services:

Generic remote windows server

Reason:

Rebooted due to unresponsivenes. After installing updates system became unstable.

Update: Reason was old version of virus shield which is now updated.

AICA: Adaptive Interfaces for Consumer Applications

AICA is a  Tekes project in the Ubicom programme for 1 November 2009 to 31 March 2012. AICA is a continuation to the PUPS project. The PUPS project was originally outlined as a four-year project; building on the results so far, we will in AICA focus on topics that have turned out to be the most promising and we will add some new aspects that have emerged and are worth further study.

Break in storage area network (SAN) at 2009-09-07 13:55-18:30

Description: 

Schedule:

2009-09-07 13:55 - 18:30

Duration:

4:35

Affected services:

At least VCS, Wiki, WWW, Windows AD. Possibly others.

Reason:

Problems with SAN.

Update: Problems were caused by faulty SFP (fibre adapter).

It caused one blade-enclosure's one uplink-port to fluctuate up and down thus messing up the SAN-fabric using that port. And, of course, availability of paths to LUNs and other resources accessed via that fabric fluctuated too. Because the port wasn't down all the time, determining the cause of problem wasn't clear, paths sometimes worked and sometimes didn't. Because of this, an incorrect desicion of rebooting one of the SAN-swithches, the one in the functioning fabric, at 14:18:35 caused some of the hosts to temporarely lose all paths to LUNs in SAN.

Linux is especially cranky when it comes to losing it's disks, even temporarely. It determines the disk to be read-only quite soon, even the disk would come back in a few seconds. To gain read-write access, a reboot was required.

The faulty SFP started to be down longer periods of time later and thus was discovered and replaced with a working one. Breaks to services were between 9 and 44 minutes. Kernel updates were installed to servers that needed them during the breaks as well.

Pages