Break in universe.hiit.fi at 2012-11-02 18:32 - 18:46

Schedule: 
2012-11-02 18:32 to 18:46
Duration: 
14 min
Affected services: 
Services running in Adaptive group's test server universe.hiit.fi
Description: 

Universe's Apache process and possibly kernel's multipath were twisted themselves. To resolve this, the server was rebooted. All pending updates were installed as well.

Update at 18:48: Universe was up and running at 18:46. The problem was that multipathd failed to fail one path during a service on disk array system at 2012-10-23T13:45 even though it had failed the devices behind it:

mithlond-lun-14 (3600601603c0027009a1ae2bc8a12e011) dm-2 ,
size=2.0T features='0' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| |- #:#:#:# -   #:#   active faulty running
| `- #:#:#:# -   #:#   active faulty running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 1:0:4:0 sda 8:0   active ready running
  `- 0:0:6:0 sdc 8:32  active ready running

This caused disk IO to fail and thus Apache generated some load and one zombie process:

top - 18:26:31 up 150 days,  6:23, 10 users,  load average: 93.99, 93.97, 93.64

 


Last updated on 5 Nov 2012 by Pekka Tonteri - Page created on 2 Nov 2012 by Pekka Tonteri