05 Mar 2009 09:21
There are weeks that nothing exciting (or fatal) happens. But there are days that a lot happens that make you think if this is really a coincidence. Yesterday was one of such days, and it was not even Friday 13th. So here is what happened:
- I stayed at home, Lukasz was at the office. All of sudden network went down. Later we learned that half of our city (that get internet from TP S.A.) had problems.
- Later Piotr called that our office server is down and cannot boot up. It was one of the disks in RAID 10 array that failed and for some reason GRUB could not boot. It booted later after Piotr did some magic, now we just need to replace one drive asap.
- At 15.30 local time I got an alert email that Wikidot.com is down. Immediately i tried to log-in to the server - nothing. Ping - yes. Alive. But all services went down.
- After a few minutes we knew we must act. Piotr started re-assigning IP addresses of the web server to a backup server. Failed. Looks like the router could not handle this in real-time this time.
- Main server restart - nothing helps. We had a similar issue some time ago, we started the rescue mode (server boots from a rescue linux image, this is greatly automated by SoftLayer). Server is up. A year ago what prevented the system from booting was a forced fsck on one of the drives and this required a key pressed or so (as told by the SoftLayer support team). So we started disk checks. And this took almost an hour! S#*t!
- Meanwhile my friend called me as his car broke just 20 meters from our parking lot and he could not move it, so I went to help him.
- Server got up, everything was back to normal. Situation under control.
I am not afraid of fatal Fridays any more. I fear of Wednesdays.
rating: 1, tags: