Random overwrite



  • Hi! This is a bit of an anecdote I'd like to tell. I decided to write it as a warning to other people that might end up in the same situation.

    I got a new laptop about two months ago, a Fujitu-Siemens C1410. Quite nice computer, althought mostly windows friendly. Anyway, I decided to install Ubuntu 7.04 on it, leaving most settings on default. After installing all my development tools, doing a bit of work and hibernating, I noticed a slight problem. Apparently the default installer decided to create a mere 512Mb swap partition for my 2GB of ram, which obviously didn't really work well with hibernation. No problem, I thought, and fired up GParted, moved and resized the swap partition and rebooted.

     Now, for some reason, the swap wasn't working. I could still do a swapon, but no swap was turned on automatically.

     I checked fstab, and noticed that drive references where changed from the traditional ones (/dev/hda or sda), to UUID ones. Since my swap was moved, I the calculated UUID apparently changed (don't know what it's based on, but you can see it by running vol_id). No problem, I thought, changed the UUID to the new one and rebooted. Swap working, everything looks great. Keep using my computer for a couple of weeks and feel happy (new Ubuntu worked great for me BTW).

    Now I decide to hibernate again. Problem is that it's not working at all. It looks like the hibernation image is written to disk, but the system does a normal boot when restarted. Also, the swap is not working again. After doing a bit of googling, it turns out that the image got written  to the swap (corrupting it), but it doesn't get read back again. Now, i need to mkswap the swap partition again, change fstab and reboot just to get the swap going. But the hibernation still wouldn't work. I wasn't quite sure where to look, after all I'm not very into linux kernels, so I had no idea why the resume was failing, but I guessed it had something to do with me moving the swap. I googled on the issue, and after quite a few tries and mkswaps, I gave up trying to fix hibernation, it just wouldn't cooperate. 

    I leave that problem in the back of my mind, and a couple of days later i remember previously fiddling with menu.lst, and I go there to add a (previously missing) kernel resume parameter. I do a update-grub and reboot. That was a fatal mistake.

     Computer reboots, GRUB shows up, loads the kernel and starts the boot process.... All of a sudden strange write errors occur, filenames that should have nothing to do with the boot process are listed, what seems like endless pages of scrolling text passes by. I'm starting to get a bit panicky now, but I decide to wait and not do a hard reboot. It tries to do a fsck, but fsck crashes with a non-zero exit code. This does not look good. Only seconds later the system dumps me to a single user prompt, encouraging me to do a manual fsck, since the automated one crashed. Since I was panicky, I obliged, and fsck started to report a LOT of badly referenced inodes, orphaned childs and other major screw-ups. I needed to do several passes to correct all the errors. By this stage, I start loosing hope of having a working computer when its over. When fsck reports done, I start thinking about what could have caused the problem. It only takes me a second, but it makes me feel so dumb! It's obvious! The loaded an old hibernate image! The system must have had some sort of cache or in-memory map of the filesystem, causing a major f**k-up when the  actual file-system did not match. Ooops.

     When I realize that the boot process, the fsck and probably any subsequend command I issue will further mess up the fs, I decide to do a hard reboot. The problem is that not even GRUB will boot. No XP, no Ubuntu, no nothing. great.

    After sweating for a while (knowing that I have maybe 3 or 4 of days worth of work on the computer), I get to my senses and boot from a livecd, figure out that the grub setting files where are garbled. I reinstalled grub and injected it into MBR again.

    Now, after a reboot, making sure that the resume kernel parameter was unset, the system miraculously  seemed all fine! In fact, it has been working since (3 days). It garbled some files that had been changed between hibernate and resume, but I was lucky this time.

     So, bottom line is, be careful with linux hibernate, since the kernel seem not to notice that the fs is outdated, and can do random overwrites! Now that I've wasted 5 mins of your life, it makes me feel better :D

     PS. Sorry for changing tempus all the time, noticed that afterwards. Will do better next time :D DS.


     


Log in to reply