I've been there and felt all the same panic. Well not a multi-TB server, but I know the feeling of panic when you think you may have lost lots and lots of hard work. (The first time was in the 7th grade when my science fair project had only one copy of an Apple II 140K floppy that got a corrupted directory. My father and I spent 3 hours with a sector editor to rebuild the directory so we could access the files.)
That being said, I can't believe they had such a large server without an automated backup system in place.
I'm also surprised that they were striping across three RAIDs. Is it really that critical to keep everything on one logical volume? Would it really be that horrible to let the three RAIDs act as three separate file systems?
@accalia said:
RAID DOES NOT EQUAL A BACKUP!
It's hard to believe how many people don't understand this concept.
RAID will protect against individual drive failure, but that's about it. There's no protection against (as we saw here) RAID controller or motherboard failure. There's no protection against multiple-drive failures (some RAID topologies will protect against more than one drive at a time, but there's always a limit.) There's also no protection against malware, a software glitch/bug, a malicious user, a stupid user, or just a dumb mistake trashing your data when all the hardware is working perfectly.
It also doesn't protect against a disaster where fire, flood, tornado, earthquake, whatever trashes the entire building and physically destroys your server.
All of these are covered by a proper backup strategy (that includes off-site storage.) If Linus had a recent backup, they could've just replaced the failed motherboard and RAID card, reformat the array and restore from the backup, losing, maybe, a day's worth of work. As opposed to this scenario where they had to enlist data recovery specialists and were really lucky they didn't end up losing everything.
@anonymous234 said:
For example, instead of giving your computer direct access to your hard drives, you connect them to a different computer running a specifically designed trusted OS that offers them as a network service but does not allow any deletions, except with a 7 day waiting period or something.
Stuff like this can lower the odds of catastrophic data loss, but it can't eliminate it. That trusted server could still fail catastrophically - its RAID controller or motherboard could die. It could get infected by malware (don't ever assume something is invulnerable). And something could delete/overwrite infrequently-accessed files such that the corruption is not discovered until after your 7 day waiting period.
Ultimately, there is no substitute for making backups, and keeping some of them off-line when they're not actively in use.