Just move that file link.....

mike_james

Recently I managed to screw up my file server.

Scenario - 2TB Debian linux fileserver using LVM and a pair of 1TB drives. eSATA 2TB backup drive. About 75% full of media (recorded TV etc)

I had two directory trees. one /home/public and the other /home/media. Inside /home/public there was a symbolic link to the /home/media directory, so users accessing /home/public via SAMBA could see the media files.

The server started to misbehave. Smart Disk utilities reported nothing wrong with the main drives. It looked like access rights problems (prompted actually by a DNS foulup where you used to require the local host name attached to 127.0.0.1 in the /etc/hosts file and then a later version of the DNS server I was using gave out conflicting name resolutions because the local host name was attached to 127.0.0.1)

So I deleted the symbolic link and moved the /home/media file tree to /home/public/media. But I forgot my incremental backup now had a complete copy of the old media file tree in yesterdays backup , and now a whole new file tree in a new place in todays backup. At this point it totally filled the backup drive and backups stopped.

But I didnt notice because the server kept on going....

And then one of the main server drives started to slow down. Then the machine wouldnt boot. Then the Smart tools told me the hard drive had died.

And the last completed backup was two weeks earlier......

flabdablet

Okay. So what you need now is Mr Sweary's Swearing and Data Recovery Lesson.

Get yourself a new 1TB drive and plug it in. Unplug all other drives except the busted one. Boot the Trinity Rescue Kit. Do fdisk -l to identify which of the two available drives has a partition table on it (this will be your busted one; I'll assume /dev/sda for the rest of this) and which one is blank (the new one, assumed /dev/sdb here). If your busted original is an Advanced Format (4K physical sectors) drive, do ddrescue --direct --block-size=4096 --cluster-size=64 --force /dev/sda /dev/sdb sda.log or if not use ddrescue --direct --cluster-size=512 --force /dev/sda /dev/sdb sda.log and go practise your swearing for a few hours while it copies.

Once it's done, do halt to shut everything down, then unplug your busted drive, plug in all the other originals, and boot TRK again. You'd need to be pretty unlucky for pvscan and fsck not to be able to put your world back together at least well enough to get you bootable. Unless, of course, you accidentally put the replacement drive's device name first in the ddrescue command line, in which case you've just overwritten what remained on your busted original with factory-fresh zeroes from the replacement, and you will need to kick the swearing up another few notches.

galgorah

Give 50 hail spagetti monsters and sacrifice a goat. Then your transgressions shall be forgiven.

russ0519

@mikedjames said:

So I deleted the symbolic link and moved the /home/media file tree to /home/public/media. But I forgot my incremental backup now had a complete copy of the old media file tree in yesterdays backup , and now a whole new file tree in a new place in todays backup. At this point it totally filled the backup drive and backups stopped.

Sounds like you need to find a backup solution that does deduplication. Something like BackupPC can do file level dedupe, but I'm sure that are better tools out there that can do block level dedupe.

morbiuswilters

RAID5, bro.

flabby's recovery advice sounds reasonable. Another reasonable bit of advice: have a cron that checks df and alerts you (email, console, whatevs) when a disk surpasses 90%; depending on your disk activity you can probably run it nightly or hourly, but even every 5 mins would be fine (df is pretty low-impact). (PROTIP: install Nagios. But that's probably too much trouble.)

What type of incremental backup are you using? Just rsync or something more sophisticated?

morbiuswilters

@russ0519 said:

@mikedjames said:
So I deleted the symbolic link and moved the /home/media file tree to /home/public/media. But I forgot my incremental backup now had a complete copy of the old media file tree in yesterdays backup , and now a whole new file tree in a new place in todays backup. At this point it totally filled the backup drive and backups stopped.

Sounds like you need to find a backup solution that does deduplication. Something like BackupPC can do file level dedupe, but I'm sure that are better tools out there that can do block level dedupe.

Block-level deduplication would require a filesystem that supported that. I can't think of any stable, free ones for Linux that do. (What would be best would be a filesystem where snapshots actually work (no LVM) and where you could export a particular snapshot to an external volume. ZFS supports this, but the only way to run it on Linux is FUSE which is absolute shit.

Sir_Twist

CrashPlan

Cassidy

@morbiuswilters said:

Another reasonable bit of advice: have a cron that checks df and alerts you (email, console, whatevs) when a disk surpasses 90%; depending on your disk activity you can probably run it nightly or hourly, but even every 5 mins would be fine (df is pretty low-impact).

Alternatives:

install & configure cacti. It's already got a template for checking filesystems - you can gauge consumption from historical stats. Apparently it has some event triggering inbuilt, but I've never used it
use quotas, because they can email you once you reach your soft limit (95% or so).
use logwatch. They can give you disk stats per-day.

Just ideas, yah?

morbiuswilters

@Cassidy said:

install & configure cacti. It's already got a template for checking filesystems - you can gauge consumption from historical stats. Apparently it has some event triggering inbuilt, but I've never used it

I thought cacti was just graphing and what-not, not alerts. And there are other alternatives to Nagios, but Nagios is probably the easiest to find documentation for.

@Cassidy said:

use quotas, because they can email you once you reach your soft limit (95% or so).

Of course, you've got to make sure email is always working. You could do that with Nagios, but then you'd want to be sure Nagios was running properly so you'd need another Nagios on a separate machine to monitor the first Nagios. Yeah, this shit gets pretty complicated real quick.

@Cassidy said:

use logwatch. They can give you disk stats per-day.

I never really used logwatch much, but I thought it was for monitoring and summarizing logs. I'm not sure how that helps with disk usage.

Cassidy

@morbiuswilters said:

@Cassidy said:
use logwatch. They can give you disk stats per-day.

I never really used logwatch much, but I thought it was for monitoring and summarizing logs. I'm not sure how that helps with disk usage.

mm.. s/disk stats/usage statistics/ - wrong terminology.

Mine does a "df -h" at the bottom. And content from some logfiles also point to the cause of a disk filling up.

TL;DR: it emails you the output of "df" but only at 24-hr periods.

Anyway, they were just other suggestions to append on your list for mikedjames' benefit.

blakeyrat

hh.. wys/wyahhh /stats hahsyw/ TL:DE "df-z" ht stats mm

nexekho1

Bees

blakeyrat

@nexekho said:

Bees

My God.

@blakeyrat said:

@nexekho said:
Bees

My God.

Fact: Elephants are afraid of bees.

If you see bees near your house, but no elephants, that's why.

da_Doctah

@El_Heffe said:

Fact: Elephants are afraid of bees.
If you see bees near your house, but no elephants, that's why.

If I see bees and elephants, what does that mean?

nexekho1

LSD.