Big Fat File

Lingerance

When I partitioned my 4 200+GB HDDs I only made one partition big enough to hold this 117GB tar.bz2 file I made containing alot of data that I now want to retrieve. All other partitions hover around 50GB and are mostly full, I could probably spend 3 days sorting through it all deleting redundant data and sorting it. I'd rather not, now I managed to get three 50GB partitions cleared, and mounted them in a way that would make the tar unzip fairly evenly across them. Fast forward 16 hours I use the computer and manage to crash it while it is doing this, so I have most of the data but not all. I was wondering if any one knows a way to get the data out without having to go through the file, since it's compressed I can just jump somewhere in to extract, so it takes a long time. I was thinking of just deleting some of the bigger files at the front, but I am curious to as how much data corruption I can expect the machine with the most ram and swap has 1GB and 2GB respectively. If worse comes to worse I'll just setup my stagard mounts and unzip it in full again (hints on a black-hole mount point would be nice, like /dev/null but can be written to as a directory).

Note: The data happens to be on an external drive, I have no USB 2.0 ports.

asuffield

bzip2 operates in blocks of about a megabyte, so it's possible to extract a hunk of the file (using dd), run bzip2recover over it, unpack all the blocks, cat them back together and untar the result. You will need to know roughly where your target data lies in the image, and it will be about four times slower than unpacking the original directly.

The chances of this being useful are quite slim. I'd just unpack the whole thing again.

no_name

Assuming your tar supports it:

     -k
     --keep-old-files        Keep files which already exist on disk; don't
                             overwrite them from the archive.
     -K file
     --starting-file file    Begin at file in the archive.

asuffield

@no name said:

Assuming your tar supports it:

     -k
     --keep-old-files        Keep files which already exist on disk; don't
                             overwrite them from the archive.
     -K file
     --starting-file file    Begin at file in the archive.

Naturally, it still has to decompress the whole thing.

no_name

@asuffield said:

@no name said:

Assuming your tar supports it:

     -k
     --keep-old-files        Keep files which already exist on disk; don't
                             overwrite them from the archive.
     -K file
     --starting-file file    Begin at file in the archive.

Naturally, it still has to decompress the whole thing.

True, but at least it wont have to write out the files that have already been untar'd.

ammoQ

@asuffield said:

Naturally, it still has to decompress the whole thing.

Can't bunzip2 pipe to tar, so it never needs the disk space for the uncompressed .tar file?

stratos

Get multiple gmail accounts, use that gmailFS thingy to mount them. Extract the archive to that mount point.

profit?

Lingerance

Actually I managed to free up enough space on a 200 drive to make one partition to hold it all. As for the k option there was a bug in that when used with the v option it started outputting garbage and couldn't extract certain files.

asuffield

@ammoQ said:

@asuffield said:
Naturally, it still has to decompress the whole thing.
Can't bunzip2 pipe to tar, so it never needs the disk space for the uncompressed .tar file?

Yes, and it normally does, but the disk was never the bottleneck.

asuffield

@Lingerance said:

Actually I managed to free up enough space on a 200 drive to make one partition to hold it all. As for the k option there was a bug in that when used with the v option it started outputting garbage and couldn't extract certain files.

Ah, unix.