Extreme Logging?

zelmak

So, spelunking around the system, I found a large log file for a small, but important program we run.

% head PROGRAM.log
Tue May 3 2005 09:43:52 >>> Server listening for connection...
Tue May 3 2005 09:43:52 >>> Closing socket and disconnecting any clients.
Tue May 3 2005 09:43:52 >>> Closing socket and disconnecting any clients.
Tue May 3 2005 09:45:00 >>> Server listening for connection...
.
.
.
% ls -l PROGRAM.log

-rwxrwxrwx   1 root     group 35125862919 Jan  9 13:19 PROGRAM.log

A 33GiB log file that was created over 6-1/2 years ago ...

useful ...

% time compress PROGRAM.log 1349.0u 65.0s 24:55 94% 0+0k 0+0io 0pf+0w

% ls -l PROGRAM.log.Z
-rwxrwxrwx 1 root group 7080664469 Jan 9 13:19 PROGRAM.log.Z

33GiB to 6.6GiB? Man, text sure compresses well.

C_Octothorpe

Try rm... Now THAT'S compression.

Cassidy

.. or store it in /dev/null. I find you tend to reclaim a lot of disk space that way.

@Cassidy said:

.. or store it in /dev/null. I find you tend to reclaim a lot of disk space that way.

This should also make the file appear less "old".

Sutherlands

If it's just those lines repeated over and over again, I find it odd that it only compresses down to 20% of its original size.

Also, who in the world uses GiB...

Quietust

@Sutherlands said:

If it's just those lines repeated over and over again, I find it odd that it only compresses down to 20% of its original size.

It's probably because he used the ancient "compress" tool - had he used GZIP, it probably would've compressed a whole lot more (and taken a lot longer to do so).

boog

@Cassidy said:

.. or store it in /dev/null. I find you tend to reclaim a lot of disk space that way.

That's where I store emails from problematic users. I figure I can always fetch their emails from there later if they say anything important.

C_Octothorpe

@boog said:

@Cassidy said:
.. or store it in /dev/null. I find you tend to reclaim a lot of disk space that way.

That's where I store emails from problematic users. I figure I can always fetch their emails from there later if they say anything important.

What do you do when they ask for the originals back?

Cassidy

Quickly restore a backup of /dev/null and hope they never notice it's actually a duplicate and not the original email.

boog

@C-Octothorpe said:

@boog said:
@Cassidy said:
.. or store it in /dev/null.

That's where I store emails from problematic users.
What do you do when they ask for the originals back?

You know, it's never come up.

I wonder if its because the emails asking for the originals get stored in /dev/null too.

I wonder...

zelmak

@Sutherlands said:

If it's just those lines repeated over and over again, I find it odd that it only compresses down to 20% of its original size.

Also, who in the world uses GiB...

There are a few more status lines with unique filenames and such that wouldn't pass my anonymization filter (my brain) so I opted to omit them. Also, the unique filenames and such would be cause for lesser compression ratios.

I really enjoy the hard-to-merge-and-order timestamp. Why is YYYY-MM-DD HH:MM:SS so frickin' hard for everyone?

As far as GiB, there are those pedantic dickweeds who would point out that a file that is 35,000,000,000 bytes long is 35GB, not 33GB (since GiB is divisible by 1024 and GB is divisible by 1000 thanks to the SI standards.) So I guess I'm damned if I do, and damned if I don't...

Sutherlands

@zelmak said:

As far as GiB, there are those pedantic dickweeds who would point out that a file that is 35,000,000,000 bytes long is 35GiB, not 33GiB (since GB is divisible by 1024 and GiB is divisible by 1000 thanks to the SI standards.) So I guess I'm damned if I do, and damned if I don't...

FTFY

superjer

@Sutherlands said:

@zelmak said:
As far as GiB, there are those pedantic dickweeds who would point out that a file that is 35,000,000,000 bytes long is 35GiB, not 33GiB (since GB is divisible by 1024 and GiB is divisible by 1000 thanks to the SI standards.) So I guess I'm damned if I do, and damned if I don't...

FTFY

Unfortunately, your FTFY is wrong. You underestimate the insanity of SI. Apparently making the prefixes officially consistent across unrelated units was more important than EVER KNOWING WHAT ANYONE MEANS EVER AGAIN.

heterodox

@boog said:

You know, it's never come up.

It's never come up in my workplace either, but that may be because I put the users in /dev/null too.

TheCPUWizard

@heterodox said:

@boog said:
You know, it's never come up.

It's never come up in my workplace either, but that may be because I put the users in /dev/null too.

There are some days I wish I could do THAT!

@Sutherlands said:

Also, who in the world uses GiB...

Not Nautilus, the file manager. It confused the hell out of me the first time I noticed the "MB"s Nautilus was reporting were different from the "MB"s I had just uploaded somewhere.

Bulb

@zelmak said:

As far as GiB, there are those pedantic dickweeds who would point out that a file that is 35,000,000,000 bytes long is 35GB, not 33GB (since GiB is divisible by 1024 and GB is divisible by 1000 thanks to the SI standards.) So I guess I'm damned if I do, and damned if I don't...

@superjer said:

@Sutherlands said:
@zelmak said:
As far as GiB, there are those pedantic dickweeds who would point out that a file that is 35,000,000,000 bytes long is 35~~GiB~~, not 33~~GiB~~ (since GB is divisible by 1024 and ~~GiB~~ is divisible by 1000 thanks to the SI standards.) So I guess I'm damned if I do, and damned if I don't...

FTFY
Unfortunately, your FTFY is wrong. You underestimate the insanity of SI. Apparently making the prefixes officially consistent across unrelated units was more important than EVER KNOWING WHAT ANYONE MEANS EVER AGAIN.

The FTFY is wrong because it's plain wrong and the original version was right. 1 GB is 10⁹ B and 1 GiB is 2³⁰ B is 1 073 741 824 B.

So now the GiB is not ambiguous (always 1024-based) and GB is ambiguous (some use it was 1024-based, some as 1000-based), but it was ambiguous before the new IEC (it does not actually seem to be SI) standard, because disk and other device manufacturers started to use it with the decimal meaning long before.

The binary use of GB was introduced for memory chip sizes, because those almost always come in power-of-two sizes; it was never standardized beyond that. Nowadays decimal meaning is used even for SD cards and flash disks. The chips are actually power-of-two-sized there too, but some is needed for wear levelling and avoiding bad blocks, so the decimal prefixes are used to avoid calling them 1.8 GB and 7.4 GB and such.

In fact harddisks started to be labelled like that for the very same reason; modern disks also have some spare blocks and internally relocate any bad blocks that develop. That and not higher quality is a reason you don't need to run low-level disk scan every now and than these days.

dhromed

I will have replaced the HD long before the difference between GiB and GB becomes relevant.

The_Assimilator

TRWTF is that you bothered to compress the entire logfile. Why not strip out everything from before, say, 3 years ago and then compress it? I can't imagine that keeping over half a decade's worth of connect... disconnect messages is in any way necessary.

Cassidy

That. Logrotate is your friend.

Sutherlands

@Bulb said:

@zelmak said:
As far as GiB, there are those pedantic dickweeds who would point out that a file that is 35,000,000,000 bytes long is 35GB, not 33GB (since GiB is divisible by 1024 and GB is divisible by 1000 thanks to the SI standards.) So I guess I'm damned if I do, and damned if I don't...
@superjer said:
@Sutherlands said:
@zelmak said:
As far as GiB, there are those pedantic dickweeds who would point out that a file that is 35,000,000,000 bytes long is 35~~GiB~~, not 33~~GiB~~ (since GB is divisible by 1024 and ~~GiB~~ is divisible by 1000 thanks to the SI standards.) So I guess I'm damned if I do, and damned if I don't...

FTFY
Unfortunately, your FTFY is wrong. You underestimate the insanity of SI. Apparently making the prefixes officially consistent across unrelated units was more important than EVER KNOWING WHAT ANYONE MEANS EVER AGAIN.

The FTFY is wrong because it's plain wrong and the original version was right. 1 GB is 10⁹ B and 1 GiB is 2³⁰ B is 1 073 741 824 B.

Crap, even after I stared at Wikipedia...