Here's a RTFM WTF courtesy of me.
In May last year I started work on a daily maintenance system for a large community driven website. To remain consistent with the rest of the code base, the maintenance system was written in php and invoked daily using cron and a local loopback address provided by our web host.
The maintenance system performed a variety of operations, most of them involving data 'cleaning' and statistics recounting. One particular operation required the system to physically delete data which had been marked as deleted from the MySQL database after a three month data retention period had expired.
The system was coded, tested briefly and was live a week later. It ran daily, and as far as we knew, it was performing it's operations without fault or failure.
Now follow me to seven months later to one New Year's morning, where I woke to the sound of my phone ringing and a pounding headache from the night before. The call was from one of the other website administrators:
Him: We've been hacked!
Me: Er.. What happened?
Him: Someone has hacked in and deleted a whole bunch of stuff!
Me: Wait.. What's missing?
Him: I went in to restore an article we deleted yesterday. They're all gone.
I sat there for a few minutes wondering aloud - previous security issues has caused a major system update, so I was fairly sure we hadn't been hacked. Plus, what would a hacker gain from deleting some items which were effectively deleted anyway? It soon dawned on me that the daily maintenance system was almost certainly at fault, just mishandling dates. I could imagine the flawed logic immediately: December 31st 2007 is more than a year older than January 1st 2008, clearly outside of the three month retention period.
I dug up the old maintenance code and found the MySQL query involved:
DELETE FROM portal_items WHERE deleted = 1 AND deletedate < (NOW() - 00000300000000)
I was confused for a moment before I remembered the method I had once used before learning of the INTERVAL keyword. The idea was to subtract three months from the current date by using the MySQL recognised YYYYMMDDHHMMSS syntax.
It looks like MySQL had been converting the values and comparing integers, apparently unimpressed with my choice of syntax. The mistake seemed obvious to me now, but because we developed and deployed the daily maintenance system after the third month, it appeared to work as normal.
I made a quick fix to the code, before heading back to bed.
DELETE FROM portal_items WHERE deleted = 1 AND deletedate < (NOW() - INTERVAL 3 MONTH)