MyWTF
-
The WTF Database
As many of you know, I inherited a site for my running club. It's an event-based site, so fairly straightforward. It's written in Perl, with a "CSV" back-end. The "database" is an events.csv file that has vertical-pipe delimited file with the data in it. The site works fairly well, and doesn't have a lot of bugs/security risks, but it's probably not the best way that it could be done.
Then, last week, one of my friends points out that even though the history of events only goes back through 2006, the separate domain that hosts the pictures for events goes back to 2001. Intrigued, I decided to dig into this, as I had credentials for both sites. I find that the events are stored on the photo site in a MySQL database that seems to scrape other sites.
Here's the psuedo-schema for the table that holds the events:
EventID INT EventName VARCHAR(50) EventDate DATETIME EventDay VARCHAR(50) EventTime VARCHAR(50) Organizers LONGTEXT hotlink VARCHAR(150) TB VARCHAR(50) Description LONGTEXT UniqueID VARCHAR(50) Current TINYINT(4) Source VARCHAR(50) Title VARCHAR(50) private TINYINT(4) Password_Enabled TINYINT(1) Password_Clue VARCHAR(60) Password VARCHAR(50)
EventID
is set up correctly as a non-nullable auto-incrementing identity column.EventName
...exactly what you'd expect.EventDate
, this is the portal to fun. It holds the date. Notice how it's aDATETIME
. It only holds the date.EventDay
is the day of the week. Talk about redundant information. You can figure this out on the fly based on theEventDate
column, but hey, let's store this in the database anyways.EventTime
is a text field. The source on the original site has a "timepicker" (three drop downs with hour, minute, and meridian), but since this database has data from multiple sources, it accepts anything. I'll explain why this is interesting in abitlater post.Organizers
is the field for the people that set up the run.hotlink
is the URL, if any.
TB
is the Thomas Brothers location for the event. Yes, that's a thing, look it up. It's actually fairly controversial since it was only used by one of the local running groups, and everyone else wanted it removed. I changed the site so that it was still there, but instead of always showing the label, it would only appear if entered by the run organizers. A lot of people thanked me for "removing" it, so I assume that I did my job well.Description
is everything that doesn't belong elsewhere. The problem is that most of the data here is the location, organizers, etc...from years past. It didn't used to be as organized as it is now, so most things where just dumped here.UniqueID
is whatever "identity" column was used on the source site.Current
, I'm guess, is supposed to be abit
field, and is alwaysNULL
.Source
is a the name of the place where the data came from, and is usually populated. Usually.Title
is........just alwaysNULL
.private
is another field that should be abit
, but is always justNULL
.Password_Enabled
,Password_Clue
, andPassword
, I'd love to see the code that uses these fields, but sadly, none exists. They are always....you guessed it,NULL
.
-
Came here to see why EventTime was interesting, was disappointed...
-
-
I never said that I would in the OP.
Anyways,
EventTime
gets the time field from multiple sources. So it's well formatted coming from the main running site, but from a bunch of places, you'll get things like "2pm", which doesn't work well for Python'sstrptime()
method when it expects the minutes as well. Not only that, but we get "2", "10:69", "3ish", and "whenever you feel like it."So I try the best that I can to clean up the data, but there are some other interesting values, such as "\n2:00pm\n." I initially tried to remove the newline characters before realizing that it was a literal backslash and n couple of characters.
Then came some literal values, what I assume were mostly of the format 10 o'clock. They're now "10 o\\'clock."
So yeah, not a huge WTF, but a PITA to deal with.
-
EventTime is a text field. The source on the original site has a "timepicker" (three drop downs with hour, minute, and meridian), but since this database has data from multiple sources, it accepts anything. I'll explain why this is interesting in a bit.
But... but you didn't!
*edge of seat*
-
They're now "10 o\'clock."
I saw some site once that oversanitized like that. Every place you'd see a single quote it would have 4-8 backslashes in front of it.
Also Discourse ate one of them in the quote and I didn't feel up to putting it back, but here's an extra: \.
-
Notice how it's a DATETIME. It only holds the date.
I once supported a DB that recorded the datetime using 2 datetime columns. The one that held the date had the time set to 00:00:00.000 and the one that held the time had the date set to 01/01/1753.
Filed Under: [Confused? You will be...][1]
-
Have a sympathy like.
-
I saw some site once that oversanitized like that. Every place you'd see a single quote it would have 4-8 backslashes in front of it.
I swear there was a front page article about that. Can't find it just now...
-
I once supported a DB that recorded the datetime using 2 datetime columns. The one that held the date had the time set to 00:00:00.000 and the one that held the time had the date set to 01/01/1753.
<HR>
Filed Under: Confused? You will be...What a
Gregorianegregious mistake.