MyWTF



  • The WTF Database

    As many of you know, I inherited a site for my running club. It's an event-based site, so fairly straightforward. It's written in Perl, with a "CSV" back-end. The "database" is an events.csv file that has vertical-pipe delimited file with the data in it. The site works fairly well, and doesn't have a lot of bugs/security risks, but it's probably not the best way that it could be done.

    Then, last week, one of my friends points out that even though the history of events only goes back through 2006, the separate domain that hosts the pictures for events goes back to 2001. Intrigued, I decided to dig into this, as I had credentials for both sites. I find that the events are stored on the photo site in a MySQL database that seems to scrape other sites.

    Here's the psuedo-schema for the table that holds the events:

    EventID	INT
    EventName	VARCHAR(50)
    EventDate	DATETIME
    EventDay	VARCHAR(50)
    EventTime	VARCHAR(50)
    Organizers	LONGTEXT
    hotlink	VARCHAR(150)
    TB	VARCHAR(50)
    Description	LONGTEXT
    UniqueID	VARCHAR(50)
    Current	TINYINT(4)
    Source	VARCHAR(50)
    Title	VARCHAR(50)
    private	TINYINT(4)
    Password_Enabled	TINYINT(1)
    Password_Clue	VARCHAR(60)
    Password	VARCHAR(50)
    

    EventID is set up correctly as a non-nullable auto-incrementing identity column.

    EventName...exactly what you'd expect.

    EventDate, this is the portal to fun. It holds the date. Notice how it's a DATETIME. It only holds the date.

    EventDay is the day of the week. Talk about redundant information. You can figure this out on the fly based on the EventDate column, but hey, let's store this in the database anyways.

    EventTime is a text field. The source on the original site has a "timepicker" (three drop downs with hour, minute, and meridian), but since this database has data from multiple sources, it accepts anything. I'll explain why this is interesting in a bit later post.

    Organizers is the field for the people that set up the run. hotlink is the URL, if any.
    TB is the Thomas Brothers location for the event. Yes, that's a thing, look it up. It's actually fairly controversial since it was only used by one of the local running groups, and everyone else wanted it removed. I changed the site so that it was still there, but instead of always showing the label, it would only appear if entered by the run organizers. A lot of people thanked me for "removing" it, so I assume that I did my job well.

    Description is everything that doesn't belong elsewhere. The problem is that most of the data here is the location, organizers, etc...from years past. It didn't used to be as organized as it is now, so most things where just dumped here.

    UniqueID is whatever "identity" column was used on the source site.

    Current, I'm guess, is supposed to be a bit field, and is always NULL.

    Source is a the name of the place where the data came from, and is usually populated. Usually.

    Title is........just always NULL.

    private is another field that should be a bit, but is always just NULL.

    Password_Enabled, Password_Clue, and Password, I'd love to see the code that uses these fields, but sadly, none exists. They are always....you guessed it, NULL.



  • Came here to see why EventTime was interesting, was disappointed...



  • @Ashley_Sheridan said:

    Came here to see why EventTime was interesting, was disappointed...

    +1



  • I never said that I would in the OP. 😜


    Anyways, EventTime gets the time field from multiple sources. So it's well formatted coming from the main running site, but from a bunch of places, you'll get things like "2pm", which doesn't work well for Python's strptime() method when it expects the minutes as well. Not only that, but we get "2", "10:69", "3ish", and "whenever you feel like it."

    So I try the best that I can to clean up the data, but there are some other interesting values, such as "\n2:00pm\n." I initially tried to remove the newline characters before realizing that it was a literal backslash and n couple of characters.

    Then came some literal values, what I assume were mostly of the format 10 o'clock. They're now "10 o\\'clock."

    So yeah, not a huge WTF, but a PITA to deal with.



  • @chubertdev said:

    EventTime is a text field. The source on the original site has a "timepicker" (three drop downs with hour, minute, and meridian), but since this database has data from multiple sources, it accepts anything. I'll explain why this is interesting in a bit.

    But... but you didn't!

    *edge of seat*


  • Discourse touched me in a no-no place

    @chubertdev said:

    They're now "10 o\'clock."

    I saw some site once that oversanitized like that. Every place you'd see a single quote it would have 4-8 backslashes in front of it.

    Also Discourse ate one of them in the quote and I didn't feel up to putting it back, but here's an extra: \.



  • @chubertdev said:

    Notice how it's a DATETIME. It only holds the date.

    I once supported a DB that recorded the datetime using 2 datetime columns. The one that held the date had the time set to 00:00:00.000 and the one that held the time had the date set to 01/01/1753.


    Filed Under: [Confused? You will be...][1]


  • Have a sympathy like.


  • ♿ (Parody)

    @FrostCat said:

    I saw some site once that oversanitized like that. Every place you'd see a single quote it would have 4-8 backslashes in front of it.

    I swear there was a front page article about that. Can't find it just now...



  • @RTapeLoadingError said:

    I once supported a DB that recorded the datetime using 2 datetime columns. The one that held the date had the time set to 00:00:00.000 and the one that held the time had the date set to 01/01/1753.

    <HR>
    Filed Under: Confused? You will be...

    What a Gregorian egregious mistake.


Log in to reply