The Format



  • This is from the administrative interface for a book review application (that's about 10 years old and was created by an inexperienced college-level programmer at that time).

    format for names (see compile.pl)
    # AUTHOR/REVIEWER FORMAT:
    # 'V' marks delimeter
    # V V V V
    # first_name!middle_name$last_name*title+salutation
    # title : Jr., III, etc...
    # salutation : Dr. Mr. Ms. etc...
    # multiple authors separated by @
    # % at the end signifies these are editors, not authors
    # only first_name, last_name required, others are optional
    # unless more than one author, then @ is required between them
    #############################################################################

    (Yes, in a plain text <pre> block at the top of the page.)

    The sad thing is that this isn't the only place this lovely format is used. Fortunately, these days, most of this sort of icky code live in files that no one looks at that much, and are put into a database with some handy Perl script. (Remind me to send the guy a book on data normalization, too. Also, I've managed to condense a lovely 5-field name to just 2. We didn't have a single name with a 'salutation', so that helped...)

    The real WTF? Perl's DBI and/or DBD::mysql (depending on your preferred side of the blame game) don't really support UTF-8 connections to a MySQL database. The postgresql driver apparently manages it, though. Whatever, three cheers for ugly workarounds.



  • I've seen worse. I suppose it gets around your delimiter being in your strings, but then again, now you have 6 characters you can't use (!$*+%@)

    What he called 'Title' I call 'Suffix' and while it's nice to store Salutation, if he never used it, what's the point?

    Merging middle into first and suffix into last can seem like a good idea, but if you're doing any sort of exports to other systems/vendors you might end up in a bit of a pickle. YMMV.

    He could have stuck to plain CSV. The troubles with double-quotes and commas probably wouldn't have bothered you in this format.



  • @Benanov said:

    I've seen worse. I suppose it gets around your delimiter being in your strings, but then again, now you have 6 characters you can't use (!$*+%@)

    We don't have much of a problem with those characters being in peoples' names, fortunately. 

    @Benanov said:

    Merging middle into first and suffix into last can seem like a good idea, but if you're doing any sort of exports to other systems/vendors you might end up in a bit of a pickle. YMMV.

    Naah. The worst that's going to happen to this data is a 'sort by author' kind of deal. (Vendor? What's a vendor? :D) EDIT: Oh, wait, a vendor is like those people who sold us a calendar application, and when we went back years later to get a license so it would keep running, had been bought out and told us that they didn't sell that anymore, but we could buy something else from them that's similar for several grand... no, it won't import your old data, though...

    @Benanov said:

    He could have stuck to plain CSV. The troubles with double-quotes and commas probably wouldn't have bothered you in this format.

    He could have also put in multiple text fields for each part of the name on the web form where these things are uploaded. Well, but then what would he do for multiple authors? We didn't really have shiny JavaScript to give us a new row of input boxes, at least, not that we can rely on. We could, a) do something involving multiple page loads or b) put in, say, 4 boxes for each and hope that's enough (actually, a SQL query says we have up to 2 reviewers and 6 author/editors...)

     



  • I'm not sure I've ever seen the reporter of a WTF defend it before.



  • @merreborn said:

    I'm not sure I've ever seen the reporter of a WTF defend it before.

    That's why it's only a sidebar WTF submission. :)
     



  • The author was clearly familiar with IRC.

    (There was a time when I maintained an IRC server. This sort of creative delimiting is all over the place. The idea is that you can omit any element, and you can use glob matching against it. These days I think it's a pretty dumb approach to the problem - way up there with XML) 



  • @fennec said:

    @Benanov said:

    I've seen worse. I suppose it gets around your delimiter being in your strings, but then again, now you have 6 characters you can't use (!$+%@)

    We don't have much of a problem with those characters being in peoples' names, fortunately. 


    Wait - isn't (!$+%@) ASCII for the Artist Now Known as Prince?



  • Translate to XML:

      MODE #channel +b *|away!*bork@*.aol.com

    what would that be? Maybe something like

      <mode><ban><match key="nickname"><any />|away</match><match key="username"><any />bork</match><match key="hostname"><hostname><any /><item name="aol" /><item name="com" /></hostname></match></ban></mode>

    and how should users enter it? Maybe

      /BAN nick=*|away username=*bork host=*.aol.com

    Sorry, but I prefer the less verbous

      /BAN *|away!*bork@*.aol.com

    I can't say XML would be a good protocol change for IRC - it would noticably increase bandwidth requirements by IRC servers. Actually, IRC maybe should rather be some binary protocol, with a byte for the command and a 4-byte ID for the channel. Yet still XML would be a better choice than strings created by java.io.Serializable...

    The WTF in this thread however stays unaffected from that...



  • @assuffield said:

    The author was clearly familiar with IRC.

    (There was a time when I maintained an IRC server. This sort of creative delimiting is all over the place. The idea is that you can omit any element, and you can use glob matching against it. These days I think it's a pretty dumb approach to the problem - way up there with XML)

    Actually, one of the old authors of a lot of code on the system used the (old) server in the (old) university hosting environment to run an ircd for some no-name IRC network. When I had learned enough about the system to discover it, I was more than a little surprised. Ultimately we got a new server hosted at a different university, and just never moved things...

    								</div>
    								</div><p>@OpBaI said:<blockquote>I can't say XML would be a good protocol change for IRC - it would noticably increase bandwidth requirements by IRC servers. Actually, IRC maybe should rather be some binary protocol, with a byte for the command and a 4-byte ID for the channel. Yet still XML would be a better choice than strings created by java.io.Serializable...</blockquote>Meh. If you're going to use XML to send messages, we have a little protocol for that, called XMPP... most people see it go by the name "Jabber". But please, ick, you can keep the binary formats to yourself. Really, it does do a whole lot of good to be able to do things in plain-text, and you can come up with all sorts of neat little bots and such. There's dozens of IRC bot frameworks out there. And tons of legacy clients. No one is about to abandon those for some no-name binary format just so you can shave 14 bytes off a message like PRIVMSG #whatever :hi there, guys, what's happening? -- it's just not worth it.</p><p>Now, if you could talk some sort of light gzip wrapping on this plain-text (or XML) format, then you might be in business.<br></p><p>&nbsp;</p><p>And the latest in WTF: I'm still working on what is to become the updated version of the app... I'm stuck looking for a good way to input an arbitrary number of authors... :P<br></p>


  • @fennec said:

    If you're going to use XML to send messages, we have a little protocol for that, called XMPP... most people see it go by the name "Jabber".

    <message type='chat' from='newfweiler' to='reply.re:theFormat.SideBarWTF.forums.theDailyWTF'>

      <thread>01</thread>

      <body>lol</body>

    </message>

     



  • @fennec said:

    [snip]

    And the latest in WTF: I'm still working on what is to become the updated version of the app... I'm stuck looking for a good way to input an arbitrary number of authors... :P

    Unfortunately you don't have many options here.  You can either add rows using JavaScript (as you previously mentioned)--which has become much more reliable as the DOM has become standardized--or you can have an Add button and reload the page with another one.  You can optimize that by having a listbox to select the number, but those are generally rather dirty.

    I don't claim to have extensive experience in web development, but these are the only methods I have seen so far and don't see any other alternatives, given the limitations of the technologies you are using (i.e. if you can't do it client-side, you have to use server-side).
     



  • @OpBaI said:

    Translate to XML:

      MODE #channel +b |away!bork@.aol.com

    what would that be? Maybe something like

     
    <mode><ban><match key="nickname"><any
    />|away</match><match key="username"><any
    />bork</match><match
    key="hostname"><hostname><any /><item name="aol"
    /><item name="com"
    /></hostname></match></ban></mode>

    and how should users enter it? Maybe

      /BAN nick=|away username=bork host=.aol.com

    Sorry, but I prefer the less verbous

      /BAN *|away!bork@.aol.com


    I can't say XML would be a good protocol change for IRC - it would
    noticably increase bandwidth requirements by IRC servers. Actually, IRC
    maybe should rather be some binary protocol, with a byte for the
    command and a 4-byte ID for the channel. Yet still XML would be a
    better choice than strings created by java.io.Serializable...

    The WTF in this thread however stays unaffected from that...

    Who
    said XML would be a better fit for IRC? He said IRC's method was about
    as dumb as XML, not about as dumb as doing it in XML.

    IRC isn't that bad though, no worse than most typeable network protocols. It could be much worse by not using spaces. (Which could have greatly simplified the initial wtf, as well.)


Log in to reply