Exemplary software design



  • I just revisited a bit of code from the horribly broken "presentation tool" my predecessor blessed us with (which actually generates customer-specific sales offers; my company is big on bizarre use of language). I'm really impressed by his approach to generating a list of "presentations"

    In the database the main ingredients are two InnoDB tables, neither of which have any indexes defined except for the primary keys, which are only there because AUTO_INCREMENT demands it. One of them (`presentations`) is merely a list of presentations with some reasonable attributes and some redundant-looking but actually vital data like the customer's phone number. The other (`form_cache`) is a huge, messy entity-attribute-value beast holding the actual data. The entity is identified by a VARCHAR field containing an SHA1 hash of the customer's phone number. Of course.

    Most data in this table is stored as base64-encoded serialized PHP objects except for some, which are base64-encoded plain data. And since the data format wasn't yet entirely finished when it went into production we have tons of code that deals with presentation data objects where certain members are missing. Also, the class that interfaces with the table made no attempt to prevent or handle duplicate entries for several months so the table is littered with detritus. Plus no error handling so every time one of the numerous bugs was hit some customer's data got a little bit more inconsistent.

    The PHP objects I mentioned are chock-full of useless data that my predecessor's "UI toolkit" requires but usually doesn't actually use. All member variables must be stdClass objects with about ten rendering options set, even the ones that are never displayed anywhere. If the options aren't set or have an unexpected value the application may or may not die when attempting to render the object.

    As if that design wasn't already brillant enough, this is how the list of presentations is retrieved:

    1. Fetch all presentations that haven't been marked as deleted or finished. For each presentation do:
      1. SELECT * FROM `form_cache` WHERE id = SHA1(<phone number>) AND object_name = "OrderCompleted"
      2. Note down whether the presentation has been partially finished by decoding each entry if present. All entries have the value "1".
    2. Fetch all presentations that belong to the current user (the tool only has one user) and haven't been marked as deleted. For each presentation do:
      1. Fetch all presentations that have the ID of this presentation and haven't been marked as deleted.
      2. Actually note down the presentation.
    3. For each presentation noted down during step 2 do:
      1. SELECT * FROM `form_cache` WHERE id = "SHA1(<phone number from presentation>)" AND object_name = "PresentationObject"
      2. Decode and deserialize each object and extract the relevant data.
      3. SELECT * FROM `form_cache` WHERE uid = <an integer obtained from the PresentationObject>
      4. Decode and discard the data we just obtained (the IP address of the person who created the presentation).

    The final result? 6 × <number of presentations> database queries, each with their own DB connection, none of which are closed until the script ends (which I later fixed because the script kept dying of "too many simultaneous connections"). With just 300 presentations that makes 1800 queries, half of which deal with `form_cache`. Combined with the badly-written UI generator this means that displaying the list of presentations takes well over a minute, during which the site gives no indication that it's actually doing anything. And no, steps 2.1 and 3.3/3.4 serve no purpose whatsoever.

    Since the `form_cache` is fragile even without me touching anything in there I decided that the best optimization strategy was to add a search field to the tool and start work on a replacement. My boss agreed.

    Of course a year later later he's citing the tool as an example of good development practice – it doesn't work particularly well but the "developer" cranked it out in just three months. Which is somehow better than writing software that doesn't require a few hundred man-hours just in emergency fixes after having been declared a legacy system shortly after the launch.

    (Oh, and another nice little tidbit: Once a person has ordered something there is nothing to link their data set back to the presentation that spawned it. Several tables do have a "hash" value that looks like a `form_cache` ID but those fields are actually generated using SELECT SHA1(NOW()) and never touched again.)



  • Why must everything suck? :(



  • @this_code_sucks said:

    Why must everything suck? :(

    It's actually all your fault. If you'd only stop referring to it (i.e., making it "this code") it wouldn't necessarily have to suck anymore.


  • Considered Harmful

    For some reason I can't get past the completely useless SHA1. Why, just why?



  • @joe.edwards said:

    For some reason I can't get past the completely useless SHA1. Why, just why?

    Phone numbers are top-secret information, maaaaaaaan.


Log in to reply