Mundane Web app WTF



  • I'm porting a horrifying custom CMS to a common platform.  Today's entertainment was preparing data migration.  I know we see this all the time around here, but here are just a few of today's fun hair-tearing moments, courtesy of my new favorite incompetent CMS developer:

     

    • separate tables for two different types of users, both queried on every page load
    • permissions stored as a comma-separated list of paths in a TEXT field in each user table
    • records with relations to user table don't use integer primary key as foreign key, but instead user's full name. 
    • Several user records have been renamed or removed without updating related records, leaving nearly half the records in question referencing nonexistant users
    • table storing leads with redundant columns: type (always "new"), status ("new" or "sent") and category (which turned out to be an integer incremented every time a lead is sent)
    • dates stored in mm/dd/yyyy format, with times as a separate column, in 12-hour format
    • EXCEPT for one table that uses ISO format so that the dev could actually sort results
    • (dates are always displayed exactly as formatted in the DB)
    • because results can't be easily filtered by date (without convolutions that the original dev clearly didn't realize were possible) a cron job runs every night to mark yesterday's records "inactive" - and results are filtered on this column
    • every SELECT is a SELECT *
    • despite how very convenient PHP makes associative arrays, every database result is integer-indexed, making it impossible to track down in which files and under what circumstances a given database column is used.

    I'm sure there's more in the remaining 90% of the project.  In the meantime, please wish me patience.


  •  Perl to the rescue! ;-)



  • I feel like killing you, just to put you out of your misery... Sorry :(



  • @sprained said:

    I'm porting a horrifying custom CMS to a common platform.  Today's entertainment was preparing data migration.  I know we see this all the time around here, but here are just a few of today's fun hair-tearing moments, courtesy of my new favorite incompetent CMS developer:

     

    • separate tables for two different types of users, both queried on every page load
    • permissions stored as a comma-separated list of paths in a TEXT field in each user table
    • records with relations to user table don't use integer primary key as foreign key, but instead user's full name. 
    • Several user records have been renamed or removed without updating related records, leaving nearly half the records in question referencing nonexistant users
    • table storing leads with redundant columns: type (always "new"), status ("new" or "sent") and category (which turned out to be an integer incremented every time a lead is sent)
    • dates stored in mm/dd/yyyy format, with times as a separate column, in 12-hour format
    • EXCEPT for one table that uses ISO format so that the dev could actually sort results
    • (dates are always displayed exactly as formatted in the DB)
    • because results can't be easily filtered by date (without convolutions that the original dev clearly didn't realize were possible) a cron job runs every night to mark yesterday's records "inactive" - and results are filtered on this column
    • every SELECT is a SELECT *
    • despite how very convenient PHP makes associative arrays, every database result is integer-indexed, making it impossible to track down in which files and under what circumstances a given database column is used.


    I'm sure there's more in the remaining 90% of the project.  In the meantime, please wish me patience.

     

     Sprained this your incompetent co-worker. Please stop posting information about our advanced CMS. You know this CMS shall overtake all horizontal markets because of its superior:

    1) Algorithmic deficiency

    2) Obsfucated source code

    3) Non-relational relational database tables

    4) Broken test suite 

     

     

     



  • @sprained said:

    • separate tables for two different types of users, both queried on every page load

    Depending upon how much variation there is between the users, this might not be a WTF.

    That having been said, I expect that the difference is something which could have been handled with a simple boolean or bi-state flag - a flag which might even possibly exist in one or both tables.  In that case, it's a big WTF.

    @sprained said:

    • permissions stored as a comma-separated list of paths in a TEXT field in each user table

    For each file or directory to which the user has access? OMG

    @sprained said:

    • records with relations to user table don't use integer primary key as foreign key, but instead user's full name.

    Depending on the database, that may have a surprisingly low performance impact.  Using usernames might be more meaningful to administrators of the system.  Assuming that usernames aren't 50 character random strings or something crazy like that, it's possible this isn't a WTF.

    Of course, if usernames aren't mnemonic or otherwise meaningful, then this may be similar to a number of systems I've complained about.

    @sprained said:

    • Several user records have been renamed or removed without updating related records, leaving nearly half the records in question referencing nonexistant users

    I've seen this happen quite often on unix systems.  The old grizzled veteran who set up the box and owned virtually every file, due to an innate misunderstanding about computer security, *finally* left the company.  His user account is deleted from /etc/passwd with glee by the new senior administrator.  /etc/shadow remains as it was.  All of the rest of the system's file ownership remains the same - meaning that a long listing of /bin1 shows that all the files are owned by user 501.  In the specific instance I'm thinking of, one could tell roughly when certain applications were originally installed by the owner being '501', '502', '503', or 'craig'.  No, I never stayed around long enough to fix that one - I merely explained the issue briefly and let Craig do as he willed.

    @sprained said:

    • EXCEPT for one table that uses ISO format so that the dev could actually sort results

    I still find it humorous that there are always exceptions.

    @sprained said:

    • (dates are always displayed exactly as formatted in the DB)

    Of course, this is except for the one place where it isn't - although you may not have found that one yet.

    @sprained said:

    • because results can't be easily filtered by date (without convolutions that the original dev clearly didn't realize were possible) a cron job runs every night to mark yesterday's records "inactive" - and results are filtered on this column

    Since the date/times are all stored as multiple text fields, instead of DATETIME fields or the local database equivalent, this is almost a given.  (Yes, it's possible to filter by date even when your date field is a text string, but I've only seen that done one time - and that was via a 'SELECT * FROM table3;' and then parsing the result, individually sending each3 string into Date::Parse2.)

    @sprained said:

    • every SELECT is a SELECT *

    For most of the databases I've seen, using 'SELECT *' with a reasonable WHERE clause is better than selecting just the columns you need from every element on the table.  Of course, if you have more than a score of columns, and/or one of the columns is a huge blob4, 'SELECT *' could easily be a disaster.

    1 Directory chosen at random for example purposes.  Choosing most other system directories would have a similar result.

    3 And, by 'each', I mean 'each'.  For a single DATETIME, it invoked Date::Parse::strptime twice.

    2 Of course it's Perl.  Since I gravitate towards Perl, most of the WTFs I encounter with enough force to know all the details are written in it.

    4 For example, a text blob which lists every file to which the user has access, as well as indicating what kind of access the user has.



  • @tgape said:

    @sprained said:
    • records with relations to user table don't use integer primary key as foreign key, but instead user's full name.

    Depending on the database, that may have a surprisingly low performance impact.  Using usernames might be more meaningful to administrators of the system.  Assuming that usernames aren't 50 character random strings or something crazy like that, it's possible this isn't a WTF.

    User's full name. What do you do when Jane Doe in Accounting gets married?


  • @tgape said:

    @sprained said:

    • separate tables for two different types of users, both queried on every page load

    Depending upon how much variation there is between the users, this might not be a WTF.

    One type of user has a full name and email address, the other doesn't.  That's it.


    @sprained said:

    • permissions stored as a comma-separated list of paths in a TEXT field in each user table

    For each file or directory to which the user has access? OMG

    Yuppers.  Of course, permissions are simply on/off per-script, not read/write or anything fancy like that.

    @sprained said:
    • records with relations to user table don't use integer primary key as foreign key, but instead user's full name.

    Depending on the database, that may have a surprisingly low performance impact.  Using usernames might be more meaningful to administrators of the system.  Assuming that usernames aren't 50 character random strings or something crazy like that, it's possible this isn't a WTF.

    Of course, if usernames aren't mnemonic or otherwise meaningful, then this may be similar to a number of systems I've complained about.

    Full name as in "First Last".  This despite the integer primary key already existing on the users table. 

    @sprained said:
    • (dates are always displayed exactly as formatted in the DB)

    Of course, this is except for the one place where it isn't - although you may not have found that one yet.

    *groan* 

    @sprained said:
    • because results can't be easily filtered by date (without convolutions that the original dev clearly didn't realize were possible) a cron job runs every night to mark yesterday's records "inactive" - and results are filtered on this column

    Since the date/times are all stored as multiple text fields, instead of DATETIME fields or the local database equivalent, this is almost a given.  (Yes, it's possible to filter by date even when your date field is a text string, but I've only seen that done one time - and that was via a 'SELECT * FROM table3;' and then parsing the result, individually sending each3 string into Date::Parse2.)

    Actually, MySQL has a utility to reformat dates, and you can sort on the calculated value.  But given that the original dev never realized there was such a thing as PHP's strtotime() or date() functions, one could hardly expect him to have discovered STR_TO_DATE(), could one?



  • @Carnildo said:

    @tgape said:
    @sprained said:

    • records with relations to user table don't use integer primary key as foreign key, but instead user's full name.

    Depending on the database, that may have a surprisingly low performance impact.  Using usernames might be more meaningful to administrators of the system.  Assuming that usernames aren't 50 character random strings or something crazy like that, it's possible this isn't a WTF.

    User's full name. What do you do when Jane Doe in Accounting gets married?
     

    Cry?

    ... never mind what happens when the company gets big enough to hire a second John Smith. 



  • @sprained said:

    • records with relations to user table don't use integer prima\ry key as foreign key, but instead user's full name. 

     

    This is not a WTF because of lack of integers, but because no full name is unique. There can be plenty of John Does around.

    I personally like to use something else but integers as primary keys. I think it shows lack of database desing, if you need to run integers on every table just to identify rows. That way you end up with views like this:

    User | SomeOtherData
    ------------------------------
       1               3
       2               8
       8               2

    What does that tell you? Nothing.



  • @Carnildo said:

    @tgape said:
    @sprained said:

    • records with relations to user table don't use integer primary key as foreign key, but instead user's full name.

    Depending on the database, that may have a surprisingly low performance impact.  Using usernames might be more meaningful to administrators of the system.  Assuming that usernames aren't 50 character random strings or something crazy like that, it's possible this isn't a WTF.

    User's full name. What do you do when Jane Doe in Accounting gets married?
     

    Then you obviously change the primary key for the current user and your well designed database updates the primary key to related tables.



  • @Airhead said:

    @sprained said:

    • records with relations to user table don't use integer prima\ry key as foreign key, but instead user's full name. 

     

    This is not a WTF because of lack of integers, but because no full name is unique. There can be plenty of John Does around.

    I personally like to use something else but integers as primary keys. I think it shows lack of database desing, if you need to run integers on every table just to identify rows. That way you end up with views like this:

    User | SomeOtherData
    ------------------------------
       1               3
       2               8
       8               2

    What does that tell you? Nothing.

    I think natural primary keys oftentimes show a lack of database design.  You end up with all sorts of problems, including performance penalties, the problem of duplicates and the difficulty of updating keys when part of the primary key of a record changes.  It might be sensible in theory, but in practice natural primary keys are lame.



  •  Natural primary keys may have their place, but if you're going to use an undeclared natural primary key in all relations, why add an unused integer as the declared primary key?



  • @morbiuswilters said:

    @Airhead said:

    @sprained said:

    • records with relations to user table don't use integer prima\ry key as foreign key, but instead user's full name. 

     

    This is not a WTF because of lack of integers, but because no full name is unique. There can be plenty of John Does around.

    I personally like to use something else but integers as primary keys. I think it shows lack of database desing, if you need to run integers on every table just to identify rows. That way you end up with views like this:

    User | SomeOtherData
    ------------------------------
       1               3
       2               8
       8               2

    What does that tell you? Nothing.

    I think natural primary keys oftentimes show a lack of database design.  You end up with all sorts of problems, including performance penalties, the problem of duplicates and the difficulty of updating keys when part of the primary key of a record changes.  It might be sensible in theory, but in practice natural primary keys are lame.

     

    I fail to see the problem with duplicates - you just need to set the "do not allow duplicates" bit on. And the updating of the primary key is done automatically by "on change".

    I've been using natural primary keys quite some time with MSSQL and I have no problem with them. I agree something like full name is a bad choise for primary key, but email, for example, cannot be duplicate and if two ppl choose to use one email address, that is their loss. But obviously, I do not use natural primary keys for stuff like orders because there is no guaranteed-to-be-unique information.



  • @Airhead said:

    I've been using natural primary keys quite some time ... email, for example, cannot be duplicate
    >_<

    I don't like e-mail as a primary key for two reasons that I encounter frustratingly often:

    "Hi, I'm no longer using Customer2245@MyISPMail.tld for my e-mail address, I'm now using EmperorOfDoom@twelvebaud.net. Can I update my account to no longer refer to my former e-mail address?" "NOPE, SORRY!"

    "Hi, I had an account with Paradise Island under Customer2245@MyISPMail.tld, and an account with Tropical City under EmperorOfDoom@twelvebaud.net. Now that you're Paradise City, I've got two accounts, both with major time and financial investment. Can these accounts be merged?" "NOPE, SORRY!"

     "... but what if I had a Paradise Island account under EmperorOfDoom@twelvebaud.net?" "Then we'd have merged them." "Can't you do that now?" "NOPE, SORRY!"



  • @Airhead said:

    I've been using natural primary keys quite some time with MSSQL and I have no problem with them. I agree something like full name is a bad choise for primary key, but email, for example, cannot be duplicate and if two ppl choose to use one email address, that is their loss. But obviously, I do not use natural primary keys for stuff like orders because there is no guaranteed-to-be-unique information.

    The way I see it, a good primary key is both 1) unique, and 2) invariant. An email address is indeed unique, but it isn't invariant: people change their email addresses all the time.



  • Unless it's a company or organization issued address that can't be changed.


  • Discourse touched me in a no-no place

    @DescentJS said:

    Unless it's a company or organization issued address that can't be changed.
    Does said company prevent their female employees from getting married?



  •  @DescentJS said:

    Unless it's a company or organization issued address that can't be changed.

    ...until the company merges/is taken over/rebrands.



  • @Carnildo said:

    The way I see it, a good primary key is both 1) unique, and 2) invariant. An email address is indeed unique, but it isn't invariant: people change their email addresses all the time.
     

    Primary keys are changeable. We do update primary keys when ever needed and that hasn't yet caused problems. Ofcourse that has to be noted in all database designs and it causes a possible point for failure - meaning if database administrator does not take care about ON UPDATE.


Log in to reply