Database index? Oh that's all worked out!



  • So the CS intern said he had finished up the project he had been assigned, which was a basically a simple PHP/Mysql address book with a lot of custom "business" extensions. Within five minutes of being asked to set it up on our intranet for testing, I knew there was a problem.

    I opened up the database, and sure enough there were more problems than I could count at a glance. I selected the most glaring one, and called the guy.

    "There are some problems with your project. To begin with, the primary index on the primary table is the company name AND the company address AND the city, and that is the table for both the company info and the services they buy. You've just decided that a company can only buy a single service from us. You should have used one table for the services, and one table for the customers, and the index of the customers should be an int, so that other tables could reference simply that int."

    "Well, sure, I know, but that was much too difficult. When a company buys a second service, all you need to do is to change the company name a little bit, you put the name of the service in parentheses after the company name . . ."

    "That's not acceptable! When a client's telephone number changes, would you have to go through all his services to change the number on every single one? Look, a client will move. Maybe there's a spelling error in the address, whatever, the address will change one day, and that means you've changed the primary index of your data, what do you do then?"

    "Oh that's all worked out! When you change one of the fields that compose the index, my PHP goes through all the other tables too and changes it everywhere!"

    Needless to say, there were bugs related to that, not to mention all the other bugs (did someone mention concurrent editing? Apostrophes?).

    That a CS intern can't code isn't a WTF. That he got so far before someone looked at what he was doing is just maybe a WTF. THE WTF is that three years later I'm still using that stopgap database because the elaborate replacement developed in-house over a year by experienced webdesigners is an AJAX-enabled monstruosity that has more bugs and is less usable than the incredibly incompetent CS intern's three-month effort.



  • Welcome to the software development industry.  Not all of us take three months to bungle a simple address book application or a year to write a replacement that ends up being a bigger mess.

     

    Some of us write software that helps fly airplanes and manages complex medical equipment.



  • @morbiuswilters said:

    Some of us write software that helps fly airplanes and manages complex medical equipment.
     

    And we are on this site reading about how the other half live...



  • @Zemm said:

    @morbiuswilters said:

    Some of us write software that helps fly airplanes and manages complex medical equipment.
     

    And we are on this site reading about how the other half live...

    I help fill people's mobile phones with spam-like "mobile operator marketing communication", but I do it right.

     

    What do you do now? 



  •  Good first post. Welcome to the forums.



  • This should be a lesson to all those that only want to work off natural keys.

    Natural keys are good for human searching, but for linking you need auto-generated, system controlled garunteed unique keys.



  • @KattMan said:

    This should be a lesson to all those that only want to work off natural keys.

    Natural keys are good for human searching, but for linking you need auto-generated, system controlled garunteed unique keys.

     

    Just because there are cases where natural keys were poorly chosen does not been that all natural keys are bad.  Thats like going to the grocery store, buying an apple, getting home and finding out it's rotten, and thereby concluding that "all apples are bad!".



  • @Zemm said:

    @morbiuswilters said:

    Some of us write software that helps fly airplanes and manages complex medical equipment.
     

    And we are on this site reading about how the other half live...

    You poor, naive soul.  You do realize that about 80% of the people who post to these forums nowadays are WTF factories, right?  And do you really think these idiots don't get jobs working on critical systems that can kill people?

     

    My friends all wonder why I'm afraid to fly.  They figure being the rational, science-minded person of the group that I should be the most likely to trust airplanes.  I always tell them, though, "I'm an engineer, I've seen how engineers work in all kinds of fields and I know how low the bar can be set.  In the best case scenario, this plane was designed to withstand very specific quantities of physical stress, taking into account a certain amount of wear over time.  The good engineers made trade-offs between the plane falling apart, cost of materials and fuel cost.  So at my most optimistic, I'm hoping that this 20 year old piece of shit I'm sitting on hasn't slowly developed weaknesses in ways never anticipated and that all maintenance has been done according to the specifications put forth by an anal-retentive engineer.  At worst, the engineer was the boss' nephew.  And I know that engineers learn more about safety from mistakes like Tacoma Narrows and the Ford Pinto than they do from pure theory."



  • @Jeff S said:

    Just because there are cases where natural keys were poorly chosen does not been that all natural keys are bad.  Thats like going to the grocery store, buying an apple, getting home and finding out it's rotten, and thereby concluding that "all apples are bad!".

    I rarely find cases where natural keys are the right way to go.  They are far too inflexible and large/slow to be practical except in very limited scenarios. 



  • The WTF is not that he can't code, it's the fact that he doesn't know jack shit about databases.

    Secondary WTF (maybe worse): PHP.

    Tertiary WTF: Experienced webdesigners. Who the hell puts several people in charge of an addressbook. Just throw the intern at it until he gets it right. 

    Quartiary WTF: Why didn't anyone check the intern's design before OK'ing its creation. Sounds to me like you've got a process problem.

    Pentiary WTF: Nothing. I just like saying (well, writing) 'Pentiary'.



  • @KattMan said:

    This should be a lesson to all those that only want to work off natural keys.

    Natural keys are good for human searching, but for linking you need auto-generated, system controlled garunteed unique keys.

     

    There are at least 37 angry dorks with underbites and flannel pants on the SQL Team forums who disagree with you.  Don't make me call in Celko for a 1200-line rant with numerous pointless examples like holiday calendars and copious references to proprietary or nonexistent RDBMS features and SQL grammars like pointer chains.

    I think the only times I've ever seen fit not to use autogenerated keys are for bootstrap data - dinky little 2-column "Type ID / Description" tables that mimic enumerations - or 1-to-1 inheritance (or if you're a SOA guy, "choice") tables that use the parent's PK as their own PK.  Even if the business rules mandate that users enter IDs manually, it's still usually better to autogenerate your own and just hide it.

    "Natural" keys don't help for searching either - there's no rule that the clustered index has to be on the primary key.



  • @Ren said:

    Secondary WTF (maybe worse): PHP.

    Care to provide any reasoning for that? 



  • @morbiuswilters said:

    @Jeff S said:

    Just because there are cases where natural keys were poorly chosen does not been that all natural keys are bad.  Thats like going to the grocery store, buying an apple, getting home and finding out it's rotten, and thereby concluding that "all apples are bad!".

    I rarely find cases where natural keys are the right way to go.  They are far too inflexible and large/slow to be practical except in very limited scenarios. 

     

    Like anything, both natural and surrogates have their place, exclusively using one or the other is not a good idea.  The thing to remember is this: if you use surrogates, and you want data integrity, you've got to at least be sure to add unique constraints on your natural keys.  Otherwise you'll be wondering why your table of States has five entires for "Massachussets", all with different StateID values ....



  • @Aaron said:

    There are at least 37 angry dorks with underbites and flannel pants on the SQL Team forums who disagree with you. 
     

    That's a vicious lie... I'm wearing khakis!



  • @morbiuswilters said:

    @Ren said:

    Secondary WTF (maybe worse): PHP.

    Care to provide any reasoning for that? 

     

     Sure. PHP is bad.



  • @Ren said:

    @morbiuswilters said:

    @Ren said:

    Secondary WTF (maybe worse): PHP.

    Care to provide any reasoning for that? 

     

     Sure. PHP is bad.

     

    When people say a particular software language/platform/framework/tool is "TRWTF", I like to ask them three questions:

    1) Would you consider yourself extremely knowledgable in xxx ?

    2) Would you consider yourself a good programmer overall?

    3) If you answer NO to 1 or 2, don't bother answering any more, your opinion isn't really that relevant, right? But if you answered YES to both:  If forced to write an application in xxx, would you be capable of writing decent code that works efficently and is relatively clean and easy to read/maintain ?

    Curious about your answers to those 3 questions in regards to PHP, Ren.



  • @Jeff S said:

    @morbiuswilters said:

    @Jeff S said:

    Just because there are cases where natural keys were poorly chosen does not been that all natural keys are bad.  Thats like going to the grocery store, buying an apple, getting home and finding out it's rotten, and thereby concluding that "all apples are bad!".

    I rarely find cases where natural keys are the right way to go.  They are far too inflexible and large/slow to be practical except in very limited scenarios. 

     

    Like anything, both natural and surrogates have their place, exclusively using one or the other is not a good idea.  The thing to remember is this: if you use surrogates, and you want data integrity, you've got to at least be sure to add unique constraints on your natural keys.  Otherwise you'll be wondering why your table of States has five entires for "Massachussets", all with different StateID values ....

    True. The OP case is a badly-chosen PK usage; even if the "company" table was made, using the company's address as part of a primary key is just wrong.

    I've recently seen a big WTF on one of our tables. The table looks something like

    service(id, account, clientid, productid, status, description)

    where they used PK(account, clientid, productid) which seemed to be a sensible choice to them, even when the id field was unique/autogenerated, and a much better candidate for being a primary key (and all related tables use that id, not the "natural key" thing.)

    Fast forward three months later; one client cancels his service for product X. Then he decides he actually wants the product X service again; by company policy, this means another INSERT into the table.

    I think you've noticed what the WTF is.

    So finally, my voice was heard and the PK was changed to only use the unique ID. So yes, poor choice on PK's will come to bite you when stuff like this happens.



  • @Ren said:

    Sure. PHP is bad.

    You make Baby Alfred North Whitehead cry. 



  • @danixdefcon5 said:

    Fast forward three months later; one client cancels his service for product X. Then he decides he actually wants the product X service again; by company policy, this means another INSERT into the table.

    I think you've noticed what the WTF is.

     

    Of course the table is poorly designed if there can/should be more than 1 combination of account/company/product.   There should be another column as part of the natural PK, such as status or effective date.  It has nothing to do with using natural keys or not.  Or maybe TRWTF is a company policy that involves INSERT statements?



  • @Jeff S said:

    1) Would you consider yourself extremely knowledgable in xxx ?

    Extremely? No. I've built some small-scale company sites and helped maintain & develop some open-source PHP games and boards (mostly features and debugging for sites), but I have to keep references open and test my code, so I don't consider myself a PHP expert. 

    @Jeff S said:

    2) Would you consider yourself a good programmer overall?

    Actually, yes. 

    @Jeff S said:


    3) If you answer NO to 1 or 2, don't bother answering any more, your opinion isn't really that relevant, right? But if you answered YES to both:  If forced to write an application in xxx, would you be capable of writing decent code that works efficently and is relatively clean and easy to read/maintain ?

    Obviously since I haven't written any PHP books (and therefore aren't 'extremely' knowledgable) my opinion doesn't matter. Mostly, I was going for the sarcasm with "X is bad", butI forgot this was the internet and anything without a smiley can't be sarcasm ;)

    Anyway, I don't have anything against PHP per se, but PHP/MySQL is quite easy to get wrong especially at the design phase if you're not careful or experienced. Of course, skipping the design phase is a WTF in itself but I digress. Take your average CS intern (I'm assuming intern means nearly-graduated here), he's maybe had two courses in DBs, which consist of basic HTML generation with PHP/SQL interfaces (and some SQL). If you're lucky he knows 'where' and if you're really lucky, 'join'. Not exactly the person I'd put solely in charge of designing a database structure. Of course, you might get really really lucky and he's a phpMyAdmin developer, but still. Hell, I wouldn't want to do it without someone checking my structures.

    I think we all agree that the WFT here isn't the intern. It's the fact that someone put him in charge of systems design, not the implementation. That he was to use PHP just adds some tripwires for the unwary since PHP is very, very simple to write.



  • Thanks for clarifying, Ren. I don't agree with Morbius much (though sometimes more than we both realize) but I think we both agree that generic statements like "technology xx is the TRWTF" are ignorant and trollish, especially when made by people who don't know that technology very well.  Always blame the programmer, not the tool, for bad code, right?

    And, of course, if you're just joking, be sure to make it more obvious!  And realize that paricular one is kinda played out ...... :)

    @Ren said:

    I think we all agree that the WFT here isn't the intern. It's the fact that someone put him in charge
    of systems design, not the implementation. That he was to use PHP just
    adds some tripwires for the unwary since PHP is very, very simple to
    write.
     

    Agreed.  It is an unwritten rule here that we generally try not to make fun of code written by students or interns.   I don't think Alex accepts too many submissions to the main page from students or those in training ... it's much more fun to pick on those "highly paid enterprise-level consultants".



  • @Jeff S said:

    @danixdefcon5 said:

    Fast forward three months later; one client cancels his service for product X. Then he decides he actually wants the product X service again; by company policy, this means another INSERT into the table.

    I think you've noticed what the WTF is.

     

    Of course the table is poorly designed if there can/should be more than 1 combination of account/company/product.   There should be another column as part of the natural PK, such as status or effective date.  It has nothing to do with using natural keys or not.  Or maybe TRWTF is a company policy that involves INSERT statements?

    Ah well, the reasoning behind this is that everything's gotta be accountable, so they want the current "service contract" there, as well as all the previous "service contracts", even if they were cancelled. I can't say much because of NDA's, but basically those rows cannot be deleted; and all payments/charges are related to that unique ID. The composite PK wasn't required, and the situation it was trying to handle (avoiding more than one active acct/client/product "service contract") can't be managed by PK's or constraints. It just took 3 months for the lead developer to find this out.


  • @Ren said:

    I don't have anything against PHP per se, but PHP/MySQL is quite easy to get wrong especially at the design phase if you're not careful or experienced.
    If you're inexperienced, isn't programming and design easy to get wrong in any language?

    Or do non-PHP languages magically make every neophyte's code as good as an experts?



  • @Ren said:

    Tertiary WTF: Experienced webdesigners. Who the hell puts several people in charge of an addressbook. Just throw the intern at it until he gets it right. 

    An addressbook with a lot of custom business logic, and the replacement is /much/ more elaborate :-) And believe me, after having seen what the guy wrote, no way I was going to throw him at any work at all, nor anywhere except the door or a convenient window.

    @Ren said:

    Quartiary WTF: Why didn't anyone check the intern's design before OK'ing its creation. Sounds to me like you've got a process problem.

     

    Well, yes. Or more like the guy who set the specs is not a coder, and therefore the specs were limited to the UI. I was first to check what was behind the UI. 



  •  @Jeff S said:

    Otherwise you'll be wondering why your table of States has five entires for "Massachussets", all with different StateID values ....

    Which could easily be resolved with a unique index/constraint on the "Description" column (or whichever column holds "Massachussets".

    Of course, if you're relying exclusively on the database to enforce uniqueness then you've got other problems.

    And if you have a table of states you probably also have a problem.  State and country tables are a classic example of a meta-problem that "pure" DBAs create with their chronic anal-retentiveness.  "But it's not normalized!  People could put any junk they want into the State field!"  Yeah, so now you're stuck maintaining these two tables, and you need to guarantee uniqueness, and you need to worry about typos, and you probably need some UI for managing those tables or an automated process to pick them up from a public repository, and you need to make sure everything is up-to-date after some crazy war or political shift that changes the names of everything, and you need to worry about address formats for military bases and P.O. boxes, and...

    Just let people type in their address.  Preferably in one big memo field or textarea (you can even use Google to parse it for you!), but at the very least make the different fields freeform.  If some idiot didn't enter the address correctly then you're going to find out about it pretty soon anyway.

    I know, I know, it's a critical business intelligence app and you need to group by the region, right?  Then put in an auto-complete text box - that saves users time and practically guarantees that they won't screw it all up.

    Maybe that was just an off-the-cuff example, but I still hate it - it's like using the Fibonacci sequence as an example of recursion.



  • @Aaron said:

     @Jeff S said:

    Otherwise you'll be wondering why your table of States has five entires for "Massachussets", all with different StateID values ....

    Which could easily be resolved with a unique index/constraint on the "Description" column (or whichever column holds "Massachussets".



    Excellent point!  But, I think I read that somewhere before.   Oh yeah -- it was the very sentence before the part you quoted!

    The thing to remember is this: if you use surrogates, and you want data integrity, you've got to at least be sure to add unique constraints on your natural keys.  Otherwise you'll be wondering why your table of States has five entires for "Massachussets", all with different StateID values ....

     ... and by the way, yes, I did spell Massachusetts wrong... 




  • @Jeff S said:

     ... and by the way, yes, I did spell Massachusetts wrong... 
    ... and you call yourself an American!



  • @TooWhiteAndNerdy said:

    @Jeff S said:

     ... and by the way, yes, I did spell Massachusetts wrong... 
    ... and you call yourself an American!

     

    It's worse than that ... look at my location below my avatar ....



  • @Jeff S said:



    Excellent point!  But, I think I read that somewhere before.   Oh yeah -- it was the very sentence before the part you quoted!

     

    I wonder how I glossed over that.  Next time I think I'll wait until after lunch before posting anything of substance.

    The rest still stands, though.



  • @Aaron said:

    The rest still stands, though.
     

    hmmm... Well, let's take a look:

    @Aaron said:


     And if you have a table of states you probably also have a problem.  State and country tables are a classic example of a meta-problem that "pure" DBAs create with their chronic anal-retentiveness.  "But it's not normalized!  People could put any junk they want into the State field!"  Yeah, so now you're stuck maintaining these two tables, and you need to guarantee uniqueness, and you need to worry about typos, and you probably need some UI for managing those tables or an automated process to pick them up from a public repository, and you need to make sure everything is up-to-date after some crazy war or political shift that changes the names of everything, and you need to worry about address formats for military bases and P.O. boxes, and...

    I really don't know how to respond to any of that to be honest.  Every schema we design is based on requirements, and you really need to analyze those requirements carefully to determine what you need.  There is never a rule that says always put states in a table, or cities, or area codes, or never don't  put them in a table -- it all depends on your requirements.  Sometimes certain data elements are just attributes, other times they are entities, and you need to model things appropriately based on your logical model.  Sometimes you might need surrogate keys, other times natural keys are fine, sometimes a natural key is a set of surrogate foreign keys, sometimes a surrogate key becomes a natural key, and so on.

    So, while your post with all of your "what ifs" is nice, it's kind of irrelevant.  Of course different scenarios require different things.  If you are trying to write that "no one should ever store the 50 US States in a lookup table, ever!" ,  then I suppose we can debate, but otherwise I am not sure what to say. 

    Your comment about not relying on the database to maintain data integrity is interesting, but again, I am not sure specifically what you are saying.  Are you saying that a poorly designed database without constraints will not effectively constrain your data?  If so, then sure!  Or are you saying that a specific relational database engine you've used is broken and for some reason even though you put a UNIQUE or CHECK or FK constraint somewhere it ignores it, due to bugs or some other oddity?   If that's the case, then, that's great as well, some examples would be nice. Otherwise,again, not sure how to respond to that comment. It sounds like you are saying "don't rely on an integer variable to only hold numbers!" in a strongly-typed programming language, but I doubt that is your intent.



  • @Jeff S said:

    There is never a rule that says always put states in a table, or cities, or area codes, or never don't  put them in a table -- it all depends on your requirements.  Sometimes certain data elements are just attributes, other times they are entities, and you need to model things appropriately based on your logical model.  Sometimes you might need surrogate keys, other times natural keys are fine, sometimes a natural key is a set of surrogate foreign keys, sometimes a surrogate key becomes a natural key, and so on.

    I find it ironic how you accuse me of being overly speculative and vague when you're doing the exact same thing in your response, and to a much larger degree.

    Let's be specific then: aside from a geographics or mapping application where you're actually collecting information about the region itself, can you give an example of when you would truly need states and countries to be entities as opposed to attributes?  Also, those "what-ifs" you dismiss as unimportant are essentially the foundation of any engineering or design discipline (assumptions, limitations, edge cases, and so on), so I would like to hear how you would answer those questions within the context of whatever particular example you choose.

    Oh, and I'll tip my proverbial hat to you if you can find a non-contrived example where an auto-generated key would be inappropriate.  Even better, if you have an example where a compound primary key would be appropriate.

    Are you saying that a poorly designed database without constraints will not effectively constrain your data?  If so, then sure!  Or are you saying that a specific relational database engine you've used is broken and for some reason even though you put a UNIQUE or CHECK or FK constraint somewhere it ignores it, due to bugs or some other oddity?   If that's the case, then, that's great as well, some examples would be nice. Otherwise,again, not sure how to respond to that comment. It sounds like you are saying "don't rely on an integer variable to only hold numbers!" in a strongly-typed programming language, but I doubt that is your intent.

    I'm not sure if you're making fun of me for originally (accidentally) taking what you said out of context several posts ago, or if you genuinely misinterpreted my words to such a major extent.  What I originally said, which you helpfully didn't include in the quotation, was:

    @Aaron said:

    if you're relying exclusively on the database to enforce uniqueness then you've got other problems.

    That is saying precisely what it says.  The operative word is exclusively.  At no point did I say that databases should not be used to enforce basic data types.  At no point did I say that any particular database engine ignores constraints.  At no point did I even say that there would be any reason not to have uniqueness constraints or other constraints.  What I said, paraphrased, was that these constraints should only be used as failsafes, as the last line of defense in defensive programming.

    If your application is designed such that, if the DBA forgot to put a UNIQUE constraint on the "StateName" column, and you end up with 5 instances of the same state, then your application is designed wrong, because it means that at some point your non-technical users would have gotten a cryptic "violation of UNIQUE constraint" message, or whatever generic failure message/page was chosen to hide those details.  The application itself should make it impossible or at least extremely difficult for any user to do something so blatantly wrong.  Of course the database should also be set up so as to ensure that the data does not get corrupted if the app designers haven't done their job properly, but that is entirely beside the point that I outlined above.

    My "specific example" is your example - the region table.  I can think of no conceivable scenario where it would be correct for the database to enforce uniqueness on a column, but not for the application itself to prevent the user from creating duplicate instances.  If you have a sub-example that contradicts this, then by all means, let's hear it.



  • @Aaron said:

    Let's be specific then: aside from a geographics or mapping application where you're actually collecting information about the region itself, can you give an example of when you would truly need states and countries to be entities as opposed to attributes

    Sure.  You gave the example already, I think.  When you need to generate reports that group or filter by state, or country. When you need to group states into regions, such as NE, NW, Midwest, etc.  When you'd like to be able to format a state as "MA","Mass", "Masschusetts", and so on consistently depending on output specs.  (i.e., one report or webpage wants the state name spelled out,  mailing labells want it MA, and so on ) When you'd like to provide users with a drop-down list because it's in the specs.  in short, any time you would like to use states as an entity in any way to store attributes or relations for states, you create a table of states. It's pretty simple, isn't it?

     @Aaron said:

    If your application is designed such that, if the DBA forgot to put a UNIQUE constraint on the "StateName" column, and you end up with 5 instances of the same state, then your application is designed wrong, because it means that at some point your non-technical users would have gotten a cryptic "violation of UNIQUE constraint" message, or whatever generic failure message/page was chosen to hide those details.

    Or, your app could just catch the exception and present it to the users in a more friendly manner.  

    You are making lots of assumptions about these simple things I wrote for some reason. 

     @Aaron said:

    The application itself should make it impossible or at least extremely difficult for any user to do something so blatantly wrong .... I can think of no conceivable scenario where it would be correct for the database to enforce uniqueness on a column, but not for the application itself to prevent the user from creating duplicate instances.  If you have a sub-example that contradicts this, then by all means, let's hear it.

    Did I write that applications should somehow be written to circumvent or ignore database constraints somewhere?  If you have your database constraints in place, you'd have to be a much more clever programmer than me to write an application thats allows users to violate them. (Short of explicitly dropping/disabling those constraints of course ... and I don't think I wrote anywhere that I recommend doing that, either.)



  • @Jeff S said:

    When people say a particular software language/platform/framework/tool is "TRWTF", I like to ask them three questions:

    1) Would you consider yourself extremely knowledgable in xxx ?

    2) Would you consider yourself a good programmer overall?

    3) If you answer NO to 1 or 2, don't bother answering any more, your opinion isn't really that relevant, right? But if you answered YES to both:  If forced to write an application in xxx, would you be capable of writing decent code that works efficently and is relatively clean and easy to read/maintain ?

    Just because someone answers 'Yes' to all three questions, does not mean that the language is bad.

    I haven't paid attention to PHP in a long time - not since PHP4 was considered quite new (although not still brand-spanking new). When I last was working with PHP, I was working with PHP3, and I was maintaining an application written in PHP3, and I had managed to get it to be relatively efficient, easy to read and maintain, and I was slowly getting it to be clean. I *could* have completed the job, as well as improved efficiency to the point where the application would have worked as well as we would have needed. In short, I feel fairly confident that, at that time, I could have correctly answered 'Yes' to all three of those questions.

    That having been said, we estimated it would have taken, at the rate we were cleaning the code, approximately two years to finish that job. It would have taken approximately a year to rewrite the whole site from scratch in PHP4, taking the time to make it clean and maintainable. We also estimated that it would've taken approximately 9 months to rewrite the whole site in perl, taking the time to make it clean and maintainable.

    As a result, our management gave us to go ahead and rewrite it in perl in 3 months, and make another real mess of things. Sigh.

    (PS: we're currently on major revision 3 with perl, and finally have decent, efficient, clean, legible code which is easy to maintain.)

    (PPS: The main problem with the old php3 code was not actually the language; the language merely contributed. The main problem was that it was written by a single individual who believed that it was better to have a 250 character line than it was to split a logical command-line across multiple lines, that it was better to put as much functionality into a single logical command-line than to use temporary variables - even if the temporary values allowed you to avoid running the same code multiple times, that sluggishness on a test box roughly 1/4 the size of the production machine with just one user on it did not bode ill for running the same software on the production machine for 10,000 users, and that a veto on a code review meant "put fixing that code on your TODO list, but it's ok to deploy it to production as-is." Not that much of his code actually went through a code review process. Because he needed a code review to deploy code to production, he'd submit a sample of the new code for review, and then deploy a much larger code update when he got any response back. Yes, I realize that if he was a perl programmer, I'd be complaining about his 3,500 character lines, rather than his 250 character lines. Except that I don't think he'd be able to get one of those to actually compile.)



  • @Jeff S said:

    Sure.  You gave the example already, I think.  When you need to generate reports that group or filter by state, or country. When you need to group states into regions, such as NE, NW, Midwest, etc.  When you'd like to be able to format a state as "MA","Mass", "Masschusetts", and so on consistently depending on output specs.  (i.e., one report or webpage wants the state name spelled out,  mailing labells want it MA, and so on ) When you'd like to provide users with a drop-down list because it's in the specs.  in short, any time you would like to use states as an entity in any way to store attributes or relations for states, you create a table of states. It's pretty simple, isn't it?

    No, it isn't simple, because you run into all of those other issues I mentioned.  No matter how you try to maintain this table, it's almost always going to be somehow wrong or incomplete.

    If all you need is sorting, filtering, and grouping, then you definitely don't need a separate table for it.  And drop-down lists of states are a blight on UI design and a curse on users; if the spec insists on one then the spec sucks.

    The other examples are, I suppose, valid, albeit contrived.  Geocoding services are probably a better solution, but if that's seriously not an option then I guess you might need a state table.  These very rare exceptions are why I used the word "probably", rather than make sweeping generalizations such as "you should never have a table of states."

    And it practically goes without saying that none of these examples would make use of a "natural" key in their optimal solutions.

    @Jeff S said:

    Or, your app could just catch the exception and present it to the users in a more friendly manner.

    Oh yes, brilliant.  So you'll need a huge exception handler that checks for all the different SQL error codes that could arise from invalid user input, and you'll be parsing the exception text to figure out which table and constraint was violated, and hopefully figure out exactly which field was in error from that.  Have you ever actually tried doing this?

    Not to mention that you now need a round-trip just to validate your form.  Pretty much rules out the possibility of real-time validation, not just because of the delay but because the only way to validate is to actually attempt an insert/update.  Reminds me of those awful 16-bit database apps that commit their changes as soon as you leave the record, without ever giving you the opportunity to cancel a batch.  We're talking about the dark ages of app design here - people expect something better today.

    @Jeff S said:

    @Aaron said:
    The application itself should make it impossible or at least extremely difficult for any user to do something so blatantly wrong .... I can think of no conceivable scenario where it would be correct for the database to enforce uniqueness on a column, but not for the application itself to prevent the user from creating duplicate instances.  If you have a sub-example that contradicts this, then by all means, let's hear it.

    Did I write that applications should somehow be written to circumvent or ignore database constraints somewhere?  If you have your database constraints in place, you'd have to be a much more clever programmer than me to write an application thats allows users to violate them. (Short of explicitly dropping/disabling those constraints of course ... and I don't think I wrote anywhere that I recommend doing that, either.)

     

    Aaaand again you've somehow managed to mount a defense against the complete opposite of what I actually said.  I'm flabbergasted.  The UI should always enforce the same rules as the database without relying on the database to do it.  If your only defense against invalid input is the database, your app design is crap, period.



  • Wow, you are really on attack mode for some reason against me.   I'd like to respond further since I usually enjoy debating database topics, but definitely not in this case, and based on your reponses I can see there's no point in doing so. 



  • @Aaron said:

    My "specific example" is your example - the region table.  I can think of no conceivable scenario where it would be correct for the database to enforce uniqueness on a column, but not for the application itself to prevent the user from creating duplicate instances

    Only the database can do that reliably. The application can try to prevent it, but race conditions can cause cases where the application fails to do so. So, if you build such logic into the application, you still have cases where the insert or update statement fails because of the unique constraint. Your application has to catch and handle that exception properly. This means: more complexity, probably an inconsistent user experience etc. In some cases it might be worth the price, but in general I would rather avoid that.



  • @ammoQ said:

    @Aaron said:

    My "specific example" is your example - the region table.  I can think of no conceivable scenario where it would be correct for the database to enforce uniqueness on a column, but not for the application itself to prevent the user from creating duplicate instances

    Only the database can do that reliably. The application can try to prevent it, but race conditions can cause cases where the application fails to do so. So, if you build such logic into the application, you still have cases where the insert or update statement fails because of the unique constraint. Your application has to catch and handle that exception properly. This means: more complexity, probably an inconsistent user experience etc. In some cases it might be worth the price, but in general I would rather avoid that.

    Agreed.  I don't enforce much of the business logic within the database, but one place I do is with unique keys.  Not only because it prevents nasty race condition errors, but also because most DBs can optimize a unique index much more efficiently than a non-unique one.  I use unique indexes a lot for system-wide atomic locks.  It is already easily network-accessible, there are database APIs for almost every programming language and the locks themselves can be shared amongst several different languages running on several different servers. 


Log in to reply