I'm no DBA....



  • We recently had an issue with a certain error not being logged in our production database.  A quick call to our vendor and a look at the database reveals why - the log table that this event was supposed to be logged to has three primary keys, the device name, the time, and the date.  This means that a PDC can't log more than one event every couple seconds (due to the way time and dates are handled in the database, they end up being ridiculous numbers, today's date, for example, is 76324, and the time is probably something like 2 million).  The result of this is that some errors that occur don't get logged, making troubleshooting more difficult.

    I'm not quite sure why the hell they did this, any professional DBA's out there care to comment?  This a good idea or a bad one?

    A recent adventure they caused is definitely not a good idea, one of their developers, while trying to get some extra logging into one of our new Palm apps, accidentally made it log too much.  In one month the table had grown from 300k records to 4.5 million records.  Our largest table is 12 million records.  Due to software inefficiencies, that table being so large made certain parts of our front end nearly unusable.

    Maybe one day we'll have a real production system.....



  • @Volcanon said:

    This a good idea or a bad one?

    It's a bad idea.  It actively prevents simultaneous (or even near-simultaneous) errors from being logged, right?  That's a major failure, no matter what other reasons they might have had for using those PKs.

     

    @Volcanon said:

    Maybe one day we'll have a real production system.....

    Bad performance?  Mysterious bugs?  Inept developers?  Sounds like you already have one.



  •  @morbiuswilters said:

    @Volcanon said:

    This a good idea or a bad one?

    It's a bad idea.  It actively prevents simultaneous (or even near-simultaneous) errors from being logged, right?  That's a major failure, no matter what other reasons they might have had for using those PKs.

     

    @Volcanon said:

    Maybe one day we'll have a real production system.....

    Bad performance?  Mysterious bugs?  Inept developers?  Sounds like you already have one.

    We were actually on a support call with my boss and several of their developers when the lady in charge mentioned the PKs.  I immediately suggested we remove the time as a PK and use another field.  She essentially ignored me and said "I added a routine to add the time+1 for this particular error so that it will get logged."


    Some other funnies regarding their support.

    They quoted us at 6 hours for our newest palm app, which is supposed to allow us to do batch labeling.  They then gave us an app that they made for another customer of theirs.  Obviously it didn't work at all.  We then go through probably 20 different itterations of software before we get a somewhat usable program.  Now it works and the final result: 65 hours.  They want want 9 grand for this program that they originally said would cost us about 800 bucks.

    I think they charged us about 20 grand this year for fixes, in addition to another 20+ grand that we pay for support.  I believe we ended up paying the 20 grand for support plus another 2 grand for fixes.  My boss refuses to pay them to fix things that break, they disagree with him on that.

    These issues are just the tip of the iceberg.  If ever get access to their sourcecode, you will see probably a week's worth of articles by me.  We're currently look at a full ERP system to replace our seperate attendance, payroll, accounting, and production systems.  I imagine there will be more hilarity to ensue.



  • @Volcanon said:

    This means that a PDC can't log more than one event every couple seconds (due to the way time and dates are handled in the database, they end up being ridiculous numbers, today's date, for example, is 76324, and the time is probably something like 2 million).  The result of this is that some errors that occur don't get logged, making troubleshooting more difficult.

    There are two ways of looking at this:

    1. The logging system was designed with the criteria/belief/assumption that errors would not occur more than every couple of seconds.
    2. The logging system was badly designed in the first place as errors will always occur faster than the logging system can handle them.

    Either way you have a mismatch between the system being monitored and the logging system. But you can't tell which system is the broken one until you consider the overall combined system. Thus the situation raises more questions like:

    • Why is the system generating errors faster than can be handled?
    • Is this a new behavior or has it always occurred?
    • If it is new behavior, did it start with a change to either system?

    So the answer to it being a good or bad ideas is "it depends" .. :D



  •  I believe the issue they were trying to resolve was spamming of the log and to do that they used key constraints.

     

    This is likely why they didn't want to remove time as a key constraint.  The table has only 6 rows and 3 of them are PKs... maybe if we make the error itself a PK as well and thus prevent duplicate errors but allow multiple errors to be thrown at once?:)



  • @Volcanon said:

      If ever get access to their sourcecode, you will see probably a week's worth of articles by me. 

     

    My company is also looking at buying the source to the only piece of software that isn't in-house developed. I think I'll be spending a lot of time posting here when we get it too...



  • @MeesterTurner said:

    @Volcanon said:

      If ever get access to their sourcecode, you will see probably a week's worth of articles by me. 

     

    My company is also looking at buying the source to the only piece of software that isn't in-house developed. I think I'll be spending a lot of time posting here when we get it too...

     

     

    You have no idea....

    In our production front end... you cannot sort by column.  The scrollbars do not work correctly and when you do use them, it slides your mouse slightly to the left.  Any screen you look at queries multiple tables and pulls in all of the data for all of the fields, whether or not you wanted to see it all or wait for it to load.  Reports randomly break and the ones that do work often randomly leave out information.

     

    The coding for these bugs, which have existed for as long as we've used the software(about 4 years no), is probably hilarious.



  • @Volcanon said:

    @MeesterTurner said:

    @Volcanon said:

      If ever get access to their sourcecode, you will see probably a week's worth of articles by me. 

     

    My company is also looking at buying the source to the only piece of software that isn't in-house developed. I think I'll be spending a lot of time posting here when we get it too...

     

     

    You have no idea....

    In our production front end... you cannot sort by column.  The scrollbars do not work correctly and when you do use them, it slides your mouse slightly to the left.  Any screen you look at queries multiple tables and pulls in all of the data for all of the fields, whether or not you wanted to see it all or wait for it to load.  Reports randomly break and the ones that do work often randomly leave out information.

     

    The coding for these bugs, which have existed for as long as we've used the software(about 4 years no), is probably hilarious.

    1 time when i was young i inherited an app with a few of these "minor issues" and was asked to fix it.

    I quit a couple of weeks later and now I just reject the job.

    When a salesguy or manager mentions this shit they are some minor annoyances, fixing them usually involves a rewrite



  • Primary keys... ok you got me started.

    Obviously these guys are idiots.  They do not understand what different types of keys are for.

    Lets start with the two types, Unique Serial ID Keys, and Natural Keys made up of one or more pieces of data.

    Anyone that says use one and not the other is an idiot.  Both types of keys have thier purpose and the point is to never confuse the two.  The people that wrote your system confused natural keys with unique serial keys.

    Unique serial keysare used by the database to assure a UNIQUE pointer to a particualr record, they are not used for searching, they have no meaning, they are simply a unique id given to a particular record.

    Natural keys  are NEVER garunteed to be unique.  They are made up of data that actually has meaning.  They are how we as people view data.  I want to look for John Smith, this would be a natural key based off of first name and last name, many records may have john smith but some of the other data may be different So the user finds all John Smith records and has them displayed, they click one and start to edit it.  The system tracks the unique ID of the record they are editing without the user having knowledge of that ID so when the user saves the system can update the correct John Smith record with the new information.

     In this case, the time and machine the error happened on all have meaning, this is a natural key and can not be considered garunteed unique as you have seen.  Where is thier unique ID field?



  • Email address is a unique natural key.



  •  @RaspenJho said:

    Email address is a unique natural key.

    Actually, I hate to tell you this, but no it is not.

    Two people in the same household could be sharing a family email account, but there would be two separate distinct people.  You only assume one person would be using a single account, this is not garunteed and you can not enforce peoples usage of thier own email accounts.  You can assume within your system that a single email address is a single client or customer and you would be wrong.  Now does being wrong in this manner actually mean anything to your business?  It might not.

    Another case would be a Jason Smith (jsmith@Company.com) at a company that you do business with that later leaves the company then let's say a year later Jane Smith replaces him but gets the same email address. Different person, same email address.  You could update all of the information in your system but then you might lose historical data that Jason is the one that approved such and such purchase two years ago and not Jane.



  • @KattMan said:

     Where is thier unique ID field?

     

     

    They don't use them, anywhere.  I have scoured the database for some kind of ID key usage, but there is none.  All keys are natural keys.  It is quite horrific.



  • You can't have two duplicate email addresses at the same time. Emails going to that address always route to the same place.

    And I have a feeling that you don't hate to tell me this at all :)



  • @RaspenJho said:

    You can't have two duplicate email addresses at the same time. Emails going to that address always route to the same place.

    And I have a feeling that you don't hate to tell me this at all :)

     

    You misunderstood him.

    Suppose you have a customer contact, John Smith, jsmith@company.com.  You have a database entry for this person.  Later that year they are replaced by Jane Smith, and they give her his old email address.  For historical purposes you'd want two seperate entries, John smith and Jane smith, both with the same email address.  If the email address was the only PK, you'd run into trouble.  This is a databse design issue, not so much an email issue.

    This is also a situation where an email address may not necessarily be a unique natural key.  Though in most cases, you would probably be safe assuming that email addresses are unique.



  • @RaspenJho said:

    You can't have two duplicate email addresses at the same time. Emails going to that address always route to the same place.

    And I have a feeling that you don't hate to tell me this at all :)

     

    You are missing the point.

    Yes the email address will always go to the same place, but that does not mean it is always the same person.  If you have need to retain historical information then you will see that email address can not be used as a unique key because it cane be assigned to a different person at some other time.

    Read my previous explanation, one year Jason has the email address, the next year Jane has it because of employee turnover at the company.  You either change all your historical data because you keyed off of email address, or you create a new unique record with the new information that happens to share an email address with another record, which is perfectly valid.  Go ahead search on email address, you will see two records now, one current and one out of date, but purchases made by Jason are still pointing to Jason because you are not using email as the unique id.

    And yes I do hate to tell you this, because you should already know that data with meaning will never be garunteed unique.



  • Just because your database is now out of date doesn't mean the email address is now non-unique.



  • @KattMan said:

    Lets start with the two types, Unique Serial ID Keys, and Natural Keys made up of one or more pieces of data.

    Anyone that says use one and not the other is an idiot.  Both types of keys have thier purpose and the point is to never confuse the two.

     

    What about "Natural Keys" that are invented to be unique and
    identifying, for example state codes. "NY" is a unique key for the
    state of New York, yes? Or country codes, or language codes.The code "th" is for the Thai language, and "en" is for English. Using such a code, there is by definition only one English. If somebody picks "en" for the report language he'll get English - not British English or Hong Kong English, just English.

    A company
    could define it's product codes as unique; in that case if ProductId1
    == ProductId2 then product #1 is by definition the same as product
    #2.What about "Employee Number"? If the corporation already has a
    methodology in place to assign a unique identifying code to every
    individual employee, why duplicate that with another level of "Unique
    Serial ID"?

     



  • @AndyCanfield said:

    A company
    could define it's product codes as unique; in that case if ProductId1
    == ProductId2 then product #1 is by definition the same as product
    #2.What about "Employee Number"? If the corporation already has a
    methodology in place to assign a unique identifying code to every
    individual employee, why duplicate that with another level of "Unique
    Serial ID"?

     

    Employee Number could be the unique key as long as they are not assigned by a person but rather generated by a the system.  A portion of this could be system generated and therefore it is unique.  This data is self contained within your company and managed by your company, just like the aformentioned email address.  Within the company they are unique.  The question here would be, can the employee numbers ever be reused like an email address can be.  If the answer is no then does the system enforce this, if so then you have a valid unique key, but only within your system.

    Someone that does business with you, such as a payroll agency, can not use this employee information as a unique id, it is part of the meta data that has meaning outside of thier system, other emplyees within thier system could have the same employee id.

    The point being is this.  If the ID is generated by the system and not a person, the ID can never be reused and the system enforces the uniqueness of that ID then it is a valid unique ID but only within that system.  The previous email example breaks this on two levels, the email address can be reused at a later date even within the same system, and they are not always purely system generated.

    You questions about the language and country codes is mostly valid, but once again context is needed.  Does the system enforce EN as being unique within the system?  Other languages may use a different code for english and still yet EN may be used to determine something else.  If you do business with companies like this and do not translate thier codes into yours you will have problems with uniqueness of those codes as primary keys.



  • @AndyCanfield said:

    What about "Natural Keys" that are invented to be unique and
    identifying, for example state codes. "NY" is a unique key for the
    state of New York, yes? Or country codes, or language codes.The code "th" is for the Thai language, and "en" is for English. Using such a code, there is by definition only one English. If somebody picks "en" for the report language he'll get English - not British English or Hong Kong English, just English.

    Just to add to this

    State codes also would rarely if ever change.  This enforces the idea that they can be defined as being systematically unique, not garunteed unique, but this is usually enough.  The difference in being systematically unique is that values like this could at some point change, like a productCode that is supposed to be systematically unique, it isn't a serial ID generated at the lowest level but within the system it is understood to be unique.  But the value can possibly change and if the value is used in relationships making this change becomes a difficult process.  

    Things like state and language codes, once defined, have such a low chance of being changed that the risk involved in using them as unique keys is very low.  The mitigation of risk here won't see a problem with treating them as unique even though they really are not by a strict definition.

    We are getting into edge cases here where the lines get blurred but I will always fall back to this:

    If your language key is managed and becomes a standard in your system similar to state codes, and the risk of change is low, then use them.

    If your product code is built from things like style and color then I would say no, you need your own separate system generated key as this information can not be garunteed unique through the lifetime of the company.  If the product code at least contains part of a unique ID such as the following:

    ID = 1234 colorcode = 23 style code = 004 then product code = 123423004

    then really you dont have a problem because you really dont need to save the product code in the system but rather the data that can be peiced together to form the product code. You could save it to make it easier to write search queries but you dont have to.



  • @RaspenJho said:

    Just because your database is now out of date doesn't mean the email address is now non-unique.
     

    The database isn't out of date.

    I have purchases made by Jason Smith who's email address was jsmith@company.com in 2007.

    I have purchases made by Jane Smith who's email address is jsmith@company.com in 2009.

    My data is not out of date, it would be bad to update all records with an ID of jsmith@company.com to say Jane because if I have to go back and prove who made that purchase my system will incorrectly state that Jane, who was not an employee in 2007 made a purchase for the company.  Instead I need a truely unique ID to link the purchases to that will still show me that in 2007 Jason made a purchase and his email was jsmith@company.com and in 2009 Jane made a purchase and her email at the time was jsmith@company.com.



  • You do not have 3 different primary keys(There can by definition only be one primary key, hence primary).

    You have one composit primary key with 3 fields. So 2 questions: Why do you have a primary key at all. Do you really need it? I know some dbas are going to hate me for saying this, but I think that logging is one of the very very few cases where you don't have to define a primary key.

    And what the hell is up with date and time beeing 2 different fields, instead of just a datetime(Or what your database calls that type). That is a true wtf which indicate that the database is really fucked up. So looking forward to more entries about that database in the future :}

     




  • @tiller said:

    You have one composit primary key with 3 fields. So 2 questions: Why do you have a primary key at all. Do you really need it? I know some dbas are going to hate me for saying this, but I think that logging is one of the very very few cases where you don't have to define a primary key.

     

     I will agree with this, if the table will not be edited, as a logging table should not be, and the records do not have to be unique, once again the logging database, then unique primary keys or any primary key is not necessary, indexes could be in order to find all entries for a particular day or for a single machine.



  •  @tiller said:



    You do not have 3 different primary keys(There can by definition only be one primary key, hence primary).

    You have one composit primary key with 3 fields. So 2 questions: Why do you have a primary key at all. Do you really need it? I know some dbas are going to hate me for saying this, but I think that logging is one of the very very few cases where you don't have to define a primary key.

    And what the hell is up with date and time beeing 2 different fields, instead of just a datetime(Or what your database calls that type). That is a true wtf which indicate that the database is really fucked up. So looking forward to more entries about that database in the future :}

     


     

     

    Believe it or not, I did suggest we just do away with PKs in that table because I agree, though it is not in the best practices book, I don't think we need a PK for that table at all.  It is not edited, simply added to.  They didn't want to do that however, because, as I said, if they did away with key constraints, the log would be spammed by duplicate errors.

    The reason why date and time are seperate fields, I am told, is because of the programming language used to build our front end.  They call it Clarion.  I have never heard of it, but wiki does have some information on it.  Apparently in Clarion your dates and times must be in a number format.  It would also likely be more complex to have a date-time field, and I think we're about at our limit for their capability of complexity;)

    Yes there are many more WTF's, and some screencaps of some of our tables might be in order, just for the lulz:)




  •  @RaspenJho said:

    You can't have two duplicate email addresses at the same time. Emails going to that address always route to the same place.

    And I have a feeling that you don't hate to tell me this at all :)

    What about an email address such as it@company.com which routes to multiple mailboxes?  Not all email addresses route to a single mailbox and a single mailbox could have multiple email addresses.

     Granted, it is highly likely that a given user has a unique email address as well, but an email address is not guaranteed to correspond to a specific user.  As a hypothetical case (as I know of no software that actually does this), suppose that an email server was written such that emails to support@company.com would be routed to only one of a group of people (selected randomly or algorithmically) who is thusly assigned to resolve the issue in the body of the mail.  In this case, an email to support@company.com would not necessarily be read by a specific individual.



  • @smbarbour said:

     @RaspenJho said:

    You can't have two duplicate email addresses at the same time. Emails going to that address always route to the same place.

    And I have a feeling that you don't hate to tell me this at all :)

    What about an email address such as it@company.com which routes to multiple mailboxes?  Not all email addresses route to a single mailbox and a single mailbox could have multiple email addresses.

     Granted, it is highly likely that a given user has a unique email address as well, but an email address is not guaranteed to correspond to a specific user.  As a hypothetical case (as I know of no software that actually does this), suppose that an email server was written such that emails to support@company.com would be routed to only one of a group of people (selected randomly or algorithmically) who is thusly assigned to resolve the issue in the body of the mail.  In this case, an email to support@company.com would not necessarily be read by a specific individual.

     

    An email address is seen as a individual entity in those cases, from the outside it doesn't matter how many people could read it.

    But for an ecommerce site, you may want to know that husband and wife have different accounts but use a common email address.  So they have a userid separate from the email.  You track thier purchases through thier userid, not through thier email address even though you send purchase confirmations to the same email address. The system itself doesnt know that the email addresses are the same, nor does it care, it just grabs the email for this userid and sends the confirmation to it.



  • The WTF in my opinion is that instead of modifying the PK to include the error code (which should be linked to a table of valid error codes with descriptions), thus preventing multiple near-instantaneous instances of an identical error from the same device, possibly including a before insert trigger to increment a new #of occurences field should the error already have existed; the decision instead was to make the log time for THAT PARTICULAR error Time+1? That would make the error appear to have occured the day AFTER it actually occured, wouldn't (depending on the way dates are handled.  most databases have a base of 1 = 1 day).  And, what happens if, the day after the error occurs, ANOTHER error occurs?  It wouldn't be logged, because of the pre-existing error that travelled forward in time!!

     



  • @Medezark said:

    The WTF in my opinion is that instead of modifying the PK to include the error code (which should be linked to a table of valid error codes with descriptions), thus preventing multiple near-instantaneous instances of an identical error from the same device, possibly including a before insert trigger to increment a new #of occurences field should the error already have existed; the decision instead was to make the log time for THAT PARTICULAR error Time+1? That would make the error appear to have occured the day AFTER it actually occured, wouldn't (depending on the way dates are handled.  most databases have a base of 1 = 1 day).  And, what happens if, the day after the error occurs, ANOTHER error occurs?  It wouldn't be logged, because of the pre-existing error that travelled forward in time!!

    I believe it's seconds, not days.



  • @KattMan said:

    Natural keys  are NEVER garunteed to be unique.

     Just to nitpick. A 'key' IS something unique.



  • I never suggested that an email address identifies a single person, or even a specific person.



  • @morbiuswilters said:

    @Medezark said:

    The WTF in my opinion is that instead of modifying the PK to include the error code (which should be linked to a table of valid error codes with descriptions), thus preventing multiple near-instantaneous instances of an identical error from the same device, possibly including a before insert trigger to increment a new #of occurences field should the error already have existed; the decision instead was to make the log time for THAT PARTICULAR error Time+1? That would make the error appear to have occured the day AFTER it actually occured, wouldn't (depending on the way dates are handled.  most databases have a base of 1 = 1 day).  And, what happens if, the day after the error occurs, ANOTHER error occurs?  It wouldn't be logged, because of the pre-existing error that travelled forward in time!!

    I believe it's seconds, not days.

    In SQL server, the datetime data type has a precision of 3 milli-seconds, and the smalldatetime data type has a precision of 1 minute.



  • @ochrist said:

    @KattMan said:

    Natural keys  are NEVER garunteed to be unique.

     Just to nitpick. A 'key' IS something unique.

     

    Normally i would agree with you, but the fields normally used in natural keys are rarely if ever unique.  Keep in mind that not even the SSN of a person is garunteed to be unique in practice.  I'm not talking about identity theft, there have been cases where someone has been issued a SSN that someone already has.  As designers we try to choose keys that will be unique and that we believe will be so, but often that uniqueness is comprimised once the system is actually put to use.  I say go ahead and use those fields for an index and searching but provide a truely unique system generated key for use.

     I'm going to remove myself from this conversation with one link.

    http://www.agiledata.org/essays/keys.html



  • @RaspenJho said:

    I never suggested that an email address identifies a single person, or even a specific person.
     

     

    If you agree with that, then I would argue that it would be unwise to use it as a natural key.  I have five different active email addresses (work, gaming, personal, spam, and deprecated spam); your system would view me as five different individuals.  While it is possible to use email as a natural key, I would posit that a system with that design would be a candidate to appear as a featured article on this website.  Just because you can do something doesn't mean you should.  Or that it would be a good idea.



  •  They don't use date or time data types.  The data type of the timelog field is an integer.  The system counts itself.  It has it's own date and time tracking mechanism.  I'm sure the code for it would fall under the category of "crappily reinventing the wheel"



  • @Jayman said:

    @RaspenJho said:

    I never suggested that an email address identifies a single person, or even a specific person.
     

     

    your system would view me as five different individuals.

    Who said I was using email address to identify people at all? All I said was that email addresses are unique.



  •  @RaspenJho said:

    @Jayman said:

    @RaspenJho said:

    I never suggested that an email address identifies a single person, or even a specific person.
     

     

    your system would view me as five different individuals.

    Who said I was using email address to identify people at all? All I said was that email addresses are unique.

    @RaspenJho said:

    Email address is a unique natural key

    In the context of a database, a key is an identifier.  Stating that an email address is a unique natural key would infer that you are stating that an email address is a unique natural identifier (which it is not).



  • Can you have two of the same email addresses at the same time?



  • @KattMan said:

    Keep in mind that not even the SSN of a person is garunteed to be unique in practice.

     

    Not only that, but the Social Security Administration has never guaranteed that SSNs are even unique in theory. They do claim that the composite key (full name, birthdate, SSN) is unique, though I would give that exactly as much credence as any other government guarantee.



  • @RaspenJho said:

    Can you have two of the same email addresses at the same time?
    That depends on the context.  If it is an email server, I would fully expect the server would have constraints to prevent duplicates, but even then it would be as a unique alias that can be assigned to one or more mailboxes.  Outside of the alias table, I would never assume that the email field of a table would have a unique contraint.

    In a server environment, depending upon the DBA's normalization fanaticism, you would have either 2 or three tables that would define email addresses and users:  A users table, and either an alias table that constrains for unique entries on two fields (userid and email) or an alias table and alias mapping table where email addresses are listed in the alias table and the mapping table constrains userid and aliasid.

     


  • Discourse touched me in a no-no place

    @RaspenJho said:

    Can you have two of the same email addresses at the same time?
    Um, like foo@example.com and FOO@example.com? Technically they're different, but most MDAs treat them as the same. In fact MDAs can do whatever they want to link disparate addresses together.

    (Anyone storing those addresses on 3rd party software however should treat them as different. Just in case.)



    Google, for example, not only ignores case, but any periods in the local part, and anything including and after a + in the local part.

    Yahoo groups, on the other hand, require the case sensitivity in the local part when moderating messages by email. A few ESPs MTAs broke this functionality (Comcast spring to mind) by lower-casing the email address on outgoing mail.



  • I'm of the school that says that all DB tables should have an auto-number primary key, even those with "obvious" keys.

    Usually tables with "obvious" keys end up getting duplication sooner or later anyway. The worst offender is systems that use Social Security Numbers as a primary key, since even the Fed. Government says those aren't necessarily unique.

    Anyway, my anal philosophy might make database a few hundred K larger than they'd be otherwise, but you wouldn't have issues like this one.



  • @RaspenJho said:

    Email address is a unique natural key.

    Geni.com thinks so, but they're wrong. My parents are two individuals with a single email address... meaning they can't both be members of Geni.com at the same time.

    Good job demonstrating my point. If you think there's any such thing as a completely unique natural key, you just haven't seen enough data yet-- you're always going to get duplication sooner or later.



  • @KattMan said:

    State codes also would rarely if ever change.
     

    Hey, let's do business in Italy.

    Wait, what's that? There's an Italian province named Matera with the postal code MT? But... but... but Montana already uses MT! What a mess!



  • @blakeyrat said:

    I'm of the school that says that all DB tables should have an auto-number primary key, even those with "obvious" keys.

    Usually tables with "obvious" keys end up getting duplication sooner or later anyway. The worst offender is systems that use Social Security Numbers as a primary key, since even the Fed. Government says those aren't necessarily unique.

    Anyway, my anal philosophy might make database a few hundred K larger than they'd be otherwise, but you wouldn't have issues like this one.

     

     I tend to go that way as well, though I break that rule in our data warehouse; specifically, fact tables in our data warehouse.  Dimension tables, or database tables in apps?  Oh, yeah.  I have a sequencer primary key.



  • @KattMan said:

     I'm going to remove myself from this conversation with one link.

    http://www.agiledata.org/essays/keys.html

     

    That article suggests using SSN as a primary key. That basically proves the author is talking out of his ass.


  • Discourse touched me in a no-no place

    @blakeyrat said:

    I'm of the school that says that all DB tables should have an auto-number
    primary key, even those with "obvious" keys.
    Ah. I don't subscribe to the ethos of introducing keys for the sake of introducing keys.



    You advocate introducing a fake key on a (two column) table to resolve a many-to-many relationship? Where the keys in the tables concerned have their own (artificial) keys?


    <fake key><fake key from table 1><fake key from table2>
    


  • @PJH said:

    @blakeyrat said:
    I'm of the school that says that all DB tables should have an auto-number
    primary key, even those with "obvious" keys.
    Ah. I don't subscribe to the ethos of introducing keys for the sake of introducing keys.



    You advocate introducing a fake key on a (two column) table to resolve a many-to-many relationship? Where the keys in the tables concerned have their own (artificial) keys?


    <fake key><fake key from table 1><fake key from table2>
    

     

    You very well know exactly what I meant, don't be purposefully dumb to zing someone on the Internet, ok?

    That said, I've actually done that before when I was younger, and I don't regret it. I realize how redundant it was now, but at the time I just put in a autonumber primary key by rote without thinking much about it.



  • @KattMan said:

    Keep in mind that not even the SSN of a person is garunteed to be unique in practice.

     

    Here in Thailand they have a guaranteed unique national ID number, much like the SSN. We never used it as a primary key. In our table of thousands of individuals, we found that we had duplicates. New data entry code would issue a warning if the new record being entered had a duplicate national ID, so the clerk could check it. Sometimes it was entered wrong, sometimes it was written down wrong.

    But sometimes the new record is correct, and the old record in the database is wrong. This is for someone we knew five years ago. There is no praactical way to contact him and ask him what his real ID number is. I suppose in theory we should change that to NULL, but I doubt if any SQL database would like NULLs in a primary key field.

     So we use a company-assigned unique ID number. They used to be assigned by hand; now they are computer generated.

    On other subjects:

    The record in the database is data about some entity in the real world. Whether a key is unique depends on what type of entity it is referring to. As has been pointed out, an e-mail address is not unique for identifying human beings. But it is unique for identifying e-mail accounts. 

    Someone said that state codes rarely change. Does anyone know of any instance where a state code changed? I don't.

    Here in Thailand we have province codes, and they never change. But sometimes the government creates a new province by splitting an old province that has grown too big. So Ubon is always and forever the code for Ubon, but half of the people who used to live in Ubon now live in Amnachalon with a new code. The codes don't change but the geographic reality changes.

    Similarly country codes don't change, but countries appear and disappear and get split up. People suddenly find themselves living in  Serbia and not Yugoslavia. The country code is permanent but as an attribute of a given person it is not permanent.

    "MT" is a unique state code for the American state of Montana. But that doesn't mean that MT is unique across all the provinces in all the countries on Earth. We had a squabble with our Indian programmers because every province (=state) in Thailand has a district (=country) named "Muang" (city).  Roiet province has a district named (Roiet) City, Khonkaen province has a district named (Khonkaen) City. We won the argument when they realized that even in India they have duplicate district names.

     

     

     


  • Discourse touched me in a no-no place

    @blakeyrat said:

    @PJH said:

    @blakeyrat said:
    I'm of the school that says that all DB tables should have an auto-number
    primary key, even those with "obvious" keys.
    Ah. I don't subscribe to the ethos of introducing keys for the sake of introducing keys.



    You advocate introducing a fake key on a (two column) table to resolve a many-to-many relationship? Where the keys in the tables concerned have their own (artificial) keys?


    <fake key><fake key from table 1><fake key from table2>
    

     

    You very well know exactly what I meant, don't be purposefully dumb to zing someone on the Internet, ok?

    That said, I've actually done that before when I was younger, and I don't regret it. I realize how redundant it was now, but at the time I just put in a autonumber primary key by rote without thinking much about it.

    a) I'm not being purposefully dumb. You said "*all* DB tables."
    2) I see the answer was yes anyway.


  • Personally, I default to using an auto-increment-type PK unless there's a good reason not to.  And sometimes there is, like for "map tables" that just consist of ID PKs from one table linked to the same from another.  (Suck it, blakeyrat!  Yeah, you like that, don't ya??)

     

    Another thing to take into account is performance: large PKs in certain databases can slow things down.   On large tables, I'd prefer to have a 32-bit int as PK rather than several text columns.

     

    Finally, in an application it's usually easier (and cleaner) to pass around a numeric ID than passing all of the pieces of a natural PK.



  • @morbiuswilters said:

    Another thing to take into account is performance: large PKs in certain databases can slow things down.
     

    That's true.  PKs are typically a clustered index.  An auto-increment PK would place the new record at the end of the index, whereas using a natural PK will incur additional performance penalties on insert where the index is being rearranged to accomodate the new entry.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.