Primary Key WTF



  • I've come across a couple of hacks in my career that created database tables where every PK in every table is called "id" or "rid". 

    Is this a database WTF, or am I just assuming that this is bad design because the guys that did this were horrible programmers?  It made writing SQL terribly complicated, and even getting data to render on web pages was a nightmare because you never knew which table a particular "id" belonged to.



  • Depends on how you look at it.

    "StudentID" or "Student.ID"

    You will generally have to identify you which table you are referring to anyway or you SQL code will start to get really confusing

    As far as I have been told SQL along the line of

    SELECT stud.ID, cls.ID

    FROM Student AS stud, Class AS cls

    WHERE std.ClassID = cls.ID;
     

    is the proper way to write it, I might of course be wrong.



  • Why does a primary key called id, make writing SQL terribly complicated?

     

    The only problem I can se, is that you can't make natural joins.

    And I kinda hate natural joins, because they compare all fields that are named the same, so if you got table called club, and an table called player,

    and each player belong to a club, you might want to do a natural join between player and club, BUT if both the club and player table also contains a field called name(Which is normal, because

    both a club, and a player does have a name, but there is no connection between a player name, and a club name), you can't do a natural join between them. (If you care about standard sql that is).

     

    I always thought that the natural join semantic: "If it's called the

    same, then it is the same" was really fucked up. They should do as Oralce do, and do a natural join on forign key, so you just do a

    select * from club natural join player;

    It should join, on the previously defined forigen key).


     

     


     



  • @TunnelRat said:

    I've come across a couple of hacks in my career that created database tables where every PK in every table is called "id" or "rid". 

    Is this a database WTF, or am I just assuming that this is bad design because the guys that did this were horrible programmers?  It made writing SQL terribly complicated, and even getting data to render on web pages was a nightmare because you never knew which table a particular "id" belonged to.

    Either you are using an extremely crappy database system or your SQL skills are crappy.

    select student.id as "StudentID", class.id as "ClassID", student.name "StudentName", class.name "ClassName" from student inner join class on student.classID = class.id

    works just fine on Oracle. (IIRC, you have to replace "xxx" with [xxx] in SQLServer...)

     
    Of course we could discuss the pros and cons of artifical primary keys vs. natural primary keys...
     



  • I've always used `id` for primary key and don't see anything wrong about it. I don't see anything bad in using Table.TableID either, but I'm just not used to it...
    Both schemes are good as long as they are use consistently. Anyone cares about one of them specifically?

    I just find some comfort in knowing that id is id, and nothing else. It makes queries shorter too :)



  • @TunnelRat said:

    It made writing SQL terribly complicated, and even getting data to render on web pages was a nightmare because you never knew which table a particular "id" belonged to.

    This is not a problem.

     



  • I used to use ID for all the primary keys, but now I've changed to using things like ID_MEMBER, ID_TOPIC, ID_MESSAGE, etc. (like SMF does)



  • I use 'id' (always unsigned, generally int or larger) for all my primary keys. The corresponding foreign keys will be named according to what they're pointing at. if I've got a Student and Class table, there'll be an 'id' field in the class table, and however I'm linking students to classes, there'll be a 'classid' field to hold the FK.



  • @MarcB said:

    I use 'id' (always unsigned, generally int or larger) for all my primary keys. The corresponding foreign keys will be named according to what they're pointing at. if I've got a Student and Class table, there'll be an 'id' field in the class table, and however I'm linking students to classes, there'll be a 'classid' field to hold the FK.

    One might hungarian the fk naming: "class_fk", so you can see what the field contains without opening the table and inspecting the data.



  • @dhromed said:

    @MarcB said:

    I use 'id' (always unsigned, generally int or larger) for all my primary keys. The corresponding foreign keys will be named according to what they're pointing at. if I've got a Student and Class table, there'll be an 'id' field in the class table, and however I'm linking students to classes, there'll be a 'classid' field to hold the FK.

    One might hungarian the fk naming: "class_fk", so you can see what the field contains without opening the table and inspecting the data.

    That's latin naming. Hungarian would be "fkclass".



  • @asuffield said:

    That's latin naming. Hungarian would be "fkclass".

    I'm probably not the only one who would pronouce that "fuckless".



  • I can't think of a better name for a field that is designed to be the PK, the single value that is used to distinguish among individual rows. If it's always the same type within a database/scheme (and is used for the same purpose as per the last sentence) I can't imagine why it shouldn't have a single name, "id".

    I'll go as far as saying that every table should have such a row, "id". Makes FKs and other references easy to use...

    edit: I'm no DB expert and am open to suggestions. 



  • @aib said:

    I can't think of a better name for a field that is designed to be the PK, the single value that is used to distinguish among individual rows. If it's always the same type within a database/scheme (and is used for the same purpose as per the last sentence) I can't imagine why it shouldn't have a single name, "id".

    I'll go as far as saying that every table should have such a row, "id". Makes FKs and other references easy to use...

    edit: I'm no DB expert and am open to suggestions. 

    I've been doing that for the last 8 years or so and looking back, in many cases those artificial "id" (we used the "id_tablename" naming scheme Daniel15 described, not that it really matters) columns have been beneficial. There are a few exceptions, though, were it would have been better to omit the artifical key and just use the natural key. As a rule of thumb, the more your own system controls the natural key, the better it is to use it as the primary key. The more influence other systems have on the natural key, the more you should use an artificial ID column as primary key.



  • @asuffield said:

    That's latin naming. Hungarian would be "fkclass".

    Hey! I'm Hungarian, you insensitive clod! 



  • Personally I think calling it WhateverID is similar to Hungarian - redundant and potentially confusing.  It's in the context of its table so ID is a natural name for it.  FKs should be fully named of course.

    On the subject of surrogate ID PKs - There's nothing wrong with them when you have a very large PK.  Just make sure you put a unique constraint & not null on your real PK.



  • @LoztInSpace said:

    Personally I think calling it WhateverID is similar to Hungarian - redundant and potentially confusing.

    Hungarian notation is "type name", and was created in direct contrast to the existing Smalltalk convention of "name type". It's called "Hungarian" specifically because Hungarians use the eastern style of writing their surname before their given name, unlike most other western cultures.

    If you use a type suffix, you have something that isn't Hungarian notation (and is actually older).



  • I like the StudentID/ID_STUDENT naming scheme, myself. Why? Composite primary keys for implementing many-to-many relationships...



  • What is wrong with:

    SELECT Student.ID AS StudentID, Grade.ID as GradeID FROM Student, Grade WHERE Student.ID = Grade.StudentID

    ?

     

    I use ID as primary key for most my tables. TableID actual shows a relation to another table when I use it. 



  • The database I am currently working on contains around 1500 tables.

    I would personally mutilate anyone with the bright idea to name the pk of every table to "ID".
    You don´t want to spend 10 minutes checking the business logic everytime you look at the column named ID in a temp table and wonder what the hell it is referensing

    It would perhaps work in a small application that is web-oriented and has no cross-referencing... but not in a full-scale business system.

    *Selfeplaining* naming standards is a good thing... *Confusing* naming standards kills kittens...



  • The database I am currently working on contains around 1500 tables.

    I would personally mutilate anyone with the bright idea to name the pk of every table to "ID".
    You don´t want to spend 10 minutes checking the business logic everytime you look at the column named ID in a temp table and wonder what the hell it is referensing

    It would perhaps work in a small application that is web-oriented and has no cross-referencing... but not in a full-scale business system.

    *Selfexplaining* naming standards is a good thing... *Confusing* naming standards kills kittens...



  • @LoztInSpace said:

    Personally I think calling it WhateverID is similar to Hungarian - redundant and potentially confusing.

    Trust me: It doesn't fscking matter whether it is "ID" or "WhateverID". Just make sure the naming scheme is consistent.



  • @unfriendly said:

    You don´t want to spend 10 minutes checking the business logic everytime you look at the column named ID in a temp table and wonder what the hell it is referensing

    That's because you're using the name "ID" for something that is not (at least semantically) the primary key of the table. If that column in the temp table is a reference to another table, it should definitely be named something like "<target>_FK" or "<target>_ID".

    I don't see a problem naming PKs "ID" either, as long as you always prefix column references with table references, as you should.



  • @unfriendly said:

    The database I am currently working on contains around 1500 tables.

    I would personally mutilate anyone with the bright idea to name the pk of every table to "ID".
    You don´t want to spend 10 minutes checking the business logic everytime you look at the column named ID in a temp table and wonder what the hell it is referensing

    It would perhaps work in a small application that is web-oriented and has no cross-referencing... but not in a full-scale business system.

    *Selfexplaining* naming standards is a good thing... *Confusing* naming standards kills kittens...

    I'm currently working on an application that is not exactly small and not at all web-oriented. The is a lot of cross-referencing (~average of 1.4 foreign keys per table). 2/3rd of the tables have a primary key column named ID; the other tables have no such column, but compound primary keys instead. Foreign keys are named tablename_id.

    There is nothing confusing about the naming standard.



  • Well, I use pk for primary keys that exist just for the sake of having a primary key, otherwise I use name_id type keys, because thse acutually identiyfy something...
     I agree that using ID everywhere is confusing, but inventing names to primary keys that actually don't identify anything of value data structure wise and should not be used anyway is pointless...



  • @TunnelRat said:

    I've come across a couple of hacks in my career that created database tables where every PK in every table is called "id" or "rid". 

    Is this a database WTF, or am I just assuming that this is bad design because the guys that did this were horrible programmers?  It made writing SQL terribly complicated, and even getting data to render on web pages was a nightmare because you never knew which table a particular "id" belonged to.

     

    I'm not a db programmer, but I know a little bit about db design.  One good reason is that the primary key must always be unique, and an ID field was just a good way to generate a unique number. If you had string fields, such as first name, etc, there would be no way to enforce a unique relationship. The primary key was the way to enforce this. And it also made it easy to store this number in other tables to link the data correctly. Just my opinion, but I don't view it as a WTF.  You do not have to show this value when you present your query results or put them on the reports.



  • @death said:

    Well, I use pk for primary keys that exist just for the sake of having a primary key, otherwise I use name_id type keys, because thse acutually identiyfy something...

    Please give an example. You're clearly not talking about a surrogate key, so I'm having a hard time imagining a case where it makes sense to do such a thing.


  • ♿ (Parody)

    If only some group of industry experts ... perhaps someone like the ISO, or even the IEC ... had gotten together, combined their decades of experience, and developed a standard way for naming things -- i.e., developing metadata -- in a database. Or, heck, not just a database, but all data repositories. Some sort of metadata standard... a Metadata Repository Standard, if you will...

    Oh. Wait. They did. And they called it ISO/IEC 11179.

    ... and not only, according to 11179, is having a column named "ID" wrong (a data element requires more than just a representation term!) -- but chances are, so are most of your other column names.

    I'm curious... who here has heard of ISO/IEC 11179? Now that you're aware of it, will you follow it? If not, why?



  • Is it possible that the forum software is so horrible that even Alex doesn't know how to create links?

     j/k ;)
     



  • @pitchingchris said:

    @TunnelRat said:

    I've come across a couple of hacks in my career that created database tables where every PK in every table is called "id" or "rid". 

    Is this a database WTF, or am I just assuming that this is bad design because the guys that did this were horrible programmers?  It made writing SQL terribly complicated, and even getting data to render on web pages was a nightmare because you never knew which table a particular "id" belonged to.

     

    I'm not a db programmer, but I know a little bit about db design.  One good reason is that the primary key must always be unique, and an ID field was just a good way to generate a unique number. If you had string fields, such as first name, etc, there would be no way to enforce a unique relationship. The primary key was the way to enforce this. And it also made it easy to store this number in other tables to link the data correctly. Just my opinion, but I don't view it as a WTF.  You do not have to show this value when you present your query results or put them on the reports.

    Strings can be unique, too. "Unique" is a constraint, but not a relationship. 



  • @Alex Papadimoulis said:

    If only some group of industry experts ... perhaps someone like the ISO, or even the IEC ... had gotten together, combined their decades of experience, and developed a standard way for naming things -- i.e., developing metadata -- in a database. Or, heck, not just a database, but all data repositories. Some sort of metadata standard... a Metadata Repository Standard, if you will...

    Oh. Wait. They did. And they called it ISO/IEC 11179.

    ... and not only, according to 11179, is having a column named "ID" wrong (a data element requires more than just a representation term!) -- but chances are, so are most of your other column names.

    I'm curious... who here has heard of ISO/IEC 11179? Now that you're aware of it, will you follow it? If not, why?

    Never heard of it, won't read it, won't follow it unless someone actually pays me for doing so.




  • Why am I not surprised that Tunnlerat would have a problem with names within a domain.

    Using just ID as a field name for a particular purpose is not a problem, as long as certain rules are followed. 

    1. ID is used to determine only an auto-generated unique number within the current table.

    2. ID is never used for a foreign key only a local primary key.

    Now if these two rules were not followed I can understand why Tunnlerat, or anyone would have a problem.  But given the problem domain and the rules as given, the following assumptions can easily be made.

    ID is only meaningful within the table it is found in.  When a related table references this field, a more descriptive name should be used.  If views or other queries return this field, it should alias the field appropriately.  This last one is where a lot of people miss and could be the root of Tunnlerat's problem here.

    It is perfectly fine to have a customer table with a field called ID so when properly referenced you see Customer.ID.  When the Orders table links in to the customer table, you have an ID field in orders but that is the ID of the order, the foreign key linking to customer should be called Customer_ID or something similar that tells us where this field links back to.  Following these rules and assumptions, using ID is never confusing.  Breaking these rules makes ID meaningless.



  • Another poster mentioned natural keys.  There has been a lot of debate about natural and auto-numbered keys.  The problem is usually when people become exclusionary about one or the other.

    Auto numbered keys are perfect within the system they are generated in and are fully controlled by that system.  These keys are also only ment for the machine, not the user.

    Natural keys are used when referring to information shared with another system. You build this key from information within the data received.  When importing this data you always add your own information in the form of an auto-generated key. 

    Natural keys can also be used for human interaction.  Natural keys can be displayed meaningfully to the user.  A natural key can be something as simple as First name, Last name.  Natural keys by their very nature are not defined to be unique.  A human can search for someone in the system via first and last name and the natural key helps optimize this search.  The user can receive multiple matches back and view other information about each to decide which one they really want.  Selecting one causes the system to see the internal (auto-generated) unique key so it knows exactly what record the user is editing. 

    The user has no need to know that he is looking at user 829204 but rather John Doe on Some Street in Virginia.  The system uses the unique ID, the user uses the natural key.  They are not mutually exclusive to each other but rather compliment each other very well..  Used properly they make you users more efficient and your data more reliable.



  • @Alex Papadimoulis said:

    I'm curious... who here has heard of ISO/IEC 11179? Now that you're aware of it, will you follow it? If not, why?

    I will not be following it because it's irrelevant busywork. Some other standards that I will not be following include BS8494, ISO19011, and ITU-T Z.150.

    Just because some self-important bureaucrat decided to write down a bunch of rules doesn't make them a good idea, and standards are utterly worthless except in certain special contexts (primarily, a group of people who are all trying to build systems which interact with each other in the respect covered by the document; if it is not a point on which you are seeking interaction, the 'standard' is just so much wasted storage space).



  • @ammoQ said:

    @pitchingchris said:
    @TunnelRat said:

    I've come across a couple of hacks in my career that created database tables where every PK in every table is called "id" or "rid". 

    Is this a database WTF, or am I just assuming that this is bad design because the guys that did this were horrible programmers?  It made writing SQL terribly complicated, and even getting data to render on web pages was a nightmare because you never knew which table a particular "id" belonged to.

     

    I'm not a db programmer, but I know a little bit about db design.  One good reason is that the primary key must always be unique, and an ID field was just a good way to generate a unique number. If you had string fields, such as first name, etc, there would be no way to enforce a unique relationship. The primary key was the way to enforce this. And it also made it easy to store this number in other tables to link the data correctly. Just my opinion, but I don't view it as a WTF.  You do not have to show this value when you present your query results or put them on the reports.

    Strings can be unique, too. "Unique" is a constraint, but not a relationship. 

     Maybe, but we all know that comparing a number is always faster than comparing a string



  • @pitchingchris said:

    we all know that comparing a number is always faster than comparing a string

    Faster as in "you can see how much faster it is without computing the exact number of CPU cycles spent"? 



  • @ammoQ said:

    @pitchingchris said:

    we all know that comparing a number is always faster than comparing a string

    Faster as in "you can see how much faster it is without computing the exact number of CPU cycles spent"? 

    Depending on length of the string, yes.

    Remember, comparing numbers of like type (which database keys can safely be assumed to be) always takes the same amount of time no matter what end of the types limit you are at.  So how many bits does it have to compare?

    Compare that to strings.  Variable length, how many bits per character?  Do we check length first and fall out or simply pad to longest?  This can change if it is a varchar or a char.

    So yes, depending on length and type of string you might be able to see the difference.  Of course the biggest difference is if there was an index created that the query can use for each of these fields.  Having an index makes a huge difference and really negates any other comparison.



  • I like to name my PKs like this:

    uid, rid, mid, ...

    for User id, Region id, Message id, ... respectively. Using just "id" for everything isn't a WTF, just personal preference. It's only a WTF if it becomes inconsistent.



  • a column named "ID" wrong (a data element requires more than just a representation term!)

    I don't consider the typical auto-increment numerical ID PK to be data, so I think I'll pass.



  • @dhromed said:

    a column named "ID" wrong (a data element requires more than just a representation term!)

    I don't consider the typical auto-increment numerical ID PK to be data, so I think I'll pass.

    I agree, it isn't data, it is a unique identifier of data. 



  • @Pidgeot said:

    @death said:

    Well, I use pk for primary keys that exist just for the sake of having a primary key, otherwise I use name_id type keys, because thse acutually identiyfy something...

    Please give an example. You're clearly not talking about a surrogate key, so I'm having a hard time imagining a case where it makes sense to do such a thing.

    I'll bite.  When using replication in SQL Server, a replicated table must have a primary key (at least, with the type of replication that I'm familiar with). I have seen primary keys added in this case just for the sake of having a primary key.  It allowed replication to work, but did not otherwise add anything to the data model in a logical sense.



  • @KattMan said:

    @ammoQ said:
    @pitchingchris said:

    we all know that comparing a number is always faster than comparing a string

    Faster as in "you can see how much faster it is without computing the exact number of CPU cycles spent"? 

    Depending on length of the string, yes.

    Remember, comparing numbers of like type (which database keys can safely be assumed to be) always takes the same amount of time no matter what end of the types limit you are at.  So how many bits does it have to compare?

    Compare that to strings.  Variable length, how many bits per character?  Do we check length first and fall out or simply pad to longest?  This can change if it is a varchar or a char.

    So yes, depending on length and type of string you might be able to see the difference.  Of course the biggest difference is if there was an index created that the query can use for each of these fields.  Having an index makes a huge difference and really negates any other comparison.

    String columns that a sane person would consider for using as a primary key are in most cases relatively short, say in the range of 10-30 characters. I propose you fire up your favourite database system and do some tests to see how much faster a numerical key really is, compared to a string key. Regarding indexes, I don't know a database system that allows for primary keys without an index.



  • @LoztInSpace said:

    Personally I think calling it WhateverID is similar to Hungarian - redundant and potentially confusing.  It's in the context of its table so ID is a natural name for it.  FKs should be fully named of course.

    Actually, I believe it's called breadcrumb naming. Early C struct definitions contained it, I assume because namespace rules for variable names were not well defined yet. For instance, struct stat has members like st_ino, st_mode, and so on. Later, it became clear that such naming only makes the code harder to read; I am guessing there was also a clarification of namespace isolation, namely that struct members and variables can have the same name without causing conflicts. The result is that members of more recently defined structs don't use breadcrumb naming.

    I'm a big fan of making code and queries as easy to read as possible, so I tend to favor "person.id" over "person.personID", but the latter isn't nearly as visually offensive as Hungarian notation.



  • @Alex Papadimoulis said:

    ... and not only, according to 11179, is having a column named "ID" wrong (a data element requires more than just a representation term!) -- but chances are, so are most of your other column names.

     



    What a shock, the standards guys in their ivory tower want me to type users.userid, just like W3C wants me to type <button type="button"></button>.



  • @VGR said:

    Actually, I believe it's called breadcrumb naming. Early C struct definitions contained it, I assume because namespace rules for variable names were not well defined yet. For instance, struct stat has members like st_ino, st_mode, and so on. Later, it became clear that such naming only makes the code harder to read; I am guessing there was also a clarification of namespace isolation, namely that struct members and variables can have the same name without causing conflicts. The result is that members of more recently defined structs don't use breadcrumb naming.

    Except C/C++ enums, whose values are defined as being in the same scope as the enum itself, so have to be prefixed to prevent conflicts. It's one of the few things I hate about the language.

    Others are:

    More complex types needing the variable name mixed into the type. e.g arrays (int arr[10]; instead of int[10] arr;) or function pointers (int (*fp)(int) instead of int(*)(int) fp;).

    No specific "byte" type, so "char" got abused. This meant that unicode-16 (like windows uses) had to get its own type (wchar_t) instead of redefining char as 16-bit.

     



  • @Thief^ said:

    Except C/C++ enums, whose values are defined as being in the same scope as the enum itself, so have to be prefixed to prevent conflicts. It's one of the few things I hate about the language.

    Others are:

    More complex types needing the variable name mixed into the type. e.g arrays (int arr[10]; instead of int[10] arr;) or function pointers (int (fp)(int) instead of int()(int) fp;).

    No specific "byte" type, so "char" got abused. This meant that unicode-16 (like windows uses) had to get its own type (wchar_t) instead of redefining char as 16-bit.

    Well, C is little more than a platform-independend macro assembler. The Real WTF (tm) is that they put C++ on top of it. 



  • @Thief^ said:

    No specific "byte" type, so "char" got abused. This meant that unicode-16 (like windows uses) had to get its own type (wchar_t) instead of redefining char as 16-bit.

    Actually, how we're abusing char is using it for [C] strings. It is a byte type, as I'm sure you know.

     

    Definitely agree on the declaration syntax, why would I want to read the type from the variable name outwards? (Though there might be a technical explanation for that, which I dare not waste time guessing atm)



  • @aib said:

    Definitely agree on the declaration syntax, why would I want to read the type from the variable name outwards? (Though there might be a technical explanation for that, which I dare not waste time guessing atm)

    It's historical. There's no particular reason why it has to be that way around, but the previous one was, and it was copied. Offhand I can't recall which language originated this slightly bizarre syntax. 



  • For me personally, using id as a pk is primarily for those 'Why can't I change my username?' questions which arise sooner or later.



  • @asuffield said:

    @aib said:

    Definitely agree on the declaration syntax, why would I want to read the type from the variable name outwards? (Though there might be a technical explanation for that, which I dare not waste time guessing atm)

    It's historical. There's no particular reason why it has to be that way around, but the previous one was, and it was copied. Offhand I can't recall which language originated this slightly bizarre syntax. 

    I believe you're thinking of B, which was just completely wonky.



  • @ShadowWolf said:

    @asuffield said:
    @aib said:

    Definitely agree on the declaration syntax, why would I want to read the type from the variable name outwards? (Though there might be a technical explanation for that, which I dare not waste time guessing atm)

    It's historical. There's no particular reason why it has to be that way around, but the previous one was, and it was copied. Offhand I can't recall which language originated this slightly bizarre syntax. 

    I believe you're thinking of B, which was just completely wonky.

    B inherits directly from BCPL and Bon, BCPL inherits from CPL, which inherits from ALGOL. Somewhere along the line, this syntax appeared. I'm not sure where.


Log in to reply