Three questions



  • Three topics where I had some discussion recently but where I'd value your opinion.

    1) instead of potentially dozens of base tables, just one; and then a column indicating the type of data (countries, cities, etc). If you need just the countries, there is a view "V_Countries".

    2) use of GUIDs as ID in every table

    3) naming convention: if the table is "Countries" and the columns might be GID,  Name, Inhabitants, then the column names will be Country_GID, Country_Name, Country_Inhabitants.

    Any opinions?



  • @b-redeker said:

    1) instead of potentially dozens of base tables, just one; and then a column indicating the type of data (countries, cities, etc). If you need just the countries, there is a view "V_Countries".

    Defeats the purpose of a relational database. Also, the first terrible symptom of Second System Syndrome.

    If your problem is adequately specified, there should be no problem creating a database schema without making this kind of "flexible" data layout. If it's not, then specify exactly what the data should look like before continuing.

    (This advice assumes you're using a relational database-- if you're using a non-relational database, then I'm not qualified to give advice. You didn't specify.)

    @b-redeker said:

    2) use of GUIDs as ID in every table

    1) Definitely create a primary key that's independent of the data going into the table, 100% of the time. If you think you can get away with a natural key, no matter how simple the table, you're wrong. (Some really common mistakes: Social Security numbers aren't unique, and many people don't have one. Your list of states/provinces could in the future contain country abbreviations to, and some of those conflict. ZIP codes don't match 1:1 with city names, not even ZIP+4 codes. A person with the email address "hotguy78@yahoo.com" in 2010 might be an entirely different person in 2015-- make sure that second person can register an account.)

    2) GUID seems overkill. Just use a bigint. (Again: assuming a relational database. If you're using a key/value store, GUID would be better.)

    @b-redeker said:

    3) naming convention: if the table is "Countries" and the columns might be GID,  Name, Inhabitants, then the column names will be Country_GID, Country_Name, Country_Inhabitants.

    Overkill to put the name of the table in the columns. If you're afraid of getting conflicting names when joining, the dreaded "ambiguous column name" error, just refer to tables by alias. (You should be doing this anyway, it's a good practice.)

    I would be overjoyed if someone plugged in the logic to say "hey, if you're inner-joining the two tables on that column name, then it's not really ambiguous is it?" But... that's not in there now.

    (And, yet again, this is all assuming a relational database.)



  • @blakeyrat said:

    (And, yet again, this is all assuming a relational database.)

    It is, SQL-Server 2008. You make exactly the same points as I did; but I wasn't able to convince my colleague.

    Unfortunately, he is also my boss. Usually I would say: "RUN", but I'm not prepared to do that, for a number of reasons: it's actually a very cool company with very cool people, this guy is also a pretty good guy (and very sharp), but for some reason I could't get my points across. Hence I wanted to get a quick check that I wasn't missing anything obvious.

    Oh well.



  • Regarding this:

    @b-redeker said:

    1) instead of potentially dozens of base tables, just one; and then a column indicating the type of data (countries, cities, etc). If you need just the countries, there is a view "V_Countries".
     

    Why does your boss consider it preferable to do it this way rather than having separate tables?



  • @b-redeker said:

    @blakeyrat said:

    (And, yet again, this is all assuming a relational database.)

    It is, SQL-Server 2008. You make exactly the same points as I did; but I wasn't able to convince my colleague.

    Unfortunately, he is also my boss. Usually I would say: "RUN", but I'm not prepared to do that, for a number of reasons: it's actually a very cool company with very cool people, this guy is also a pretty good guy (and very sharp), but for some reason I could't get my points across. Hence I wanted to get a quick check that I wasn't missing anything obvious.

    Oh well.

    Your boss shouldn't be your boss unless he's read The Mythical Man-Month, which talks about Second System Syndrome in detail.



  • @blakeyrat said:

    @b-redeker said:

    @blakeyrat said:

    (And, yet again, this is all assuming a relational database.)

    It is, SQL-Server 2008. You make exactly the same points as I did; but I wasn't able to convince my colleague.

    Unfortunately, he is also my boss. Usually I would say: "RUN", but I'm not prepared to do that, for a number of reasons: it's actually a very cool company with very cool people, this guy is also a pretty good guy (and very sharp), but for some reason I could't get my points across. Hence I wanted to get a quick check that I wasn't missing anything obvious.

    Oh well.

    Your boss shouldn't be your boss unless he's read The Mythical Man-Month, which talks about Second System Syndrome in detail.

    To go a bit more in-depth, because I have insomnia and I can't get to sleep anyway...

    Second System Syndrome is basically over-engineering and architecting a software product to flexibly handle so many contingencies that you lose sight entirely of what the product was supposed to actually *do* in the first place. Joel actually called this "architecture astronauts," which I think pretty much describes the same phenomenon.

    So here's what your boss is doing, remembering that we're starting with a perfectly fine RDMS with features like constraints, data types (don't laugh-- SQLite lacks these), indexes, triggers, etc. The whole package.

    He's created a single master table that looks something like:

    Master
    ------
    MasterID GUID
    DataKind nvarchar(10)
    Data nvarchar(max)

    Then he's created a bunch of views that access this table, but each view only shows one specific DataKind. So far so good. Selects, Inserts, Updates all work (on the View), things are going decently fast to start out, etc.

    But wait a minute, it turns out that the product only deals with the US, and so the DataKind "zipcode" always needs to be a numeric value. Some of your interns are manually entering that data, but because of a bug in the verification, they typoed a few-- Joe Six-Pack lives in the "9021-" zipcode? Obviously the dash is supposed to be a zero... let's add something to our table to enforce the type:

    Master
    ------
    MasterID GUID
    DataKind nvarchar(10)
    DataType nvarchar(10)
    Data nvarchar(max)

    Fixed! But, oh wait, still not. Inserting into the view now isn't sufficient to enforce typing, because the RDMS isn't going flag an error, it has no way of knowing you want to enforce that type. Well, we can write a sproc maybe that takes a look at the input value, takes a look at the type that corresponds with that DataKind, and barfs if the value doesn't match the type. It works, whee, but things are running a lot slower now.

    You've done tons of work to replicate a built-in, completely "free", feature of your RDMS... except your implementation is orders of magnitude slower, and significantly more prone to failure. And that's just adding strict typing, the most basic of features-- imagine how you'd add, say, a Trigger or Constraint in this situation. It would be painful, if possible at all. Imagine how you'd have to improve search performance, once that Master table had a few million records-- an index would be useless on a table like that.

    Basically, you've taken a perfectly functional and fast database and built a slow and feature-bare database on top of it.

    The only possibly excuse for doing something like this is, "well, we don't know exactly what the data will look like when the product is finished." There's two possibilities here:
    1) Your product is insufficiently specified.
    2) Your DBA sucks ass.

    Take your pick. Your product should be specified enough that he data schema is obvious-- if it isn't, go back to the drawing board. If, on the other hand, you're a big believer in Agile or Scrum or some other crap, then your DBA should be competent enough to keep up with the code changes-- if he isn't, fire him.

    But implementing a crappy database atop a good database is never a good idea.

    (That all said, what your boss is doing would probably be fine in a key/value store, like Amazon SimpleDB or another NoSQL system. It's just stupid in a RDMS.)



  • @b-redeker said:

    1) instead of potentially dozens of base tables, just one; and then a column indicating the type of data (countries, cities, etc). If you need just the countries, there is a view "V_Countries".
     

    This is an inner-platform that worse in every conceivable way and offers no benefits.

    @b-redeker said:

    2) use of GUIDs as ID in every table

    Offers no benefits and is not guaranteed to be unique. So, it's worse.

    @b-redeker said:

    3) naming convention: if the table is "Countries" and the columns might be GID,  Name, Inhabitants, then the column names will be Country_GID, Country_Name, Country_Inhabitants.

    Offers no benefits and forces more typing. Worse.

     

    Your boss doesn't know a lot about practical data models, does he?

     

    People who do this seem to suffer from a special kind of abstractionitis. They appear to conceive of the RDB(MS) as the hard, given part of reality, from which a system must be devised. But that's a dream too deep*! You have reality, with its requirements for data storage, and there's the already devised system, the RDBMS, which one must use to store this data. It's reinventing the wheel... using wheels. It's going to be a shitty wheel.

     

    *) if that mediocre thing Inception gave us anything, it's more comfortable mental pop-vocabulary to talk about levels of abstraction.



  • Other people have already made some very cogent points, so I'll try to be brief.

    @b-redeker said:

    1) instead of potentially dozens of base tables, just one; and then a column indicating the type of data (countries, cities, etc). If you need just the countries, there is a view "V_Countries".

    Bad, [b]unless[/b] a significant part of the data to be stored is the same across all data types. In that case it's not necessarily a bad idea to have the common information in a big single table.

    3) naming convention: if the table is "Countries" and the columns might be GID,  Name, Inhabitants, then the column names will be Country_GID, Country_Name, Country_Inhabitants.

    I don't mind this much. Some of the databases I work with do this, and while it makes it tedious to type out the joins when I'm writing queries, it's a rather minor issue and not worth fighting over. And it does make it instantly clear where a column is coming from when you have to debug a query you've never seen before.


  • @blakeyrat said:

    "hey, if you're inner-joining the two tables on that column name, then it's not really ambiguous is it?"
    That's what NATURAL JOIN and USING do.



  • @dhromed said:

    It's reinventing the wheel... using wheels. It's going to be a shitty wheel.

    Show him a diagram of this thing. It's basically what he's building.



  • @Lingerance said:

    @blakeyrat said:
    "hey, if you're inner-joining the two tables on that column name, then it's not really ambiguous is it?"
    That's what NATURAL JOIN and USING do.

    Not sure if you're legitimately providing advice, or making a subtle "ha ha Microsoft sucks" dig, but SQL Server (the RDMS we're all talking about) doesn't support either of those. Its documentation says that an INNER JOIN is treated as a NATURAL JOIN if the columns you're joining on have the same name, but that doesn't prevent the ambiguous column name error when you refer to that column.



  • I can think of another reason not to do this -- That master table is going to be HUGE and have lots of wasted space as the entities which should have been other tables propagate.



  • @blakeyrat said:

    @dhromed said:
    It's reinventing the wheel... using wheels. It's going to be a shitty wheel.

    Show him a diagram of this thing. It's basically what he's building.

     

    That thing is awesome.


  • ♿ (Parody)

    Your boss is right and you are wrong. Obviously, he has more experience and is likely more skilled than you are (he is the boss, afterall), and any opinion you provide is naive because he knows more than do. Of course, mathematically speaking, your boss can't be the most knowledgable person in the industry, and there has to exist someone who has more experience and is even more skilled than he. Impossible sounding, I know, but bare with me.

    Once such person who most certainly has more experience and is more skilled than your boss is Joe Celko. Mr. Celko also happens to be one of my personal heros. Referencing him might work, unless your boss says "I know more than him; he's written several books, which clearly shows he's from academia, and I'm from the real world." No kidding... I've heard bosses say that before.

    @b-redeker said:

    1) instead of potentially dozens of base tables, just one; and then a column indicating the type of data (countries, cities, etc). If you need just the countries, there is a view "V_Countries".

    As Mr. Celko would put it, that's a "Spears, Automobiles and Squids" table. Note that that link goes to a fan site. Yes, that's how awesome Mr. Celko is. And no, that's not my fan site... though, I would totally have made one if it didn't exist already!

    @b-redeker said:

    2) use of GUIDs as ID in every table

    I'll let the master do all the talking here.

    @b-redeker said:

    3) naming convention

    If only someone had come up with a naming convention standard already... oh wait... ISO-11179. Of course, I learned about that standard via Mr. Celko, and he has some very colorful responses regarding it.



  • @Alex Papadimoulis said:

    @b-redeker said:

    2) use of GUIDs as ID in every table

    I'll let the master do all the talking here.

    Alex, I love you to death, but your idol could not be more dead wrong on this one.

    Unless you're storing miracle data from miracle-ville, the odds you'll see a suitable natural key in your entire lifetime approach zero. Or, alternatively, the only way to squeeze a natural key into a table of real-life-based data will involve so many columns, you'll lose every benefit of having a primary key in the first place.

    (And some of his complaints are... pretty goofy. "You can't insert a value into it." Well: 1) You can if you turn on identity insert, which is off by default as a safety mechanism, and 2) If you have to insert values into it, you're doing something wrong.)

    Frankly, based on that newsgroup post, I'd agree with the people who say he's arguing from an academic perspective with little experience of real-world databases.


  • ♿ (Parody)

    @blakeyrat said:

    your idol could not be more dead wrong on this one
     

    Yeah, he rails quite a bit on surrogate keys in general, but I think there was some good points about GUIDs. Or maybe not (caugt me! I skimmed it) .

    @blakeyrat said:

    Unless you're storing miracle data from miracle-ville, the odds you'll see a suitable natural key in your entire lifetime approach zero. Or, alternatively, the only way to squeeze a natural key into a table of real-life-based data will involve so many columns, you'll lose every benefit of having a primary key in the first place.

    Natural keys are definitely out there, but outright saying "never a natural key" makes them difficult to see:

    • Orders
      • Good - { Order_Number (PK), Customer_Number (FK), Order_Date, Approved_Date }
      • Bad  - { Order_Id (PK), Order_Number, Customer_Id (FK), Order_Date, Approved_Date }
    • OrderItems
      • Good { Order_Number (PK/FK), Item_Sku (PK/FK), Ordered_Qty, Filled_Qty, List_Amount }
      • Bad { OrderItem_Id (PK), Order_Id (FK), Item_Id (FK), Ordered_Qty, Filled_Qty, List_Amount }
    • OrderItemActions
      • Good { Order_Number (PK/FK), Item_Sku (PK/FK), Action_Code (PK/FK), Occurred_Date, Notes_Text }
      • Bad { OrderItemAction_Id (PK), OrderItem_Id (FK), Action_Id (FK), Occurred_Date, Notes_Text }

    It all really depends on the data you're working with. With abstract concepts that have no real-world counterpart (say, ForumPostId), a sequence/auto_id are the only way to go. But there are a lot of natural keys out there when you build systems around real-world (especially business) processes.... they just need to be researched.



  •  @blakeyrat said:

    @Alex Papadimoulis said:

    @b-redeker said:

    2) use of GUIDs as ID in every table

    I'll let the master do all the talking here.

    Alex, I love you to death, but your idol could not be more dead wrong on this one.

    Unless you're storing miracle data from miracle-ville, the odds you'll see a suitable natural key in your entire lifetime approach zero. Or, alternatively, the only way to squeeze a natural key into a table of real-life-based data will involve so many columns, you'll lose every benefit of having a primary key in the first place.

    (And some of his complaints are... pretty goofy. "You can't insert a value into it." Well: 1) You can if you turn on identity insert, which is off by default as a safety mechanism, and 2) If you have to insert values into it, you're doing something wrong.)

    Frankly, based on that newsgroup post, I'd agree with the people who say he's arguing from an academic perspective with little experience of real-world databases.

    Wouldn't something like username be an acceptable primary key? The key is guaranteed to be unique just as long as the associated data is. Same with things like email addresses. Sure, the owner of the email address might change, but how is that your problem? You're already ensuring that they're unique, right? Otherwise, a person could register multiple accounts with the same email address.

    Note: I am not a DBA, nor do I work with databases, so I may be off the mark. If so, please elucidate.



  • @pkmnfrk said:

    Wouldn't something like username be an acceptable primary key? The key is guaranteed to be unique just as long as the associated data is. Same with things like email addresses. Sure, the owner of the email address might change, but how is that your problem? You're already ensuring that they're unique, right? Otherwise, a person could register multiple accounts with the same email address.

    As a UNIQUE KEY sure, but as the PK hell no. Something with a constant length should be used as the PK. A [GU]UID is simply a 128-bit integer with undeterminable order (which is great when you have geographically seperate databases that get synced and don't want issues if it gets used as an FK). Mind you there are ways to have SERIAL-like types inserted into multiple masters without worry of collision (MySQL has offsets and skip values for this).



  • @pkmnfrk said:

    Wouldn't something like username be an acceptable primary key? The key is guaranteed to be unique just as long as the associated data is. Same with things like email addresses. Sure, the owner of the email address might change, but how is that your problem? You're already ensuring that they're unique, right? Otherwise, a person could register multiple accounts with the same email address.

    Note: I am not a DBA, nor do I work with databases, so I may be off the mark. If so, please elucidate.

    Username: if you're ok decreeing that they must be unique, then that might be appropriate. (although joining on varchars is slow.) If you merge your system with another that also uses username, though, suddenly you have duplicate keys ahoy! Rare, but not as rare as you'd think-- I'd you used an identity, you could just import the other set of username into your database with minimal effort.

    Email: my parents (two people) have one email address. And, as I've mentioned above, an email address referring to Bob today could refer to Steve tomorrow. Make sure your app can cope with both of those scenarios.


  • ♿ (Parody)

    @blakeyrat said:

    (although joining on varchars is slow.)

    No way, it's crazy fast!

    @blakeyrat said:

    If you used an identity, you could just import the other set of username into your database with minimal effort

    Not to be nitpicky here... but I'm pretty sure the effort would be an equal pain in the ass. The SEQUENCES would likely collide, meaning UserID=31 could refer to two accounts.



  • @blakeyrat said:

    @pkmnfrk said:
    Wouldn't something like username be an acceptable primary key? The key is guaranteed to be unique just as long as the associated data is. Same with things like email addresses. Sure, the owner of the email address might change, but how is that your problem? You're already ensuring that they're unique, right? Otherwise, a person could register multiple accounts with the same email address.

    Note: I am not a DBA, nor do I work with databases, so I may be off the mark. If so, please elucidate.

    Username: if you're ok decreeing that they must be unique, then that might be appropriate. (although joining on varchars is slow.) If you merge your system with another that also uses username, though, suddenly you have duplicate keys ahoy! Rare, but not as rare as you'd think-- I'd you used an identity, you could just import the other set of username into your database with minimal effort.

    Email: my parents (two people) have one email address. And, as I've mentioned above, an email address referring to Bob today could refer to Steve tomorrow. Make sure your app can cope with both of those scenarios.

    Well, if your parents (two people) share an email account, they're they're almost certainly going to share a <whatever> account as well. But, even ignoring the social issues, there are technical issues with sharing emails between accounts. What if a spammer registers 12,000 accounts with one email? What if you need to send sensitive information to one account? What if...?

    Remember that the onus is on Bob to keep his email up to date, so that Steve doesn't get his bank statements. There's no way your system can detect that lonerboi2345@hotmail.com doesn't belong to Bob any more.

    Regardless, my way of coping with these issues is pretty easy:

    That email address is already in use. If you already have an account with use, please log in. If not, please contact our technical support.



  • @Alex Papadimoulis said:

    Not to be nitpicky here... but I'm pretty sure the effort would be an equal pain in the ass. The SEQUENCES would likely collide, meaning UserID=31 could refer to two accounts.

    Well, in that scenario, you don't import the ID field, you just import the new usernames with some flag. So, you add a boolean to your table saying whether the user came from your system or the foreign system, then import all the foreign records with that boolean set. Of course, then you have to edit your login system to include a checkbox for what system they're logging in to, you would also have to make sure, when you imported all the related tables, that the userIDs were updated appropriately. but... it's still easier than trying to resolve all the username conflicts.

    @pkmnfrk said:

    Well, if your parents (two people) share an email account, they're they're almost certainly going to share a <whatever> account as well. But, even ignoring the social issues, there are technical issues with sharing emails between accounts. What if a spammer registers 12,000 accounts with one email? What if you need to send sensitive information to one account? What if...?

    All I'm saying is that there's nothing in the email rulebook that says 1 address = 1 person. Just as there's nothing in real life saying that 1 street address = 1 person. I'm talking about a system that models data from real life-- in real life, there are email addresses shared my multiple people.

    If you want to ignore that, and enforce unique emails, well, you're welcome to do that. But you'd be losing business from people like my parents, so you're artificially limiting your customer-base.

    @pkmnfrk said:

    Remember that the onus is on Bob to keep his email up to date, so that Steve doesn't get his bank statements. There's no way your system can detect that lonerboi2345@hotmail.com doesn't belong to Bob any more.

    Again, you're welcome to do whatever you like with your own system. But in real life, the same email address can change hands. My step-sister is using the email address that used to belong exclusively to her boyfriend before they broke up and she moved all the way across the country. So, again, I know for a fact that this does happen in real-world data.



  • @blakeyrat said:

    @pkmnfrk said:
    Well, if your parents (two people) share an email account, they're they're almost certainly going to share a account as well. But, even ignoring the social issues, there are technical issues with sharing emails between accounts. What if a spammer registers 12,000 accounts with one email? What if you need to send sensitive information to one account? What if...?

    All I'm saying is that there's nothing in the email rulebook that says 1 address = 1 person. Just as there's nothing in real life saying that 1 street address = 1 person. I'm talking about a system that models data from real life-- in real life, there are email addresses shared my multiple people.

    If you want to ignore that, and enforce unique emails, well, you're welcome to do that. But you'd be losing business from people like my parents, so you're artificially limiting your customer-base.

    It's hardly artificial. It's good security! These days, it's expected that everyone has a personal email account. And, it's not that unreasonable. Unlike a physical address, which you tend to inherit for the first 20 or so years of your life, and costs $500 000 to get a new one, email addresses are free.

    And, honestly, how many sites do your parents have two distinct accounts with one email address?

    @blakeyrat said:

    @pkmnfrk said:
    Remember that the onus is on Bob to keep his email up to date, so that Steve doesn't get his bank statements. There's no way your system can detect that lonerboi2345@hotmail.com doesn't belong to Bob any more.

    Again, you're welcome to do whatever you like with your own system. But in real life, the same email address can change hands. My step-sister is using the email address that used to belong exclusively to her boyfriend before they broke up and she moved all the way across the country. So, again, I know for a fact that this does happen in real-world data.

    Again, how am I supposed to build a system that knows this or cares? If [b]he[/b] gives the email account to [b]her[/b], then [b]he[/b] should update all the sites that think the email is [b]him[/b] to another email that really [i]is[/i] [b]him[/b]. If not, he's asking for trouble.



  • @pkmnfrk said:

    It's hardly artificial. It's good security! These days, it's expected that everyone has a personal email account. And, it's not that unreasonable. Unlike a physical address, which you tend to inherit for the first 20 or so years of your life, and costs $500 000 to get a new one, email addresses are free.

    Dude, re-freakin'-lax.

    I'm not telling you what to do. All I'm saying is that a database that models *REAL WORLD* data would have to include provisions for those two quirks of email addresses, being as they're things seen *IN THE REAL WORLD*.

    You don't have to care. You don't have to follow my advice. I'm not holding a gun to your head. Just shake your head, mumble "man that Blakeyrat guy is really off the rails" to yourself, and move on.

    k?



  • @blakeyrat said:

    @pkmnfrk said:
    It's hardly artificial. It's good security! These days, it's expected that everyone has a personal email account. And, it's not that unreasonable. Unlike a physical address, which you tend to inherit for the first 20 or so years of your life, and costs $500 000 to get a new one, email addresses are free.

    Dude, re-freakin'-lax.

    I'm not telling you what to do. All I'm saying is that a database that models *REAL WORLD* data would have to include provisions for those two quirks of email addresses, being as they're things seen *IN THE REAL WORLD*.

    You don't have to care. You don't have to follow my advice. I'm not holding a gun to your head. Just shake your head, mumble "man that Blakeyrat guy is really off the rails" to yourself, and move on.

    k?

    I'm quite relaxed. I'm just curious to know of [i]one[/i] *REAL WORLD* application where duplicate email addresses are permitted.



  • @pkmnfrk said:

    I'm quite relaxed. I'm just curious to know of one REAL WORLD application where duplicate email addresses are permitted.

    We have a group email that's used for support, that's the email we give to customers, we each have an account in our customer's system which points to said email address.



    Edit: Forgot to point out we don't actually access the inbox. All mails get put into an issue tracking system.



  • @Lingerance said:

    We have a group email that's used for support, that's the email we give to customers, we each have an account in our customer's system which points to said email address.



    Edit: Forgot to point out we don't actually access the inbox. All mails get put into an issue tracking system.

    That's not what I mean. I'm taking about registering multiple accounts in a system, with a single address as "my address".



  • @pkmnfrk said:

    @Lingerance said:
    We have a group email that's used for support, that's the email we give to customers, we each have an account in our customer's system which points to said email address.



    Edit: Forgot to point out we don't actually access the inbox. All mails get put into an issue tracking system.

    That's not what I mean. I'm taking about registering multiple accounts in a system, with a single address as "my address".

    That's exactly what's happening here. There are multiple accounts with the same address.


  • @Lingerance said:

    @pkmnfrk said:
    @Lingerance said:
    We have a group email that's used for support, that's the email we give to customers, we each have an account in our customer's system which points to said email address.



    Edit: Forgot to point out we don't actually access the inbox. All mails get put into an issue tracking system.

    That's not what I mean. I'm taking about registering multiple accounts in a system, with a single address as "my address".

    That's exactly what's happening here. There are multiple accounts with the same address.

    Fine. I'm a contractor who works for an advertising agency with multiple clients. One of my specialities is implementing multi-variate tests. Many of our clients use a product whose name rhymes with Badobe Harmniture Pest and Farkit. That's a horrible rhyme, but ... whatever, bear with me.

    When I'm doing work with these clients, I have to log in to the Pest and Farkit tool... their login is email+password. The problem is that:
    1) They only allow unique emails (across all clients)
    2) They don't have any way to link a single email to multiple clients

    So if I do Pest and Farkit work for 5 clients, I (a single human being, I remind you), need to have 5 different email addresses to use. I've already used up all of the domains our company uses, meaning when I sign a new Pest and Farkit client I literally have a register a new free email address with Gmail or Hotmail to service them-- which of course makes me look completely unprofessional.

    Is this a huge problem? No, it's worked-around pretty easily. But it only exists because Harmniture made two assumptions that aren't true:
    1) The only person using ClientA's account would work for ClientA
    2) A single email address can't represent multiple accounts

    There's a real-world "it happens every couple months" example of the sort of poor DB planning I'm talking about. Now you may regale me with a hundred "well I'd never design something that dumb" defenses.



  • @blakeyrat said:

    Fine. I'm a contractor who works for an advertising agency with multiple clients. One of my specialities is implementing multi-variate tests. Many of our clients use a product whose name rhymes with Badobe Harmniture Pest and Farkit. That's a horrible rhyme, but ... whatever, bear with me.

    When I'm doing work with these clients, I have to log in to the Pest and Farkit tool... their login is email+password. The problem is that:
    1) They only allow unique emails (across all clients)
    2) They don't have any way to link a single email to multiple clients

    So if I do Pest and Farkit work for 5 clients, I (a single human being, I remind you), need to have 5 different email addresses to use. I've already used up all of the domains our company uses, meaning when I sign a new Pest and Farkit client I literally have a register a new free email address with Gmail or Hotmail to service them-- which of course makes me look completely unprofessional.

    Is this a huge problem? No, it's worked-around pretty easily. But it only exists because Harmniture made two assumptions that aren't true:
    1) The only person using ClientA's account would work for ClientA
    2) A single email address can't represent multiple people

    There's a real-world "it happens every couple months" example of the sort of poor DB planning I'm talking about. Now you may regale me with a hundred "well I'd never design something that dumb" defenses.

    You're right, I would never design something that dumb, primarily because I wouldn't design anything of that nature to begin with.

    Either way, although I can't parse what product you may be referring to, I will admit that is a good example. (Although, there are plenty of proper ways to circumvent that properly as well. For example, I assume your company's email server supports more than one email address per inbox.)

    (Actually, now that I'm thinking about it, why would this be a problem in the first place? Since I don't know the specific product, I am guessing here. If Harmniture is hosted by Badobe, the bad assumption would be that you only have one client. If it's hosted by the client, then why would multiple clients be using it? If it's hosted by you, FOR your clients, again, the assumption is that you can't have multiple clients.)

    Regardless, this is all academic, because I am not in charge of designing this kind of stuff. I am simply going off of every website that requires registration that I have registered for.



  • @pkmnfrk said:

    (Although, there are plenty of proper ways to circumvent that properly as well. For example, I assume your company's email server supports more than one email address per inbox.)

    If the system were well-designed, we wouldn't have to screw around with our perfectly well-designed system to interact with it.

    @pkmnfrk said:

    (Actually, now that I'm thinking about it, why would this be a problem in the first place? Since I don't know the specific product, I am guessing here. If Harmniture is hosted by Badobe, the bad assumption would be that you only have one client. If it's hosted by the client, then why would multiple clients be using it? If it's hosted by you, FOR your clients, again, the assumption is that you can't have multiple clients.)

    It's hosted by Badobe. Clients pay Badobe to use the product. Badobe links emails to clients with a 1:1 relationship, so any given email address can only be in one client at a time. Apparently, the moron who designed this system never heard of the concept of "consulting".



  • @blakeyrat said:

    @pkmnfrk said:
    (Although, there are plenty of proper ways to circumvent that properly as well. For example, I assume your company's email server supports more than one email address per inbox.)

    If the system were well-designed, we wouldn't have to screw around with our perfectly well-designed system to interact with it.

    I agree 100%. But the issue isn't email addresses, it's more fundamental than that.

    @blakeyrat said:

    @pkmnfrk said:
    (Actually, now that I'm thinking about it, why would this be a problem in the first place? Since I don't know the specific product, I am guessing here. If Harmniture is hosted by Badobe, the bad assumption would be that you only have one client. If it's hosted by the client, then why would multiple clients be using it? If it's hosted by you, FOR your clients, again, the assumption is that you can't have multiple clients.)

    It's hosted by Badobe. Clients pay Badobe to use the product. Badobe links emails to clients with a 1:1 relationship, so any given email address can only be in one client at a time. Apparently, the moron who designed this system never heard of the concept of "consulting".

    The issue is that Badobe doesn't allow you to have more than one client. Bah, Badobe's only good product is Badobe Rotohop. :\



  • @pkmnfrk said:

    The issue is that Badobe doesn't allow you to have more than one client. Bah, Badobe's only good product is Badobe Rotohop. :</blockquote>

    Actually Pest and Farkit is a good product, other than the login issue. But that's mostly because Badobe just bought Harmniture and hasn't had time to ruin it yet. I didn't want to give the impression the entire product is awful, just the login system for it.



  • @blakeyrat said:

    @pkmnfrk said:
    (Although, there are plenty of proper ways to circumvent that properly as well. For example, I assume your company's email server supports more than one email address per inbox.)
    If the system were well-designed, we wouldn't have to screw around with *our* perfectly well-designed system to interact with it. @pkmnfrk said:
    (Actually, now that I'm thinking about it, why would this be a problem in the first place? Since I don't know the specific product, I am guessing here. If Harmniture is hosted by Badobe, the bad assumption would be that you only have one client. If it's hosted by the client, then why would multiple clients be using it? If it's hosted by you, FOR your clients, again, the assumption is that you can't have multiple clients.)
    It's hosted by Badobe. Clients pay Badobe to use the product. Badobe links emails to clients with a 1:1 relationship, so any given email address can only be in one client at a time. Apparently, the moron who designed this system never heard of the concept of "consulting".

    You can't set up aliases on your e-mail system to route multiple disparate e-mails to your inbox?

     



  • @Medezark said:

    @blakeyrat said:

    @pkmnfrk said:
    (Although, there are plenty of proper ways to circumvent that properly as well. For example, I assume your company's email server supports more than one email address per inbox.)
    If the system were well-designed, we wouldn't have to screw around with our perfectly well-designed system to interact with it. @pkmnfrk said:
    (Actually, now that I'm thinking about it, why would this be a problem in the first place? Since I don't know the specific product, I am guessing here. If Harmniture is hosted by Badobe, the bad assumption would be that you only have one client. If it's hosted by the client, then why would multiple clients be using it? If it's hosted by you, FOR your clients, again, the assumption is that you can't have multiple clients.)
    It's hosted by Badobe. Clients pay Badobe to use the product. Badobe links emails to clients with a 1:1 relationship, so any given email address can only be in one client at a time. Apparently, the moron who designed this system never heard of the concept of "consulting".

    You can't set up aliases on your e-mail system to route multiple disparate e-mails to your inbox?

     

    I can't, no. I don't work in IT, and I don't have access to the Exchange servers, and even if I could it's been a long time since I've touched Exchange and I'd have to look up now.

    But the important thing to me is: STOP DEFENDING THIS POORLY-WRITTEN SOFTWARE THAT REQUIRES STUPID WORKAROUNDS FOR ME TO USE.

    Seriously. This is why we can't have nice things. You don't have to loudly gripe about it, like I do, but at the very least don't defend their moronic design decisions-- just realize the product is broken for a large percentage of their users, and move on.



  • @Alex Papadimoulis said:

    @blakeyrat said:

    your idol could not be more dead wrong on this one
     

    Yeah, he rails quite a bit on surrogate keys in general, but I think there was some good points about GUIDs. Or maybe not (caugt me! I skimmed it) .

     

    I'd like to come back to this one for a bit, even though Blakey ranting is always entertaining ;)

    Celko has a lot of points that make little sense (or can be countered, like the clustering), but the portability issue has some merit.I googled a bit more and for instance Jeff Atwood has a slightly more interesting list of pros and cons; and there's much more. So I'll give this one a bit more thought and not discard it automatically (either way).



  • @pkmnfrk said:

    I'm quite relaxed. I'm just curious to know of one REAL WORLD
    application where duplicate email addresses are permitted.

    My company's clients are (in the main) mortgage brokers. It's not at all uncommon for a mortgage broking company to use a single email address for all its brokers, but they all need to have separate accounts in our system so that, for instance, the correct broker's name and details can be put on mortgage applications they create.



  • @b-redeker said:

    Three topics where I had some discussion recently but where I'd value your opinion.

    1) instead of potentially dozens of base tables, just one; and then a column indicating the type of data (countries, cities, etc). If you need just the countries, there is a view "V_Countries".

    2) use of GUIDs as ID in every table

    3) naming convention: if the table is "Countries" and the columns might be GID,  Name, Inhabitants, then the column names will be Country_GID, Country_Name, Country_Inhabitants.

    Any opinions?

    #1 has already been covered, this idea must have been the result of a sharp blow to the head.

    #2 -- I have switched to using GUIDs most of the time.  I write mostly real three-tier applications and waiting until the row makes it to the database to know the ID is cumbersome.  With GUIDs, I can simply create it in any layer or let the database do it if I don't care.  A lot of people have mention clustering hot-spots... are you people actually making your primary key the clustered index?  The only time this is useful is when you want to favor insert performance over select performance, which is a rare situation in my experience.  If the clusering key is a foreign key, then it will massively speed up reporting and automatically fix the hot spot and fragmentation issues.

    #3 -- Isn't this just the retardedness of hungarian naming, taken to a new level of crazy, and applied to the database.  I see the same trend in the view naming in point #1, please make it stop.



  • @Jaime said:

    #2 -- I have switched to using GUIDs most of the time.  I write mostly real three-tier applications and waiting until the row makes it to the database to know the ID is cumbersome.  With GUIDs, I can simply create it in any layer or let the database do it if I don't care.  A lot of people have mention clustering hot-spots... are you people actually making your primary key the clustered index?  The only time this is useful is when you want to favor insert performance over select performance, which is a rare situation in my experience.  If the clusering key is a foreign key, then it will massively speed up reporting and automatically fix the hot spot and fragmentation issues.

    This is all HIGHLY dependent on your table structure and situation.

    From my experience, most of the time you're joining on the primary key (since you have relations in other tables pointing to it), so you'd want an index on it. Secondly, consider the extremely common "if this key exists, update the record, otherwise insert a new record" construct-- you've slowed that down to a crawl, because now it has to do a text match for GUIDs with no index at all. Thirdly, the proper way to speed up reporting is to generate a cube from the data and query the cube... any other method will eventually slow to a crawl as the amount of data increases. And, you're opening yourself up to the rare, but not unknown, possibility of hash collisions... how do you handle two records with the same primary key if your GUID generator happens to create that?

    And seriously, cumbersome? Just put "return @@identity" at the end of your insertion sproc.

    I mean, the GUID method is fine in a few situations... especially if you're using a NoSQL-type solution with "eventual consistency". And it's probably fine for a database that's not very normalized, but in that case why are you bothering to use a relational database at all?



  • @blakeyrat said:

    Secondly, consider the extremely common "if this key exists, update the record, otherwise insert a new record" construct-- you've slowed that down to a crawl, because now it has to do a text match for GUIDs with no index at all.
    Any sane DB would let you store them in a raw binary format, which doesn't have a major impact speed-wise.



  • @blakeyrat said:

    @Jaime said:
    #2 -- I have switched to using GUIDs most of the time.  I write mostly real three-tier applications and waiting until the row makes it to the database to know the ID is cumbersome.  With GUIDs, I can simply create it in any layer or let the database do it if I don't care.  A lot of people have mention clustering hot-spots... are you people actually making your primary key the clustered index?  The only time this is useful is when you want to favor insert performance over select performance, which is a rare situation in my experience.  If the clusering key is a foreign key, then it will massively speed up reporting and automatically fix the hot spot and fragmentation issues.
    This is all HIGHLY dependent on your table structure and situation.

    From my experience, most of the time you're joining on the primary key (since you have relations in other tables pointing to it), so you'd want an index on it. Secondly, consider the extremely common "if this key exists, update the record, otherwise insert a new record" construct-- you've slowed that down to a crawl, because now it has to do a text match for GUIDs with no index at all. Thirdly, the proper way to speed up reporting is to generate a cube from the data and query the cube... any other method will eventually slow to a crawl as the amount of data increases. And, you're opening yourself up to the rare, but not unknown, possibility of hash collisions... how do you handle two records with the same primary key if your GUID generator happens to create that?

    And seriously, cumbersome? Just put "return @@identity" at the end of your insertion sproc.

    I mean, the GUID method is fine in a few situations... especially if you're using a NoSQL-type solution with "eventual consistency". And it's probably fine for a database that's not very normalized, but in that case why are you bothering to use a relational database at all?

    Where did I say "put no indexes on the primary key"?  I said to not put a clustered index on the primary key.  You should certainly put a non-clustered index on the primary key.  That invalidates most of your points.

    To handle duplicates, simply put a unique constraint (or primary key constraint) on the primary key column.  The rare insert of a duplicate will violate the unique constraint and cause the insert to fail.  The application error handling logic should kick in at this point.

    Waiting for the return of the insert sproc for an identity causes problems for a physical three tier application.  In order to maintain transactional intergity, it is necessary to make atomic calls from the first tier to the second tier.  This often means sending things like an order header with a collection of order details in one method call.  However, the order number is currently unknown, since it hasn't been inserted into the database yet.  So, the save logic has to not only update the order header instance, but also all of the order detail instances.  Then, the modified objects need to be marshalled back to the client, and the client-side instances need to be updated.  It is much easier to simply generate the GUID key on the client side.



  • @Lingerance said:

    Any sane DB would let you store them in a raw binary format, which doesn't have a major impact speed-wise.

    Have you benchmarked that on MS SQL Server, and if so could you let me know how well it works? Using a GUID as a key could actually solve a problem I've been thinking over the last couple days. I'm trying to make some web servers stateless, so we can better use AWS, but right now they rely on a lot of DB lookups.

    @Jaime said:

    That invalidates most of your points.

    Relax. You don't have to invalidate my points. The world can have shades of grey and differing opinions in it, I'm ok with that. It's a big world.

    @Jaime said:

    To handle duplicates, simply put a unique constraint (or primary key constraint) on the primary key column.  The rare insert of a duplicate will violate the unique constraint and cause the insert to fail.  The application error handling logic should kick in at this point.

    It rubs me the wrong way to have the application handle a failure the DB could easily handle. Your sproc could catch that error and fix it by assigning a new GUID, but then you're talking about a speed hit... hm.

    @Jaime said:

    Waiting for the return of the insert sproc for an identity causes problems for a physical three tier application.  In order to maintain transactional intergity, it is necessary to make atomic calls from the first tier to the second tier.  This often means sending things like an order header with a collection of order details in one method call.  However, the order number is currently unknown, since it hasn't been inserted into the database yet.  So, the save logic has to not only update the order header instance, but also all of the order detail instances.  Then, the modified objects need to be marshalled back to the client, and the client-side instances need to be updated.  It is much easier to simply generate the GUID key on the client side.

    I could see that.



  • @blakeyrat said:

    @Jaime said:
    That invalidates most of your points.
    Relax. You don't have to invalidate my points. The world can have shades of grey and differing opinions in it, I'm ok with that. It's a big world.
    I shouldn't have added that last sentence, putting in those little digs is somewhat childish and usually messes up the entire point of the comment, as it did in this case.  My original point was on indexing... It seems to me that much of the world argues for sequential keys with an assumption that the primary key will be the clustered index.  For me, the primary key is always a non-clustered index unless there is a really good reason to make it clustered.  It's always indexed.  The best choice for the clustered index is usually a foreign key.



  • @blakeyrat said:

    @Medezark said:

    @blakeyrat said:

    @pkmnfrk said:
    (Although, there are plenty of proper ways to circumvent that properly as well. For example, I assume your company's email server supports more than one email address per inbox.)
    If the system were well-designed, we wouldn't have to screw around with *our* perfectly well-designed system to interact with it. @pkmnfrk said:
    (Actually, now that I'm thinking about it, why would this be a problem in the first place? Since I don't know the specific product, I am guessing here. If Harmniture is hosted by Badobe, the bad assumption would be that you only have one client. If it's hosted by the client, then why would multiple clients be using it? If it's hosted by you, FOR your clients, again, the assumption is that you can't have multiple clients.)
    It's hosted by Badobe. Clients pay Badobe to use the product. Badobe links emails to clients with a 1:1 relationship, so any given email address can only be in one client at a time. Apparently, the moron who designed this system never heard of the concept of "consulting".

    You can't set up aliases on your e-mail system to route multiple disparate e-mails to your inbox?

     

    I can't, no. I don't work in IT, and I don't have access to the Exchange servers, and even if I could it's been a long time since I've touched Exchange and I'd have to look up now.

    But the important thing to me is: STOP DEFENDING THIS POORLY-WRITTEN SOFTWARE THAT REQUIRES STUPID WORKAROUNDS FOR ME TO USE.

    Seriously. This is why we can't have nice things. You don't have to loudly gripe about it, like I do, but at the very least don't defend their moronic design decisions-- just realize the product is broken for a large percentage of their users, and move on.

    Didn't mean to upset you.  I've just given up on "fixing stupid".  It can't be done. Unless you can go into their company and re-write their code, there's no point in trying to get them to fix their stupid app.  The only thing you CAN do is to create the best "fix" for you.

     



  • @Medezark said:

    Didn't mean to upset you.  I've just given up on "fixing stupid".  It can't be done. Unless you can go into their company and re-write their code, there's no point in trying to get them to fix their stupid app.  The only thing you CAN do is to create the best "fix" for you.

    Yeah, but that's the big problem in our industry. Once someone finds a workaround, there's no incentive to fix the original bug-- in fact, in many cases, there's negative incentive to fix the original bug because the fix could break the workaround.

    And the all-caps was uncalled for, sorry... I'm just passionate about the sorry state of our buggy-ass and unusable software, which seems to be constantly back-sliding to become more buggy and less usable every year.



  • @pkmnfrk said:

    Wouldn't something like username be an acceptable primary key? The key is guaranteed to be unique just as long as the associated data is. Same with things like email addresses
     

    Usernames and email addresses are fairly guaranteed to be unique, but they are also highly susceptible to being changed.  If Mary Smith gets married and becomes Mary Jones, which one has less of a cost on the database:  Changing the username from msmith to mjones on just the Users table (with a separate primary key) or changing the Users table AND every table that uses the username as the foreign key?

    Email addresses are even more likely to change.



  • @blakeyrat said:

    @Medezark said:
    Didn't mean to upset you.  I've just given up on "fixing stupid".  It can't be done. Unless you can go into their company and re-write their code, there's no point in trying to get them to fix their stupid app.  The only thing you CAN do is to create the best "fix" for you.
    Yeah, but that's the big problem in our industry. Once someone finds a workaround, there's no incentive to fix the original bug-- in fact, in many cases, there's negative incentive to fix the original bug because the fix could break the workaround.

    And the all-caps was uncalled for, sorry... I'm just passionate about the sorry state of our buggy-ass and unusable software, which seems to be constantly back-sliding to become more buggy and less usable every year.

    No problem with the all caps, I understand your frustration. Sorry that I'm a little slow on the uptake.

    I understand the problem now.  The initial database should have been setup with two tables a "CUSTOMER" table and a "USERS" table, in a many-to-many relationship, where-in a single CUSTOMER could have multiple USERS and a single USER should be able to access multiple CUSTOMER accounts, with a "CUSTOMER-USER' table in the middle managing the relationships. Relatively simplistic concept, much like the USER/APP/ROLE relationships in standard ASP.NET security.

    The "fix" is relatively simple in concept, but might be expensive to apply depending on how craptastic the rest of the application is.  But it's also a relatively standardized type of thing as well.


Log in to reply