Unroll a large structure b/c it's more efficient



  • I just got out of a meeting with my boss and our DBA. We need to save a new data structure. It has all sorts of info about a customer - totalling about 250 fields, each of which represents a single column in the database. Ok, it's a wide table, but doable, and there's a clear PK that's very efficient to use.

    The DBA cited efficiency concerns and suggested that instead of storing a wide table like this, that we store: [PK], name, value. I pointed out that more than 95% of the fields would always be set, and backed it up with some queries on the production database. The DBA insisted that we do it his way. We pointed out that the logic to break down the structure into separate insert statements for each field, and the query to read in the data would now be way more complicated due to the if-else chain that would be required for field-names in order to associate (and convert from string to the correct type) each row with the correct field in the structure.

    I refused to budge and the DBA backed down, but c'mon; WTF is wrong with people!

     



  • Wow, a new record. Three Snoffle-WTFs in one day! Keep them coming! :-)



  • Another guy who loves to create a system within a system. I once had to work with that kind of database, too. In that database, the "values" could have different types, depending on what the key was. Some values were even foreign keys to other rows. Those queries were ugly!



  • @TheRider said:

    Wow, a new record. Three Snoffle-WTFs in one day! Keep them coming! :-)
    I've decided to just cut out early and go home to the 'sanity' of my two teenage daughters.



  • @snoofle said:

    The DBA cited efficiency concerns and suggested that instead of storing a wide table like this, that we store: [PK], name, value. I pointed out that more than 95% of the fields would always be set...

    Good Lord, the DBA actually wanted to use the EAV anti-pattern for this. I knew your DBAs needed the clue-bat, but wow.



  • Glad you didn't back down on that one.  Wish I had that kind of clout.


  • Considered Harmful

    Is this The Inner-Platform Effect back again?



  • That's a seriously incapable DBA. He would have you store the same PK 200 to 250 times? And to store the name of the field... for every entry. I always thought that's why data structures were such a good idea. Perhaps he's a converted FORTRAN programmer that can still only think in 1D arrays?



  • @snoofle said:

    I just got out of a meeting with my boss and our DBA. We need to save a new data structure. It has all sorts of info about a customer - totalling about 250 fields, each of which represents a single column in the database. Ok, it's a wide table, but doable, and there's a clear PK that's very efficient to use.

    The DBA cited efficiency concerns and suggested that instead of storing a wide table like this, that we store: [PK], name, value. I pointed out that more than 95% of the fields would always be set, and backed it up with some queries on the production database. The DBA insisted that we do it his way. We pointed out that the logic to break down the structure into separate insert statements for each field, and the query to read in the data would now be way more complicated due to the if-else chain that would be required for field-names in order to associate (and convert from string to the correct type) each row with the correct field in the structure.

    I refused to budge and the DBA backed down, but c'mon; WTF is wrong with people!

     

     

    I believe that anti-pattern is in this book: http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557

    It might be worth having a copy at hand for your next meeting. 

    Or, you could write your own book.

     



  • @Rick said:

    It might be worth having a copy at hand for your next meeting.
     

    I think they already know all the tricks in the book. Perhaps they even wrote it.



  • @corgimonster said:

    I knew your DBAs needed the clue-bat, but wow.
     

    Remember past snoofle posts?

    We'd established he didn't have any DBAs working at his current contract. There were untouchable fuckwits that had a job title of "DBA", but the sure as hell didn't seem to know a great deal about database administration.



  • People who make suggestions like that don't read industry press, articles, books, blogs. They don't. If they did, they wouldn't make suggestions like that.

    My point is, he doesn't know what an "antipattern" even is.



  • @snoofle said:

    I refused to budge and the DBA backed down, but c'mon; WTF is wrong with people!
     

    I don't know what the term is, but many times I've encountered a particular course of action chosen because it was easier for the [cover|developer|programmer] to do at the expense of usability - the end results gave short-term benefits for the creator at the expense of long-term detriment to the user.

    I know it's solely a selfish and inconsiderate point of view, especially when the developer doesn't consider that they'll spend a brief moment creating in comparison to the time expended by users consuming.

    I hate this skewed attitude: many's the time I've had to bang developer heads together and remind them that they're in this ultimately to serve users and the business, not to feather their own next.

    YMMV, but really it's not YOUR job, it's the organisation's - you're just fulfilling it. Currently.

    (yeah, that's aimed at self-centred developers - not at snoofle - before anyone gets the wrong impression. Heading off misunderstandings at the pass there)



  • @corgimonster said:

    @snoofle said:

    The DBA cited efficiency concerns and suggested that instead of storing a wide table like this, that we store: [PK], name, value. I pointed out that more than 95% of the fields would always be set...

    Good Lord, the DBA actually wanted to use the EAV anti-pattern for this. I knew your DBAs needed the clue-bat, but wow.

    I wouldn't call EAV an anti-pattern, some technologies are based mostly on it. But you do need to know when to use it, and this clearly isn't the place. Wikipedia explicitly states that it is to be used as a sparse matrix datastructure, and this clearly isn't a sparse matrix.



  • @TGV said:

    @Rick said:

    It might be worth having a copy at hand for your next meeting.
     

    I think they already know all the tricks in the book. Perhaps they even wrote it.

     

    The copy at the next meeting isn't for reading, silly. It's a big thick book, useful for twacking idiots in the back of the skull.

     

     



  • @blakeyrat said:

    People who make suggestions like that don't read industry press, articles, books, blogs. They don't. If they did, they wouldn't make suggestions like that.

    My point is, he doesn't know what an "antipattern" even is.

    Och, cap'n, me bonnie engines canna take it much longer! We need ta adjust the pattern/antipattern ratio, or else they'll explode!



  • @Sutherlands said:

    Glad you didn't back down on that one.  Wish I had that kind of clout.

    Clout is available.



  • Maybe that DBA read an article in the net about NoSQL... Use BSON dude!



  • It has to be said ... if you have the kind of system that justifies storing 250 distinct pieces of data on each customer, chances are next week, you'll be adding another 50.

    You really want to start altering tables with potentially millions of rows to add in yet another 50 columns ?

    Seems like the DBA has experienced this before, and is trying to find a more painless upgrade path.



  • @daveime said:

    It has to be said ... if you have the kind of system that justifies storing 250 distinct pieces of data on each customer, chances are next week, you'll be adding another 50.

    If most of them were empty, I'd agree, but that wasn't the case. I have to say that it's weird to think of 250 columns without a relation or logical grouping between them, but well, if that's the case, then 250 columns it is.

    BTW, I'm sure there are databases with a 255 or 256 limit on the number of columns, although Oracle goes to 1000. Seems that's enough.



  • @TGV said:

    I have to say that it's weird to think of 250 columns without a relation or logical grouping between them
     

    That number of columns flags warning bells to me, and smacks of poor design.

    I don't know why; I can't tell you how many columns a table is supposed to have, but that just sounds like too many to manage.



  • @Cassidy said:

    @TGV said:

    I have to say that it's weird to think of 250 columns without a relation or logical grouping between them
     

    That number of columns flags warning bells to me, and smacks of poor design.

    I don't know why; I can't tell you how many columns a table is supposed to have, but that just sounds like too many to manage.

    We use Oracle. That said, there are separate tables where there is a 1-many relationship. This is not one of those cases. The reason this is one wide table is because it was originally about 20 tables. There were > 2 billion rows of data in each table. Doing the join was expensive. This turned out to be a pure performance boost. The conversion was a royal PITA, but it was worth it.

    Sometimes, you have to make certain concessions for performance; as long as you clearly document WHY you did it (I always do, although I can't take credit for this one).

     




  • @TGV said:

    @daveime said:

    It has to be said ... if you have the kind of system that justifies storing 250 distinct pieces of data on each customer, chances are next week, you'll be adding another 50.

    If most of them were empty, I'd agree, but that wasn't the case. I have to say that it's weird to think of 250 columns without a relation or logical grouping between them, but well, if that's the case, then 250 columns it is.

    BTW, I'm sure there are databases with a 255 or 256 limit on the number of columns, although Oracle goes to 1000. Seems that's enough.

     

    In the end of the day, having your data organized is normaly more important than gaining a few hundreds of MB in your database size. It doesn't matter if most of the registers will be empty or not, if next week you'll be adding more kinds of data, you must go for the key:value (anti|whatever)pattern. There is no other sane way of doing it.

    Now, if you have something with 250 pieces of data describing it, and that logically can not be described by any other kind of data*... Then you create a table with 250 columns. Just make sure you are at this situation, because if you are not, you'll pay a high price.

    * Yeah, I've never seen that happen. Neither have heard about it happening to anybody. I also can't conceive it happening. But, hell, if this ever happens to you, just create the huge table.

     UPDATE (I started writting that before reading Snoofle's post above):

    Ok, that is one of those cases where it is more important to save the few hundreds MB (and joins) than it is to get the data organized. I agree, sometimes you have to make concessions.



  • @snoofle said:

    Sometimes, you have to make certain concessions for performance; as long as you clearly document WHY you did it (I always do, although I can't take credit for this one).
     

    Yup, I doff my cap at that one. As you say, if there's a viable case (and benefits are recognised) then it's justified.

    Still... 250 columns. Ow. Don't envy you the conversion. I've done something on a smaller scale and found that arduous and painful; it'd take sizeable wonga and complimentary sexual favours of dubious legality before I'd consider it again.



  • @snoofle said:

    The reason this is one wide table is because it was originally about 20 tables. There were > 2 billion rows of data in each table. Doing the join was expensive. This turned out to be a pure performance boost.

    ...which makes the desire to use EAV even more of a WTF than usual.



  • @TheRider said:

    Wow, a new record. Three Snoffle-WTFs in one day! Keep them coming! :-)

    Time for a doping control, Snoffle!


Log in to reply