But the tables are more efficient this way



  • Each of these tables has 3 integer and 2-10 char string columns, plus 4 auditing columns. They are all 1:1 relationships; there is always exactly 1 row in each subordinate table to match the single corresponding row in the main table.

    Why didn't the DBAs just add the columns to the main table? Because it's more efficient to store each type of data in its own table, even if will never, ever, change.

    OBTW, there are nearly a billion rows in each of these tables.

    select ...
      from MainTable m,
           Table1 t1, Table2 t2, Table3 t3, Table4 t4,
           Table5 t5, Table6 t6, Table7 t7, Table8 t8
     where  m.id = t1.id
       and t1.id = t2.id
       and t2.id = t3.id
       and t3.id = t4.id
       and t4.id = t5.id
       and t5.id = t6.id
       and t6.id = t7.id
       and t7.id = t8.id
       and ...
    


  • But the real question is, is this true?

    Also with this you won't need indices anymore!



  • It sounds as though the DBAs might have encountered a table file storage limit for those billions of rows and opted for a solution that doesn’t require additional cost.



  • How often are the subordinate tables queried? If the data isn't used very often then this might actually be more efficient* since queries against the main table won't have to deal with as much data.

    *I'm making some blind assumptions based on untested hypotheses and therefore the level of correctness for this statement may vary greatly.



  • @mott555 said:

    How often are the subordinate tables queried? If the data isn't used very often then this might actually be more efficient* since queries against the main table won't have to deal with as much data.

     

    That might be true in some cases. Perhaps if you use only one column, and draw really large chunks of data (millions of rows at a time), but otherwise this is going to be way less efficient. I'd say that even if only one in every 10 queries uses more than one tables this is inefficient.

    Something like this happens when the DBA cannot figure out how to alter a table without bringing the system to a halt...



  • TRWTF is not using ANSI joins.



  • In general, databases are more efficient when they are correctly designed, and this is really bad design. I have seen it in a lot of places though. Sometimes it's because you would be over the row size limit with a single table, and that's the only legitimate reason why you would ever do this. Tables which take up more space on the disk are not a problem - you can always create the proper indexes so that searches are still fast regardless of the physical disk size of the table. Whoever told you this situation was good is wrong. I don't say that very often, but it's warranted here - this is just plain wrong.



  • @snoofle said:

    Because it's more efficient
     

    There is a response which should immediately follow this statement: Show me the numbers.

    "Really? Show me your numbers. What? You don't have numbers to back up  your statement? Then you need to stop saying the statement, because you are lying and that is unprofessional."

     

     



  • @jasmine2501 said:

    In general, databases are more efficient when they are correctly designed, and this is really bad design. I have seen it in a lot of places though. Sometimes it's because you would be over the row size limit with a single table, and that's the only legitimate reason why you would ever do this. Tables which take up more space on the disk are not a problem - you can always create the proper indexes so that searches are still fast regardless of the physical disk size of the table. Whoever told you this situation was good is wrong. I don't say that very often, but it's warranted here - this is just plain wrong.
    A lot of database purests will argue that all nullable columns should be replaced with separate entities with one-to-one relationships.  If you go all the way to the 6th normal form, the database will look exactly like the one in the OP, but with more tables.



  • @jasmine2501 said:

    I have seen it in a lot of places though. Sometimes it's because you would be over the row size limit with a single table, and that's the only legitimate reason why you would ever do this.

    Perhaps I am misunderstanding something. But if there is a 1 to 1 relationship, won't there be a row in Table1 for every row in the Main table? If that is the case how will this keep you from going over the row size limit?


  • Discourse touched me in a no-no place

    @chrismcb said:

    [...] if there is a 1 to 1 relationship, won't there be a row in Table1 for every row in the Main table? If that is the case how will this keep you from going over the row size limit?
    I think the 'row limit' is how long the rows in a single table are, not the number of rows. So instead of, say, a single table with 64M's worth of data per row in it, you have 64 tables with 1.5M of data per row.



  • We have one table (or should I say table set) in our database that is like this.

    This is just one way to do polymorphism in the database. If you make a new subclass of MainTable you just create a new TableX for the new fields. And when you query your new table you join in only those tables that are in its class hierarchy.

    Another way would be to add nullable columns to the main table, however this might seem a waste when on most rows they are NULL but depending on the database it actually may not be that significant of a waste of space.


  • Discourse touched me in a no-no place

    @SlyEcho said:

    polymorphism in the database
    That sounds fun. Is it as fun as it sounds?



  • @SlyEcho said:

    We have one table (or should I say table set) in our database that is like this.

    This is just one way to do polymorphism in the database. If you make a new subclass of MainTable you just create a new TableX for the new fields. And when you query your new table you join in only those tables that are in its class hierarchy.

    Another way would be to add nullable columns to the main table, however this might seem a waste when on most rows they are NULL but depending on the database it actually may not be that significant of a waste of space.

    I have a similar system in one project as well. Main table with 19 columns, and three subtables with 4-24 columns each (ID columns included). Every row in the main table has a corresponding row in at most one of the subtables, so I'm saving 18-42 null fields per row. There's only 131124 rows in the main table so it may not be that huge savings, but it does make managing the tables and queries easier.

    From the OP I got the impression every table Table1..Table8 has the same amount of rows as MainTable. That's just bad design.

    I'd imagine that any RDBMS that's capable of handling tables with billions of rows would also have its storage backend implemented in a way that can avoid file size limits, whether that is using a raw disk partition or distributing the storage across many smaller files.



  • Yep.  Database == Namespace.  Table == Class.  Row == Instance.  Field == Property.  1:1 == inheritance.  1:n == contained instances.  Trigger == Collection/Property Changed Event Handler.  View == Wrapper/Composition.  Sproc == Static Method.



  • @chrismcb said:

    @jasmine2501 said:
    I have seen it in a lot of places though. Sometimes it's because you would be over the row size limit with a single table, and that's the only legitimate reason why you would ever do this.

    Perhaps I am misunderstanding something. But if there is a 1 to 1 relationship, won't there be a row in Table1 for every row in the Main table? If that is the case how will this keep you from going over the row size limit?

    In SQL Server, the data in a single row can not exceed 8060 bytes (8k minus some overhead - a full data page). So, if you define a table like this:

    create table #myTemp (
     ID INT,
     myData varchar(8000),
     myData2 varchar(60)
    )

    Then SQL Server will create the table, but it's possible to exceed the row size limit and your data may not be saved if you do that. If you actually put fill up the myData columns, then the extra 4 bytes of the INT will overflow the table row and your attempt to insert the data will be rejected. If you try to define a table like this:

    create table #myTemp (
     ID INT,
     myData char(8000),
     myData2 char(60)
    )

     Then SQL Server will say the following: Creating or altering table '#myTemp' failed because the minimum row size would be 8071, including 7 bytes of internal overhead. This exceeds the maximum allowable table row size of 8060 bytes.

     SO, while I think this is a violation of 3rd Normal Form, if you have more data which is uniquely related to the integer ID, then you are forced to violate normal form due to limitations of the RDBMS. However, if this is not the case, then you are violating normal form for no good reason.

     



  • @jasmine2501 said:

    SO, while I think this is a violation of 3rd Normal Form, if you have more data which is uniquely related to the integer ID, then you are forced to violate normal form due to limitations of the RDBMS. However, if this is not the case, then you are violating normal form for no good reason.

    How can a schema in 6th Normal Form violate the 3rd Normal Form?



  • @Jaime said:

    @jasmine2501 said:

    SO, while I think this is a violation of 3rd Normal Form, if you have more data which is uniquely related to the integer ID, then you are forced to violate normal form due to limitations of the RDBMS. However, if this is not the case, then you are violating normal form for no good reason.

    How can a schema in 6th Normal Form violate the 3rd Normal Form?

    To answer the question, if something really is in 6th normal form, then by definition it meets all the normals below that.

    However, after reading a bit to refresh my memory, I think I got that wrong. 3rd normal form is "all attributes are dependent on the key, the whole key, and nothing but the key, so help me Codd" - but what's going on here is a violation of "all data should be represented one time and one time only" and I forget how that "rule" works into the normal forms concept. The argument that all nullable entities should be a table of their own is interesting but not practical. It's much more efficient to have a null column than to have to check for the absence of a row in another table. The OP isn't looking at that situation - they are looking at attributes of an object which have been moved to another table for the purpose of making all the tables smaller in total size. With modern database systems, this is not any faster, and it leads to possible data integrity problems (since you have to access multiple tables to update or create single entities). So, it's not one of those "optimizations" that has no impact so you might as well do it. This "optimization" hurts performance any time you have to do a query like the OP posted with lots of joins - these joins are fast because they use a single-column integer key, but joins are always slower than no joins. Putting all the columns in one table would make the tables larger, but it would not increase the size of the indexes if they are made correctly, so you wouldn't have any impact on search performance. THAT is where the DBA is wrong, and considerations about correct design are just additional ammunition against this concept.



  • @jasmine2501 said:

    3rd normal form is "all attributes are dependent on the key, the whole key, and nothing but the key, so help me Codd" - but what's going on here is a violation of "all data should be represented one time and one time only" and I forget how that "rule" works into the normal forms concept.

    There is no evidence the the tables the OP was talking about had any duplication other than the key itself.  This doesn't violate normalization and is actually required for the more obscure normal forms.

    @jasmine2501 said:

    The argument that all nullable entities should be a table of their own is interesting but not practical. It's much more efficient to have a null column than to have to check for the absence of a row in another table.

    Extreme normalization is rarely practical and often inefficient (at least for reading).

    @jasmine2501 said:

    The OP isn't looking at that situation - they are looking at attributes of an object which have been moved to another table for the purpose of making all the tables smaller in total size. With modern database systems, this is not any faster, and it leads to possible data integrity problems (since you have to access multiple tables to update or create single entities). So, it's not one of those "optimizations" that has no impact so you might as well do it. This "optimization" hurts performance any time you have to do a query like the OP posted with lots of joins - these joins are fast because they use a single-column integer key, but joins are always slower than no joins. Putting all the columns in one table would make the tables larger, but it would not increase the size of the indexes if they are made correctly, so you wouldn't have any impact on search performance. THAT is where the DBA is wrong, and considerations about correct design are just additional ammunition against this concept.

    Normalization is not optimization.  Many optimizations are a form of de-normalization.  You are arguing against a poor implementation by calling it a poor design.  On the contrary, this make much more theoretical sense than it makes practical sense.



  • @Jaime said:

    Normalization is not optimization.

    Sure it is, when it reduces duplication or uses less space due to nulls: size optimization. That assumes, of course, that it actually reduces storage, but then anything that could be an optimization in any sense isn't necessarily an optimization in practice in all cases.

    It could also be a speed optimization if you have some sort of replication scheme where you end up having to send less information over the wire. Or, copying/cloning data might be quicker if you don't also have to copy/clone the normalized rows. Depending on data size, it could also make a difference just due to the physical layout, i.e., more rows per cluster or whatever.



  • @Jaime said:

    There is no evidence the the tables the OP was talking about had any duplication other than the key itself.  This doesn't violate normalization and is actually required for the more obscure normal forms.

    That's why I said I was wrong about that.

    @Jaime said:

    Normalization is not optimization.  Many optimizations are a form of de-normalization.  You are arguing against a poor implementation by calling it a poor design.  On the contrary, this make much more theoretical sense than it makes practical sense.

    No I'm saying this design doesn't make theoretical or practical sense. It goes against a rule I can't remember the name of, and in practice it provides no improvement of performance, and introduces a data integrity risk.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.