The war of The One Table



  • So, I need database design help. We have a Central Control and Supervision System, lets call it the CCSS, for one of our humm... products, lets call it device_A, which goes with controller_A. Now we are integrating controller_B in the CCSS. Controller_B, in different configurations, controls an assortment of devices, some ours, some not. The CCSS has a table to hold information of device_A type elements. Now to integrate the new assortment of devices my first instinct was to roll new tables, basically something like:

    --(This is just an example don't look closely at my choice of types)
    Create table Master_Device_Table(
    Device_id   sequence;
    Name        text;
    Type         device_type;
    --more common info
    );
    
    create table device_A(
     device id bigint foreing key Master_device_table(device_id);
    --more device_A specific info.
    );
    
    create table device_B(
     device id bigint foreing key Master_device_table(device_id);--
    more device_B specific info.
    );
    
    create table device_C(
     device id bigint foreing key Master_device_table(device_id);
    --more device_C specific info.
    );
    ....
    --More devices
    

    Now, other people responsible for this say relations and Joins and stuff won't be performant, instead they want to roll The One Table. Basically

    --(This is just an example don't look closely at my choice of types)
    Create table Device_Table(
    Device_id   sequence;
    Name        text;
    Type         device_type;
    ....
    --All information about all devices ever 
    );
    
    

    In about eighteen months we have a good chance of finding ourselves supporting about 10 device types, for a about 1000 actual devices and a 200-300 column One Table. Moving forward... I don't know, if this goes well lots of devices, not much more device types (hopefully), but then at some point our corporate overlords would probably take/buy this from our hands code and all included.

    So, I am no expert at I know fuckall about database stuff, much less about huge database performance. What I want to know is;

    I am TRWTF for wanting to roll different tables and I should drop the issue?
    The One Table is the root of all evil and I should petition for redress to our tech lead
    None of the above, and any insight.


  • Discourse touched me in a no-no place

    How many device-types are you expecting? 3? 30? 300? Without bound?

    Additionally your second option could be broken into two tables (using EAV, since your current second option is going to have a lot of gaps):

    Create table Master_Device_Table(
    Device_id   sequence;
    Name        text;
    Type         device_type;
    --more common info
    );
    
    create table Device_Specific_Stuff(
     Device_id bigint foreing key Master_device_table(device_id);
     Property   text -- or foreign key into a property table
     Value      text
    --more device_A specific info.
    );
    

  • Discourse touched me in a no-no place

    @Piko said:

    Now, other people responsible for this say relations and Joins and stuff won't be performant, instead they want to roll The One Table

    :wtf:



  • More than 10 device-types (we have 10 already), less than 1000, probably 200 (medium-term) or so, Lots of devices will be similar. Volume of data for each device might get huge depending on what we are asked. Like megabytes per device per day big.


  • Discourse touched me in a no-no place

    @Piko said:

    probably 200 (medium-term) or so

    Table per device is not how I'd want to go with this then.

    @Piko said:

    Volume of data for each device might get huge depending on what we are asked. Like megabytes per device per day big.

    Are you saying that that data needs to go into the database or was that just some exposition?



  • Since you say you'll have some overlapping features between devices, can you split this off based on features?

    Example:

    Create table Devices(
      Device_id   sequence;
      Name        text;
      Type         device_type;
      -- more common info
    );
    
    Create table Devices_Feature_1(
      Device_id integer;
      Param1 text;
      -- more specific fields
    );
    
    Create table Devices_Feature_2(
      Device_id integer;
      -- more specific fields for feature 2...
    )
    

    Perhaps in the long run, once you have 100-s of device types, you can still keep a manageable number of these feature side-tables.



  • @cartman82 said:

    Since you say you'll have some overlapping features between devices, can you split this off based on features?

    That's probably a good approach, This all is still pretty green, I had though a per device solution thinking of the devices we have now

    @PJH said:

    Are you saying that that data needs to go into the database or was that just some exposition?

    Well I just realized I should probably press more to get that clarified. I know devices must be able to serve data at the lowest granularity. Then I was hoping we would only store summaries, because that makes sense to me, but I don't really know.

    We don't have yet detailed requirements for this, we have had a meeting telling us that some version of this is getting done. We need a toy prototype soonish (2 months kinda sorta), a couple of new device types talking to the CCSS and some colorful charts about them would probably do the trick.

    Anyway That helped me get some thoughts so thanks. Just to clarify I'm not gonna "architect" this thing but I wanted to be able to propose saneless crazy ideas, as long as I get to voice an opinion on this.



  • @Piko said:

    Now, other people responsible for this say relations and Joins and stuff won't be performant, instead they want to roll The One Table.

    Premature optimization is endemic in databases. Joining is WHAT THEY DO! I mean this is like saying you should put your sports car on a flat-bed trailer towed by a Mack because sports cars aren't performant.

    I recommend the following: implement your sane idea. For the application, wrap all of the functions that handle this data either in a stored procedure or code function that wraps and hides the join from them.

    Explain to your co-workers that you think proper normalization is best, but you've implemented it in such a way that you can easily go back to denormalized tables with just a couple ad-hoc queries if you detect a performance problem.

    Now it's not you dictating the design, it's a compromise.


  • Discourse touched me in a no-no place

    @Piko said:

    Now, other people responsible for this say relations and Joins and stuff won't be performant

    My rule of thumb is if someone says that they probably don't know what they're talking about and shouldn't have any input.


  • Discourse touched me in a no-no place

    @blakeyrat said:

    Joining is WHAT THEY DO!

    QFT. To expand on my previous post, "joins won't be performant" is like buying Oracle or MSSQL and deciding you don't trust it's transactional capability, so you roll your own transaction handler, or deciding to roll your own encryption method.



  • @FrostCat said:

    To expand on my previous post, "joins won't be performant" is like buying Oracle or MSSQL and deciding you don't trust it's transactional capability, so you roll your own transaction handler, or deciding to roll your own encryption method.

    Let me put it this way: the only time I've ever seen a join seriously bog Oracle down to the point where I had to abandon that query design was when the join condition was a substring match on a CLOB, and that had mostly to do with the original schema designers getting their foreign keying bass-ackwards.


  • Discourse touched me in a no-no place

    That was sort of my point: if you're buying a commercial database, and I don't mean stuff like "older versions of MySQL that didn't support transactions", and you're second-guessing the query engine or the transaction processor, you're doing it wrong.


  • Discourse touched me in a no-no place

    @FrostCat said:

    MySQL

    I forgot people pay for that.


  • Discourse touched me in a no-no place

    @loopback0 said:

    I forgot people pay for that.

    I don't know how many people do, but I needed a counterpoint to Oracle. 😄


  • ♿ (Parody)

    @tarunik said:

    Let me put it this way: the only time I've ever seen a join seriously bog Oracle down to the point where I had to abandon that query design was when the join condition was a substring match on a CLOB, and that had mostly to do with the original schema designers getting their foreign keying bass-ackwards.

    No...I've had lots of joins that sucked. When it wasn't an index that we should have had, there were usually ways to rewrite the query.

    Saying that joins won't bog a modern DB down is about as misleading as saying you shouldn't do them because they will.


  • FoxDev

    A join is only as good as the indexes it can use


  • ♿ (Parody)

    Right...and those don't necessarily scale.



  • @boomzilla said:

    Saying that joins won't bog a modern DB down is about as misleading as saying you shouldn't do them because they will.

    Most of the time, you'll bog down into the 10s-5min range; that's rather annoying from a UX perspective, but can be worked around, especially if you're just bringing back tons of data, and is NBD for a reporting query. When I say "seriously bog down to the point of abandoning the query design", I'm talking about a query that took over five hours to run.


  • ♿ (Parody)

    @tarunik said:

    I'm talking about a query that took over five hours to run.

    IME (and I admittedly haven't worked with very large DBs) this is due to either a mistake or trying to do too much in a single query. Too much often meaning some calculation, but sure, your CLOB nightmares qualify. And so you run the poor thing out of memory or something because you messed up a join or left something else off.



  • @cartman82 said:

    Since you say you'll have some overlapping features between devices, can you split this off based on features?

    Example:

    ``
    Create table Devices(
    Device_id sequence;
    Name text;
    Type device_type;
    -- more common info
    );

    Create table Devices_Feature_1(
      Device_id integer;
      Param1 text;
      -- more specific fields
    );
    
    Create table Devices_Feature_2(
      Device_id integer;
      -- more specific fields for feature 2...
    )
    

    ``

    Perhaps in the long run, once you have 100-s of device types, you can still keep a manageable number of these feature side-tables.

    From what I've read so far, I'd lean this way.

    Asking about the performance of joins is a red herring, IMHO. Their performance is perfectly fine, but that doesn't mean that's the best way to go.

    What smells about the table-per-device schema to me is having potentially dozens to hundreds of tables with identical schemas. That doesn't sound like what a database is for - all devices have a type, so that info should be expressed in a column on the main table, not by having a separate table for each possible type that holds the same info as every other table.

    An extra table you join into is for having different columns that you only need for certain types, so it makes more sense to define them by features and data specific to those features.

    Or alternatively, you could go for a schema-less extra info column, in XML or JSON or whatever type like that your DB supports.



  • @boomzilla said:

    or trying to do too much in a single query. Too much often meaning some calculation, but sure, your CLOB nightmares qualify.

    Yeah, considering that simply unpacking those CLOBs took Oracle 30 minutes...

    (BTW: the query was otherwise fine -- when I plugged in the normalized/unpacked form of the culprit table, the runtime was measured in seconds, which for an eight-way join-on-subqueries reporting query, isn't so bad.)


Log in to reply