What kind of datastore do I want?


  • Discourse touched me in a no-no place

    So I have a dataset. This dataset consists of a well defined set of values and a key/value list providing additional data.

    Our current approach is to flatten the key value store into columns, named for the keys, add any columns to our SQL table, and load the data.

    The table is now a skazillion columns wide.

    The use case is to basically do a select * on the table once we've populated some bonus columns according to some values in the fixed schema.

    The developers working on this tell me that dealing with this in a flat sql table SQL is evil and they want to do a key/value table instead.

    I'm 100 percent certain that that's just going to cause mass chaos and break everything, but what's the actual correct way to do this? Some kind of nosql thing?


  • sockdevs

    nosql (like MongoDB) would work.

    but if you want to keep things cleaner in SQL try this:

    • TABLE1:
    • Core table, all common/required columns
    • EAV1
    • Entitiy (key to Table1 record) Attribute Value table that is exposed in your ORM as a dictionary string string (or other appropraite object structure)

    Keeps the benefits of reporting and stuff in SQL and allows arbitrary key value pairs for extra things that aren't on all objects.


  • BINNED

    @accalia said:

    like MongoDB

    FWIW, the article linked here would suggest it may not scale well:

    http://what.thedailywtf.com/t/goodbye-mongodb-hello-postgresql-article/9049?u=onyx


  • sockdevs

    @Onyx said:

    FWIW, the article linked here would suggest it may not scale well:

    well yes.

    i didn't say it would work well..... :-P


  • BINNED

    I think @Weng lost enough sanity already judging by that story. Do give him a break ;-)



  • Depending on the database you have, you may already have a good solution - several DBs out there support key-value columns in formats like XML, JSON, and HSTORE. Take a look and see if yours already supports one of these.

    Since it's read-only (and it sounds like you won't be doing filtering of any kind on it) you can probably even get away with a BLOB or TEXT column if your DB of choice doesn't support any of the above.


  • BINNED

    Well, I know Postgres can index jsonb and hstore datatypes. hstore would probably suit the purpose the best.

    Of course, we don't know the DB engine that's already used now, maybe it actually has an appropriate datatype but whoever designed the database is not aware of it.


  • Discourse touched me in a no-no place

    SQL Server 2008 R2. If I'm not mistaken, XML column support blew in 2005, which is where this app originated.



  • @Weng said:

    The table is now a skazillion columns wide.

    And that is a problem because... why?

    @Weng said:

    The developers working on this tell me that dealing with this in a flat sql table SQL is evil

    Do they have a reason?

    @Weng said:

    and they want to do a key/value table instead.

    Are all the values the same type?

    @Weng said:

    I'm 100 percent certain that that's just going to cause mass chaos and break everything, but what's the actual correct way to do this?

    I think you already did it correct.


  • Discourse touched me in a no-no place

    @blakeyrat said:

    And that is a problem because... why?

    Because my boss+1 prefers the opinion of a line developer who's been on his team for decades over the opinion of a system architect on a team he was drafted to take over two years ago.

    @blakeyrat said:

    Do they have a reason?
    Outdated misconceptions about rows not being able to exceed 8kb and vague stupidity about the current model being unsustainable.

    @blakeyrat said:

    Are all the values the same type?

    All variable length strings.

    @blakeyrat said:

    I think you already did it correct.
    Me too.



  • We've been through several alternatives on that one, mostly _OPTIONS child tables and name=value; strings in an OPTIONS column (VARCHAR2(255), VARCHAR2(4000), occasionally CLOB). All nice and dandy as long as you think you're only using them in code anyway. And then it's been in a production release and you find out you need to do SQL on it after all.

    Go columns.



  • What you have is a political problem, not a technical problem.

    You can either tell boss + 1's line developer to show you the money (i.e. prove his assertions), or you can just cave in and say, "whatever you wanna do." Or quit I guess.

    But your question asks for a technical solution, and I don't think one exists.



  • Curious would this be the way you do it if you had 20 predefined types?

    We initially build a static survey tool. It then morphed to 20 different versions because each group wanted/needed different questions.

    Since at the time the fastest solution was to just add columns to the table that is what we did.

    Then we had a shake-up at work. New PM takes over as says we should have 20 different tables, one for each type.

    Thoughts?



  • @blakeyrat said:

    is a problem because... why?

    It's all fun and games until you end up with an object having a million properties in your code. Which needs to map exactly to the columns - you can't, for example, aggregate VALUE_JAN through VALUE_DEC into an array, or the whole homegrown DB-to-objects cancer of a mapper blows up. And then you get to define all these properties again in your business logic layer, except now instead of {get; set;} you have to do {get { return dao.PropertyName; } set { dao.PropertyName = value; } }.

    All this, of course, has zero relation to the case at hand. But I needed to vent.


  • Discourse touched me in a no-no place

    Yeah. Not applicable to our situation. I'll explain when I get to a real keyboard and can psuedocode



  • Load it into a hash?



  • @PleegWat said:

    Load it into a hash?

    And smoke it?


  • BINNED

    @Weng said:

    Outdated misconceptions about rows not being able to exceed 8kb

    Still applicable in SQL Server, certainly in 2008R2.

    There are built in workarounds, like VARCHAR(MAX) being a pointer to the actual string, but there is still an 8k limit on the row length after the savings are taken into account.

    Source: I once had to come up with a creative workaround when importing an Excel file with around 500 columns, blowing out the limit by a long way


  • Discourse touched me in a no-no place

    Varchar doesn't count at all anymore. All these fields are Varchar.

    It just overflows the variable length fields into another page, which hurts performance a bit when touching those particular rows.

    Fixed size columns must all total up to less than 8 bytes though.

    Citation: The docs and experimentation.

    There are also specialized wide tables. Not sure on the specifics there, but they exist.



  • I used XML columns in a SQL Server 2005 app - a bit of a pain to query, since you need to suddenly think in XPATH when you were just thinking in SQL, but nothing ridiculously painful (and not applicable at all if the use case is simply write / read not query). That said, the columns were options sets on small tables (~10K not ~100B) so I didn't have a chance to encounter other performance or maintenance issues.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.