What's the big deal if the data isn't consistent?



  • To support a new aspect of our business, our data architect (BA) designed a table to hold the names of various items to be processed by our system. The other tables would then reference an item by its id key. Basic normalization.

    When we noticed that our application was returning what appeared to be many duplicate records, we investigated and found:

    table: ItemNames
    	id	number,  -- PK
    	name	varchar2(32)
    

    select * from ItemNames order by name;
    1 Item1
    6001 Item1
    12001 Item1
    2 Item2
    6002 Item2
    12002 Item2
    ...
    6000 Item6000
    12000 Item6000
    18000 Item6000

    Upon querying our data architect as to why the same item has three separate keys in the same table, she said that the data was loaded three times. We pointed out that having multiple PKs for the same item would cause joins to produce more rows than was desired. Also, depending on how the code was written, and in what random sequence the DB returned the rows, we might end up with PK 1, 2 or 3 for Item1 in different records in the db.

    She replied: so what's the big deal if the data isn't consistent?

    *facepalm*



  • What's going to happen to Item6001?



  • @error_NoError said:

    What's going to happen to Item6001?

    I assume you'd get something like this:

    1      Item1
    6002   Item1
    12003  Item1
    2      Item2
    6003   Item2
    12004  Item2
    ...
    6000   Item6000
    12001  Item6000
    18002  Item6000
    6001   Item6001
    12002  Item6001
    18003  Item6001
    


  • Then she should have no problem removing the PK from the id field. After all, why can't multiple items have the same id?



  • @barrabus said:

    6001 Item6001
    12002 Item6001
    18003 Item6001

    No, 6001 and 12002 are already taken by Item1 and Item2, respectively; that was stated in the OP. Item6001 would get tacked on to the end of the next import, so it would get 24001 and then 30002 and so on...



  • @morbiuswilters said:

    @barrabus said:
    6001 Item6001 12002 Item6001 18003 Item6001
    No, 6001 and 12002 are already taken by Item1 and Item2, respectively; that was stated in the OP. Item6001 would get tacked on to the end of the next import, so it would get 18001 and then 24001 and so on...
    FTFY



  • @Sutherlands said:

    @morbiuswilters said:

    @barrabus said:
    6001 Item6001 12002 Item6001 18003 Item6001
    No, 6001 and 12002 are already taken by Item1 and Item2, respectively; that was stated in the OP. Item6001 would get tacked on to the end of the next import, so it would get 18001 and then 24001 and so on...
    FTFY

    It might get 18001 if it was added before the next full import of the items (but how likely is that? given how stupid this whole thing is, they'll probably just re-import all of the items with Item6001 at the end..) However, it won't get 24001 because assuming Item6001 == 18001, then Item0001 == 18002, so Item6001 == 24002.



  • @snoofle said:

    so what's the big deal if the data isn't consistent?
    I believe the correct answer to that question is "I'm sorry, are you retarded?".



  • @snoofle said:


    She replied: so what's the big deal if the data isn't consistent?

    *facepalm*

    Who's face? [Hopefully, your palm, her face....what ever happend to the "Full contact development methodology ??? ]



  • @DOA said:

    @snoofle said:

    so what's the big deal if the data isn't consistent?
    I believe the correct answer to that question is "I'm sorry, are you retarded?".

    A better answer would be to bring a meme out into the real world:  "Please show some sensitivity.  I had a son who wasn't consistent, and let me assure you it was no laughing matter."

     



  • @snoofle said:

    Our data architect (BA) (...) replied: so what's the big deal if the data isn't consistent?

    I hope your response was to alert the nearest clueful person higher up the ladder that this person is breathtakingly incompetent and completely unqualified to do the job she's placed in- if the boss also doesn't care that your data architect doesn't know the first thing about architecting data that's the company's problem, but you tried.



  •  To me, a DBA saying this, and not adding "just kidding!" afterwards, would be grounds for firing.



  • @Medinoc said:

     To me, a DBA saying this, and not adding "just kidding!" afterwards, would be grounds for firing.

     

    ... yourself, as in quitting; there is a time where you fight and there is a time where you just run for your life. But Snoofle is very persistent. Good thing too, without him we'd all die of WTF withdrawal symptoms.

     



  • If you could guarantee that the names were going to be unique, I would recommend deleting all records where id>6000, and then putting a unique constraint on the name.  So when the idiot tries it again, she would get 6,000 errors (assuming it does not abort after too many consecutive failures).  By any chance is she a blond, since that would explain it too?



  • @orange_robot said:

    @snoofle said:

    Our data architect (BA) (...) replied: so what's the big deal if the data isn't consistent?

    I hope your response was to alert the nearest clueful person higher up the ladder that this person is breathtakingly incompetent and completely unqualified to do the job she's placed in- if the boss also doesn't care that your data architect doesn't know the first thing about architecting data that's the company's problem, but you tried.

     

    I noted the BA in parens following the data architect title.  Does it mean Business Analyst?  If so, this is about the worst combination of job roles I can imagine, and I speak as a former system engineer now turned business analyst.  One job is highly technical, the other one much less so.  It seems to me like being half pregnant.

     



  • What's the problem? Can't you just write some code to deal with it? Just subtract 6000, 12000, whatever ...



    (Yep, I'm being sarcastic, but the lead devs who've said things like that to me before sadly weren't.)



  • @Anketam said:

    By any chance is she a blond
    Anketam, by any chance are you a douchenozzle?



  • @jetcitywoman said:

    I noted the BA in parens following the data architect title.  Does it mean Business Analyst?
    Yes it does. She is as brain-dead as her boss.

    And yes, I would call them both retarded, but it would be a disservice to lump all the folks who are retarded because of something biological in with these two, who are what they are because of sheer stupidity.



  • @token_woman said:

    Can't you just write some code to deal with it?
    I could write ten lines of logic to create a map and only replace something if the new id is lower than the one we already have. This way, it wouldn't matter how much crap they added, or how many times they added it, we'd always get the lowest id any item ever had. But it's just stupid to have to load tens of thousands of records, only to discard most of them.

    I could probably do it with a subselect or something similar, but that's equally pointless, when there is an obvious and correct human-factors based procedural solution.



  • @snoofle said:

    @token_woman said:

    Can't you just write some code to deal with it?
    I could write ten lines of logic to create a map and only replace something if the new id is lower than the one we already have. This way, it wouldn't matter how much crap they added, or how many times they added it, we'd always get the lowest id any item ever had. But it's just stupid to have to load tens of thousands of records, only to discard most of them.

    I could probably do it with a subselect or something similar, but that's equally pointless, when there is an obvious and correct human-factors based procedural solution.

    ... group by name order by id asc? ;)



  • more like:

    select min(id), name
    from table
    group by name
    order by min(id)

    I dunno about other DB servers, but MS SQL won't select id when you're grouping by name unless you tell it how to resolve multiple ids for the same name. (i.e. min(), max(), avg() whatever.) Then again, IIRC Snoofle is using Oracle in which case God fucking knows how it works.



  • @blakeyrat said:

    Then again, IIRC Snoofle is being forced to use Oracle in which case God fucking knows how it works.
     

    FTFY.

    Oracle doesn't know either. It is likely to just report "unstable set of rows".

     



  •  Of course this doesn't fix problems when different db tables end up using different IDs for the same item resulting in requiring extra joins with the itemids table in order to join the two tables together.  

    Of course as the lady said, who cares about consistency.  While we're at it we'll just start using whatever item ids we want even if the id in the itemids table doesn't match the item that we are really dealing with - we'll just use item id 8329 to mean item6021 (and add hard-coded rules in code to specify which items really are in the itemids table and which ones have "special" mappings).

    Besides, consistency is so overrated.  There are lots of companies which get away with not having consistent data because it means they can process data that much faster (no global locks across a cluster of DBs, of course they need some sophistocated conflict management when they do push updates to the other DBs).  If we just do away with consistency entirely just think of how much faster we can process the data?  It doesn't matter if the applications keep pushing around out-of-date, inconsistent data, leading to more and more inconsistency.  What matters is that we do it faster/cheaper, right?



  • To me, a data architect saying this, and not adding "just kidding!" afterwards, would be grounds for blacklisting from the industry.

     



  • @snoofle said:

    She replied: so what's the big deal if the data isn't consistent?

    Sadly, it is these times, when someone in a position of power makes a statement so obviously and patently stupid, that my brain locks up due to the fact that I can't comprehend how they were able to even get their mouth to say such a thing. I then sputter and stammer and try to make them see the error of their ways, but I come off like the idiot in these pieces because I can't coherently explain myself due to the aforementioned brain lock up.

     



  •  @snoofle said:

    I could probably do it with a subselect or something similar, but that's equally pointless, when there is an obvious and correct human-factors based procedural solution.

    The obvious solution is to drop the database.

    Server.

    Onto the Data Architect's head. Repeatedly if necessary.



  • @Anketam said:

    By any chance is she a blond, since that would explain it too?

    I bet she's a girl, too. That would explain it.



  • @token_woman said:

    @Anketam said:
    By any chance is she a blond
    Anketam, by any chance are you a douchenozzle?
     

    HERO

    I lol'd!

    In an empty house.

     

    :(

     

    :'(



  • @snoofle said:

    But it's just stupid to have to load tens of thousands of records, only to discard most of them.
     

    Hey, if it's good enough for the outsourced junk we used to receive, it's good enough for you.



  • Not in any way disputing the WTFness here, but there's an aspect I don't get:

    @snoofle said:

    The other tables would then reference an item by its id key.

    [...]

    When we noticed that our application was returning what appeared to be many duplicate records

    [...]

    We pointed out that having multiple PKs for the same item would cause joins to produce more rows than was desired.

    How would it do that, if all the joins were on the PK? It sounds to me as if the application was actually joining on the name instead of the id, which is a problem with the application.

    The other issues raised of data consistency and integrity are perfectly valid. For instance, if you were getting fewer records than anticipated because one table had product ID 6001 and another had product ID 12001, that would make sense. But I don't see that you should have been getting extra records except in a "list all the products" query, which has nothing to do with joins to the product ID.



  • @Scarlet Manuka said:

    How would it do that, if all the joins were on the PK?

    select Foo.* from Foo join ItemName on (Foo.ItemId = ItemName.Id) where ItemName.name = 'Bar'



  • @morbiuswilters said:

    @Scarlet Manuka said:
    How would it do that, if all the joins were on the PK?

    select Foo.* from Foo join ItemName on (Foo.ItemId = ItemName.Id) where ItemName.name = 'Bar'

    But that'll only get you one record per Foo, which is what you should be getting. If you have three Foos for the 'Bar' item, you can't tell from this query whether they're all using the same ItemId value or three different ItemId values.



  • @Scarlet Manuka said:

    @morbiuswilters said:
    @Scarlet Manuka said:
    How would it do that, if all the joins were on the PK?

    select Foo.* from Foo join ItemName on (Foo.ItemId = ItemName.Id) where ItemName.name = 'Bar'

    But that'll only get you one record per Foo, which is what you should be getting. If you have three Foos for the 'Bar' item, you can't tell from this query whether they're all using the same ItemId value or three different ItemId values.

    Ah, good point. I'm not sure how they're getting multiple rows, then..



  • @zelmak said:

    Sadly, it is these times, when someone in a position of power makes a statement so obviously and patently stupid, that my brain locks up due to the fact that I can't comprehend how they were able to even get their mouth to say such a thing.

    I believe this is the canonical reaction in these moments:


     



  • Don’t give her her next paycheck; instead give her a duplicate of the last one.



  • @jcsalomon said:

    Don’t give her her next paycheck; instead give her a duplicate of the last one.

    My name is zelmak and I approve of this message.



  •  " I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.