Let's do it in a complicated way



  • I'm working as a project manager, and my one of my responsibilities is to shield the developers against the question "Why is that taking so long?!". But sometimes, I have to ask the question by myself.


    We're using an open source CMS that supports users and groups; the latter are used, among other things, for a fairly elaborate access rights management. Though the CMS is not maintained by our developers, we frequently take existing modules and change them on customer request into something that fits their workflow better, and for this project, groups are very central in the customer's business logic and practically touched everywhere. And at each touching point, the developers are kind of ... reserved when it comes to proposed changes.


    Today, I fooled around in the administration interface, and I found an option to create a newsletter recipient list from a "SQL query". Awesome. One of the developers told me that it's not very flexible though, since it doesn't support JOINs or such. Not awesome.


    I was able to pretty quickly build the exact recipient list I wanted, when I noticed that the SQL that was spit in my face (WTF #1 - this interface is built, among others, for editors who just write the newsletter) was a bit unexpected; for selecting all users that belong to either group, I got

    WHERE groups LIKE %3,17%
    

    with 3 and 17 being respective group ids (I assumed). It turns out that each user record has a groups column that contains a comma-separated list of group ids that are not even sorted. This means that the above query will only find users that belong to both groups and have been added to these groups in that exact order with no other group in between. (WTF #2) However, the recipient list I got was correct - all users of either group. (WTF #3 Exercise: write a query that selects all users that are in either group.


    So I called our lead developer in for a cigarette conference, and he explained that this "issue" is known to the CMS's maintainers and that they're moving away from these type of queries (yay!) to replace them with MySQL's "IN SET" clauses (wat?) - but it takes some time since somehow, performance is suffering ( ... WTF #4).


    About fixing it with a sane join table - that would break backwards compatibility with a load of plugins. So - next version. Maybe.

    BLAKEYRAT WHY DIDN'T YOU WARN ME!



  • This reminds me of IPB, they have some fields like this too. Except that I think they are sorted and for some inane reason have commas at both the start and end.



  • Comma-seperated values FTW!  Seriously though, who the fuck ever thought using comma seperated values when you HAVE A FUCKING RELATIONAL DATABASE right in front of you is a good idea?  Is/was there some sort of anti-pattern fad that I have never heard of, a la "two-column table to store key-value pairs of all the objects".

    Sadly, I've seen this before, even in "Enterprise" products *cough* commerce server *cough*...



  • @C-Octothorpe said:

    Comma-seperated values FTW!  Seriously though, who the fuck ever thought using comma seperated values when you HAVE A FUCKING RELATIONAL DATABASE right in front of you is a good idea?

    If you're using stored procs before MSSQL server 2008, it's the best way to get in a list of values :(



  • @Sutherlands said:

    @C-Octothorpe said:

    Comma-seperated values FTW!  Seriously though, who the fuck ever thought using comma seperated values when you HAVE A FUCKING RELATIONAL DATABASE right in front of you is a good idea?

    If you're using stored procs before MSSQL server 2008, it's the best way to get in a list of values :(

    It pains me to say this, but this is *ouch* where *ow* XML *oh the pain* is the Correct Solution (TM).  I think I need to go to the hospital...

    EDIT: I do kind of get why you would use CSVs, but how do you join or in this particular case, use it reliably in a where clause?  Forget about the huge impact on performance, you don't even get type-saftey...



  • @henke37 said:

    This reminds me of IPB, they have some fields like this too. Except that I think they are sorted and for some inane reason have commas at both the start and end.

    That's easy: You can query for "%,3,%" in this way. Without the commas at the start and at the end, the query gets waaaaaay more ugly.



  • I'd say XML is the solution in the case that you need more complicated data than, say, a list of IDs (and is how we're using it when we need to pass in a complicated structure)... but if you just need a list of IDs, I'd still stick with CSV.

    On a side note, we've upgrade to 2008, and are considering updating some procs to use the datavalue parameter.



  • @C-Octothorpe said:

    I do kind of get why you would use CSVs, but how do you join or in this particular case, use it reliably in a where clause?

    I suppose you could do this:

    WHERE groups LIKE '%,3,%' AND groups LIKE '%,17,%'



  • I just noticed this, but are the IDs hard-coded (the %3,17% I mean)?


  • ♿ (Parody)

    @beermouse said:


    WHERE groups LIKE %3,17%



    with 3 and 17 being respective group ids (I assumed). It turns out that each user record has a groups column that contains a comma-separated list of group ids that are not even sorted. This means that the above query will only find users that belong to both groups and have been added to these groups in that exact order with no other group in between. (WTF #2) However, the recipient list I got was correct - all users of either group. (WTF #3 Exercise: write a query that selects all users that are in either group.

    It also gets fun with users in groups 13 and 170. OK, maybe 170 is ridiculous, but 13 isn't.



  • @boomzilla said:

    It also gets fun with users in groups 13 and 170. OK, maybe 170 is ridiculous, but 13 isn't.
     

    Googling for "group 170" results in "about 353,000 results".  Not quite my idea of "ridiculous".


  • ♿ (Parody)

    @da Doctah said:

    @boomzilla said:
    It also gets fun with users in groups 13 and 170. OK, maybe 170 is ridiculous, but 13 isn't.

    Googling for "group 170" results in "about 353,000 results".  Not quite my idea of "ridiculous".

    TRWTF is that just using google would probably be easier than querying their braindead schema.



  • @henke37 said:

    This reminds me of IPB, they have some fields like this too. Except that I think they are sorted and for some inane reason have commas at both the start and end.

    WUT? Really? I thought Invision was one of the more sane lumps of code I've encountered. Then again, I didn't prod around under the bonnet too much, I've only touched on a few of its tables.

    @C-Octothorpe said:

    Seriously though, who the fuck ever thought using comma seperated values when you HAVE A FUCKING RELATIONAL DATABASE right in front of you is a good idea?

    Beat me to it. Seriously misunderstood denormalisation?



  • @beermouse said:

    BLAKEYRAT WHY DIDN'T YOU WARN ME!

    I don't even know you.


  • ♿ (Parody)

    @Cassidy said:

    @C-Octothorpe said:
    Seriously though, who the fuck ever thought using comma seperated values when you HAVE A FUCKING RELATIONAL DATABASE right in front of you is a good idea?

    Beat me to it. Seriously misunderstood denormalisation?

    I'd guess that the guy had heard about bit flags third hand, and thought it was cool how you could pack information into a single column.



  • @boomzilla said:

    I'd guess that the guy had heard about bit flags third hand, and thought it was cool how you could pack information into a single column.

    serializes entire database to XML, then stuffs that XML in a nvarchar(MAX) column as a "backup"



  •  @C-Octothorpe said:

    Comma-seperated values FTW!  Seriously though, who the fuck ever thought using comma seperated values when you HAVE A FUCKING RELATIONAL DATABASE right in front of you is a good idea?  Is/was there some sort of anti-pattern fad that I have never heard of, a la "two-column table to store key-value pairs of all the objects".

    (snip)

    "Stoolbox antipattern" -- Imagine someone trying to repair a bicycle. They grab a nice sized toolbox full of shiny wrenches (s/wrench/spanner/g if you prefer) and other well-suited tools.  Unaware of the treasures within the toolbox, they proced to use the toolbox as a stool to sit upon while they bash the bike with a large rock... producing a shitty result.

    I clam the concept, dhromed added the scatological imagry, cconroy coined the name. Open-source meme development while you wait, right here on the sidebar!

     



  • @boomzilla said:

    I'd guess that the guy had heard about bit flags third hand, and thought it was cool how you could pack information into a single column.

    .. and didn't hang around to understand how it gets unpacked when a specific flag is required. Little knowledge and everything...

    @RichP said:

    I clam the concept,

    I dunno... something smells fishy about it.



  • @RichP said:

    dhromed added the scatological imagry,
     

    whu?



  • @Cassidy said:

    I dunno... something smells fishy about it.
     

    Sorry *burp*

    Had the salmon for lunch.



  • @C-Octothorpe said:

    EDIT: I do kind of get why you would use CSVs, but how do you join or in this particular case, use it reliably in a where clause?  Forget about the huge impact on performance, you don't even get type-saftey...


    Type safety isn't exactly something that SQL is famous for...



  • @dhromed said:

    @RichP said:
    dhromed added the scatological imagry,
     

    whu?

     

    There's no place like 127.0.0.1 - looks like you've found my stash Libraries comics!



  • @dhromed said:

    @RichP said:

    dhromed added the scatological imagry,
     

    whu? wat?

     

    OTTFY.

     



  • @dhromed said:

    @RichP said:

    dhromed added the scatological imagry,
     

    whu?

     

    Back in the "Deleting old log entries the wrong way, done badly" sidebar thread... your contribution was the shit reference. Just trying to be sure to cite everyone who contributed :-)

     

     

     



  • @pjt33 said:

    @C-Octothorpe said:

    EDIT: I do kind of get why you would use CSVs, but how do you join or in this particular case, use it reliably in a where clause?  Forget about the huge impact on performance, you don't even get type-saftey...

    Type safety isn't exactly something that SQL is famous for...
    What do you mean?  I know SQL Server will implicitly cast or convert data types if the query writer doesn't know what they're doing (i.e. comparing a string representation of a datetime to a datetime type), but other than that, you'd have to be more specific...


  • @C-Octothorpe said:

    EDIT: I do kind of get why you would use CSVs, but how do you join or in this particular case, use it reliably in a where clause?  Forget about the huge impact on performance, you don't even get type-saftey...

    Since you edited and I didn't see this:

    The stored proc extracts the IDs and puts them in a temp table.  If anything isn't the right type, throw an error.  Then it's a JOIN instead of a WHERE IN



  • @Sutherlands said:

    @C-Octothorpe said:

    EDIT: I do kind of get why you would use CSVs, but how do you join or in this particular case, use it reliably in a where clause?  Forget about the huge impact on performance, you don't even get type-saftey...

    Since you edited and I didn't see this:

    The stored proc extracts the IDs and puts them in a temp table.  If anything isn't the right type, throw an error.  Then it's a JOIN instead of a WHERE IN

    I've done that before, where in the database it's a normalized table, but I'll coalesce the list of IDs on a select query and split the CSVs on an insert or update.  Then you can easily do an indexed join or where.

    BTW, did you literally mean "temp table"?  Why not create a real table?



  • @boomzilla said:

    @da Doctah said:
    @boomzilla said:
    It also gets fun with users in groups 13 and 170. OK, maybe 170 is ridiculous, but 13 isn't.
    Googling for "group 170" results in "about 353,000 results".  Not quite my idea of "ridiculous".

    TRWTF is that just using google would probably be easier than querying their braindead schema.

    I like this idea.  Have your data as pages of text and put them up on the web along with a sitemap as the "schema".  Then just do a "site:db.example.com" search.  You'd need to reversibly encrypt the pages, of course.



  • @C-Octothorpe said:

    BTW, did you literally mean "temp table"?  Why not create a real table?

    Again, I'm not a DBA.  The proc basically has "DECLARE @TableName TABLE (columns)", "INSERT @TableName(columns) SELECT values FROM fn_convertListToTable(@ListParam)"

    That's a temporary table, right (only in the sp)?  Is there a better way to do it?



  • @Sutherlands said:

    @C-Octothorpe said:

    BTW, did you literally mean "temp table"? Why not create a real table?

    Again, I'm not a DBA. The proc basically has "DECLARE @TableName TABLE (columns)", "INSERT @TableName(columns) SELECT values FROM fn_convertListToTable(@ListParam)"

    That's a temporary table, right (only in the sp)? Is there a better way to do it?

    No, temp tables start with a hash. You've created a table variable.



  • @Sutherlands said:

    @C-Octothorpe said:

    BTW, did you literally mean "temp table"?  Why not create a real table?

    Again, I'm not a DBA.  The proc basically has "DECLARE @TableName TABLE (columns)", "INSERT @TableName(columns) SELECT values FROM fn_convertListToTable(@ListParam)"

    That's a temporary table, right (only in the sp)?  Is there a better way to do it?

    <pedant>DBAs don't create tables - they create and maintain databases, in which they create users (PL/SQL devs) that create tables, in the same way that sysadmins don't really create Word docs, they create an environment for users to do so.</pedant>

    In terms of the temp table bit - temporary tables are used as a way of (as the name suggests) temporarily holding data where the table is flushed out once the query/transaction is over.

    The way I would have done permissions is to have a userID as a PK in a members table, groupID as a PK of another table containing group details, then a lookup table between the two with a unique UID+GID per row to identify which groups a user belonged to. No reason to have these as temp tables, they'll be accessed quite often.

    Anyone know the processing overhead of splitting out a bitfield? Is this plus pulling one row likely to be quicker than pulling n rows in a join? I'd have thought the latter would be quicker (and be likely for caching to speed up things even further).



  • @Cassidy said:

    <pedant>DBAs don't create tables - they create and maintain databases, in which they create users (PL/SQL devs) that create tables, in the same way that sysadmins don't really create Word docs, they create an environment for users to do so.</pedant>

    If you're going to be a pedantic dickweed, at least be correct. Christ.


  • ♿ (Parody)

    @blakeyrat said:

    @Cassidy said:
    <pedant>DBAs don't create tables - they create and maintain databases, in which they create users (PL/SQL devs) that create tables, in the same way that sysadmins don't really create Word docs, they create an environment for users to do so.</pedant>

    If you're going to be a pedantic dickweed, at least be correct. Christ.

    What's wrong with what he said? I mean, I guess sysadmins can create their own Word documents, but they don't do it for the users.



  • @Sutherlands said:

    @C-Octothorpe said:

    BTW, did you literally mean "temp table"?  Why not create a real table?

    Again, I'm not a DBA.  The proc basically has "DECLARE @TableName TABLE (columns)", "INSERT @TableName(columns) SELECT values FROM fn_convertListToTable(@ListParam)"

    That's a temporary table, right (only in the sp)?  Is there a better way to do it?

    Oh sorry, my bad.  My reading comprehension leaves something to be desired.  Having said that, blakey was right, its a table variable which is thrown away after the session (stored proc in this case) completes.  I'm guessing they just load the table variable and join on the user and group table.  Somewhere around this point the developer should have said, "hmm, I keep creating this table, which intersects two other tables with some sort of relationship. If only there was a way to create a permanent table, which would hold this data, be able to index it and enforce referential integrity. [Long pause while his/her brain reboots]  Oooo, donuts!".

     



  • @Cassidy said:

    <pedant>DBAs don't create tables - they create and maintain databases, in which they create users (PL/SQL devs) that create tables, in the same way that sysadmins don't really create Word docs, they create an environment for users to do so.</pedant>

    I'm still not a DBA.

    @Cassidy said:

    In terms of the temp table bit - temporary tables are used as a way of (as the name suggests) temporarily holding data where the table is flushed out once the query/transaction is over.

    So then it seems that while this ISN'T a temporary table, it SHOULD be?

    @Cassidy said:

    The way I would have done permissions is to have a userID as a PK in a members table, groupID as a PK of another table containing group details, then a lookup table between the two with a unique UID+GID per row to identify which groups a user belonged to. No reason to have these as temp tables, they'll be accessed quite often.

    Except that the table is specifically there to hold the multi-valued data that is passed in as a parameter so that it can be used in JOINs and WHEREs. 


  • @C-Octothorpe said:

    @Sutherlands said:

    @C-Octothorpe said:

    BTW, did you literally mean "temp table"?  Why not create a real table?

    Again, I'm not a DBA.  The proc basically has "DECLARE @TableName TABLE (columns)", "INSERT @TableName(columns) SELECT values FROM fn_convertListToTable(@ListParam)"

    That's a temporary table, right (only in the sp)?  Is there a better way to do it?

    Oh sorry, my bad.  My reading comprehension leaves something to be desired.  Having said that, blakey was right, its a table variable which is thrown away after the session (stored proc in this case) completes.  I'm guessing they just load the table variable and join on the user and group table.  Somewhere around this point the developer should have said, "hmm, I keep creating this table, which intersects two other tables with some sort of relationship. If only there was a way to create a permanent table, which would hold this data, be able to index it and enforce referential integrity. [Long pause while his/her brain reboots]  Oooo, donuts!".

    Since I have apparently taken > 5 minutes to write my previous response... how would you go about putting an ARGUMENT to a stored proc in a PERMANENT table?

  • :belt_onion:

    @blakeyrat said:

    @Cassidy said:
    <pedant>DBAs don't create tables - they create and maintain databases, in which they create users (PL/SQL devs) that create tables, in the same way that sysadmins don't really create Word docs, they create an environment for users to do so.</pedant>

    If you're going to be a pedantic dickweed, at least be correct. Christ.

    What's incorrect about that? At least, that seems to be how it works in my environment. You talk to the DBAs if you need a tablespace extended or performance is suffering or something; developers create the schema, and they submit DDLs and DMLs to the DBAs to make changes (I think there's also another team that does it).


  • ♿ (Parody)

    @Sutherlands said:

    ... how would you go about putting an ARGUMENT to a stored proc in a PERMANENT table?

    This thread is probably too vague to give a good answer for your case, but wouldn't you pass the ID or whatever that would be used to condition your query in the stored proc? Perhaps you'd need to overhaul the procedure, too, if that doesn't make sense. Once you have a sane way to do what it was doing with a table variable, you may be in a situation to find a much more reasonable and performant way to do whatever you were trying to do. That's been my experience, anyways. It's easy to end up doing lots of stupid stuff because you're focusing on the wrong part of the problem.



  • @Sutherlands said:

    @C-Octothorpe said:

    @Sutherlands said:

    @C-Octothorpe said:

    BTW, did you literally mean "temp table"?  Why not create a real table?

    Again, I'm not a DBA.  The proc basically has "DECLARE @TableName TABLE (columns)", "INSERT @TableName(columns) SELECT values FROM fn_convertListToTable(@ListParam)"

    That's a temporary table, right (only in the sp)?  Is there a better way to do it?

    Oh sorry, my bad.  My reading comprehension leaves something to be desired.  Having said that, blakey was right, its a table variable which is thrown away after the session (stored proc in this case) completes.  I'm guessing they just load the table variable and join on the user and group table.  Somewhere around this point the developer should have said, "hmm, I keep creating this table, which intersects two other tables with some sort of relationship. If only there was a way to create a permanent table, which would hold this data, be able to index it and enforce referential integrity. [Long pause while his/her brain reboots]  Oooo, donuts!".

    Since I have apparently taken > 5 minutes to write my previous response... how would you go about putting an ARGUMENT to a stored proc in a PERMANENT table?
    I think we're confusing things here...  I was speaking more to the fact that the groupid column in the OP is a CSV, which is wrong.  What you said about splitting the CSV passed to the stored proc is correct (at least pre SQL 2008), or you can use XML, or another method I have yet to hear of.  I was just making fun of the tard mentioned in the OP who thought having CSVs in a column is better than an intersection table.



  • Gotcha... paragraph breaks would be nice to separate out the two ideas :P



  • @Sutherlands said:

    Gotcha... paragraph breaks would be nice to separate out the two ideas :P

    Christ, you're asking for good communication skills from a software developer?!  You're lucky I remembered to wear pants to work today...


  • @boomzilla said:

    @Sutherlands said:
    ... how would you go about putting an ARGUMENT to a stored proc in a PERMANENT table?
    This thread is probably too vague to give a good answer for your case, but wouldn't you pass the ID or whatever that would be used to condition your query in the stored proc? Perhaps you'd need to overhaul the procedure, too, if that doesn't make sense. Once you have a sane way to do what it was doing with a table variable, you may be in a situation to find a much more reasonable and performant way to do whatever you were trying to do. That's been my experience, anyways. It's easy to end up doing lots of stupid stuff because you're focusing on the wrong part of the problem.
    Ok, stored proc named GetCustomersByIds - needs to take in a list of arguments, so a CSV is used.  Stored proc is something like:

    TempTable = SeparateCSVIntoTable(@IdParam)

    Select * From Customer c

    join TempTable t on c.Id = t.Id



  • @C-Octothorpe said:

    @Sutherlands said:

    Gotcha... paragraph breaks would be nice to separate out the two ideas :P

    Christ, you're asking for good communication skills from a software developer?!  You're lucky I remembered to wear pants to work today...
    I would say your coworkers are the lucky ones.


  • @Sutherlands said:

    Except that the table is specifically there to hold the multi-valued data that is passed in as a parameter so that it can be used in JOINs and WHEREs. 

    Temp more than permanent, yup. Although if the values were passed as a parameter to a SP, do they need to be held in a table whilst the JOIN/WHERE is parsed, or can they just be held in variables/objects within the SP?

    (disclaimer: I'm not a PL/SQL dev, so don't know too much about SP stuff. Educate-me-do.)



  • @ekolis said:

    @C-Octothorpe said:
    I do kind of get why you would use CSVs, but how do you join or in this particular case, use it reliably in a where clause?

    I suppose you could do this:

    WHERE groups LIKE '%,3,%' AND groups LIKE '%,17,%'

     

     

    ... where ','||groups||',' like '%,3,%' and ...

    full table scan dead ahead, probably not a problem with a CMS.



  • @Cassidy said:

    @Sutherlands said:

    Except that the table is specifically there to hold the multi-valued data that is passed in as a parameter so that it can be used in JOINs and WHEREs. 

    Temp more than permanent, yup. Although if the values were passed as a parameter to a SP, do they need to be held in a table whilst the JOIN/WHERE is parsed, or can they just be held in variables/objects within the SP?

    (disclaimer: I'm not a PL/SQL dev, so don't know too much about SP stuff. Educate-me-do.)

    Dont forget there is a difference [in SQLServer at least] between a "temporary table" and a "table variable"!



  • @Cassidy said:

    @C-Octothorpe said:
    Seriously though, who the fuck ever thought using comma seperated values when you HAVE A FUCKING RELATIONAL DATABASE right in front of you is a good idea?

    Beat me to it. Seriously misunderstood denormalisation?

    I recall having read an article a while back, wherein this was touted as a legitimate design alternative. In the context of optimization, and taking into account numbers of rows, maximum number of distinct values in the CSV, SQL capabilities of the DB, data access patterns, amount of server memory, understanding of legitimate reasons for denormalization, etc, etc... it's at least conceivable that such a structure might work better for the desired purpose than a proper intersection table would. Of course, such advice is easily misunderstood by the clueless who are in search of advice.

    Also, taking into account when the article was written, which may have been in the previous millennium.



  • @TheCPUWizard said:

    Dont forget there is a difference [in SQLServer at least] between a "temporary table" and a "table variable"!

    We can't forget something that we never knew.  Have you missed the comments in this thread of people saying "I'm not a <insert DB role here>"?

  • ♿ (Parody)

    @Sutherlands said:

    @TheCPUWizard said:
    Dont forget there is a difference [in SQLServer at least] between a "temporary table" and a "table variable"!

    We can't forget something that we never knew.  Have you missed the comments in this thread of people saying "I'm not a <insert DB role here>"?

    FTFTCW



  • @Sutherlands said:

    @TheCPUWizard said:

    Dont forget there is a difference [in SQLServer at least] between a "temporary table" and a "table variable"!

    We can't forget something that we never knew.  Have you missed the comments in this thread of people saying "I'm not a <insert DB role here>"?

     NOT directed at Sutherlands.... There is a world of difference between someone asking questions because "I'm not a <insert DB role here>" and the WTF of people making assertions about various DB capabilities/techniques in spite of admitting "I'm not a <insert DB role here>"..


Log in to reply