SQL - easiest way to keep DRY



  • I hate non-DRY code. It drives me crazy.

    This is :wtf: code that originated more than 5 years ago.

    There are about 13 stored procedures that have same basic structure, joining 5 of the same tables, etc.

    Each does a grouped by count (list of X with count of Y), details listing the Ys. Both the grouped result set and detail result set have unions because the date math varies on what type of X. This means there are minimally 4 queries in each sp.

    Each one varies by what conditions it requires, various other joins, and subqueries. Each one sets a date to use in the where clause.

    I have to change all of them in the same way (design bug that has only recently been noticed). They are all relatively slow which isn't the issue (though I can't help myself when I see functions in the where clauses).

    Here is what I did so far:

    • Create a view with basic joins in common, added columns to replace the functions in the where clauses and/or simplify the where clauses to be able to follow better
    • Turn the date calculation into a function so it is just on call

    I did this for the first sp. Now I am looking to do the remaining ones...and I just can't copy/pasta > 40 more times. I. Just. Can't.

    I can't spend a lot of time on this...so I have to compromise.

    How do I get the highest ratio of DRY to fastest to complete?

    ETA: Heading home, won't be able to respond for an hour or so.


  • Garbage Person

    Views for common joins. Maybe functions in places, but they tend to do unfortunate things to the query planner.

    SQL isn't very good at DRY.



  • @Karla I've heard some DBMSes will ruin your DRY by, for example, being notoriously bad at planning whenever your queries involve views in any form. For many of them, a view might be an optimization fence. You'll end up with a large view being materialized before anything else even kicks in.

    I know, I know. But abstractions leak, a fact of life.

    Procedural SQL languages are bad at modern coding techniques because of their design. The only way you can work is to make your code editor-friendly and master find-replace. That's usually what I do.



  • @Karla Also, if you invent a kind of preprocessor toolchain with macros to facilitate SQL reuse, this might be cute and clever, but the next person to maintain the shit will hate you. Better master the power of your editor. :)


  • Garbage Person

    @wft said in SQL - easiest way to keep DRY:

    @Karla I've heard some DBMSes will ruin your DRY by, for example, being notoriously bad at planning whenever your queries involve views in any form. For many of them, a view might be an optimization fence. You'll end up with a large view being materialized before anything else even kicks in.

    I know, I know. But abstractions leak, a fact of life.

    Procedural SQL languages are bad at modern coding techniques because of their design. The only way you can work is to make your code editor-friendly and master find-replace. That's usually what I do.

    Another common query planning antipattern:
    If you include a function in the WHERE clause along with several other conditions, too fucking bad, it's still going to execute that function on every goddamned row, including the ones excluded by the other conditions. Oh, and the overhead for calling a function is INSAAAAAAAAAAAANE.

    Scalar valued functions are basically considered harmful anywhere but the SELECT clause.



  • @Weng said in SQL - easiest way to keep DRY:

    Views for common joins. Maybe functions in places, but they tend to do unfortunate things to the query planner.

    I understand not making the view do too much..

    Another common query planning antipattern:
    If you include a function in the WHERE clause along with several other conditions, too fucking bad, it's still going to execute that function on every goddamned row, including the ones excluded by the other conditions. Oh, and the overhead for calling a function is INSAAAAAAAAAAAANE.

    Yes, this why I felt the need to remove them even though performance isn't the complaint. It bothers me.



  • @wft said in SQL - easiest way to keep DRY:

    @Karla I've heard some DBMSes will ruin your DRY by, for example, being notoriously bad at planning whenever your queries involve views in any form. For many of them, a view might be an optimization fence. You'll end up with a large view being materialized before anything else even kicks in.

    I know, I know. But abstractions leak, a fact of life.

    Procedural SQL languages are bad at modern coding techniques because of their design. The only way you can work is to make your code editor-friendly and master find-replace. That's usually what I do.

    Let me know if you think this is a :wtf:

    Have a master sp that does all of the common stuff. That stores data in a temp table.

    For each type of report return the subset of data that applies.

    Possibly using CTEs for necessary functions



  • @Weng said in SQL - easiest way to keep DRY:

    SQL isn't very good at DRY.

    Then my brain clearly likes to torture itself. Since I often prefer working in the database side of things.



  • @Weng said in SQL - easiest way to keep DRY:

    Another common query planning antipattern:
    If you include a function in the WHERE clause along with several other conditions, too fucking bad, it's still going to execute that function on every goddamned row, including the ones excluded by the other conditions. Oh, and the overhead for calling a function is INSAAAAAAAAAAAANE.
    Scalar valued functions are basically considered harmful anywhere but the SELECT clause.

    It does depend strongly on the optimizer and the query engine. But some functions can be folded, which can make it tolerable

    I use DB2, which has explicit types for TIMESTAMP (date plus time) and DATE (no time component). Suppose column T is TIMESTAMP and variable dtlow and dthigh are pure date. This is horribly inefficient:

       SELECT *
       FROM MYTAB
       WHERE DATE(T) BETWEEN :dtlow AND :dthigh
    

    That's because DATE() must be applied to each row to convert it for the comparisons, which basically means no indexing on T. That has to happen because T is not constant across all rows.

    Now consider:

       SELECT *
       FROM MYTAB
       WHERE T BETWEEN TIMESTAMP(:dtlow,'00.00.00') AND TIMESTAMP(:dthigh,'24.00.00')
    

    The variables dtlow and dthigh are constant for the life of the query. Therefore, the TIMESTAMP functions can be pre-folded, which means this is equivalent to T between constant times. DB2 considers that eminently indexable, and efficiency will be high.

    (As an aside, though, guess which form is preferred by my coworkers.)



  • @CoyneTheDup said in SQL - easiest way to keep DRY:

    @Weng said in SQL - easiest way to keep DRY:

    Another common query planning antipattern:
    If you include a function in the WHERE clause along with several other conditions, too fucking bad, it's still going to execute that function on every goddamned row, including the ones excluded by the other conditions. Oh, and the overhead for calling a function is INSAAAAAAAAAAAANE.
    Scalar valued functions are basically considered harmful anywhere but the SELECT clause.

    It does depend strongly on the optimizer and the query engine. But some functions can be folded, which can make it tolerable

    I use DB2, which has explicit types for TIMESTAMP (date plus time) and DATE (no time component). Suppose column T is TIMESTAMP and variable dtlow and dthigh are pure date. This is horribly inefficient:

       SELECT *
       FROM MYTAB
       WHERE DATE(T) BETWEEN :dtlow AND :dthigh
    

    That's because DATE() must be applied to each row to convert it for the comparisons, which basically means no indexing on T. That has to happen because T is not constant across all rows.

    Now consider:

       SELECT *
       FROM MYTAB
       WHERE T BETWEEN TIMESTAMP(:dtlow,'00.00.00') AND TIMESTAMP(:dthigh,'24.00.00')
    

    The variables dtlow and dthigh are constant for the life of the query. Therefore, the TIMESTAMP functions can be pre-folded, which means this is equivalent to T between constant times. DB2 considers that eminently indexable, and efficiency will be high.

    (As an aside, though, guess which form is preferred by my coworkers.)

    I will keep this in mind. I think it will help as we are using a lot of date ranges.


  • I survived the hour long Uno hand

    @Weng
    So, then, what's the right way to accomplish things, if you need to basically accomplish a CASE statement, and then filter on the results of that statement? Or are you just SOL from a performance standpoint?



  • @izzion said in SQL - easiest way to keep DRY:

    @Weng
    So, then, what's the right way to accomplish things, if you need to basically accomplish a CASE statement, and then filter on the results of that statement? Or are you just SOL from a performance standpoint?

    I've used CTE successfully in this case:

    Instead of
    select A, B, C
    from M
    where function(M.B) between 1 and 10

    change to:

    ;with cte (select A, B, C, Num = function(M.B)
    from M
    )
    select A,B,C
    from cte
    where Num between 1 and 10



  • @izzion said in SQL - easiest way to keep DRY:

    @Weng
    So, then, what's the right way to accomplish things, if you need to basically accomplish a CASE statement, and then filter on the results of that statement? Or are you just SOL from a performance standpoint?

    A case in the WHERE clause is pretty much always going to perform badly. But sometimes you can work around that with UNION, especially if you can do UNION ALL. Consider a stupid example:

    WHERE 
        LIM < CASE
                 WHEN REVERSE='Y' THEN -AMT
                 ELSE AMT
              END
    

    Since a row is either REVERSE='Y' or not, this would be a candidate for UNION ALL:

       SELECT COL1, COL2, REVERSE, -AMT AS AMT, LIM
       FROM MYTAB 
       WHERE REVERSE = 'Y' AND LIM < -AMT
         UNION ALL
       SELECT COL1, COL2, REVERSE, AMT, LIM
       FROM MYTAB 
       WHERE REVERSE <> 'Y' AND LIM < AMT
    

    UNION ALL is efficient because there is no sort to eliminate duplicates. The individual queries should be efficient. So in theory this should be much faster than the CASE.


  • I survived the hour long Uno hand

    @Karla

    Hm, yeah, I'm kind of doing that, I guess a lot of my problem with the queries i'm working on comes from the fact that my function is basically going out to one or more different tables (some remote) to try to translate a friendly name into an actual server or vice versa. So, the lack of being able to filter before applying the (multiple) functions is really fugly for performance



  • @izzion said in SQL - easiest way to keep DRY:

    @Karla

    Hm, yeah, I'm kind of doing that, I guess a lot of my problem with the queries i'm working on comes from the fact that my function is basically going out to one or more different tables (some remote) to try to translate a friendly name into an actual server or vice versa. So, the lack of being able to filter before applying the (multiple) functions is really fugly for performance

    Well, YMMV, but if the function is foldable (which requires it be deterministic) and the engine is "paying attention" then it still should work.

    Consider my TIMESTAMP() function above: it is deterministic. Since the input is constant for the life of the query, and the same input yields the same output always, then the function can be prefolded.

    Addendum: Deterministic means given input yields given output for the life of the query: it's a contract that the engine is allowed to assume it doesn't have to be recomputed each row. It sounds like your functions should meet that requirement...unless you routinely flip servers in mid query. So I would declare the functions deterministic.


  • I survived the hour long Uno hand

    @CoyneTheDup

    Hm, yeah, I think that sounds right. Basically the functions are parsing the result of column A to pull a couple pieces of the connection string out, then referencing a local table to see if that server name makes sense, and if not then it looks at a remote table that's more complete (I probably should just pull a cache copy of the remote table, tbh, since it doesn't change that often...)

    Eh, it's functional and it's a temporary thing and it's not impacting operations, but it makes me cry a little on the inside to see 30+ second execution times with hundreds of remote server executions.


  • Garbage Person

    @izzion The WtfCorp way is to abuse #temp tables any time you need to do anything particularly tricky.

    Our database servers are absolute fucking monsters and have entire SSD arrays dedicated to TempDB, though, so YMMV.


  • kills Dumbledore

    @Karla said in SQL - easiest way to keep DRY:

    How do I get the highest ratio of DRY to fastest to complete?

    Dynamic SQL 🚎

    @Weng said in SQL - easiest way to keep DRY:

    The WtfCorp way is to abuse #temp tables any time you need to do anything particularly tricky.

    I've sped up a query from missing the 60s timeout to under a second before by doing nothing but replacing 3-4 temp tables with CTEs.



  • @wft said in SQL - easiest way to keep DRY:

    @Karla Also, if you invent a kind of preprocessor toolchain with macros to facilitate SQL reuse, this might be cute and clever, but the next person to maintain the shit will hate you. Better master the power of your editor.

    That was my first thought too.

    Why not set up a quick build system in the repository where you store SQL code, and use something like ejs to easily crank out may similar SP-s?

    That way you get DRY, without compromising efficiency or having too many strange internal utilities inside a database.



  • @Weng @wft @izzion

    And I'll tag @boomzilla as he told me how much faster CTEs are than temp tables.

    Is this a :wtf: Thoughts?

    Have a master sp that does all of the common stuff. That stores data in a temp table.

    For each type of report return the subset of data that applies.

    Possibly using CTEs for necessary functionsfiltering that are report specific.



  • @Jaloopa said in SQL - easiest way to keep DRY:

    @Karla said in SQL - easiest way to keep DRY:

    How do I get the highest ratio of DRY to fastest to complete?

    Dynamic SQL 🚎

    LOL Yeah, I knew that wasn't the answer here.

    I do use it on the occasion the performance hit isn't too important and means I don't have to change code every time there is a new A and there will always be new As.

    @Weng said in SQL - easiest way to keep DRY:

    The WtfCorp way is to abuse #temp tables any time you need to do anything particularly tricky.

    I've sped up a query from missing the 60s timeout to under a second before by doing nothing but replacing 3-4 temp tables with CTEs.

    Yeah, I've made big improvements with CTEs. Also by breaking up monster joins I can see what it is doing.



  • @cartman82 said in SQL - easiest way to keep DRY:

    @wft said in SQL - easiest way to keep DRY:

    @Karla Also, if you invent a kind of preprocessor toolchain with macros to facilitate SQL reuse, this might be cute and clever, but the next person to maintain the shit will hate you. Better master the power of your editor.

    That was my first thought too.

    Why not set up a quick build system in the repository where you store SQL code, and use something like ejs to easily crank out may similar SP-s?

    That way you get DRY, without compromising efficiency or having too many strange internal utilities inside a database.

    ejs?

    Tell me more...this is way more interesting to me.


  • kills Dumbledore

    @Karla I have had the odd occasion where dynamic SQL was the least WTF way to do what I needed.



  • @Jaloopa said in SQL - easiest way to keep DRY:

    @Karla I have had the odd occasion where dynamic SQL was the least WTF way to do what I needed.

    Yeah, I use it in a dynamic Excel report but only very small amounts.

    I also use it to clean up our report database in TEST and DEV environments to keep the database for becoming too big. I generate dynamic truncate statements.

    Otherwise it is add a new truncate line for every new table...which would get forgotten at times.


  • ♿ (Parody)

    @Weng said in SQL - easiest way to keep DRY:

    If you include a function in the WHERE clause along with several other conditions, too fucking bad, it's still going to execute that function on every goddamned row, including the ones excluded by the other conditions. Oh, and the overhead for calling a function is INSAAAAAAAAAAAANE.

    This is where things like CTEs come in handy, because you can do that initial filtering and then call the function in a later CTE that uses the original CTE as a source.

    I've also been bitten multiple times by views screwing over my performance. I hate the repetition, but not as much as the slow performance.



  • I think to remember is that SPROCS, VIEWS and FUNCTIONS can be done in a sane way that makes thing fairly dry.

    But Data isn't DRY and you have to accept at some point it is going to get soggy.



  • @Karla while i make liberal use of view and stored procedure, be very careful when introducing them. They create chains that the query optimizer has difficulty with and can drastically slow down your query time, and can create a difficult chain to follow if you start going deep. Function tend to also have awful performance in the long run.

    Consider creating stored procs that accept variables to define what type of query they run, so you can query as little as possible to keep things snappy. Minimize joins in your views whenever possible. Conditional if statements mostly only work in procs, views tend to suck for that.



  • @Matches All of this is true.



  • @Karla said in SQL - easiest way to keep DRY:

    ejs?
    Tell me more...this is way more interesting to me.

    It can be any templating language, really, EJS is just my favourite. It's pretty straightforward and fits well with a js-based build system.

    Like most templating languages, EJS was meant to be used with html. I never actually did this in practice, but I don't see why it couldn't work with SQL too.

    It would involve setting up a frontend-like build system - grunt or gulp. Except, instead of javascripts and CSS, you would convert your source procedure.sql.ejs files into procedure_v1sql, procedure_v2.sql etc.



  • @cartman82 Don't listen to cartman you need to use Jade ...



  • @lucas1 said in SQL - easiest way to keep DRY:

    @cartman82 Don't listen to cartman you need to use Jade ...

    No. Jade is too html-centric.
    I hardly know how to write HTML in that thing. SQL would be impossible.

    EJS is way more straightforward.

    Everything between <%= and %> is javascript. Everything outside is whatever.

    That's it. It's all you need to know.

    Easy to parse and reason. Less easy to type, but oh well.



  • @cartman82 I was joking .. I think Jade is shit because the white space can break it.

    EDIT: before any pythonistas come for me it isn't obviously mentioned when you are using something like express as a framework.



  • @lucas1 said in SQL - easiest way to keep DRY:

    @cartman82 I was joking .. I think Jade is shit because the white space can break it.
    EDIT: before any pythonistas come for me it isn't obviously mentioned when you are using something like express as a framework.

    I did some things in jade. It's ok for little html things. But in the end, whatever gains you make by not having to type as much are lost as you're googling every 5 minutes how to achieve this or that.

    Maybe if I was working in it ALL the time... But probably not worth it.



  • @cartman82 I agree. I think it looks good at first but it has a load of problems.

    I prefer something like TWIG or Jinja ... I think Razor lets you put to much code into the view.



  • @Matches said in SQL - easiest way to keep DRY:

    @Karla while i make liberal use of view and stored procedure, be very careful when introducing them. They create chains that the query optimizer has difficulty with and can drastically slow down your query time, and can create a difficult chain to follow if you start going deep. Function tend to also have awful performance in the long run.

    Consider creating stored procs that accept variables to define what type of query they run, so you can query as little as possible to keep things snappy. Minimize joins in your views whenever possible. Conditional if statements mostly only work in procs, views tend to suck for that.

    I understand all this.

    Performance is not my objective (other than not making it worse) ❗

    I can't stand the idea of copy/pasting more than 40 times.

    DRY > optimization



  • @cartman82 said in SQL - easiest way to keep DRY:

    @Karla said in SQL - easiest way to keep DRY:

    ejs?
    Tell me more...this is way more interesting to me.

    It can be any templating language, really, EJS is just my favourite. It's pretty straightforward and fits well with a js-based build system.

    Like most templating languages, EJS was meant to be used with html. I never actually did this in practice, but I don't see why it couldn't work with SQL too.

    It would involve setting up a frontend-like build system - grunt or gulp. Except, instead of javascripts and CSS, you would convert your source procedure.sql.ejs files into procedure_v1sql, procedure_v2.sql etc.

    I know ColdFusion** (and the webpart is that).
    I have generated SQL (and other CF files) with it before.

    I was hoping for a a SQL solution.

    **I know, ColdFusion is TR :wtf:



  • @Karla Cold fusion and co are basically ASP.NET circa WEBFORMS.



  • @lucas1 said in SQL - easiest way to keep DRY:

    @Karla Cold fusion and co are basically ASP.NET circa WEBFORMS.

    We were just talking about templating, yes?



  • @Karla

    Yes were were. ASP.NET Webforms works a bit like cold fusion.

    Usually the way to optimise stuff is to write sprocs, cache everything to can server side and hope for the best depending on the infrastructure.



  • @lucas1 said in SQL - easiest way to keep DRY:

    @Karla

    Yes were were. ASP.NET Webforms works a bit like cold fusion.

    Usually the way to optimise stuff is to write sprocs, cache everything to can server side and hope for the best depending on the infrastructure.

    I'm not trying to optimize.

    I have 13 sprocs that are all basically the same with 4 queries each.

    I don't want to copy/paste the fix I have to do.



  • @Karla A sproc if they are a bit different should be maybe copy and pasted.

    You shouldn't change a sproc if it is in place because unless you are adding columns because there is plenty of code that relies on column indexes for the purposes of speed.

    Also you should only bring specific data back.

    I know it is anti normal programming, but that is anything with SQL.



  • @lucas1 said in SQL - easiest way to keep DRY:

    @Karla A sproc if they are a bit different should be maybe copy and pasted.

    You shouldn't change a sproc if it is in place because unless you are adding columns because there is plenty of code that relies on column indexes for the purposes of speed.

    Also you should only bring specific data back.

    I know it is anti normal programming, but that is anything with SQL.

    Um, thanks.



  • @Karla What I am trying to say is that you can't be perfect because of how SQL works. You can take that reality or leave it.



  • @lucas1 I'm not expecting perfect, I want better and at least more interesting than copy/paste.



  • @Karla said in SQL - easiest way to keep DRY:

    DRY > optimization

    All depends on the size of your data set.



  • @Karla I think people are trying to tell you here that you can't kinda have it as DRY as you want.



  • @lucas1 you can design your processes to not be redundant as all fuck.



  • @Karla said in SQL - easiest way to keep DRY:

    Have a master sp that does all of the common stuff. That stores data in a temp table.
    @Karla said in SQL - easiest way to keep DRY:
    That stores data in a temp table.

    This. ↑↑↑↑↑

    Maybe you have measured that the previous versions of queries generated a temporary table anyway. Then just maybe it's ok.

    The general drift, though, is to not resort to creating temporary tables if at all possible. Also, there are going to be re-entrancy issues.

    Also, now you've got a routine that dutifully does all the stuff you only need 1/X of each time.



  • Also, I remembered when CTEs started to be a thing in PostgreSQL, blogs were all like, "be cautious! CTEs are an optimization fence in PostgreSQL!". They still are. PostgreSQL still executes CTEs separately from everything else. And yet I did SQL that was orders of magnitude faster with CTEs than without. Apparently, querying against a large dataset, making a small joinable table that fits in the RAM, and then running the main query against that did the trick, as opposed to doing it all at once.

    I know this is not how it should be done. The query planner, in the end, should be smart enough to deduce such stuff. I don't have to estimate that in 99% of the cases I'll only need 100 records from that billion-record table. In the end, CTEs should only be syntactic sugar for added readability (except maybe WITH RECURSIVE). But the reality is what it is, and abstractions still leak.



  • @wft said in SQL - easiest way to keep DRY:

    @Karla said in SQL - easiest way to keep DRY:

    Have a master sp that does all of the common stuff. That stores data in a temp table.
    @Karla said in SQL - easiest way to keep DRY:
    That stores data in a temp table.

    This. ↑↑↑↑↑

    Maybe you have measured that the previous versions of queries generated a temporary table anyway. Then just maybe it's ok.

    The general drift, though, is to not resort to creating temporary tables if at all possible. Also, there are going to be re-entrancy issues.

    Also, now you've got a routine that dutifully does all the stuff you only need 1/X of each time.

    Thank you.

    Performance is not my priority. I just hate seeing obvious things that are known to cause problems.


Log in to reply