Connect the dots



  • The query designer in SQL Server Management Studio is supposed to make writing queries easier.  You just select the tables you want data from and connect the tables by the columns they join on.  Its just as easy as ... connecting the dots! The problem I have with it, is that it if the query is anything complex it produces results like this:

    SELECT lots, of, shit

    FROM table1 RIGHT OUTER JOIN
    table2 ON table1.table1Id = table2.table1Id RIGHT OUTER JOIN
    table3 RIGHT OUTER JOIN
    table4 LEFT OUTER JOIN
    table5 ON table4.table5Id = table5.table5Id LEFT OUTER JOIN
    table6 ON table4.table4Id = table6.table4Id ON table3.table4Id = table4.table4Id LEFT OUTER JOIN
    table7 ON table4.colA = table7.colA AND table4.table4Id = table7.table4Id LEFT OUTER JOIN
    table8 RIGHT OUTER JOIN
    table9 RIGHT OUTER JOIN
    table10 AS table10_1 INNER JOIN
    table10 INNER JOIN
    table11 ON table10.table10Id = table11.colB ON table10_1.table10_1Id = table11.colB ON
    table9.table9Id = table11.colC ON table8.table8Id = table11.table8Id ON table4.table4Id = table11.table4Id LEFT OUTER JOIN
    table12 LEFT OUTER JOIN
    table13 ON table12.table13Id = table13.table13Id ON table4.table4Id = table12.table4Id LEFT OUTER JOIN
    table14 RIGHT OUTER JOIN
    table15 RIGHT OUTER JOIN
    table16 ON table15.table15Id = table16.table15Id ON table14.table14Id = table16.table14Id ON
    table4.table4Id = table16.table4Id LEFT OUTER JOIN
    table17 ON table11.table11Id = table17.table11Id ON table2.colA = table4.colA LEFT OUTER JOIN
    table18 RIGHT OUTER JOIN
    table19 RIGHT OUTER JOIN
    table20 ON table19.table19Id = table20.table19Id ON
    table18.table18Id = table20.table20Id ON
    table4.table4Id = table20.table4Id LEFT OUTER JOIN
    table21 ON table7.table21Id = table21.table21Id LEFT OUTER JOIN
    table22 ON table16.table22Id = table22.table22Id

    WHERE table4.ColD > 3

    The table names aren't all that important but look at those joins! It's almost impossible to look at this query and tell if it is correct.  In fact, the person who deisgned this query was not able to tell that it is not correct.  So I ask around the office and get told "Well it doesn't really do complex queries well, but it's really good for simple queries with just a few joins."  If its just a simple query, why bother with the fancy GUI at all?  Just write the query.

    tldr: The SQL Server Management Studio query designer is only usefull for people who should not be writing queries.



  • @Peraninth said:

    tldr: The SQL Server Management Studio query designer is only usefull for people who should not be writing queries.

    That is correct. If you want a readable query, you have to write it yourself. I does complex queries fine, but its purpose is for running ad-hoc queries, not as a query writer, so don't expect it to be readable.



  • @Peraninth said:

    tldr: The SQL Server Management Studio query designer is only usefull for people who should not be writing queries.

    Hah! Would be great for my coworkers then.


  • BINNED

    @Peraninth said:

    tldr: The SQL Server Management Studio query designer is only usefull for people who should not be writing queries.

    True, but unfortunately, people who use the query designer generally don't know that they shouldn't be writing queries.



  • I have had some complex queries that query developer tools would make ugly, but I dont blame the tools for that.  I blame the person who decided to over normalize the tables I was trying work with.  I had 3 tables that I was always joining together and they had a 1 to 1 to 1 relationship.



  • TRWTF is needing to join 22 different tables in one query in the first place... seriously, what is this, some executive uber-report?


  • ♿ (Parody)

    @ekolis said:

    TRWTF is needing to join 22 different tables in one query in the first place... seriously, what is this, some executive uber-report?

    22 doesn't seem terribly excessive to me, unless all of you queries are like that. But if the complex queries in your system are much smaller than that, then you can count yourself lucky that you have such a simple system. Unless it's just because the people who designed it thought that normalization just meant buying the same RDBMS as most other people.



  •  You should try the Entity Framework Designer. I have seen some dumb code generators in my time, but it wasn't until this software that my IDE actively tried to sabotage me. One of these days I should sit down and write a rant about all the stupid shit they've done with Visual Studio and some of the .NET frameworks. I don't know what the hell is going on in Microsoft but they really need to stop publishing half-baked crap and expect us to write production code with it.



  • @DOA said:

     You should try the Entity Framework Designer. I have seen some dumb code generators in my time, but it wasn't until this software that my IDE actively tried to sabotage me. One of these days I should sit down and write a rant about all the stupid shit they've done with Visual Studio and some of the .NET frameworks. I don't know what the hell is going on in Microsoft but they really need to stop publishing half-baked crap and expect us to write production code with it.

    Please do; I've always found the Entity Framework to be an excellent tool for my use-cases.



  • @boomzilla said:

    @ekolis said:
    TRWTF is needing to join 22 different tables in one query in the first place... seriously, what is this, some executive uber-report?

    22 doesn't seem terribly excessive to me, unless all of you queries are like that. But if the complex queries in your system are much smaller than that, then you can count yourself lucky that you have such a simple system. Unless it's just because the people who designed it thought that normalization just meant buying the same RDBMS as most other people.

    I don't think 22 joins would be unusual for some of the complex systems I've worked on, but usually it would be structured as a few joins on top of views which themselves do a few joins, and so on.. 22 joins in a single query does seem a bit odd.



  • @Peraninth said:

    It's almost impossible to look at this query and tell if it is correct.

    Really? I think it is quite easy. I see at least two outer joins so I know it is not correct.



  • Using Left Outer Joins and Right Outer Joins in the same query... that ought to put you on Yellow Alert right there!



  • @Mason Wheeler said:

    Using Left Outer Joins and Right Outer Joins in the same query... that ought to put you on Yellow Alert right there!
    That's an artifact of how the designer works. It puts the tables in the from clause in the order they are added to the designer surface. If you add the inner table before the outer table, you'll end up with a right join. It's practically impossible to add 23 tables in such a way as to eliminate right joins.



  • @boomzilla said:

    @ekolis said:
    TRWTF is needing to join 22 different tables in one query in the first place... seriously, what is this, some executive uber-report?

    22 doesn't seem terribly excessive to me, unless all of you queries are like that.

    It's excessive to me - I would have created views by joining related tables, then joined those in the final query/stored procedure.

    @boomzilla said:

    Unless it's just because the people who designed it thought that normalization just meant buying the same RDBMS as most other people.

    .. or that the people who did the normalisation have never heard of de-normalising a schema to aid daily operation.


  • ♿ (Parody)

    @Cassidy said:

    @boomzilla said:

    @ekolis said:
    TRWTF is needing to join 22 different tables in one query in the first place... seriously, what is this, some executive uber-report?

    22 doesn't seem terribly excessive to me, unless all of you queries are like that.


    It's excessive to me - I would have created views by joining related tables, then joined those in the final query/stored procedure.

    If you're going to be doing it a lot, then yes. But maybe this is building the query to create the view.

    @Cassidy said:

    @boomzilla said:
    Unless it's just because the people who designed it thought that normalization just meant buying the same RDBMS as most other people.

    .. or that the people who did the normalisation have never heard of de-normalising a schema to aid daily operation.

    Sometimes a fine line. But also a PITA for data integrity. I'd usually rather let the DB do that (because that's one of its main reasons for being) than rely on code from hither and yon to always get it right. Where necessary and feasible, I like to fall back to materialized views to get the performance boost / simplicity without having to worry about reducing integrity.



  • @boomzilla said:

    But maybe this is building the query to create the view.

    Point.

    I just presumed it was a query to be executed (not seeing any "CREATE VIEW viewname AS" in the syntax), but you're right about making assumptions.

    Are there any performance implications with building a view that uses views (that uses views, etc)...? I'd be curious to know the experiences of those more in the know.


  • ♿ (Parody)

    @Cassidy said:

    Are there any performance implications with building a view that uses views (that uses views, etc)...? I'd be curious to know the experiences of those more in the know.

    My experience, with Oracle, is that there can be. It doesn't always do a good job, depending on how you use the view. If you really need all of the columns, then it's probably fine. But if you don't, you generally end up joining tables that you didn't need. I've found, in practice, that the more places a view can be used, the crappier the performance is, because all of those places are actually looking at different subsets of columns.



  • @Cassidy said:

    Are there any performance implications with building a view that uses views (that uses views, etc)...? I'd be curious to know the experiences of those more in the know

    In SQL Server, yes. I've seen queries that took 20 hours using views and went down to under one second when broken down into direct table access.

    That gets worse every new release of SQL server



  • @JvdL said:

    @Cassidy said:
    Are there any performance implications with building a view that uses views (that uses views, etc)...? I'd be curious to know the experiences of those more in the know

    In SQL Server, yes. I've seen queries that took 20 hours using views and went down to under one second when broken down into direct table access.

    That gets worse every new release of SQL server

    Not true. SQL Server merges the submitted query with the view definition (and any view definitions that the view itself may depend on) to make a single query plan. In order for direct table access to be faster than using a view, it has to be doing something different.


  • ♿ (Parody)

    @Jaime said:

    Not true. SQL Server merges the submitted query with the view definition (and any view definitions that the view itself may depend on) to make a single query plan. In order for direct table access to be faster than using a view, it has to be doing something different.

    I could see it being different if you didn't use all of the tables in the view, which could cut out a lot of I/O. Unless not using some of those columns in the view caused it to ignore the table joins altogether. But that just sounds like a bug in SQL server, if it really does that, because you could get a different number of rows based on which columns you select.



  • @boomzilla said:

    @Jaime said:
    Not true. SQL Server merges the submitted query with the view definition (and any view definitions that the view itself may depend on) to make a single query plan. In order for direct table access to be faster than using a view, it has to be doing something different.

    I could see it being different if you didn't use all of the tables in the view, which could cut out a lot of I/O. Unless not using some of those columns in the view caused it to ignore the table joins altogether. But that just sounds like a bug in SQL server, if it really does that, because you could get a different number of rows based on which columns you select.

    But that would qualify as "doing something different". It's not the fact that it's a view that causes the performance problem, it's the fact that the programmer included functionality that they didn't want. Blaming that on view is like blaming Microsoft for security vulnerabilities in IE when Flash or Java have a bug.


  • ♿ (Parody)

    @Jaime said:

    But that would qualify as "doing something different". It's not the fact that it's a view that causes the performance problem, it's the fact that the programmer included functionality that they didn't want. Blaming that on view is like blaming Microsoft for security vulnerabilities in IE when Flash or Java have a bug.

    Yes, I don't disagree with you at all. I would say that I was pointing out a reason why views aren't as useful as they are often made out to be, in that for performance reasons, you can't reuse them as much as you'd otherwise like to. And maybe that guy was literally copying the SQL out of the view, in which case I'd be as skeptical as you. But I suspect that the underlying query was simplified. I should have made that assumption explicit in my earlier post.

    I don't know SQL Server, but my experience with Oracle is basically the same as your description of SQL Server.



  • @Jaime said:

    @JvdL said:
    @Cassidy said:
    Are there any performance implications with building a view that uses views (that uses views, etc)...? I'd be curious to know the experiences of those more in the know

    In SQL Server, yes. I've seen queries that took 20 hours using views and went down to under one second when broken down into direct table access.

    That gets worse every new release of SQL server

    Not true. SQL Server merges the submitted query with the view definition (and any view definitions that the view itself may depend on) to make a single query plan. In order for direct table access to be faster than using a view, it has to be doing something different.
     

    Bullshit. For a simple query like select * from someview union select something , replacing the select * from someview by the verbatim defintion of that view can already speed up performance enormously.Obviously, when it is a compllicated view in big database (billions of records)

    Also, if you have 10 views each referencing 10 tables out of a total of
    20 and rewrite this into a query that reference each table only once it
    can make a big difference. Believe me: the example of 20 hours to 1
    second actually happened in a large company that had to shut down one
    day every month to do a financial rollover because of this. It didn't help that whoever
    made those views was not exactly a SQL expert. It only took a few hours to rewrite this.

    All of this is because SQL Server query plan optimization
    is, well, not optimal. It can't figure out to use the right indexes when the joins become convoluted.

    SQL is not modular.



  • @Peraninth said:

    tldr: The SQL Server Management Studio query designer is only useful for people who should not be writing queries.

    My particular bane in this area is our BI reporting system (for which I provide the data; I don't generally deal with the reporting end but I can if I have to). It's addicted to the "with" statement. The previous version wasn't too bad; a typical query might look like

    WITH SAWITH0 AS 
    {SELECT something as c1, something_else as c2, another_thing as c3
    from stuff}
    WITH SAWITH1 AS
    {query selecting from SAWITH0 and renumbering the columns}
    WITH SAWITH2 AS
    {query selecting from SAWITH1, and maybe SAWITH0, and renumbering the columns again}
    SELECT stuff FROM SAWITH2; /* and renumber the columns once more*/

    If it was a complex query, there might be a little more; I think the most I saw was SAWITH4 or maybe SAWITH5, where we basically had two independent reports being glued together by the date.

    Unfortunately, the version we're on now has gone completely over the edge. The guy who does the reporting end wanted my help a couple of months ago modifying one of the BI reports - not a particularly complex one; probably around the median - with some custom logic for an ad-hoc report. He sent me the query that BI generated. Unfortunately I don't have (or at least can't find) a copy of it still, but it got up to 16 chained WITH statements. I rewrote it as a single select. I've seen other queries generated from this version of BI, and this isn't particularly unusual.

    The real problem, though, is that it makes debugging the things very hard. It's not so bad (though still bad enough) when you can say "okay, we want c8, which is coming from c6 in the second query, and... c7 in the first one, which is taken from... column Y on table X." Trying to trace it back through 15 or 20 WITH clauses is much harder.



  • @Scarlet Manuka said:

    The real problem, though, is that it makes debugging the things very hard. It's not so bad (though still bad enough) when you can say "okay, we want c8, which is coming from c6 in the second query, and... c7 in the first one, which is taken from... column Y on table X." Trying to trace it back through 15 or 20 WITH clauses is much harder.

    You know you can alias tables and refer to columns by name, right?

    I mean your query is still stupid, but there's no need to memorize the column numbers. Ever.



  • @JvdL said:

    Bullshit. For a simple query like select * from someview union select something , replacing the select * from someview by the verbatim defintion of that view can already speed up performance enormously.Obviously, when it is a compllicated view in big database (billions of records)

    Also, if you have 10 views each referencing 10 tables out of a total of
    20 and rewrite this into a query that reference each table only once it
    can make a big difference. Believe me: the example of 20 hours to 1
    second actually happened in a large company that had to shut down one
    day every month to do a financial rollover because of this. It didn't help that whoever
    made those views was not exactly a SQL expert. It only took a few hours to rewrite this.

    All of this is because SQL Server query plan optimization
    is, well, not optimal. It can't figure out to use the right indexes when the joins become convoluted.

    SQL is not modular.

    I'm calling bullshit on your "bullshit". SQL Server views are inlined. The definition code for a view in inlined into the query at the Resolve stage, before it gets to the Optimise/Compile stage. Querying a view is no more or less optimial than querying the tables the view is accessing.

    Your anecdote about 20 hours to 1 second? I want to see code and benchmarks because I don't believe the view was the issue. There was something else going on. My guess, based on situations I have seen, is that whoever rewrote it changed the order of outer joins.

    Disclaimer: I am talking about nonmaterialised views. Materialised views have an effect on performance (that's their purpose). As an aside, if you are using SQL Server 2008 then you should look at replacing materialised views with filtered indexes.

    Disclaimer: I am talking about Microsoft's RDBMS product, SQL Server. If you are talking about another vendor's product then please indicate (and stop using the term "SQL Server").



  • @blakeyrat said:

    @Scarlet Manuka said:
    The real problem, though, is that it makes debugging the things very hard. It's not so bad (though still bad enough) when you can say "okay, we want c8, which is coming from c6 in the second query, and... c7 in the first one, which is taken from... column Y on table X." Trying to trace it back through 15 or 20 WITH clauses is much harder.

    You know you can alias tables and refer to columns by name, right?

    Apparently you missed the part in my post where I said that these are the queries generated by our BI system. If there was human involvement in the process, it wouldn't have come out as a chain of 16 WITH clauses (and yes, we would have kept the existing, meaningful table and column names). Unfortunately, sometimes it's necessary to debug a report that isn't giving you the results you expect, and then you have to deal with the generated query and try to establish what the heck it's doing.



  • @havokk said:

    @JvdL said:
    Also, if you have 10 views each referencing 10 tables out of a total of
    20 and rewrite this into a query that reference each table only once it
    can make a big difference. Believe me: the example of 20 hours to 1
    second actually happened in a large company that had to shut down one
    day every month to do a financial rollover because of this.

    I'm calling bullshit on your "bullshit". SQL Server views are inlined. The definition code for a view in inlined into the query at the Resolve stage, before it gets to the Optimise/Compile stage. Querying a view is no more or less optimial than querying the tables the view is accessing.

    Your anecdote about 20 hours to 1 second? I want to see code and benchmarks because I don't believe the view was the issue. There was something else going on. My guess, based on situations I have seen, is that whoever rewrote it changed the order of outer joins.

    Read the part of his post that I've left remaining above. It makes sense. If you have 20 tables, and you have 10 views each using ten of those tables, and you join all those views together, you'll wind up with a 100-table query where you only actually need a 20-table query. If you're lucky the query optimiser might be able to identify some common bits and split them out into internal views which are only executed once, but it's very likely that you'll be pulling in most of the tables more often than necessary. If you pull in a large table six times instead of once, it'll definitely hurt your performance.

    In this case, if you rewrite the query so that it only uses each table once, you'll obviously see a performance improvement. It's not because the views exist per se, it's because the views were used in an inappropriate manner. The same performance improvement would be achieved by creating a new view using the same tables and exposing all the required information, so that the query only had to look at that one view.



  • @havokk said:

    @JvdL said:
    Bullshit. For a simple query like select * from someview union select something , replacing the select * from someview by the verbatim defintion of that view can already speed up performance enormously.Obviously, when it is a compllicated view in big database (billions of records) [...]
    I'm calling bullshit on your "bullshit". SQL Server views are inlined. The definition code for a view in inlined into the query at the Resolve stage, before it gets to the Optimise/Compile stage. Querying a view is no more or less optimial than querying the tables the view is accessing.

    From http://technet.microsoft.com/en-us/library/cc917715.aspx : "When a view is referenced in the FROM clause of another query, this metadata is retrieved from the system catalog and expanded in place of the view's reference." This is not the case for a union (which as a whole doesn't hve a FROM clause). Also, there are various largely undocumented exceptions where inlining does not occur, see section "Exceptions" in http://sqlblog.com/blogs/merrill_aldrich/archive/2010/02/11/busting-a-persistent-myth-views-are-executed-before-enclosing-queries.aspx

    @havokk said:

    Your anecdote about 20 hours to 1 second? I want to see code and benchmarks because I don't believe the view was the issue. There was something else going on. My guess, based on situations I have seen, is that whoever rewrote it changed the order of outer joins.

    The anecdote details are under NDA. A similar example is this: http://msmvps.com/blogs/greglow/archive/2006/04/02/88853.aspx . They are both examples of rewriting and answer the question "can using views have a bad impact on performance?" with "Yes, sometimes."

    Code or the of the union (slightly anomyzed):

    create view someview as
    with deltas as (
        select
            model = rtrim(f.model),
            f.resource_type,
            resource = rtrim(f.resource),
            f.quantity,
            f.priority,
            override = 1
        from deltatable f
    ), originals as (
        select
            model = rtrim(f.model),
            f.resource_type,
            resource = rtrim(f.resource),
            quantity = f.quantity * coalesce(p.quantity,1),
            priority = null,
            override = 0
        from someotherview f
        left join keytable p on f.uom = 'xx' and p.keycolumn = f.resource
        where f.quantity > 0
            and f.model_type = 'yy'
            and f.model not in (select distinct model from deltas)
    ), formulas as (
        select * from deltas union all select * from originals
    ), families as (
        select f.model
        from formulas f
        join keytable p on p.keycolumn = f.model
        join keytable r on r.keycolumn = f.resource
        where f.resource_type = 'c'
        group by f.model
        having count(*) > 1
    )
    select
        f.*,
        is_family = case when fam.model is null then 0 else 1 end
    from formulas f
    left join families fam on fam.model = f.model

    go

    select * from someview union select model,resource_type,resource,quantity,null,0,null from someotherview

    This takes 36 seconds to execute in SQL Studio. Replacing the part "someview" by the verbatim, it takes less than 1 second.

    @havokk said:

    Disclaimer: I am talking about nonmaterialised views. Materialised views have an effect on performance (that's their purpose). As an aside, if you are using SQL Server 2008 then you should look at replacing materialised views with filtered indexes.

    Disclaimer: I am talking about Microsoft's RDBMS product, SQL Server. If you are talking about another vendor's product then please indicate (and stop using the term "SQL Server").

     

    I am talking about normal views in SQL Server 2008 R2.



  • So you're saying that "Bullshit... SQL is not modular" is the same as "It works 99% of the time, but has some edge cases where it doesn't work"? Proving that there are exceptions doesn't make your bullshit not bullshit. Sometimes SQL Server chooses a non-clustered index when it should have chosen the clustered index. That doesn't mean clustered indexes are broken.



  • @Cassidy said:

    Are there any performance implications with building a view that uses views (that uses views, etc)...?

    @JvdL said:
    In SQL Server, yes. I've seen queries that [have implications]

    @Jaime said:
    Not true. SQL Server merges the submitted query with the view definition. [...]

    @JvdL said:
    Bullshit. [Generic counter example]. SQL is not modular.

    @havokk said:
    I'm calling bullshit on your "bullshit". SQL Server views are inlined. [..] I want to see code and benchmarks. [...]

    @JvdL said:
    [Code and benchmarks of specific counter example]

    @Jaime said:
    So you're saying that "Bullshit... SQL is not modular" is the same as "It works 99% of the time, but has some edge cases where it doesn't work"

    I called bullshit on the blanket statement "SQL Server merges the submitted query with the view definition" or "SQL Server views are inlined" because they are false.

    And I call bullshit on blind faith in patterns that only work 99%of the time.


     



  • @JvdL said:

    for( i = 0, i < 100, ++i ) print 100 / i, // works 99% of the time

    Huh, I'm surprised that came out in the right order.



  • Okay... jury's still out on this one, then.

    For my sins, I can't help feeling a view that performs a join between two views would run more optimally if the underlying statements were reviewed and a new view created from those statements instead.

    I'm not sure why, just a feeling that the execution plan can see the whole picture rather than getting results tainted from each individual view(point).

    I half-expected a reply along the lines of "yes it does, but the performance impact is minimal due to the way that..." but it's clear from some of the examples here that building views from views from views will introduce bottlenecks somewhere along the line.

    Some interesting reading anyway, folks - thanks for sharing your experiences and knowlege.



  • @pkmnfrk said:

    @JvdL said:
    for( i = 0, i < 100, ++i ) print 100 / i, // works 99% of the time

    Huh, I'm surprised that came out in the right order.

    Lucky. Or just use the "SINGLE LOW-9 QUOTATION MARK", which looks like a comma and which CS does not interpret as a tag split.


  • Trolleybus Mechanic

    @morbiuswilters said:

    Lucky. Or just use the "SINGLE LOW-9 QUOTATION MARK", which looks like a comma and which CS does not interpret as a tag split.

    Oh. What fun is that?


Log in to reply