MS SQL Query Performance



  • Question on query performance 

    MSSQL supports subselects. As in, [code]select id from table where fk_foo = (select id from other_table etc etc)[/code]

    Should I:

    A) avoid subqueries always, and use JOIN instead. This way I save the overhead for the subquery.
    B) avoid subselects only for larger queries, not for selecting a detail record
    C) Premature optimization! Subselects perform largely the same as JOINs in all circumstances.

    I tried to use Query Analyzer, but my DBA-fu isn't strong enough to interpet the Execution Plan diagrams. I do see that a subselect is a more complex query than a join, according to those pretty icons.

    So. Thanks for any insights that may increase my belt colour. :)



  • Not something I've ever really worried about, but I understand that if there is a difference its only noticable once the subquery returns a lot of rows.



  • [quote user="dhromed"]

    A) avoid subqueries always, and use JOIN instead. This way I save the overhead for the subquery.

    [/quote]

    The difference in using a subquery and a JOIN is that the subquery will run once for every row in the main table of the SELECT, whereas a JOIN  most likely won't. So use a JOIN wherever possible instead.

    However, you'll find times where you have to use the subselect and absorb the performance hit. 



  • Instead of "<font face="Lucida Console">select id from table where fk_foo = (select id from other_table etc etc)" </font>

    <font face="Lucida Console">I have used "Select id from table where id in (Select id from other_table where...)"</font>

    <font face="Lucida Console">The use of = there is strange.  If you are sure that the subselect will return only one value, then you could use a join.  (For the subselect case, you could use a join also).</font>

    If you can't interpret the results of query analyzer, code it both ways and see how long they actually take.



  • My experience is that you can almost always avoid a subquery (derived table).  It may take an extra join condition or some kung-fu with aggregate functions, but you can almost always do it.

    However, if you need to do it: do it.



  • In general, use subqueries where performance isn't the main concern and where they will enhance readability.  Subqueries may improve performance when used with EXISTS or NOT EXISTS, but in general will either have a negative impact or no impact on performance.  Usually with small test batches of data, there will be no performance difference, but when getting into large data set where SQL starts to do has joins, subqueries can cause performance degradation.



  • [quote user="KenW"]

    [quote user="dhromed"]

    A) avoid subqueries always, and use JOIN instead. This way I save the overhead for the subquery.

    [/quote]

    The difference in using a subquery and a JOIN is that the subquery will run once for every row in the main table of the SELECT, whereas a JOIN  most likely won't. So use a JOIN wherever possible instead.

    However, you'll find times where you have to use the subselect and absorb the performance hit. 

    [/quote]

    Run this:

    SELECT *
    FROM Employees
    WHERE
      Salary > (SELECT AVG(Salary) FROM Employees)

    In query analyzer and see how many times it runs the inner query.  I assure you that the subquery will only run once.  Of course, add a Salary column to the Employees table in Northwind first.



  • Rob Vieira's Professional SQL Server 2000 Programming (Wrox) has a good comparison on page 563. From what I can tell, you should avoid correlated subqueries if you can get away with it (they have to be executed for every row in the outer table), and use nested subqueries as opposed to joins if you're working with small amounts of data, or a small lookup table. Another suggested approach for situations where the subquery lookup value would be the same for every row is to pre-query the value and stuff it into a variable for referencing in the query.



  • Another approach would be to get an explain plan of the query in each instance and make decisions based on the cost.



  • [quote user="jsmith"][quote user="KenW"]

    [quote user="dhromed"]

    A) avoid subqueries always, and use JOIN instead. This way I save the overhead for the subquery.

    [/quote]

    The difference in using a subquery and a JOIN is that the subquery will run once for every row in the main table of the SELECT, whereas a JOIN  most likely won't. So use a JOIN wherever possible instead.

    However, you'll find times where you have to use the subselect and absorb the performance hit. 

    [/quote]

    Run this:

    SELECT *
    FROM Employees
    WHERE
      Salary > (SELECT AVG(Salary) FROM Employees)

    In query analyzer and see how many times it runs the inner query.  I assure you that the subquery will only run once.  Of course, add a Salary column to the Employees table in Northwind first.

    [/quote]

    Jeff,

    Sorry for the delay in response.

    This isn't what I was referring to - in this case, the subquery runs once because of the aggregate function. That's not the same as the OP's question at all.

    Ken
     



  • SET STATISTICS TIME ON
    SET STATISTICS IO ON

    Run both statements and compare both sets of results.

    Derived tables are generally faster though, but I have found that it is all dependant on perspective..............10ms verse 15ms for a resultset of a certain size........

    What ever works for you.



  • My experience is mostly Sybase not MS, but for what it's worth, I don't believe a subquery is run for every line in the main table necessarily. For example
    select a,b,c from MainTable where id in ( select id from OtherTable where blah )
    would only run the subquery once, then the optimiser should put the results into a worktable and 'flatten' the query into a regular join with the MainTable.
    However
    if you are using a correlated subquery, then yes it will run the subquery for each line in the main table. For example
    select a,b,c from MainTable MT where EffectiveDate= (select max(MT2.EffectiveDate) from MainTable MT2 where MT.id = MT2.id  )
    this will probably give you performance issues.
    In general, I avoid subqueries. Either use a straight join if you can, or create a temp table of interim results and join to that. So in my second example, you might do better to create a temp table of ids and max dates group by id and use that for a straight join.
    Regards.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.