Transforming a SQL query with inner join on subquery



  • I'm really starting to hate Advantage Database. A new gotcha: the engine does not support queries with subqueries in the FROM clause.

    Which is terrible, since I wrote a query and now I need to completely redesign it.

    So, let's play a game! Here are the rules:

    1. Subqueries can be used in the WHERE and HAVING clause of SELECT statements.
    2. I can't change the database schema (so, no views, stored procs, etc)

    How can I write:

    select patient.first as "first name",
           insurance.primary_co as "primary insurance company",
           insurance.secondary_co as "secondary insurance company",
      from patient inner join ( select primary.patunique,
                                       primary.co_name as "primary_co",
                                       secondary.co_name as "secondary_co"
                                  from insured primary left outer join insured secondary 
                                    on primary.patunique = secondary.patunique
                                 where primary.id = 1
                                   and (secondary.id is null or secondary.id = 2) 
                               ) insurance
           on patient.patunique = insurance.patunique
    

    as a query that follows the rules? Even a hint would be so helpful.



  • So only return the rows where the patient has a primary insurance company, with the secondary where there is one?



  • Use a temp table or tables and do it in multiple steps?



  • So only return the rows where the patient has a primary insurance company, with the secondary where there is one?

    Yes, but every patient will have at least a primary. It's an actual join, tacking columns on to the patient data set -- not a filter (as such).

    Use a temp table or tables and do it in multiple steps?

    Good idea, I always forget about temp tables.



  • Why doesn't this work?

    select patient.first as "first name",
    primary.co_name as "primary insurance company",
    secondary.co_name as "secondary insurance company",
    from patient
    join insured primary on patient.patunique = primary.patunique
    left join insured secondary on primary.patunique = secondary.patunique
    where primary.id = 1
    and (secondary.id is null or secondary.id = 2)
    


  • It looks like it will work, but I didn't have enough imagination to think it through it under sudden pressure (which is kind of funny, since I advised @Karla to use outer joins to tack on columns just yesterday...)

    Thanks.



  • Why doesn't this work?

    select patient.first as "first name",
    primary.co_name as "primary insurance company",
    secondary.co_name as "secondary insurance company",
    from patient
    join insured primary on patient.patunique = primary.patunique
    left join insured secondary on primary.patunique = secondary.patunique
    where primary.id = 1
    and (secondary.id is null or secondary.id = 2)
    

    Edit: fuck sake - broadband goes down while posting and some how Discourse manages to post it TWICE?!



  • @Captain said:

    Use a temp table or tables and do it in multiple steps?

    Good idea, I always forget about temp tables.

    I am neck deep in some complicated data manipulation and optimizing on our reporting database. I dream in temp tables.



  • @Captain said:

    It looks like it will work, but I didn't have enough imagination to think it through it under sudden pressure (which is kind of funny, since I advised @Karla to use outer joins to tack on columns just yesterday...)

    Thanks.

    Even funnier (or not)...the previous boss+2 had a directive of no temp tables (we were told to use perm tables). I didn't start using them until after he left and I had to work directly with the architect for the first time (I already didn't like him) but he walked me through optimizing a stored procedure that was taking almost a minute because it used perm tables for processing data. We got it down to under a second. I started using them everywhere.

    Our reporting person was out on medical leave for 3 months.
    I was less than familiar with the complexity of data and business rules at the time.
    I had to fix her reports where the data was wrong. They used perm tables, dynamic SQL and were 1000s of lines long with multiple levels derived tables.

    I think temp tables make it it easier to follow what is going on.
    I ended up rewriting all of them because it was faster.

    As I didn't write them performance as I was learning the business rules. I am now optimizing them and the whole reporting database.



  • What is Advantage Database, and why aren't you on something sane like SQL Server, PostgreSQL or even Oracle?



  • Advantage Database is an embeddable SQL engine by SAP. It's fairly okay.

    It's embedded in the records system the office uses. Both the database and the records system have some crusty cruft, which taken together, make neither so nice to work with (despite Advantage being okay on its own).

    When I replace the records system, I'll use Postgres.



  • Yikes. I can't imagine using permanent staging tables for reports. Never mind optimizations in place for temp tables, I'd be worried about concurrency. I like permanent tables for data staging, but only if I can guarantee that only one process is going to be touching those at a time, like for bulk data import/export processes.



  • What prevents usage of a temp table? Anything?

    Also now that I'm looking at it, why the heck is the subquery even necessary? You can easily rewrite that without one.



  • What until you get a copy of SQL Server and find out about table variables.



  • I was too stoooopid to see that it could be rewritten easily.

    Happens to the best of us.



  • Legit usages of subqueries are SO RARE that I've actually worked with SQL people who thought the entire feature should be removed. In 99% of cases, it's being misused, and in the other 1% it's better-solved by using a table variable or temp table.



  • @blakeyrat said:

    What until you get a copy of SQL Server and find out about table variables.

    Way ahead of you. I use SQL Server (I started with SQL Server 6.5) and table variables.

    You can't put indices on table variables after the insert. So I only use them for small lookups.



  • Those people sound silly. They would like Advantage Database.

    Subqueries are great because they belong to the same category of things as "regular" queries, which means you can do things like join and meet them with queries. Getting rid of subqueries just because you can express the same idea another way is like cutting off a finger because you can still count to 10.



  • @Captain said:

    Getting rid of subqueries just because you can express the same idea another way is like cutting off a finger because you can still count to 10.

    If you can express it another way, the "other way" is almost certain to be about 40 times faster.



  • That wasn't a correlated subquery. It could have been merge or hash joined. A table is a query is a view is a temp-table -- they're just different mechanical representations of the same data. Which one you choose ought to be an implementation detail -- and I'd go as far as to say that you shouldn't have to choose. The engine should be smart enough to figure it out for you.



  • @blakeyrat said:

    If you can express it another way, the "other way" is almost certain to be about 40 times faster.

    TIL - I thought I previously read that optimizer usually makes them equivalent performance-wise.



  • @Karla said:

    I think temp tables make it it easier to follow what is going on. I ended up rewriting all of them because it was faster.

    What DB do you use? Oh, sql server. I'm on Oracle, so I have a similar relationship with CTEs.



  • @Karla said:

    @blakeyrat said:
    If you can express it another way, the "other way" is almost certain to be about 40 times faster.

    TIL - I thought I previously read that optimizer usually makes them equivalent performance-wise.

    I remember reading around there that sql server has fairly recently improved on that sort of thing.



  • @boomzilla said:

    What DB do you use? I'm on Oracle, so I have a similar relationship with CTEs.

    SQL Server. Production is 2008R2, other environments are 2012. I know WTF, because raisins.
    It has CTEs.

    I've started using them when I need a sequence but I struggle sometimes with getting it right. My brain conceptually has block and I end up try this, that, and whatever until I get the result I was looking for.

    ETA: the WTF



  • @boomzilla said:

    I remember reading around there that sql server has fairly recently improved on that sort of thing.

    I'm trying it out on something I've been optimizing and not happy with yet.



  • I'd be very worried about optimizations if I were you due to:

    @Karla said:

    SQL Server. Production is 2008R2, other environments are 2012.

    CTEs are awesome in Oracle. I think the biggest thing is that you can reuse the data there and there's no risk of additional I/O, whereas various other joins might cause some of that. Plus it makes writing a query much more modular. I can break things up into smaller pieces in a way that having a zillion complicated joins doesn't do.

    Not to mention being able to do aggregates in various ways and then combine them later. Trying to do that is a nightmare in a monolithic query.



  • @boomzilla said:

    CTEs are awesome in Oracle. I think the biggest thing is that you can reuse the data there and there's no risk of additional I/O, whereas various other joins might cause some of that. Plus it makes writing a query much more modular. I can break things up into smaller pieces in a way that having a zillion complicated joins doesn't do.

    Not to mention being able to do aggregates in various ways and then combine them later. Trying to do that is a nightmare in a monolithic query.

    Hmmm, maybe they work differently in Oracle than SQL Server.

    The accepted answer indicates there is no performance benefit over temp tables.



  • @Karla said:

    The accepted answer indicates there is no performance benefit over temp tables.

    Yeah, but that's retarded and wrong. :stuck_out_tongue: The comments point out several ways.

    But...they're very different to a temp table, which sort of exist in Oracle, but I've never used them and honestly I don't see why I would. The CTEs that I write are part of the query I'm running. They don't need to persist or be reused in other queries.

    Plus it seems like a temp table is just making lots more I/O for myself, which is often exactly what I'm trying to avoid. But...performance on different DBs can be very different so YMMV.



  • @boomzilla said:

    But...performance on different DBs can be very different so YMMV.

    But... Oracle is the only one that gets it right :fish:



  • @boomzilla said:

    But...they're very different to a temp table, which sort of exist in Oracle, but I've never used them and honestly I don't see why I would. The CTEs that I write are part of the query I'm running. They don't need to persist or be reused in other queries.

    I use temp tables for persistence because I am manipulating the table...though there are places where I just broke up the query into steps for readability...those I could convert to CTEs.

    I will try and report back my findings. Probably not until tomorrow.



  • @loopback0 said:

    But... Oracle is the only one that gets it right :fish:

    Meh, not my choice which DB to use.

    I have used Oracle back at version 8 when I was a consultant.

    ETA - Oh yeah I used back when I started in IT and doing QA on a PeopleSoft application with Oracle.



  • I think the performance of subquery is hugely enhanced since SQL2005, and I think the SQL team once say in their blog that the SQL queries using subquery should now have performance exceeding the use of temp tables or even table variables. So if you're working with MSSQL servers, subquery is the way to go.

    Btw, I think subquery works especially well with row_number() and let you add those columns handy without heavy-handed modification to the original query.



  • @boomzilla said:

    Plus it seems like a temp table is just making lots more I/O for myself, which is often exactly what I'm trying to avoid. But...performance on different DBs can be very different so YMMV.

    I replaced the temp tables that I don't need to persist with CTEs. The reports that were consistently taking ~3 minutes I got down to under 2 minutes. In test (which has close to the amount of data of production). It was a quick rough test but I will take it.

    And now I have muscle memory to write them I don't have to look up every time. Even did a several nested CTEs.

    The code is cleaner too. I always felt the need to drop the temp table when I was done with it.
    Now that is not needed.

    Thank you!


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.