I SELECT f.*, c.* sometimes.

boomzilla

As for my answer, I come from a data warehouse querying background, hence the use of the subquery to eliminate as much data as possible before joining.

Yeah, there a zillion things that could change the performance.

Frank

Yeah. I agree. @boomzilla has a solid no bullshit query.

Matches

I didn't forget the asterisk, discourse helped and italicized my query.

Frank

So it was markdown trickery...Got it.

boomzilla

I recommend using the triple backtick around code. You don't have to worry about that and you get highlighting and a monospace font.

Matches

My original reply is fine, it's his quoting of my original reply that broke it.

I used < pre>< code> like you assholes told me to when i bitched about not being able to use just < code>

Arantor

But triple backtick is even less typing than manually putting in HTML that can't possibly go wrong...

Frank

@Frank said:

SELECT q.*,(SELECT description FROM faqs_topics WHERE id=q.category) AS catDescription
FROM faqs_questions q
WHERE q.id NOT IN (SELECT questionid FROM faqs_answers)
ORDER BY q.createdate

Can someone tell me why this solution is substandard. @boomzilla didn't like it. But why?

I will change NOT IN to NOT EXISTS

Like this:

@Frank said:

WHERE not exists (SELECT 1 FROM faqs_answers WHERE questionid=faqs_questions.id)

But what else is wrong with it?

boomzilla

@Frank said:

Can someone tell me why this solution is substandard. @boomzilla didn't like it. But why?

Filed Under: Subquery abuse

Frank

Really? But all I need is the description. Explain.

Frank

Is the thinking that if you can do it without a subquery then that is the best route? It must be additional overhead on the DB?

Frank

Nevermind I found an explanation here:

Join vs. sub-query

I am an old-school MySQL user and have always preferred JOIN over sub-query. But nowadays everyone uses sub-query, and I hate it; I don't know why. I lack the theoretical knowledge to judge for

Filed under: creepy winky picture

Matches

Because like my original post stated, if you ever get more than one match from your

select q.*, (subselect dupes)

your sql query will choke and die?

If you're being serious,

The goal is to do a subquery, then a join so that tables can effectively use their indexes before joins so there's less 'swallowing the ocean'

EX:


select *
from tablea a
left join tableb b
on a.id = b.foreignid
where datetime >= somedate

vs


select *
from
(
 select * from tablea where datetime >= somedate
) a
left join
(
select * from tableb where datetime >= somedate
) b
on a.id = b.foreignid

Generally performs better.

[edit] fixed some formatting between the two queries. Fuck the query formatting though.

boomzilla

@Frank said:

Is the thinking that if you can do it without a subquery then that is the best route? It must be additional overhead on the DB?

In addition to potential performance issues, it's simply not as easy to read. YMMV, of course. When I need a subquery, I tend to pull it out into a with clause. This works well in Oracle. I have only barely touched mysql, so I can't really comment there.

boomzilla

@Matches said:

if you ever get more than one match...your sql query will choke and die?

I thought that maybe mysql would do some sort of cross join with that, since he was doing it. But, yeah, that's another potential land mine.

Frank

I see. But in seriousness. I should have mentioned its a reference table. It will never change. So my comment on f'd in the a was a joke, no one will ever change this table, if they do I will kill them with fire (I actually told them that for this project this table can never change....ever). Everyone is pretty clear about it.

Matches

I haven't used MySQL since version 4.1ish, so I can't tell you much about that environment. I generally use sql server

Matches

@Frank said:

I see. But in seriousness. I should have mentioned its a reference table. It will never change. So my comment on f'd in the a was a joke, no one will ever change this table, if they do I will kill them with fire (I actually told them that for this project this table can never change....ever). Everyone is pretty clear about it.

Filed under: create trigger FuckYou instead of insert select 1

Frank

As far as I know MySQL doesn't support with ...

but you mentioned you are Oracle specific.

Arantor

How would you use WITH? There's probably some way to do it in MySQL these days; it's come a long way since 4.x.

Frank

That's a good question. As far as I know it is not an option. but then again there are a fuckton of differences between Oracle and MySQL. Starting with identity tables. You get used to it fast but you wonder. Will they ever change it to match what SQL SERVER and MySQL does with incrementing values?

Answer is probably no...never...

Matches

with works with sql server as well

;with table as
(
select something from somewhere
)
select * from table

MySQL "WITH" clause

I'm trying to use MySQL to create a view with the "WITH" clause WITH authorRating(aname, rating) AS SELECT aname, AVG(quantity) FROM book GROUP BY aname But it doesn't seem like MySQL su...

Looks like mysql is about the only db without a with clause

Oracle's identities, apparently?

Oracle / PLSQL: Sequences (Autonumber)

But it looks like real shit, since it's lost if the db crashes, and you have to have a process to reset it. (or you could cache the whole range, but that also looks like a terrible idea)

Arantor

And MySQL won't get it since Oracle need to keep MySQL crippled to be able to sell the big iron stuff.

I wonder if MariaDB will obtain it if it hasn't already.

Matches

Honestly? I don't feel like missing the with clause is that crippling. Even on databases that have it, with generally is pretty poor performance. @sam my enter button is broken again.

Temp tables, table variables, etc. generally perform significantly better than with tables (at least on sql server)

darkmatter

@Frank said:

SELECT q.*,(SELECT description FROM faqs_topics WHERE id=q.category) AS catDescription FROM.....

@boomzilla said:

Filed Under: Subquery abuse

@Frank said:

Really? But all I need is the description. Explain.

If you need the description, then join to the table for it.
By subquerying in the select part, you're basically asking the dbms to run the subquery X number of times, where X is the number of results you got back. Also, you need to have a "distinct" or "group by" clause on you subquery to guarantee a single result or your query might just flat out fail altogether.

For 2 row example tables it might work great. For multi-million or billion row tables... well good luck with that.

Matches

Though some tools are smart enough to resolve this to a variable if it can determine it will never change. (I don't think this can be applied to any mysql editor...)

darkmatter

Possible, but not likely if the subquery is using a column from the selection to subselect data.
If you did
[code]select (select 1)
from table[/code]
or
[code]select (select a from table2 where table2.a = table1.a)
from table1 where a = 3[/code]
then sure, it will likely optimize fine, but why would you do that unless you're trying to screw with your colleagues.

Matches

Oh, no, I don't recommend it. I'm in the camp of 'Join directly', or 'join via subquery' in the from clause.

Frank

Well there is a good reason.

BTW I put the query @boomzilla designed in my production code.

Everybody taught me something new about SQL today.

darkmatter

The culmination of years and years of data warehouse database experience... I taught Frank a thing.

Filed Under: Consulting fee now please.

Frank

So I think everyone is in consensus that joins are better than subqueries and that right join should be killed with fire.

Also to make the query human readable.

Arantor

Especially if you use an application that, in its homebrew query abstraction layer (before PDO and similar were a thing) actually tests and refuses to run any query with a subquery in it.

Frank

That was some serious sub query awesomeness in your query.

I understood your thinking instantly...i don't mind subqueries actually and I think there are some cases where they can't be avoided.

At least I think that's the case.

Frank

I did like the way my query looked. I didn't consider another person might have look at it. I was not concerned about DB overhead...it's not a site like this discourse forum where transactions can get intense.

darkmatter

Subqueries can be good.
I've even done subqueries in the select part of a statement, but they are for extremely special circumstances. One instance was a query that would have required joining a 3.5billion row table to a 500million row table of a different grain¹, where the final result only ever actually gave back about 25 rows.

In that situation, selecting the 25 final rows from the 3.5billion row table and then subquerying the 500million row table greatly improved the performance.

Also note subqueries of that type don't even work on certain engines (Netezza comes to mind).

_{1) which meant having to run a subquery on the 500mil row table to group on the same key as the 3.5billion row table so the join wouldn't duplicate the data.}

Groaner

As likely being among "some people," I should probably apologize for being a tad harsh - when code snippets get posted here, the general response seems to be that people pick them apart mercilessly.

@Matches proposes the closest response to what I would choose. If the schema is fixed, go with his/her answer - the only thing I would add would be some stylistic modifications:

[code]
select fq., ft.
from faqs_questions as fq
left join faqs_topics as ft on
(fq.faqs_topic_id = ft.faqs_topic_id)
where not exists
(
select fa.question_id
from faqs_answers as fa
where fq.faqs_question_id = fa.faqs_question_id
)
order by fq.create_date
[/code]

Rationale:

The aliases resemble the table names, so that if this query grows (or each of the columns in the * are expanded), it's easier to see what's coming from where.
Starting with the base table(s) and going left to right on joins is more idiomatic
Prefer NOT EXISTS to LEFT JOIN WHERE NULL to NOT IN. You can get burned by NOT IN if either side of the expression is nullable. NOT EXISTS also makes your intent clearer to the query optimizer and can be more performant (at least in SQL Server).
Indent subqueries, derived tables and common table expressions for readability
All referenced columns are aliased. This is important because if someone adds a column with the same name to any of the tables participating in the join and you don't alias it, you'll get an ambiguous column error and the query is broke.
Primary key and foreign key columns should have the same names so the join predicate to use is obvious
Since the underlying convention seems to involve underscores, added them to column names. There's a religious war over under_scores vs. PascalCase (as there is in most languages); my personal preference is underscores.
There are religious wars over whether tables and their keys should be named singular.singular_id, plural.singular_id, plural.plural_id, etc. My personal preference is plural.singular_id because it sounds more natural when spoken (i.e. select star from orders where order_id is...). Most people I've asked seem to have little problem with any convention as long as it's CONSISTENT.

darkmatter

At least do [code]where not exists (select 1 from....[/code]
There is no need to get the value out to tell whether the result would exist.

As for naming things, the request/challenge was not to rename things but to improve the actual SQL logic, that is why hardly anyone renamed anything.

Captain

Argh, this thread reminds me I need to write like 30 queries this weekend. At least SQL is easy. The layer between my database and my app is more of a pain in the ass.

Groaner

@darkmatter said:

At least do

where not exists (select 1 from....

There is no need to get the value out to tell whether the result would exist.

Fair point, I left the column there mainly to emphasize the need for qualified names.

Frank

This is a great answer. I appreciate more explanation when it comes to 'why' certain conventions are used.

Also, when I see code answers it tells more about people and what they deal with on a day to day basis.

Yes my code was not great, but I come from a code environment that uses databases differrently...because I am solving front end requests. When I do composite keys with rollback deletes to other tables I get a little excited...people kind of get excited because I'm excited but it doesn't mean much. Also I don't get a whole lot of time during projects to do such things. Maybe two projects a year leverage tables in concerning ways.

Basically there is a cost to doing things which can be argued one way or another till the cows come home but when you are in the trenches it's up to the business-side and clients to determine what is needed and what they want to pay for.

Rule of thumb is that pain usually predicates change and do extra massaging only when told.

With that said. This thread alone has taught me at least 10 really important things I can easily do when I am working through queries which only required a shift in thinking and concerns.

Thanks again for the great answer!

cartman82

@Groaner said:

Primary key and foreign key columns should have the same names so the join predicate to use is obvious

I agree with everything but this. I prefer calling my primary keys just id (if they are of integer / auto number type). That way I can at a glance distinguish them from foreign keys, which are formatted as something_id.

I don't see great benefit from naming them both the same. If you get rid of aliases (in case of shorter table names), you can arrive at nice looking constructs such as table1.id = table2.table1_id.

ender

@Onyx said:

Would someone enlighten me on the necessity of WHERE 1 = 1 AND... here?

WHERE 1=1 makes it easier to dynamically build the query - you have something like foo = "select ... where 1=1", then you can simply append other conditions - if there are any. Makes the code a bit smaller, since you don't have to check if you already have a where clause every time you want to add a condition.

Onyx

@ender said:

WHERE 1=1 makes it easier to dynamically build the query - you have something like foo = "select ... where 1=1", then you can simply append other conditions - if there are any. Makes the code a bit smaller, since you don't have to check if you already have a where clause every time you want to add a condition.

That... makes sense. And I have a few annoying ifs I could kill that way. See, that's why I love this forum: I learn the WTFs to avoid, and learn that some WTFs are actually just a clever trick.

Frank

@ender is right. In coldfusion queries you will see where 1= 1 quite a bit.

<cf query name="pullAllMonkies" datasourse="#application.dsn#">

Select * from monkey q Where 1=1

<cfif argument eq 1>
And q.field = #argument#
<cfelseif argument eq 2>
And q.field = 2
<cfelseif argument eq 9>
And q.field = 9
</cfif>

</cfquery>

This way I can use the raw query for something else if none of those checks match.

Filed under: I should have explained that.

ben_lubar

Why are you ignoring two of the arguments if the first one is set?

Frank

Because I am typing on my phone and was trying to get my post out...and I didn't think anyone would catch or care... lol. Dammit!

Frank

You guys don't let anything slip...I'm on the laptop now...lol.

Frank

There FIFM...syntactically correct..But I am digging up a real world example so you can see how hairy this can get ....for shits and gigs.

Frank

Here is a day in the life of a coldfusion programmer approaching queries.

Disclaimer (this is only an example):

<cfquery name="getAllmonkey" datasource="#application.dsn#">
SELECT * FROM monkey LEFT JOIN monkey_categories ON monkey.category = monkey_categories.id
WHERE 1=1 
<cfif id neq "">
	AND monkey.id = <cfqueryparam cfsqltype="cf_sql_integer" value="#id#">
<cfelseif NOT archive>
	<cfif application.dstype eq "mysql">
		AND NOW() > startDate
		AND ( endDate IS NULL OR NOW() < endDate )
	<cfelse>
		AND getdate() > startDate
		AND ( endDate IS NULL OR getdate() < endDate )
	</cfif>
<cfelseif archive>
	<cfif application.dstype eq "mysql">
		AND NOW() > startDate
		AND ( endDate IS NOT NULL AND NOW() >= endDate )
	<cfelse>
		AND getdate() > startDate
		AND ( endDate IS NOT NULL AND getdate() >= endDate )
	</cfif>
</cfif>
<cfif arguments.category>
	AND (monkey.category = <cfqueryparam cfsqltype="cf_sql_integer" value="#arguments.category#">
		<cfif application.settings.var('monkey.multipleCategories')>
			OR monkey.category LIKE <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.category#,%">
			OR monkey.category LIKE <cfqueryparam cfsqltype="cf_sql_varchar" value="%,#arguments.category#">
			OR monkey.category LIKE <cfqueryparam cfsqltype="cf_sql_varchar" value="%,#arguments.category#,%">
		</cfif>
		)
</cfif>
ORDER BY monkey.sortorder ASC, monkey.startDate DESC, monkey.headline ASC
</cfquery>

Maciejasjmj

@Frank said:

<cfif application.dstype eq "mysql">
AND NOW() > startDate
AND ( endDate IS NOT NULL AND NOW() >= endDate )
<cfelse>
AND getdate() > startDate
AND ( endDate IS NOT NULL AND getdate() >= endDate )
</cfif>

So the app is trying to be usable with both MySQL and SQL Server, with no abstraction layer.

There's no way it's gonna end badly, right?