<strong>GROUP BY ALL THE THINGS!</strong>
-
In maintaining stored procedures in the codebase I work on, I frequently encounter this (anti)pattern:
select c.ID as CategoryID, c.Name as CategoryName, p.ID as ProductID, p.Name as ProductName, p.Description as ProductDescription, pv.ID as ProductVersionID, pv.Name as ProductVersionName, avg(pv.Price) as ProductAveragePrice from dbo.tblCategories c inner join dbo.tblProducts p on (p.CategoryID = c.ID) inner join dbo.tblProductVersions pv on (pv.ProductID = p.ID) group by c.ID, c.Name, p.ID, p.Name, p.Description, pv.ID, pv.Name
This results in a pretty wide grouping key, which is going to make aggregating the results a needlessly expensive operation. Besides, when you have the unique key for a table as part of a
group by
, any other subset of columns in the same table aren't going to be any more unique. So one optimization is to reduce thegroup by
to the minimal set of columns needed for a unique tuple, and then slapping aggregates on the rest of the columns, like so:select c.ID as CategoryID, max(c.Name) as CategoryName, p.ID as ProductID, max(p.Name) as ProductName, max(p.Description) as ProductDescription, pv.ID as ProductVersionID, max(pv.Name) as ProductVersionName, avg(pv.Price) as ProductAveragePrice from dbo.tblCategories c inner join dbo.tblProducts p on (p.CategoryID = c.ID) inner join dbo.tblProductVersions pv on (pv.ProductID = p.ID) group by c.ID, p.ID, pv.ID
This is slightly better, but we can optimize further by taking advantage of the fact that
p.ID
will be constant for any single row intblProductVersions
, andc.ID
will be constant for any single row intblProducts
. This gets us to here:select max(c.ID) as CategoryID, max(c.Name) as CategoryName, max(p.ID) as ProductID, max(p.Name) as ProductName, max(p.Description) as ProductDescription, pv.ID as ProductVersionID, max(pv.Name) as ProductVersionName, avg(pv.Price) as ProductAveragePrice from dbo.tblCategories c inner join dbo.tblProducts p on (p.CategoryID = c.ID) inner join dbo.tblProductVersions pv on (pv.ProductID = p.ID) group by pv.ID
But wait... the only reason there was a group by to begin with was
avg(pv.Price)
, and the fact that we are grouping bypv.ID
means thatavg(pv.Price) = pv.Price
, and the original query optimizes to this:select c.ID as CategoryID, c.Name as CategoryName, p.ID as ProductID, p.Name as ProductName, p.Description as ProductDescription, pv.ID as ProductVersionID, pv.Name as ProductVersionName, pv.Price as ProductVersionAveragePrice from dbo.tblCategories c inner join dbo.tblProducts p on (p.CategoryID = c.ID) inner join dbo.tblProductVersions pv on (pv.ProductID = p.ID)
Cargo culting at its finest!
-
SQL standard actually demands all non-aggregate columns be mentioned in the GROUP BY clause. Which, given ambiguities otherwise, is a sound idea.
ENOWTF
-
They saw an error message, and instead of fixing the root problem, they fixed what the error message said.
If that's not what this site is for, I don't know what is.
-
This site is about lojban, microaggressions, and got bashing. What else?
-
-
-
-
What's the average age of @loopback0 right now?
-
Yeah nevermind I see it now. I read it as grouping on
p.Id
.
-
In the application I am currently maintaining, I see a lot of queries of the type
SELECT DISTINCT Id, {more columns} FROM {some table}
... and yes, Id is the primary key. Sure, I suppose the query optimizer is most likely smart enough to just ignore the "DISTINCT" keyword when you select a set of columns including one that is guaranteed to be unique by definition - it just bothers me seeing this used everywhere with the perpetrator apparently never really understanding what he was doing.
F(a)iled under: You keep using that keyword. I don't think it means what you think it means.
-
SQL standard actually demands all non-aggregate columns be mentioned in the GROUP BY clause.
TRWTF is doing this and not having a proper WHATEVER() (or even CRASHONAMBIGUITY()) aggregate. Using MAX() just looks horrible and makes no sense.
-
They'd better use window functions... Is there anything like that in the SQL server land?
-
You mean things like
OVER() PARTITION BY
? It's there, but it's not standard SQL, and judging by my codebases not that many people know either about them or how to use them properly.
-
SELECT DISTINCT
This is a perfect example of people fixing the error message and not the underlying problem (and I see it a lot).
Dev: "My query returns duplicate rows. Let's use distinct to get rid of them"
Dev's conscious: "But if your query returns duplicate rows, doesn't that mean that there is a more fundamental flaw with your query (wrong join condition, etc.)?"
Dev: "What was that? Oh, just the wind blowing. Here is your report, Mr. Manager!"
-
SELECT DISTINCT
@Groaner said:group by
You haven't until you've seen both of these in a single query. Usually with some actual aggregates but not always
-
I've just looked through the 65kLOC SQL file which contains every view, stored proc, etc. that this application uses. In 233 matches for
GROUP BY
, 110 are in aSELECT DISTINCT
, 88 in a plain SELECT, the rest in commented out code and actual comments.Some of the comments are along the lines of
--Removed pointless grouping
. Every single one of these is still aSELECT DISTINCT
. Some of theSELECT DISTINCT
s are inUNION
s, so there are three levels of removing duplicates
-
three levels of removing duplicates
And if it is anything like some of our clients' queries, there will still be duplicates due to some field values being different
-
I haven't seen any dupes survive the distinct, unionised groups, but I can tell you that there will be more levels of grouping and distincting because these views love to call each other, in a hierarchy that reaches 15 levels of nested views at some points.
-
so there are three levels of removing duplicates
SO DAWG I HEARD YOU LIKE REMOVING DUPLICATES SO I REMOVED DUPLICATES FROM YOUR DISTINCT VALUES SO YOU CAN REMOVE DUPLICATES WHILE YOU DISTINCT VALUES.
-
-
Sorry, it's only 13.
https://what.thedailywtf.com/t/from-a-single-insert-statement/8230/23?u=jaloopa
-
This site is about lojban, microaggressions, and got bashing. What else?
And redundancy. Don't forget redundancy.
Oh yeah, and redundancy, too.
-
Could you repeat that, just in case?
-
Is that related to the tautology department and were they informed in triplicate?