SQL - Getting counts of for each value of a lookup table

Karla

Right now I am basically doing this:

	select Attr1_Type1 = (select count(id) from bigtable where Attr1 = Type1)
		, Attr1_Type2 = (select count(id) from bigtable where Attr1 = Type2)
		, Attr2_Type3 = (select count(id) from bigtable where Attr2 = Type3)
		, Attr2_Type4 = (select count(id) from bigtable where Attr2 = Type4)

There's got to be a better way.

Especially, since we've adding new attributes and new options in the lookups.

masonwheeler

@karla If it was all one column, it would be easy.

Since it's not, the best way probably involves multiple queries.

select Attr1, count(*)
from bigtable
where Attr1 in (Type1, Type2)
group by Attr1

select Attr2, count(*)
from bigtable
where Attr2 in (Type3, Type4)
group by Attr2

Karla

@masonwheeler said in SQL - Getting counts of for each value of a lookup table:

@karla If it was all one column, it would be easy.

Since it's not, the best way probably involves multiple queries.
select Attr1, count(*)
from bigtable
where Attr1 in (Type1, Type2)
group by Attr1

select Attr2, count(*)
from bigtable
where Attr2 in (Type3, Type4)
group by Attr2

First I need a separate count for each type.

select Attr1_Type1, count(*)
from bigtable
where Attr1 = Type1
group by Attr1

select Attr1_Type2, count(*)
from bigtable
where Attr1 = Type2
group by Attr1

select Attr2_Type3, count(*)
from bigtable
where Attr2 = Type3
group by Attr2

select Attr2_Type4, count(*)
from bigtable
where Attr2 = Type4
group by Attr2

That's still pretty tedious if I have 25 attributes with 2 - 12 options (that change based on the year).

izzion

@Karla
I'm assuming bigtable is the data table and there's a separate type/lookup table that has what the various attributes mean?

If so, you might be able to future proof the query like:

SELECT 
    a1.Attr1
   ,COUNT(b.*)
FROM 
    ( SELECT DISTINCT Attr1 FROM lookuptable1 ) AS a1
LEFT JOIN bigtable AS b
ON a1.Attr1 = b.Attr1
UNION ALL -- if the type columns are the same data type, that way you can get one result set. 
          -- Otherwise, you'll have to go with two separate queries as per above
SELECT 
    a2.Attr2
   ,COUNT(b.*)
FROM 
    ( SELECT DISTINCT Attr2 FROM lookuptable2 ) AS a2
LEFT JOIN bigtable AS b
ON a2.Attr2 = b.Attr2

From a technical perspective, it doesn't matter if you give the subqueries separate aliases, to me it's more readable to see "oh those are different tables and the base bigtable is the same". More of a style preference there than technical requirement.

izzion

@karla @masonwheeler's suggestion would give you separate counts for each type, that's what the GROUP BY is doing in those queries (besides allowing you to output the actual type value)

Karla

@izzion said in SQL - Getting counts of for each value of a lookup table:

@Karla
I'm assuming bigtable is the data table and there's a separate type/lookup table that has what the various attributes mean?

Yes. Though I've already flattened the data and do not have to join with lookups.

If so, you might be able to future proof the query like:
SELECT 
    a1.Attr1
   ,COUNT(b.*)
FROM 
    ( SELECT DISTINCT Attr1 FROM lookuptable1 ) AS a1
LEFT JOIN bigtable AS b
ON a1.Attr1 = b.Attr1
UNION ALL -- if the type columns are the same data type, that way you can get one result set. 
          -- Otherwise, you'll have to go with two separate queries as per above
SELECT 
    a2.Attr2
   ,COUNT(b.*)
FROM 
    ( SELECT DISTINCT Attr2 FROM lookuptable2 ) AS a2
LEFT JOIN bigtable AS b
ON a2.Attr2 = b.Attr2
From a technical perspective, it doesn't matter if you give the subqueries separate aliases, to me it's more readable to see "oh those are different tables and the base bigtable is the same". More of a style preference there than technical requirement.

How it is returned isn't really a factor. I am putting the counts into a table by AttributeId, TypeId and returning that table.

blakeyrat

@karla You can create a fake identifier then count based on that possibly?

DISCLAIMER: I DO NOT HAVE SSMS OPEN AND PROBABLY MADE SYNTAX ERRORS BUT YOU GET THE IDEA

Something like:

select attr1 + "|" + attr2 as fakeId,
  count( attr1 + "|" + attr2 )
from bigtable
group by attr1 + "|" + attr2

Notes:

This assumes | doesn't appear in the attributes, use a bigger/weirder delimiter if it does
I think some DBMS's would let you do group by fakeId to save some typing

The output's not super nice, but you can use a simple lookup table for that. You could even auto generate the lookup table:

select distinct( attr1, attr2, attr1 + "|" + attr2 as fakeId )
into #lookup
from bigtable

For a final solution like:

select distinct( attr1, attr2, attr1 + "|" + attr2 as fakeId )
into #lookup
from bigtable

select attr1 + "|" + attr2 as fakeId,
  count( attr1 + "|" + attr2 ) as [count]
into #result
from bigtable
group by attr1 + "|" + attr2

select l.attr1, l.attr2, r.count
from #result r
  join #lookup l on l.fakeId = r.fakeId

Karla

@izzion said in SQL - Getting counts of for each value of a lookup table:

@karla @masonwheeler's suggestion would give you separate counts for each type, that's what the GROUP BY is doing in those queries (besides allowing you to output the actual type value)

Yeah, derp.

Still with ~25 attributes that change by year. I am not seeing an alternative to dynamic SQL.

izzion

@karla
Are the attributes being used to join to Type tables or something that indicate what the attributes mean? Otherwise, @blakeyrat's solution is functional (though I would generally recommend rewriting it with a sub-query similar to what I did for performance - temporary tables can become pretty cancerous on high usage systems)

Are the Attr1 & Attr2 (and other relevant columns) the same data type? Or are they different data types (or different lengths of the same data type, e.g. VARCHAR(10) and VARCHAR(20) or SMALLINT and BIGINT?

Karla

@blakeyrat said in SQL - Getting counts of for each value of a lookup table:

@karla You can create a fake identifier then count based on that possibly?

DISCLAIMER: I DO NOT HAVE SSMS OPEN AND PROBABLY MADE SYNTAX ERRORS BUT YOU GET THE IDEA

Something like:
select attr1 + "|" + attr2 as fakeId,
  count( attr1 + "|" + attr2 )
from bigtable
group by attr1 + "|" + attr2
Notes:

This assumes | doesn't appear in the attributes, use a bigger/weirder delimiter if it does

I think some DBMS's would let you do group by fakeId to save some typing

The output's not super nice, but you can use a simple lookup table for that. You could even auto generate the lookup table:
select distinct( attr1, attr2, attr1 + "|" + attr2 as fakeId )
into #lookup
from bigtable
For a final solution like:
select distinct( attr1, attr2, attr1 + "|" + attr2 as fakeId )
into #lookup
from bigtable

select attr1 + "|" + attr2 as fakeId,
  count( attr1 + "|" + attr2 ) as [count]
into #result
from bigtable
group by attr1 + "|" + attr2

select l.attr1, l.attr2, r.count
from #result r
  join #lookup l on l.fakeId = r.fakeId

I'm still trying to wrap my brain around this...but maybe?

NULLs are likely to be a problem.

blakeyrat

@karla said in SQL - Getting counts of for each value of a lookup table:

NULLs are likely to be a problem.

NULLs will "break" your first query. (You can't do = NULL in SQL; you need to use IS NULL.)

But yeah, it was just to give you an idea about generating a dynamic "countable" key when you need one. If you need ISNULL to make it work, or whatever, got for it. The details aren't important, just the basic idea that you can GROUP BY and COUNT stuff that's not in the table-on-disk.

Karla

@izzion said in SQL - Getting counts of for each value of a lookup table:

@karla
Are the attributes being used to join to Type tables or something that indicate what the attributes mean? Otherwise, @blakeyrat's solution is functional (though I would generally recommend rewriting it with a sub-query similar to what I did for performance - temporary tables can become pretty cancerous on high usage systems)

Are the Attr1 & Attr2 (and other relevant columns) the same data type? Or are they different data types (or different lengths of the same data type, e.g. VARCHAR(10) and VARCHAR(20) or SMALLINT and BIGINT?

For the primary purposed all columns are int and the current code hard codes them.

Using @masonwheeler 's solution I will alter:

select Attr1, count(*)
from bigtable
where Attr1 in (1, 2)
group by Attr1

select Attr2, count(*)
from bigtable
where Attr2 in (3, 4)
group by Attr2


--... snip 23 more attributes

So for this year I have new attributes and new types.

select Attr1, count(*)
from bigtable
where Attr1 in (1, 2, 8, 20)
group by Attr1

select Attr2, count(*)
from bigtable
where Attr2 in (3, 4)
group by Attr2

--... snip 23 more attributes

select Attr26, count(*)
from bigtable
where Attr26 in (34, 43,77,78)
group by Attr26

Karla

@blakeyrat said in SQL - Getting counts of for each value of a lookup table:

@karla said in SQL - Getting counts of for each value of a lookup table:

NULLs are likely to be a problem.

NULLs will "break" your first query. (You can't do = NULL in SQL; you need to use IS NULL.)

But yeah, it was just to give you an idea about generating a dynamic "countable" key when you need one. If you need ISNULL to make it work, or whatever, got for it.

In mine, NULLs just fall out. I only want to count the non-NULLs of each attribute.

blakeyrat

@karla That's why I put "breaks" in scare-quotes. Stuff like that, there's a good chance the behavior (of skipping NULLs) is on purpose, but it's impossible to tell unless you know the actual specs of the query.

Anyway good luck.

izzion

Hm, yeah, that gets fuglier... The "simplest" query I can think of would be to basically inline what @blakeyrat was doing, but with a UNIONed series of statements, that you'll have to manually modify when you add a new attribute column (but NOT when the range of possible non-NULL values in the columns change):

SELECT
    'Attribute1Name' AS AttributeName
    ,Attr1 AS AttributeValue
    ,COUNT(*)
FROM bigtable
WHERE Attr1 IS NOT NULL
GROUP BY Attr1

UNION ALL

SELECT
    'Attribute2Name' AS AttributeName
    ,Attr2 AS AttributeValue
    ,COUNT(*)
FROM bigtable
WHERE Attr2 IS NOT NULL
GROUP BY Attr2

UNION ALL

-- snip, yada yada

SELECT
    'Attribute26Name' AS AttributeName
    ,Attr26 AS AttributeValue
    ,COUNT(*)
FROM bigtable
WHERE Attr26 IS NOT NULL
GROUP BY Attr26

If you have a master table of the Attribute columns, you could dynamic SQL that together without too much hassle (and the master table could include what you want to show as the AttributeName column, as well...

I assume this is a black box wherein you can't change the schema? Because 26 attribute columns is not really that hot in the realm of relational database design...

blakeyrat

@izzion said in SQL - Getting counts of for each value of a lookup table:

Hm, yeah, that gets fuglier... The "simplest" query I can think of would be to basically inline what @blakeyrat was doing, but with a UNIONed series of statements, that you'll have to manually modify when you add a new attribute column (but NOT when the range of possible non-NULL values in the columns change):

Srsly?

How about:

select distinct( attr1, attr2, attr1 + "|" + attr2 as fakeId )
into #lookup
from bigtable

select attr1 + "|" + attr2 as fakeId,
  count( attr1 + "|" + attr2 ) as [count]
into #result
from bigtable
group by case when attr2 is null then null else attr1 + "|" + attr2

select l.attr1, l.attr2, r.count
from #result r
  join #lookup l on l.fakeId = r.fakeId

(Note the added CASE. Leave it as an exercise to the reader to handle either attr1 or attr2 being NULL.)

EDIT: actually you might not even need that. What does SQL do when you concatenate a int with a NULL? It probably already does the expected action: return NULL.

And yeah the relational design here sucks, but we've all had to deal with shitty database, so there's not much point discussing that.

izzion

@blakeyrat
Assuming the bigtable statistics don't suck¹, the UNIONed series of statements will be significantly better than selecting into a temp table on a system under any load. With a proper memory grant, the UNIONed selects won't have to hit tempdb at all, and thus won't get stuck waiting in line with all the other queries on the system that need to access tempdb. And it's still the same amount of maintenance due to adding new attribute columns versus the synthesize a lookup table tempdb method.

And using @tablevariables wouldn't help over a true #temptable (in fact, they'd likely be worse because of how SQL estimates rows from table variables versus temp tables), as they'll still wind up actually writing into tempdb under any significant amount of data load.

¹If the bigtable statistics aren't good and the estimates for the UNION series of statements are too low, then you'll still wind up with tempdb spills when SQL goes to do the GROUP BY sorts and doesn't have a large enough memory grant to handle the volume of data that actually came back. You could also run into those spills if the system is so memory constrained that it couldn't get the desired memory grant... though anything is going to suck at that point, so....

Karla

@izzion said in SQL - Getting counts of for each value of a lookup table:

Hm, yeah, that gets fuglier... The "simplest" query I can think of would be to basically inline what @blakeyrat was doing, but with a UNIONed series of statements, that you'll have to manually modify when you add a new attribute column (but NOT when the range of possible non-NULL values in the columns change):
SELECT
    'Attribute1Name' AS AttributeName
    ,Attr1 AS AttributeValue
    ,COUNT(*)
FROM bigtable
WHERE Attr1 IS NOT NULL
GROUP BY Attr1

UNION ALL

SELECT
    'Attribute2Name' AS AttributeName
    ,Attr2 AS AttributeValue
    ,COUNT(*)
FROM bigtable
WHERE Attr2 IS NOT NULL
GROUP BY Attr2

UNION ALL

-- snip, yada yada

SELECT
    'Attribute26Name' AS AttributeName
    ,Attr26 AS AttributeValue
    ,COUNT(*)
FROM bigtable
WHERE Attr26 IS NOT NULL
GROUP BY Attr26
If you have a master table of the Attribute columns, you could dynamic SQL that together without too much hassle (and the master table could include what you want to show as the AttributeName column, as well...

I assume this is a black box wherein you can't change the schema? Because 26 attribute columns is not really that hot in the realm of relational database design...

I can adjust the DB anyway I want.

It is a reporting database, gets populated overnight, and I do a bunch of pre-processing to flatten the data to make reporting faster.

The problem I am trying to solve is to provide specific counts of demographics and others of the people we serve.

Standard demographic stuff, plus education, work status, housing, income range, and sources, health insurance, plus for some of these the state when they went in and the state when they were done with the program.

This report is a governmental required report and changed for the upcoming year. I have to program for the current year report and the upcoming year (that starts 10/1).

We are adding attributes/options on the front end to start collecting as of 10/1.

blakeyrat

@izzion said in SQL - Getting counts of for each value of a lookup table:

Assuming the bigtable statistics don't suck1, the UNIONed series of statements will be significantly better than selecting into a temp table on a system under any load.

Then use a table variable. Or a subquery.

Look, I was just demonstrating the technique of creating a fake key that doesn't exist in the on-disk table, and then using it as a basis for your group by and count.

All else is just vomit to demonstrate the desired point.

And in any case, premature optimization sucks, and 26 statements all UNIONed together also sucks, so I stand by what I wrote.

Karla

@blakeyrat said in SQL - Getting counts of for each value of a lookup table:

@izzion said in SQL - Getting counts of for each value of a lookup table:

Assuming the bigtable statistics don't suck1, the UNIONed series of statements will be significantly better than selecting into a temp table on a system under any load.

Then use a table variable. Or a subquery.

Look, I was just demonstrating the technique of creating a fake key that doesn't exist in the on-disk table, and then using it as a basis for your group by and count.

All else is just vomit to demonstrate the desired point.

And in any case, premature optimization sucks, and 26 statements all UNIONed together also sucks, so I stand by what I wrote.

The hard work is done overnight. The code that populates the Excel is slower than the query.

blakeyrat

@karla said in SQL - Getting counts of for each value of a lookup table:

It is a reporting database, gets populated overnight, and I do a bunch of pre-processing to flatten the data to make reporting faster.

Please don't flatten relational data. Use OLAP instead. But third normal form ought to be considered the bare minimum. #opinions.

Karla

@blakeyrat said in SQL - Getting counts of for each value of a lookup table:

@karla said in SQL - Getting counts of for each value of a lookup table:

It is a reporting database, gets populated overnight, and I do a bunch of pre-processing to flatten the data to make reporting faster.

Please don't flatten relational data. Use OLAP instead. But third normal form ought to be considered the bare minimum. #opinions.

Yeah probably.

I don't know OLAP.

They have discussed moving into data analytics going forward.

izzion

@karla
Hm, then I would be tempted to turn this table into a Key:Value store and adjust the data load accordingly.

CREATE TABLE bigtable (
    CustomerID INT
    ,AttributeName VARCHAR(100) NOT NULL
    ,AttributeValue INT NOT NULL
)

At which point if you need to track another attribute, you just slap it into the data load process and your report automatically picks it up, since your reporting query is now:

SELECT
    AttributeName
    ,AttributeValue
    ,COUNT(*)
FROM bigtable
GROUP BY AttributeName, AttributeValue

Karla

@izzion said in SQL - Getting counts of for each value of a lookup table:

@karla
Hm, then I would be tempted to turn this table into a Key:Value store and adjust the data load accordingly.
CREATE TABLE bigtable (
    CustomerID INT
    ,AttributeName VARCHAR(100) NOT NULL
    ,AttributeValue INT NOT NULL
)
At which point if you need to track another attribute, you just slap it into the data load process and your report automatically picks it up, since your reporting query is now:
SELECT
    AttributeName
    ,AttributeValue
    ,COUNT(*)
FROM bigtable
GROUP BY AttributeName, AttributeValue

That I like and can wrap my head around.

I'll update tomorrow. It's late here.

Tsaukpaetra

This almost looks like a job for PIVOT.

An adapted example from

VanMSFT

Using PIVOT and UNPIVOT - SQL Server

Transact-SQL reference for PIVOT and UNPIVOT relational operators. Use these operators on SELECT statements to change a table-valued expression into another table.

-- Pivot table with one row and five columns  
SELECT    
[Type1], [Type2], [Type3], [Type4]
FROM  
(SELECT id, Attr1 
    FROM bigtable) AS SourceTable  
PIVOT  
(  
Count(id)  
FOR Attr1 IN ([Type1], [Type2], [Type3], [Type4])  
) AS PivotTable;

Not tested, insert standard compliance-friendly disclosure here.

Not sure about that Attr1 vs Attr2 , you might end up needing a UNION ALL anyways if that's the case?

Pretty much, if you can make a Pivot Table in Excel from the data, you should be able to do it here in the query directly.

boomzilla

@tsaukpaetra said in SQL - Getting counts of for each value of a lookup table:

Pretty much, if you can make a Pivot Table in Excel from the data, you should be able to do it here in the query directly.

Here's how I do pivot type queries (though I'd usually use DECODE because that's less verbose for simple stuff in Oracle):

select
  sum( case when attr1 = type1 then 1 else 0 ) attr1_type1,
  sum( case when attr1 = type2 then 1 else 0 ) attr1_type2,
  sum( case when attr2 = type3 then 1 else 0 ) attr2_type3,
...etc

TheCPUWizard

@karla said in SQL - Getting counts of for each value of a lookup table:

I don't know OLAP.

This is OLAF, he can help with your data

[Oh Wait...]

Apologies in advance, but I have recently been accused of being "too helpful/serious"....so.....

boomzilla

@thecpuwizard Just let it go, dude.

TheCPUWizard

@boomzilla said in SQL - Getting counts of for each value of a lookup table:

@thecpuwizard Just let it go, dude.

Let it go, let it go
Can’t hold it back anymore
Let it go, let it go
Turn my back and slam the door

blakeyrat

@boomzilla I think I might have liked him better when he was an incredible braggart.