SQL - Getting counts of for each value of a lookup table



  • Right now I am basically doing this:

    	select Attr1_Type1 = (select count(id) from bigtable where Attr1 = Type1)
    		, Attr1_Type2 = (select count(id) from bigtable where Attr1 = Type2)
    		, Attr2_Type3 = (select count(id) from bigtable where Attr2 = Type3)
    		, Attr2_Type4 = (select count(id) from bigtable where Attr2 = Type4)
    

    There's got to be a better way.

    Especially, since we've adding new attributes and new options in the lookups.


  • Impossible Mission - B

    @karla If it was all one column, it would be easy.

    Since it's not, the best way probably involves multiple queries.

    select Attr1, count(*)
    from bigtable
    where Attr1 in (Type1, Type2)
    group by Attr1
    
    select Attr2, count(*)
    from bigtable
    where Attr2 in (Type3, Type4)
    group by Attr2
    


  • @masonwheeler said in SQL - Getting counts of for each value of a lookup table:

    @karla If it was all one column, it would be easy.

    Since it's not, the best way probably involves multiple queries.

    select Attr1, count(*)
    from bigtable
    where Attr1 in (Type1, Type2)
    group by Attr1
    
    select Attr2, count(*)
    from bigtable
    where Attr2 in (Type3, Type4)
    group by Attr2
    

    First I need a separate count for each type.

    select Attr1_Type1, count(*)
    from bigtable
    where Attr1 = Type1
    group by Attr1
    
    select Attr1_Type2, count(*)
    from bigtable
    where Attr1 = Type2
    group by Attr1
    
    select Attr2_Type3, count(*)
    from bigtable
    where Attr2 = Type3
    group by Attr2
    
    select Attr2_Type4, count(*)
    from bigtable
    where Attr2 = Type4
    group by Attr2
    

    That's still pretty tedious if I have 25 attributes with 2 - 12 options (that change based on the year).


  • I survived the hour long Uno hand

    @Karla
    I'm assuming bigtable is the data table and there's a separate type/lookup table that has what the various attributes mean?

    If so, you might be able to future proof the query like:

    SELECT 
        a1.Attr1
       ,COUNT(b.*)
    FROM 
        ( SELECT DISTINCT Attr1 FROM lookuptable1 ) AS a1
    LEFT JOIN bigtable AS b
    ON a1.Attr1 = b.Attr1
    UNION ALL -- if the type columns are the same data type, that way you can get one result set. 
              -- Otherwise, you'll have to go with two separate queries as per above
    SELECT 
        a2.Attr2
       ,COUNT(b.*)
    FROM 
        ( SELECT DISTINCT Attr2 FROM lookuptable2 ) AS a2
    LEFT JOIN bigtable AS b
    ON a2.Attr2 = b.Attr2
    

    From a technical perspective, it doesn't matter if you give the subqueries separate aliases, to me it's more readable to see "oh those are different tables and the base bigtable is the same". More of a style preference there than technical requirement.


  • I survived the hour long Uno hand

    @karla @masonwheeler's suggestion would give you separate counts for each type, that's what the GROUP BY is doing in those queries (besides allowing you to output the actual type value)



  • @izzion said in SQL - Getting counts of for each value of a lookup table:

    @Karla
    I'm assuming bigtable is the data table and there's a separate type/lookup table that has what the various attributes mean?

    Yes. Though I've already flattened the data and do not have to join with lookups.

    If so, you might be able to future proof the query like:

    SELECT 
        a1.Attr1
       ,COUNT(b.*)
    FROM 
        ( SELECT DISTINCT Attr1 FROM lookuptable1 ) AS a1
    LEFT JOIN bigtable AS b
    ON a1.Attr1 = b.Attr1
    UNION ALL -- if the type columns are the same data type, that way you can get one result set. 
              -- Otherwise, you'll have to go with two separate queries as per above
    SELECT 
        a2.Attr2
       ,COUNT(b.*)
    FROM 
        ( SELECT DISTINCT Attr2 FROM lookuptable2 ) AS a2
    LEFT JOIN bigtable AS b
    ON a2.Attr2 = b.Attr2
    

    From a technical perspective, it doesn't matter if you give the subqueries separate aliases, to me it's more readable to see "oh those are different tables and the base bigtable is the same". More of a style preference there than technical requirement.

    How it is returned isn't really a factor. I am putting the counts into a table by AttributeId, TypeId and returning that table.



  • @karla You can create a fake identifier then count based on that possibly?

    DISCLAIMER: I DO NOT HAVE SSMS OPEN AND PROBABLY MADE SYNTAX ERRORS BUT YOU GET THE IDEA

    Something like:

    select attr1 + "|" + attr2 as fakeId,
      count( attr1 + "|" + attr2 )
    from bigtable
    group by attr1 + "|" + attr2
    

    Notes:

    1. This assumes | doesn't appear in the attributes, use a bigger/weirder delimiter if it does
    2. I think some DBMS's would let you do group by fakeId to save some typing

    The output's not super nice, but you can use a simple lookup table for that. You could even auto generate the lookup table:

    select distinct( attr1, attr2, attr1 + "|" + attr2 as fakeId )
    into #lookup
    from bigtable
    

    For a final solution like:

    select distinct( attr1, attr2, attr1 + "|" + attr2 as fakeId )
    into #lookup
    from bigtable
    
    select attr1 + "|" + attr2 as fakeId,
      count( attr1 + "|" + attr2 ) as [count]
    into #result
    from bigtable
    group by attr1 + "|" + attr2
    
    select l.attr1, l.attr2, r.count
    from #result r
      join #lookup l on l.fakeId = r.fakeId
    


  • @izzion said in SQL - Getting counts of for each value of a lookup table:

    @karla @masonwheeler's suggestion would give you separate counts for each type, that's what the GROUP BY is doing in those queries (besides allowing you to output the actual type value)

    Yeah, derp.

    Still with ~25 attributes that change by year. I am not seeing an alternative to dynamic SQL.


  • I survived the hour long Uno hand

    @karla
    Are the attributes being used to join to Type tables or something that indicate what the attributes mean? Otherwise, @blakeyrat's solution is functional (though I would generally recommend rewriting it with a sub-query similar to what I did for performance - temporary tables can become pretty cancerous on high usage systems)

    Are the Attr1 & Attr2 (and other relevant columns) the same data type? Or are they different data types (or different lengths of the same data type, e.g. VARCHAR(10) and VARCHAR(20) or SMALLINT and BIGINT?



  • @blakeyrat said in SQL - Getting counts of for each value of a lookup table:

    @karla You can create a fake identifier then count based on that possibly?

    DISCLAIMER: I DO NOT HAVE SSMS OPEN AND PROBABLY MADE SYNTAX ERRORS BUT YOU GET THE IDEA

    Something like:

    select attr1 + "|" + attr2 as fakeId,
      count( attr1 + "|" + attr2 )
    from bigtable
    group by attr1 + "|" + attr2
    

    Notes:

    1. This assumes | doesn't appear in the attributes, use a bigger/weirder delimiter if it does
    2. I think some DBMS's would let you do group by fakeId to save some typing

    The output's not super nice, but you can use a simple lookup table for that. You could even auto generate the lookup table:

    select distinct( attr1, attr2, attr1 + "|" + attr2 as fakeId )
    into #lookup
    from bigtable
    

    For a final solution like:

    select distinct( attr1, attr2, attr1 + "|" + attr2 as fakeId )
    into #lookup
    from bigtable
    
    select attr1 + "|" + attr2 as fakeId,
      count( attr1 + "|" + attr2 ) as [count]
    into #result
    from bigtable
    group by attr1 + "|" + attr2
    
    select l.attr1, l.attr2, r.count
    from #result r
      join #lookup l on l.fakeId = r.fakeId
    

    I'm still trying to wrap my brain around this...but maybe?

    NULLs are likely to be a problem.



  • @karla said in SQL - Getting counts of for each value of a lookup table:

    NULLs are likely to be a problem.

    NULLs will "break" your first query. (You can't do = NULL in SQL; you need to use IS NULL.)

    But yeah, it was just to give you an idea about generating a dynamic "countable" key when you need one. If you need ISNULL to make it work, or whatever, got for it. The details aren't important, just the basic idea that you can GROUP BY and COUNT stuff that's not in the table-on-disk.



  • @izzion said in SQL - Getting counts of for each value of a lookup table:

    @karla
    Are the attributes being used to join to Type tables or something that indicate what the attributes mean? Otherwise, @blakeyrat's solution is functional (though I would generally recommend rewriting it with a sub-query similar to what I did for performance - temporary tables can become pretty cancerous on high usage systems)

    Are the Attr1 & Attr2 (and other relevant columns) the same data type? Or are they different data types (or different lengths of the same data type, e.g. VARCHAR(10) and VARCHAR(20) or SMALLINT and BIGINT?

    For the primary purposed all columns are int and the current code hard codes them.

    Using @masonwheeler 's solution I will alter:

    select Attr1, count(*)
    from bigtable
    where Attr1 in (1, 2)
    group by Attr1
    
    select Attr2, count(*)
    from bigtable
    where Attr2 in (3, 4)
    group by Attr2
    
    
    --... snip 23 more attributes
    
    

    So for this year I have new attributes and new types.

    select Attr1, count(*)
    from bigtable
    where Attr1 in (1, 2, 8, 20)
    group by Attr1
    
    select Attr2, count(*)
    from bigtable
    where Attr2 in (3, 4)
    group by Attr2
    
    --... snip 23 more attributes
    
    select Attr26, count(*)
    from bigtable
    where Attr26 in (34, 43,77,78)
    group by Attr26
    
    


  • @blakeyrat said in SQL - Getting counts of for each value of a lookup table:

    @karla said in SQL - Getting counts of for each value of a lookup table:

    NULLs are likely to be a problem.

    NULLs will "break" your first query. (You can't do = NULL in SQL; you need to use IS NULL.)

    But yeah, it was just to give you an idea about generating a dynamic "countable" key when you need one. If you need ISNULL to make it work, or whatever, got for it.

    In mine, NULLs just fall out. I only want to count the non-NULLs of each attribute.



  • @karla That's why I put "breaks" in scare-quotes. Stuff like that, there's a good chance the behavior (of skipping NULLs) is on purpose, but it's impossible to tell unless you know the actual specs of the query.

    Anyway good luck.


  • I survived the hour long Uno hand

    Hm, yeah, that gets fuglier... The "simplest" query I can think of would be to basically inline what @blakeyrat was doing, but with a UNIONed series of statements, that you'll have to manually modify when you add a new attribute column (but NOT when the range of possible non-NULL values in the columns change):

    SELECT
        'Attribute1Name' AS AttributeName
        ,Attr1 AS AttributeValue
        ,COUNT(*)
    FROM bigtable
    WHERE Attr1 IS NOT NULL
    GROUP BY Attr1
    
    UNION ALL
    
    SELECT
        'Attribute2Name' AS AttributeName
        ,Attr2 AS AttributeValue
        ,COUNT(*)
    FROM bigtable
    WHERE Attr2 IS NOT NULL
    GROUP BY Attr2
    
    UNION ALL
    
    -- snip, yada yada
    
    SELECT
        'Attribute26Name' AS AttributeName
        ,Attr26 AS AttributeValue
        ,COUNT(*)
    FROM bigtable
    WHERE Attr26 IS NOT NULL
    GROUP BY Attr26
    

    If you have a master table of the Attribute columns, you could dynamic SQL that together without too much hassle (and the master table could include what you want to show as the AttributeName column, as well...

    I assume this is a black box wherein you can't change the schema? Because 26 attribute columns is not really that hot in the realm of relational database design... 🍹



  • @izzion said in SQL - Getting counts of for each value of a lookup table:

    Hm, yeah, that gets fuglier... The "simplest" query I can think of would be to basically inline what @blakeyrat was doing, but with a UNIONed series of statements, that you'll have to manually modify when you add a new attribute column (but NOT when the range of possible non-NULL values in the columns change):

    Srsly?

    How about:

    select distinct( attr1, attr2, attr1 + "|" + attr2 as fakeId )
    into #lookup
    from bigtable
    
    select attr1 + "|" + attr2 as fakeId,
      count( attr1 + "|" + attr2 ) as [count]
    into #result
    from bigtable
    group by case when attr2 is null then null else attr1 + "|" + attr2
    
    select l.attr1, l.attr2, r.count
    from #result r
      join #lookup l on l.fakeId = r.fakeId
    

    (Note the added CASE. Leave it as an exercise to the reader to handle either attr1 or attr2 being NULL.)

    EDIT: actually you might not even need that. What does SQL do when you concatenate a int with a NULL? It probably already does the expected action: return NULL.

    And yeah the relational design here sucks, but we've all had to deal with shitty database, so there's not much point discussing that.


  • I survived the hour long Uno hand

    @blakeyrat
    Assuming the bigtable statistics don't suck1, the UNIONed series of statements will be significantly better than selecting into a temp table on a system under any load. With a proper memory grant, the UNIONed selects won't have to hit tempdb at all, and thus won't get stuck waiting in line with all the other queries on the system that need to access tempdb. And it's still the same amount of maintenance due to adding new attribute columns versus the synthesize a lookup table tempdb method.

    And using @tablevariables wouldn't help over a true #temptable (in fact, they'd likely be worse because of how SQL estimates rows from table variables versus temp tables), as they'll still wind up actually writing into tempdb under any significant amount of data load.

    1If the bigtable statistics aren't good and the estimates for the UNION series of statements are too low, then you'll still wind up with tempdb spills when SQL goes to do the GROUP BY sorts and doesn't have a large enough memory grant to handle the volume of data that actually came back. You could also run into those spills if the system is so memory constrained that it couldn't get the desired memory grant... though anything is going to suck at that point, so.... 🤷♂



  • @izzion said in SQL - Getting counts of for each value of a lookup table:

    Hm, yeah, that gets fuglier... The "simplest" query I can think of would be to basically inline what @blakeyrat was doing, but with a UNIONed series of statements, that you'll have to manually modify when you add a new attribute column (but NOT when the range of possible non-NULL values in the columns change):

    SELECT
        'Attribute1Name' AS AttributeName
        ,Attr1 AS AttributeValue
        ,COUNT(*)
    FROM bigtable
    WHERE Attr1 IS NOT NULL
    GROUP BY Attr1
    
    UNION ALL
    
    SELECT
        'Attribute2Name' AS AttributeName
        ,Attr2 AS AttributeValue
        ,COUNT(*)
    FROM bigtable
    WHERE Attr2 IS NOT NULL
    GROUP BY Attr2
    
    UNION ALL
    
    -- snip, yada yada
    
    SELECT
        'Attribute26Name' AS AttributeName
        ,Attr26 AS AttributeValue
        ,COUNT(*)
    FROM bigtable
    WHERE Attr26 IS NOT NULL
    GROUP BY Attr26
    

    If you have a master table of the Attribute columns, you could dynamic SQL that together without too much hassle (and the master table could include what you want to show as the AttributeName column, as well...

    I assume this is a black box wherein you can't change the schema? Because 26 attribute columns is not really that hot in the realm of relational database design... 🍹

    I can adjust the DB anyway I want.

    It is a reporting database, gets populated overnight, and I do a bunch of pre-processing to flatten the data to make reporting faster.

    The problem I am trying to solve is to provide specific counts of demographics and others of the people we serve.

    Standard demographic stuff, plus education, work status, housing, income range, and sources, health insurance, plus for some of these the state when they went in and the state when they were done with the program.

    This report is a governmental required report and changed for the upcoming year. I have to program for the current year report and the upcoming year (that starts 10/1).

    We are adding attributes/options on the front end to start collecting as of 10/1.



  • @izzion said in SQL - Getting counts of for each value of a lookup table:

    Assuming the bigtable statistics don't suck1, the UNIONed series of statements will be significantly better than selecting into a temp table on a system under any load.

    Then use a table variable. Or a subquery.

    Look, I was just demonstrating the technique of creating a fake key that doesn't exist in the on-disk table, and then using it as a basis for your group by and count.

    All else is just vomit to demonstrate the desired point.

    And in any case, premature optimization sucks, and 26 statements all UNIONed together also sucks, so I stand by what I wrote.



  • @blakeyrat said in SQL - Getting counts of for each value of a lookup table:

    @izzion said in SQL - Getting counts of for each value of a lookup table:

    Assuming the bigtable statistics don't suck1, the UNIONed series of statements will be significantly better than selecting into a temp table on a system under any load.

    Then use a table variable. Or a subquery.

    Look, I was just demonstrating the technique of creating a fake key that doesn't exist in the on-disk table, and then using it as a basis for your group by and count.

    All else is just vomit to demonstrate the desired point.

    And in any case, premature optimization sucks, and 26 statements all UNIONed together also sucks, so I stand by what I wrote.

    The hard work is done overnight. The code that populates the Excel is slower than the query.



  • @karla said in SQL - Getting counts of for each value of a lookup table:

    It is a reporting database, gets populated overnight, and I do a bunch of pre-processing to flatten the data to make reporting faster.

    Please don't flatten relational data. Use OLAP instead. But third normal form ought to be considered the bare minimum. #opinions.



  • @blakeyrat said in SQL - Getting counts of for each value of a lookup table:

    @karla said in SQL - Getting counts of for each value of a lookup table:

    It is a reporting database, gets populated overnight, and I do a bunch of pre-processing to flatten the data to make reporting faster.

    Please don't flatten relational data. Use OLAP instead. But third normal form ought to be considered the bare minimum. #opinions.

    Yeah probably. 🤷🏻♂

    I don't know OLAP.

    They have discussed moving into data analytics going forward.


  • I survived the hour long Uno hand

    @karla
    Hm, then I would be tempted to turn this table into a Key:Value store and adjust the data load accordingly.

    CREATE TABLE bigtable (
        CustomerID INT
        ,AttributeName VARCHAR(100) NOT NULL
        ,AttributeValue INT NOT NULL
    )
    

    At which point if you need to track another attribute, you just slap it into the data load process and your report automatically picks it up, since your reporting query is now:

    SELECT
        AttributeName
        ,AttributeValue
        ,COUNT(*)
    FROM bigtable
    GROUP BY AttributeName, AttributeValue
    


  • @izzion said in SQL - Getting counts of for each value of a lookup table:

    @karla
    Hm, then I would be tempted to turn this table into a Key:Value store and adjust the data load accordingly.

    CREATE TABLE bigtable (
        CustomerID INT
        ,AttributeName VARCHAR(100) NOT NULL
        ,AttributeValue INT NOT NULL
    )
    

    At which point if you need to track another attribute, you just slap it into the data load process and your report automatically picks it up, since your reporting query is now:

    SELECT
        AttributeName
        ,AttributeValue
        ,COUNT(*)
    FROM bigtable
    GROUP BY AttributeName, AttributeValue
    

    That I like and can wrap my head around.

    I'll update tomorrow. It's late here.


  • Notification Spam Recipient

    This almost looks like a job for PIVOT.

    An adapted example from

    -- Pivot table with one row and five columns  
    SELECT    
    [Type1], [Type2], [Type3], [Type4]
    FROM  
    (SELECT id, Attr1 
        FROM bigtable) AS SourceTable  
    PIVOT  
    (  
    Count(id)  
    FOR Attr1 IN ([Type1], [Type2], [Type3], [Type4])  
    ) AS PivotTable;  
    

    Not tested, insert standard compliance-friendly disclosure here.

    Not sure about that Attr1 vs Attr2 , you might end up needing a UNION ALL anyways if that's the case?

    Pretty much, if you can make a Pivot Table in Excel from the data, you should be able to do it here in the query directly.


  • ♿ (Parody)

    @tsaukpaetra said in SQL - Getting counts of for each value of a lookup table:

    Pretty much, if you can make a Pivot Table in Excel from the data, you should be able to do it here in the query directly.

    Here's how I do pivot type queries (though I'd usually use DECODE because that's less verbose for simple stuff in Oracle):

    select
      sum( case when attr1 = type1 then 1 else 0 ) attr1_type1,
      sum( case when attr1 = type2 then 1 else 0 ) attr1_type2,
      sum( case when attr2 = type3 then 1 else 0 ) attr2_type3,
    ...etc
    


  • @karla said in SQL - Getting counts of for each value of a lookup table:

    I don't know OLAP.

    This is OLAF, he can help with your data

    [Oh Wait...]

    Apologies in advance, but I have recently been accused of being "too helpful/serious"....so.....


  • ♿ (Parody)

    @thecpuwizard Just let it go, dude.



  • @boomzilla said in SQL - Getting counts of for each value of a lookup table:

    @thecpuwizard Just let it go, dude.

    Let it go, let it go
    Can’t hold it back anymore
    Let it go, let it go
    Turn my back and slam the door



  • @boomzilla I think I might have liked him better when he was an incredible braggart.


Log in to reply