Aggregating Averages


  • Trolleybus Mechanic

    My math co-processor is offline today, so I can't quite remember the correct way of doing this.

    Issue: I have Average Number of X Per Period.  The main report period is by Month. I need to take that data and average it out by Quarter instead. Suppose this data table:

    Month    Avg_X
    1           1
    2           20
    3          6

    If I change the period to Quarter, would I end up with:

    1) SUM(Avg_X) WHERE Month in (1, 2, 3)  -- [b]Which would equal 27[/b]
    2) AVG(Avg_X) WHERE Month in (1, 2, 3) -- [b]Which would equal 27/3 = 9[/b]

    I suspect #2, but in a foggy stupor wrote #1, and have completely confused myself.

    Thanks!


  • ♿ (Parody)

    What's confusing is "Average Number of X per Period", since it implies two aggregates (COUNT and AVERAGE), yet there's only one dataset. I.e., you can't have "Average Number of Calls Received per Month", you need "Average Number of Calls Received Each Month Over 2010," or "Average Number of Calls Per Day Over Each month".

    So, assuming there are two aggregates in play, and you really meant "Average Number of X [per N] per Month," then #2 is probably as close as you need.

    It could skew slightly, especially if you mean you mean "per day" or "per week", since those are different each month, and thus don't have a common denominator

    ((x1+x2+...+xn)/n + (y1+y2+...+ym)/m)/2 != (((x1+x2+...+xn)+ (y1+y2+...+ym))/(n+m))/2

    So if you were being really pedantic, you'd need to add a column to your data table (Avg_X_by_N), and then aggregate that by the N per quarter.


  • Trolleybus Mechanic

     Yeah, it is a bit confusing. There's a black-box stored procedure that returns the quoted table above with two columns, Month and Average_Per_Month.

    I need to get the Average_Per_Quarter from that data. I don't have access to the initial records, only the already-aggregated data.  But I assume, mathematically,

    AVG(AVG(Month 1 Data), AVG(Month 2 Data), AVG(Month 3 Data))


    will give me the same results as if I had access to the data set and had done

    AVG(Month 1 Data, Month 2 Data, Month 3 Data)



  • ♿ (Parody)

    @Lorne Kates said:


    Issue: I have Average Number of X Per Period.  The main report period is by Month. I need to take that data and average it out by Quarter instead. Suppose this data table:

    Your problem isn't clear here (average X over what? as Alex asked). Is this a daily average over days? If that's so, then you'll need to multiply each monthly average by the number of days in the respective month, add them all for the quarter, and then divide by the number of days in the quarter.


  • ♿ (Parody)

    The only way that an "average of averages" will work is if there's a common denominator. Consider the following sets:

    • {4,8} => 6
    • {3,5,9,9,6,4} => 6
    • {6,12} => 9

    The Average of the Averages {6,6,9} is 7, but the average of all sets is 6.6, which is a ~6% discrepancy.

    The only way to get the correct average is to weight the averages -- but to know what the weight is, you need to know the second dimension:

    • Average [Dollars Per Sale] Per Month
    • Average [Calls Per Day] Per Month

  • Trolleybus Mechanic

     I can see where the confusion is. The stat coming out of the black box is "Average [Waittime per Client] Per Month".  I needed Average [Waittime per client] Per Quarter.  The best I could manage was Average [Average [Waittime per client] Per Month] Per Quarter. Which, as teh mathz above points out, won't work.

    Thanks for the points. I can show now that it's impossible to get the stat we want from the data I have. I got permission to crack the black box and do a bit of rewriting. Now I can do AVG(Waittime) GROUP BY Quarter, instead of relying on AVG(Waittime) GROUP BY Month.

     



  • You could still use the black box if you were able to get the count of clients in a month from something else.  Then you just multiply out your waittime average by client count for the month, sum them for the three months, and devide by the total client count for the three months.  Depending on the black boxes you are using it may be quicker to do it that way then redoing the original stuff as well.


Log in to reply