Saturday, February 18, 2006

Spot the Bug

Here is a completely made up query, but is close to one that I wrote recently to spread an overhead cost value (@MarketingExpenses) over a list of invoices. This is just one step in a complex series of calculations that helps to determine profitability of products or services sold.

WHEN COALESCE(b.sum_of_sales,0) > 0
THEN (a.sales / b.sum_of_sales) * @MarketingExpenses
END AS ShareOfMarketingExpenses
invoice a LEFT JOIN regional_sum b
ON a.region_code = b.region_code

This query assumes that there's a table called [regional_sum] that has the sum of sales for each region. The [invoice] table is joined to [regional_sum] by linking the "region_code" fields of both tables, and since a LEFT JOIN is used, it will include all rows from [invoice] and only those rows in [regional_sum] that matches (if there is not a match, then NULL values will be used for the [regional_sum] data).

Now, the point of the COALESCE function is to return the first non-NULL value in the supplied list of values (b.sum_of_sales and zero). In this case, I'm simply trying to handle NULL values by changing them into zeros.

You'll also notice that I'm using a SQL CASE statement. I do this because I need to prevent the dreaded "divide by zero" error, so I ensure that the "sum_of_sales" field is not zero before trying to calculate a percentage (otherwise, I return zero as the result of the calculation).

The percentage is then multiplied by the predetermined "@MarketingExpenses" value in order to give you a number of marketing dollars that could be attributed to that one invoice record ("ShareOfMarketingExpenses").

Let me just assure you that the query above compiles and works as written, so the bug is not a syntax error. However, there was an issue that came up as we were trying to reconcile data from the new system with old/validated numbers (the sum of all of the "ShareOfMarketingExpenses" values did not equal the specified "@MarketingExpenses" value).

I'll leave it as an exercise to the reader, but I'll also provide some sample data:


region_code sum_of_sales
----------- ------------
a0001 12345.67
a0002 432312.33
b0001 -123.23
b0004 34322.23
b0005 -12.12
c0006 234328.23

(Negative sales are probably due to internal accounting of moving inventory around from one region to another, but I'm not that close to this [fictitious] business, so I can't explain the numbers any better).


customer_id region_code sales
----------- ----------- ----------
1 a0001 12.32
1 a0001 33.23
2 b0001 5.34
3 c0006 1232.21
3 b0004 322.22
4 a0003 232.23
[the list goes on, but pretend that the
sales for each region correctly sums up
to the values listed in the above table]

It was a pretty cool "Ah ha!" moment for me when I spotted the flaw. Your mileage may vary.