Wednesday, February 01, 2006

Database Performance Increase Tip: Avoid MS-Access

A client of mine wanted to put together a tool to help them determine the profitability of their products. They didn't have a big budget, so initially (last year), they opted for a MS-Access based tool, since they already knew (or thought that they knew) how to use Access. They were also under the assumption that this would be cheaper than designing and building a purely SQL Server based solution (because they don't understand SQL Server).

So, my company developed the tool for them to take a bunch of overhead numbers and distribute the associated costs for a period of time over the list of invoice line items for the same period. Using Access. From a desktop client.

Ok, that's fine. The tool worked really fast with the sample data that they provided. Party on.

But, whoa! Their production data for just 1 month's worth of invoice data was multiple gigabytes in size - much too large to transfer into an Access database (which apparently has a 2GB file size limit). And, typically, the profitability was measured on a quarter's worth of data (3 months). Well, now we needed to introduce SQL Server into the equation in order to hold the large sets of data, while the business logic continued to exist in Access.

Did I mention that I was NOT involved in the architecture or construction at this point?

So, picture this: They would use Access to drive the data migration by linking to the source database in their data warehouse (using ODBC) and the destination database in SQL Server (also using ODBC) and pump all of those gigabytes worth of data through the desktop client. Besides being extremely slow, this also had a nice side effect of bloating the transaction log file (LDF) to the point that it actually filled the disk.

Then, once data for a period was in place on the SQL Server, a VBA module in Access would fire off, open about 15 recordsets to various lookup tables, loop through each row of the invoice data, search the 15 recordsets for corresponding data using .FindFirst, and then eventually update the row in the invoice table with calculated data.

This whole process, end-to-end, would literally take them 3+ days to run. They would start it on a spare machine, and check back days later to see if it completed, or if it errored out (in which case, they had to start over).

Ok, so the MS-Access solution is now being called a prototype, and I'm now in the picture to ensure that they get better performance out of the next iteration of the project. The obvious improvements that I'm making are to use DTS to pump the data from Oracle into SQL Server 2000, establishing proper indexes (including a clustered index that includes the invoice date), and to move everything out of Access. That VBA module is being ported to a stored procedure, and instead of using cursor-based processing, I'm manually converting everything to set-based.

(To be fair, a lot of these changes were actually identified by the original developer of the MS-Access developer before I became involved)

I don't have access to their production data at this point, but based on the data that I have, I'm predicting that the 3-days worth of processing can be completed in 10 minutes. That's 432 times faster (which, if I'm calculating correctly, is 21,600% faster??!), chiefly due to the logic being moved to the data tier instead of pumping data across the wire using ODBC and MS-Access.

I'm excited to see this run on the actual production system...