Thursday, February 09, 2006

Finally Saw the Light

After all of the wondering I did about the value of the new DELETE TOP(x) in SQL Server 2005 (here), I finally saw the light today as I worked on a SQL Server 2000 project.

Picture this:

  • A huge-ass table with 2.5 million records taking up about 5.5 GB of disk space.
  • Most, but not necessarily all, of this data needs to be replaced monthly with a new snapshot that will be generated via an ETL (Extract, Transform, Load) process.
  • Due to disk space, the database has a hard limit of 12 GB, and the transaction log is also size fixed.

The data is marked with a month and year, so replacing data consists of deleting existing records for the months/years that are being imported, and then copying the new records into the table. But, it's not that simple.

If you just remove all of the records in one batch using DELETE FROM table WHERE..., then you need to have enough room in the transaction log to write the "undo" information in case the batch fails (so it can roll back the entire DELETE). If the transaction log cannot accomodate the 5.5GB of data that will be stuffed into it, then the batch will fail.

So, you need to find a way to break up the deletes into smaller chunks. Each smaller DELETE will still use the Transaction log, but after the command successfully terminates, that space will be reclaimed for subsequent operations.

Under SQL 2000, you're stuck with things like using Cursors in order to iterate through a list of month/year combinations from the incoming data (assuming that it was saved to an intermediate table first), and then using those values as part of a DELETE statement within the loop.

But, with our new friend SQL Server 2005, you can use something like multiple "DELETE TOP(1000) FROM table WHERE..." in order to break up the transaction.