Friday, August 26, 2005

UPDATE TOP(x) Behavior Figured Out

Just tested UPDATE TOP(x) to see if I could better understand how it works:

The first (x) records will always get updated as stored on disk (i.e., by the clustered index). Subsequent updates, unless including a field that affects the clustered index, will continue to update the same records.

You can see this behavior by looking at the Execution Plan in Management Studio. Heap tables do a simple table scan, while tables with a Clustered Index will to a clustered index scan to find the x'th record.

Steps to reproduce:

1. Create a new table

USE [AdventureWorks]
CREATE TABLE [dbo].[test](
[id] [int] IDENTITY(1,1) NOT NULL,
[LastUpdated] [datetime] NULL

2. Insert 100 records into the table. Then, set the LastUpdated field to the same value:
UPDATE test SET LastUpdated='1/1/2000'

At this point, there's no clustered index or primary key (which would create a clustered index by default), so we're dealing with a heap table.

3. Execute the following:

update top(5) test set lastupdated='3/1/2005'

No matter how many times you execute, the first 5 rows will always be '3/1/2005'

4. Now add a clustered index (non-unique) on the LastUpdated field. This will sort the records by that field's value on the disk itself. The first five rows, which were previously updated, will now move to the bottom of the table when you SELECT * FROM test

5. Now execute:

update top(5) test set lastupdated='3/1/2005'

Because the new date is greater than the original date, these 5 records move to the end of the list as well. The next time you execute this command, then next 5 records will move to the bottom.

So, the notion of breaking up a transaction into smaller chunks using TOP does not seem to be a viable solution. Most clustered indexes will probably be set to the primary key for the table, which is unlikely to be updated by your query (so subsequent UPDATE TOP(x) will continue to update the same rows).

UPDATE: The same behavior was observed for DELETE TOP(x) as well. So, I think the MS Technical Writer needs to update the SQL Server 2005 Books Online to get rid of the comment that indicates that using TOP(x) with DML is "random". So far, it looks very predictable.