Think you know how to write UPDATE statement? Think again.

When I was a kid, my mom used to read me UPDATE queries every night before I went to sleep. I heared many stories about updating objects to database where most of them were the same old:


UPDATE [ Table ] SET Field1 = Value1, Field2 = Value2, . . . WHERE PRIMARYKEY = TheKey

So, I grew up with the same ideas on how to update objects in tables as other kids do. All the UPDATE queries involved taking all the fields and the update stored procedures used to have all the properties of the objects. If you are using some Code Generators (e.g. Code Smith) and generating data access layer codes and stored procedures for objects, you will see almost all the code generators and ORM tools generate UPDATE statments with all the fields in the SET block. Let me show you with an example how evil this idea is.

Imagine a table like this:


CREATE TABLE [ dbo ] . [ ChannelSubscribedByUser ] ( [ ID ] [ int ] IDENTITY ( 1 , 1 ) NOT NULL , [ UserId ] [ int ] NOT NULL , [ ChannelId ] [ int ] NOT NULL , [ ReadRSSItemIDs ] [ image ] NOT NULL , CONSTRAINT [ PK_ChannelSubscribedByUser ] PRIMARY KEY CLUSTERED ( [ UserId ] ASC , [ ChannelId ] ASC )

This is a table from Pageflakes database. In this table, we store all the RSS feeds user has read from a particular RSS channel. UserId is a foreign key to User table and ChannelID is a foreign key to Channel table. Pretty straight forward. We had a harmless update stored procedure generated using Code Smith using the famous .NET Tiers template.


ALTER PROCEDURE [ dbo ] . [ prcChannelSubscribedByUserUpdate ] @ID int , @ChannelId int , @ReadRSSItemIDs image , @UserId int AS UPDATE dbo. [ ChannelSubscribedByUser ] SET [ ChannelId ] = @ChannelId , [ ReadRSSItemIDs ] = @ReadRSSItemIDs , [ UserId ] = @UserId WHERE [ ID ] = @ID SELECT [ ID ] , [ UserId ] , [ ChannelId ] , [ ReadRSSItemIDs ] FROM dbo. [ ChannelSubscribedByUser ] WHERE [ ID ] = @ID

Look at the query plan and see how horrible it really is:

 

 

There are 2 Clustered Index Seeks, one Table Spool (very expensive), 2 Nested Loops, 1 Assert, 1 Clustered Index Seek. If you look at the IO Statistics, you can see how truly evil this query is:

Table 'RSSChannel'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'PageFlakesUser'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ChannelSubscribedByUser'. Scan count 1, logical reads 15, physical reads 0, read-ahead reads 0, lob logical reads 3, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

It is making SQL Server go to the tables which are refered as foreign key during a single row update!

Generally when we update a row in a table which represents an object, we rarely change the value of the fields which are foreign keys to other tables and has index on them. Most of the time, the updates are on the fields which contain properties, not relations. For example, 99% of the cases, you will update properties of an Employee object like FirstName, LastName, Age etc. 1% case you will modify the CompanyID (because s/he was fired) which is a foreign key to Company Table. But if you go to your database and see the stored procedure which updates the Employee object, you will see this:


UPDATE Employee SET FirstName = @FirstName , LastName = @LastName , CompanyID = @CompanyID WHERE EmployeeID = @EmployeeID

Don't be ashamed. I know we all have queries like this every where.

If you remove those unwanted fields which generally have Foreign Key and Index on them, you can gain significant performance improvement. When I just change the UPDATE statement to this:


UPDATE dbo. [ ChannelSubscribedByUser ] SET [ ReadRSSItemIDs ] = @ReadRSSItemIDs , WHERE [ ID ] = @ID

See the query plan:

There's just one Clustered Index Update. No Table Spool, no Clustered Index seeks, no nested loops, no asserts. The IO statistics shows how significant the improvement really is:

Table 'ChannelSubscribedByUser'. Scan count 1, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 3, lob physical reads 0, lob read-ahead reads 0.

There's just one update and nothing else. Also the number of Logical Reads is 6 compared to 15.

So, you thought you knew how to write UPDATE statements? Think again.

Published Sat, Jan 20 2007 17:47 by omar
Filed under: