Doing the maths to understand SQL optimiser choices
The second of my tips from my talk (slides and scripts available) was about doing the maths to understand why the SQL Server optimiser behaves in the way that it does - particularly in relation to choosing whether to scan an index/heap, or whether to seek a less-suitable index for some of the data and then do a lookup to fetch the rest. This applies in SQL Server 2005 just fine, by the way, as do all the scripts from my talk.
The basic gist of it is that the system will decide to use a plan which minimises the number of reads required to get all the data. So My example scripts compare a scenario of looking up employees in AdventureWorks.HumanResources.Employee by managerid. There's a Non-Clustered Index on ManagerID (which naturally then includes EmployeeID, which is the Clustered Index). But if we also want the LoginID field, then our NCIX isn't quite so handy.
The choice of how to implement this comes down to basic mathematics. First, a couple of useful facts:
- The whole Employee table (CIX in the physical representation) is on 9 pages.
- Seeks on small indexes typically take 2 reads - one to find out which page the row is on, and the other to get that page.
- Lookups are essentially the same kind of thing as seeks.
So let's consider the two options here - scanning the table v seeking the index and looking up the rest.
Scanning a table is always going to be 9 reads. Easy.
Seeking the table is going to take a couple of reads (but maybe three even) to find the EmployeeIDs for the people that report to the manager of interest. But then it's going to require another two reads for every employee in this list, to be able to get the LoginID field out of the CIX. So it's going to be 2+2n reads (where n is the number of employees).
So if our manager has 2 reports, we get 2+2x2=6 reads using the seek, and the system will use this method.
But if our manager has 4 reports, we get 2+2x4=10 reads using the seek, which would make it better to use just scan the CIX instead (only 9 reads).
Naturally, if we change our index to include the LoginID, the query then takes 2 reads, whether the manager has 2 or 4 direct reports. It could take more if those records are stored over multiple pages.
Understanding this simple bit of maths can really impact your indexing strategy, and your appreciation of the way that T-SQL works.