A CASE study in SARGability
A recent discussion on Twitter about a query that Denny Cherry was looking at led to this post by Josef Richberg: http://josef-richberg.squarespace.com/journal/2010/1/28/is-a-case-statement-considered-sargable.html, and I thought it might be worth going through a few points on the topic of SARGability. Particularly given that I wrote a related post recently myself.
If something is SARGable, it means it can be used as a Search Argument – that is, it can be used effectively in conjunction with an index.
To repeat a common analogy, trying to find “Farley” in the phonebook is a Seek (I can pretty much go straight there), whereas trying to find names which end in “arley”, is not. Those names could be just about anywhere.
The way that an Index Seek works is that the system can go to a record and tell whether the row(s) it's looking for is to the left of the current value or to the right. Looking for "F" is easy. I open the book at "M" and know that I must go left. I open the book at C and know that I must go right. It's SARGable.
As Josef writes, applying a function to a field stops it being SARGable. If we wanted to say WHERE RIGHT(Surname,5) = 'arley' we would quickly see a performance problem, as finding these records would mean starting the Aardvarks (in case there was an ‘Aardvarkarley’), and end with the ZZzzzs (in case there was a ‘ZZzzarley’). On the other hand, looking for WHERE Surname LIKE 'Farl%' is quick, because we see that evaluating surnames using a wildcard that doesn’t appear at the start doesn’t affect the SARGability.
At first glance, an ON clause such as:
ON g.GroupId = CASE
WHEN c.GroupId <> l.GroupID
AND l.GroupID <> @AccGroupID
…would seem to be nasty for SARGability. We’re not just looking for a row where GroupId = AParticularyGroupID, we’re looking for something that is the result of a CASE function, which is going to change for every combination of rows in the tables ‘c’ and ‘l’.
But actually, it’s a lot more complicated than that, and yet in some ways, a lot more simple.
Let’s think about the term SARGable again. It means that we’re able to use it as a Search Argument – but what is the ‘it’ we’re looking at? What is the thing that is SARGable? (This next bit is the key – so I’ll write it again in a moment, in bold) The thing that is SARGable is the term that is indexed – in other words, g.GroupId. It doesn’t matter how much work is required to work out that value – it’s whether or not you can find it in the index.
SARGability is about the thing that is (hopefully) indexed in the table (or set) that you’re introducing into the query.
In the phonebook, RIGHT(Surname,5) isn’t in the index. No index on Surname is going to help match RIGHT(Surname,5) (unless you want to index the result of that function). But if you wanted to say WHERE Surname = REVERSE('yelraF') then there’s no problem. Despite how nasty REVERSE is, it doesn’t affect what we’re looking up in the book. We can evaluate REVERSE, and then (using the index) find the Surname that matches that. Surname = something is SARGable.
So the CASE clause above is largely irrelevant to finding something by GroupId. The result of the CASE can be easily worked out, and then a seek done on an index on the ‘g’ table.
That was the simple aspect of it.
Consider that you have a diary, and you want to mark a week before some birthdays, so that you can go shopping. Suppose you have Jul 11, Feb 6, Dec 10. You take one of the birthdays, work out the date that is a week before it, then do an Index Seek to find that date in the diary. Then you repeat it twice more. This query might be along the lines of:
FROM people p JOIN diary d ON d.diarydate = dateadd(week, -1, p.birthday)
You'll notice that we happily did an Index Seek on the diary. The dateadd function on p.birthday had no effect on us. The SARGability applies on whichever table/set we're thinking about at the time, which in this scenario is the diary.
But consider that we wrote the query like this, which essentially means the same, looking for birthdays and diary dates that are a week apart, but applying the function to d.diarydate instead:
FROM people p JOIN diary d ON p.birthday = dateadd(week, 1, d.diarydate)
Logically, this means the same. But this is much more suited to finding the birthday once we know the diarydate, rather than the other way around. This might be fine if we had a thousands of people to consider, and we wanted to look up a mere 365 diary dates in our list of people (indexed by birthday)... but we'd still be looking 365 dates. The function still kills the SARGability of the column, it's a question of which column we want to be considered indexable.
Repeating the matching process over and over like this is known as a Nested Loop. As quick as it might be to find the record because of SARGability, we'd still be doing it 365 times. Things would be much better if we could index both sides in this scenario. Then, we might be able to utilise a different kind of join, such as a Merge Join, which involves running through two ordered sets, comparing the values to find matches in a single pass. But for this to apply, both sides must be SARGable. Consider two tables with numbers in them... one with 1, 1, 5, 10, 10, 200, and one with 3, 4, 5, 6, 7, 8, 8, 10. To do a Merge Join between them, the system will see that the first table starts with 1, while the second one starts with 3. It will skip through the 1s in the first table to the 5, and then start skipping through the second table to find matches. It's very fast, but it needs to be able to know whether the value it's looking for is to the left or right of where it's up to. This would involve an Index Scan on both sides, but it would be quicker than doing lots of Index Seeks. A single seek is faster than a single scan, but something quick done over and over can take longer than using a slow method one time.
Unfortunately, the Query Optimizer currently isn't smart enough to know that dateadd(week, 1, d.diarydate) doesn't change the order of dates in the diary, and that it can easily tell whether it should go left or right. That's the topic of my earlier blog post, and a Connect item to ask that Microsoft fix this.
So should you care about SARGability? After all, if the system is doing an Index Seek anyway, you've got SARGability in play, and it shouldn't matter.
You should try to arm the Query Optimizer with as many options as possible, so that it can use the best plan available, based on the statistics. We might be happy with looking up three diary dates in a Nested Loop, but we wouldn't be happy doing that thousands of times. Think about how YOU would solve the problem without a computer (like my diary analogy, or the phonebook). If your execution plan is similar to your paper-based solution, then you've done well. If it's not, then maybe you should look into it some more and work out if the query is okay or not.
In Josef's situation, the Index Seek was being done 1155 times. That might be okay, but it also might not be. But should the query be rewritten? That's a different question, that I couldn't really answer without knowing more. The most important thing about a query is not its performance, but its correctness. If you need to kill the SARGability to maintain correctness, then so be it. You can always handle it with an indexed computed column if you need to, or even an indexed view.