Avoiding access to base tables using Foreign Key, and two suggestions...
Just a note before going further: I am going to maintain my blog in Spanish and English:
Frequently, as a trainer and consultant I have to face in our clients the benefits that customers can get implementing DRI in their systems. I usually get different arguments such us:
- If I don't define Foreign Keys, it is easier to perform bulk operations.
- I prefer to validate the integrity in the Data tier instead of the database.
- All the access to the base table is done using a set of stored procedures that I manage.
- We prefer not implementing DRI because our data model changes very frequently, and it is complex to change the database logical design.
I could count many more cases apart of those four; from my experiencie, I think that generally the root cause is related to their ALM processes, and their best practices. Almost always, you can come up with a safer and better design solution; in fact, during the classes and the mentoring sessions, you convice them to go for the change, but days after, they come back to their daily work, and they do not have time to GO for the change.
Here is a good post from Conor Cunningham (http://blogs.msdn.com/conor_cunningham_msft/archive/2008/08/07/foreign-keys-are-our-friends.aspx) showing a hidden benefits of using DRI that you may not know: Using DRI SQL Server Optimizer, may dedice not to access base tables because SQL Server already knows -- due to the DRI -- that if a row exist child table, it must exists one in the parent table.
However, I would like to make some suggestions in this matter, that you may see useful.
Using a very similar script to Conors':
| use AdventureWorks2008; go drop table f1 drop table t1 go create table t1(id int, v char(1) default 'a' , constraint t1_pk primary key (id)) go
create table f1(f_id int identity, t_id int not null, v char(1) default 'a' , constraint f1_pk primary key (f_id) , constraint fk1 FOREIGN KEY (t_id) REFERENCES t1(id)) go create nonclustered index nci_f1_t_id on f1 (t_id) declare @i int=0 set nocount on while @i < 2000 begin insert into t1 (id) values (@i) insert into f1 (t_id) values (@i) insert into f1 (t_id) values (@i) insert into f1 (t_id) values (@i) set @i+= 1 end |
Conor pointed in his post, those two queries does not need to access the base table -- as you can see in the execution plan:
| /* |--Index Scan(OBJECT:([AdventureWorks2008].[dbo].[f1].[nci_f1_t_id])) */ select f1.* from t1 inner join f1 on t1.id = f1.t_id /* |--Index Scan(OBJECT:([AdventureWorks2008].[dbo].[f1].[nci_f1_t_id])) */ select f1.* from f1 where exists (select * from t1 where t1.id = f1.t_id) |
And I'd like to suggest to the SQL Server Team considering those two cases as well:
| /* |--Nested Loops(Inner Join) |--Clustered Index Seek(OBJECT:([AW].[dbo].[t1].[t1_pk]), SEEK:([AW].[dbo].[t1].[id]=(1)) ORDERED FORWARD) |--Nested Loops(Inner Join, OUTER REFERENCES:([AW].[dbo].[f1].[f_id])) |--Index Seek(OBJECT:([AW].[dbo].[f1].[nci_f1_t_id]), SEEK:([AW].[dbo].[f1].[t_id]=(1)) ORDERED FORWARD) |--Clustered Index Seek(OBJECT:([AW].[dbo].[f1].[f1_pk]), SEEK:([AW].[dbo].[f1].[f_id]=[AW].[dbo].[f1].[f_id]) LOOKUP ORDERED FORWARD) */ select f1.* from t1 inner join f1 on t1.id = f1.t_id where f1.t_id = 1
/* |--Merge Join(Inner Join, MERGE:([AW].[dbo].[t1].[id])=([AW].[dbo].[f1].[t_id]), RESIDUAL:([AW].[dbo].[f1].[t_id]=[AW].[dbo].[t1].[id])) |--Clustered Index Scan(OBJECT:([AW].[dbo].[t1].[t1_pk]), ORDERED FORWARD) |--Index Scan(OBJECT:([AW].[dbo].[f1].[nci_f1_t_id]), ORDERED FORWARD) */ select t1.id, f1.* from t1 inner join f1 on t1.id = f1.t_id |
The first query, implements a typical OLTP query looking for all the detail rows for a given parent. In this example, SQL Server already knows that for each row in the child table (f1), at least exists one row in the header (t1). The only difference with the first query from Connor's post is the predicate, that I think fires the behaviour. In my humble opinion, the filter is highly selective, and then sets the t1 table in best position to be seek"ed".
The second query, seems to be a bit awkward, because SQL Server already knows that t1.id = f1.t_id. Why accessing t1 table, if the matching values are already in f1.t_id?
If you think that those last "logical suggestions" to avoid accesing parent tables should be considered by SQL Server, please, go to Conect and cast your vote (https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=363429).