Pages

Monday, 25 March 2013

SQL Server : Part 8 : Explaining The Covering Index or Included Columns

In our earlier discussion about non clustered index ,we have seen that, the leaf level of a non clustered index contain only the non clustered index key column and clustered index key (if the table is a clustered table). To fetch the remaining column from the clustered index structure or heap structure, SQL server has to do a bookmark/key look up operation.Many time the bookmark or key look up operation might be costly affair. Let us see an example.

USE mydb
GO
DROP TABLE dbo.SalesOrderDetail                               
GO                               
SELECT * INTO dbo.SalesOrderDetail FROM AdventureWorks2008.Sales.SalesOrderDetail
GO
CREATE UNIQUE CLUSTERED INDEX ix_SalesOrderDetail ON dbo.SalesOrderDetail(SalesOrderDetailID)
GO
CREATE UNIQUE NONCLUSTERED INDEX ix_Productid ON dbo.SalesOrderDetail(ProductId,SalesOrderId)
GO 
SET STATISTICS IO ON
GO
SELECT SalesOrderDetailid,productid,salesorderid,orderqty,unitprice FROM SalesOrderDetail WHERE productid=707 AND 
SalesOrderID=43680

The execution plan and the out put of IO statistics of the select statement are given below.















In the execution plan, you can see that ,50 percent of the query cost is contributed by the Key Lookup operation.In the output of the IO statistics , it clearly says SQL server performed 5 IO operation to fetch the single record. 


Note that, the existing non clustered index have 229 pages and depth is 2( levels in the b tree structure).Seek operation on this index need to perform only 2 IO operation to complete the task.You can verify this using the  DBCC IND  command or refer the earlier post.


Let us assume that, this query(with different parameters ) is used very frequently from the application and you need to optimize it further.How we can do that ? The only way that we can optimize this query is by avoiding the Key lookup operation. For that we can modify our non clustered index and add the remaining two column (OrderQty and UnitPrice) which are not part of clustered index key or non clustered index key. 

DROP INDEX ix_Productid ON dbo.SalesOrderDetail
GO
CREATE UNIQUE NONCLUSTERED INDEX ix_Productid ON dbo.SalesOrderDetail(ProductId,SalesOrderId,OrderQty ,UnitPrice)
GO
SELECT SalesOrderDetailid,productid,salesorderid,orderqty,unitprice FROM SalesOrderDetail WHERE productid=707 AND SalesOrderID=43680














Now we were  able to get rid of the Key lookup operation from the execution plan and to reduce the IO from 5 to 3.But if we  look into the out put of DBCC IND of the modified non clustered index, we can see that , depth of the b tree is increased by one due to this change. As the index level is increased , the non clustered index has to to perform 3 IO to complete the operation. This will be worst, if we have more column in the select list and we added all those columns into the non clustered index key to avoid the key lookup operation.

Here comes the covering index to help us.Covering index help us to add non key column to leaf level of the non clustered index with very minimal possibility of increasing the depth of the b-tree structure. This can be achieved by adding include column in the CREATE INDEX statement.
An index that contains all information required to resolve the query is known as a Covering Index.When we create a nonclustered index to cover a query, we can include nonkey columns in the index definition to cover the columns in the query that are not used as primary search columns. Performance gains are achieved because the query optimizer can locate all the required column data within the index; the table or clustered index is not accessed.

DROP INDEX ix_Productid ON dbo.SalesOrderDetail
GO
CREATE UNIQUE NONCLUSTERED INDEX ix_Productid ON dbo.SalesOrderDetail(ProductId,SalesOrderId
include(OrderQty ,UnitPrice)
GO
SELECT SalesOrderDetailid,productid,salesorderid,orderqty,unitprice FROM SalesOrderDetail 
WHERE productid=707 AND SalesOrderID=43680













With this also, we were able to get rid of the key lookup operation and to reduce the IO operation to 2. The IO operation clearly says the the depth of the clustered index is two.
Let us see the output of the DBCC IND

SELECT index_id FROM sys.indexes WHERE name='ix_Productid' AND OBJECT_ID= OBJECT_ID('SalesOrderDetail')
GO
DBCC ind('mydb','SalesOrderDetail',2)

This returns 378 records and the root page is 7456 (Value of pagepid column of the record having max value for indexlevel column)


Let us see the root page and one leaf level page

DBCC traceon(3604)
GODBCC page ('mydb',1,7456,3)GODBCC page ('mydb',1,7328,3)


























From the output we can see that, columns mentioned in the include clause are added into the leaf level pages with out making any changes in the non leaf level pages.


Include column are useful because we can refer the column that has a data type which can not be used in the index key.More over include columns are not counted in the 900 bytes or 16 key column limitation of index keys.We can include with any data types except text,ntext and image.Included column also support the computed column.

If you liked this post, do like my page on FaceBook 

13 comments:

  1. Fantastic article ! You havemade some very astute statements and I appreciate the the effort you have put into your writing. Its clear that you know what you are writing about. I am excited to read more of your sites content.
    http://www.sqlservermasters.com/

    ReplyDelete
  2. This was a great post, you cleared up my questions in regards to the INCLUDE keyword. Very informative and detailed. Great blog, look forward to your posts!

    ReplyDelete
  3. Very informative post for the author, you help me a lot because i have a lot of problem of index. I will refer your blog to my friends to be a source of information about SQL Server thanks a lot.

    ReplyDelete
  4. Thanks for posting this article. This will come handy in my field of work. Thanks again, I cant wait to tell my friends about this blog.

    ReplyDelete
  5. Nice and clear. Thank you.

    ReplyDelete
  6. Best article I found online about included columns. keep up the good work.

    ReplyDelete
  7. Simple and quite understandable. Thank you.

    ReplyDelete
  8. I like this article, Firs time using the include clause , I did it basically as suggestion from the SQL Query Optimizer, But now I'm experiencing a higher occurrences of Index fragmentation, does using the include contribute to the Index fragmentation? should I look at something else to fix this issue? my query performance degrade form 2 minutes processing time to about 2 hours in a week, fragmentation in the index about 50%, this is a highly used table, current row count: 1,677,128

    ReplyDelete
  9. How often are you reorganizing/rebuilding your index?

    ReplyDelete
  10. I pile on the motion - this was a very well done article and you've just made all of us better!!

    ReplyDelete
  11. Awesome Explanation.. appreciates the time spent on clarifying in depth and sharing valuable knowledge.

    ReplyDelete



  12. I loved the way you discuss the topic great work thanks for the share, Let me share this, Hadoop training in pune

    ReplyDelete