Pages

Thursday 14 March 2013

SQL Server : Part 4 :Explaining the Non Clustered Index Structure

A table can have only one clustered index as the data rows are stored in the order of the clustered key, but a table can have multiple non clustered indexes.We have discussed about clustered index structure in our earlier post . In this post let us try to understand the non clustered index structure.

Logical Representation of Non Clustered index

In a simple word , a non clustered index is a subset of a table. When we define a non clustered index, SQL server store the set of non clustered key in a different pages.Let us consider a table with four columns (PersonId(PK),PersonType,FirstName,LastName) and a non clustered index on that table.The actual table is stored in the order of personid column (Cluster index key).In the below figure will give you a pictorial representation of non clustered index. The non clustered index key column are stored  apart from the actual table.If you look into the Non clustered index ,you can notice that, records are stored in the order of Firstname and lastname (Non Clustered index key) not as in the order of actual table.To understand easily , non clustered index can be considered as subset table of actual one. 

Fig 1

Now let us assume that we have a task to find out all records whose first name is 'Michael'.If you tried to find out from the Actual table , we need to go through each record from top to bottom to check whether first name matches with our search string 'Michael' as the records are not ordered based on the firstname column. It will be more tedious task if we have thousands of records in that table. The task will be much easier if we look into the the Non Clustered index side as the first name and last name are alphabetically ordered.We can easily locate the records which has firstname as 'Michael' and we do not need go beyond that as we are sure that there will not be any more records with firstname as 'Michael'.

Now we know Firstname and lastname of the record. How do we get the values for other two column ? Let us make a change in the Non clustered index part by associating the PersonId column along with the non clustered index.

Fig 2




















Now, once we locate the records , we can go back to the Actual Table using the PersonId (Cluster index key) to find the values of other columns and this operation is called bookmark lookups or RID lookup. 

Clustered index v/s Non clustered index

Non clustered indexes have the same B-tree structure as clustered indexes.The non clustered  index key will not make any change in the sort order of underlying table where as clustered index force the SQL server to store the underlying table in the order of cluster index key. The leaf level of clustered index are made up of data pages which contain the actual data of table where as the leaf level of non clustered index are made up of index pages.

Non clustered index can be defined on a heap table or clustered table.In the leaf level of nonclusterd index, each index row contain the nonclustered key value and a row locator.This locator point to a the data row in the clustered index or heap.The row locator in nonclusterd index rows are either a pointer to a row or a clustered index key for a row. If the table is a heap, which means it does not have a clustered index,the row locator is a pointer to the row.The pointer is built from the file identifier ,page number and slot number of the row on the page. The whole pointer is known as a Row ID(RID). If the table has a clustered index, the row locator is the clustered index key for the row.

Hand on

Let us create a unique non clustered index on the salesorderdetails table that we were working on the clustered index post. The table has a unique clustered index on salesorderdetailid column. Now we are defining a non clustered index using the below statement

CREATE UNIQUE INDEX Ix_ProductId ON SalesOrderDetail(ProductId,Salesorderid)

A pictorial representation of the b-tree structure is given below
Clustered Index Structure





















Now let us see how SQL server store the index. Let us see the output of DBCC IND command

DBCC IND('mydb','SalesOrderDetail',2)


The last parameter, 2 is the index id of the index Ix_ProductId. For the usage of DBCC IND refer this post.

The output returns me 229 records which include one IAM page and 228 index page.From this we can find the root page id by locating the record having highest value for  IndexLevel column. Remember that index level increase from leaf level to root. In this case, the root level page id is 4688 with index level 1. That means, the b-tree structure of this index has only root and leaf level  .There is no  intermediate level  in the b tree structure of this non clustered index. Let us see the page 4688.

DBCC traceon(3604)
GO
DBCC page('mydb',1,4688,3)

This returns the 228 records (228 leaf level index pages) .Part of the output is shown below.The output looks same as the clustered index root/intermediate page structure. The output can be read as , all the records where the combination of  productid,salesorderid is less than or equal to (707,51151) can be found in child page 2288. All the records where the combination of  productid,salesorderid is between (707,51151)  and (707,55920) can be found in the child page 2289 and so on.

Fig 3




















Let us see the page 2289

DBCC traceon(3604)
GO
DBCC page('mydb',1,2289,3)

This returns 539 records, all with product id 707.As this index use only two level, this is the leaf level of the b tree structure. Here you can notice that, there is no child page id but we have the value of the salesorderdetailid column (Clustered index key) which SQL server use do key or bookmark look up operation.

Fig 4

Let us see, how SQL server perform a select operation using this index. 

SET STATISTICS IO ON
GO
SELECT *  FROM SalesOrderDetail WHERE productid=707 AND SalesOrderid=51192

The execution plan will looks like below.You can see the Key lookup operation in the execution plan.


Fig 5















As the where condition exactly match with our non clustered index definition, SQL sever use that index to execute this query.As a first step SQL server reads the root page of b tree structure.The combination of our search criteria (707,51192) falls in to the second records (Refer Fig 3). So SQL server goes to the child page (Page id 2289).In that page, we can locate the record with the exact combination of (707,51192) with salesorderdetailid 37793. From here, SQL server do a key look up operation using the salesorderdetailid value. From the earlier post about clustered index, we know that , SQL server need to perform 3 I/O operation while searching on any clustered index key. So to complete this query, SQL server needs to do 5  I/O operation (2 in non clustered index and 3 for bookmark/key lookup operation in the clustered index)  and that is the same you can find in the output of IO statistics. 

To understand it better,let us consider non clustered index as subset table ( let us say table name Saleorderdetail_NC) of salesorderdetail table with columns productid,salesorderid and SalesorderDetailid and having combination ProductId and salesorderid  as clustered index.Then the result of the above query can  be obtained by combining the output of below two query.
SELECT *  FROM SalesOrderDetail_nc WHERE productid=707 AND SalesOrderid=51192
GO
SELECT *  FROM SalesOrderDetail WHERE SalesOrderDetailid=37793


As a last part let us see one more query

SELECT *  FROM SalesOrderDetail WHERE productid=707

This query returns 3083 records and the search condition is matching with the non clustered index first column. Still SQL server will not use the non clustered index to execute the query. Please find below the execution plan of this query.

Fig 6



The reason behind that, if it use the non clustered index, it has to do a key lookup operation for 3083 records . That itself requires  (3083X3)  9249 I/O operation . Instead of that, SQL sever can fetch the output by performing clustered index scan which need only 1501(Total number of pages required for clustered index b-tree structure) I/O operation.If we make slight changes in the query to select only Productid ,SalesOrderDetailid and SalesOrderId, SQL server use the non clustered index as it does not need to do the key lookup operation. The leaf level of the non clustered index have the values of these three column . Please find below the execution plan.

SELECT productid,salesorderdetailid,salesorderid  FROM SalesOrderDetail WHERE productid=707

FIg 7









I know this was a bit lengthy post, but still I feel it will give you the clear picture of the non clustered index structure.

If you liked this post, do like my page on FaceBook

8 comments:

  1. Nice Post.. Thanks

    ReplyDelete
  2. Wonderful explanation..

    ReplyDelete
  3. Such a cogent explanation. Thanks a ton :)

    ReplyDelete
  4. Good explanation:


    we can also visit link for clustered and nonclustered index architecture
    http://www.youtube.com/watch?v=95fnvJ2T9Vw

    ReplyDelete
  5. good learning. Just come across now on this site read 2-3 article regarding data storage on page. Nice thoroughly explanation on it. Thanks!

    ReplyDelete
  6. So u are saying that clustered indexes are more difficult to retrieve file cause there are clustered as compared to non clustered indexes where files are easy organised alphabetically

    ReplyDelete