Tuesday 16 April 2013

SQL Server : Index at a glance

In the last ten posts, we have discussed in details about the SQL server index and its storage structure. In this post I will consolidate all the ten post for easy reference.

What is index ?
In the first post, we have discussed about index in general. We tried to understand the clustered  index non clustered index and key lookup  with a sample from real life. Read More

What is heap table?
A table with out clustered index is called heap table. Data in heap table are not stored in any specific order.To understand more about the heap structure Read More

B tree structure of Clustered index

A table with clustered index is called Clustered table.Clustered index stores the actual data in the order of clustering key using a b tree structure.A table can have only one clustered index. To understand more about clustered index table storage and access pattern, read this article 

B tree structure of a non clustered index on clustered table
We can have many non clustered index on a table.To understand about the non clustered index storage structure and the way SQL server use the non clustered index, read here

B tree structure of a non clustered index on heap  table
How non clustered index works on a heap table,Read More

Design Consideration of clustered index key
What you should consider while deciding clustered index key. What will happen if you defined a clustered index on a non unique column ? Read the article Design Consideration of Clustered index.

Overhead on non clustered index on a non unique column 
You ever thought about difference in storage structure non clustered index while defining on non unique column. This post will give you more insight.

How to get rid of key lookup operation ? 
Book mark look up or Key Look up operation is very common in execution plan. How to get rid of that . Learn about covering index and included column in this post

Filtered index with its limitation 
How to define index on part of the table data ? How to define complex unique constraint on a table. How to improve performance of a query which work on small chunk of table data. Filtered index is the new way to solve the issue. Read more

Real time sample of the importance of the index key column order
The column order in the index is very important. It define the way data stored.The data access patter will play a major role in deciding the key column order. Shared some thoughts on this article 

If you liked this post, do like my page on FaceBook

SQL Server : Part 10: Importance of Key Column Position While Creating Index

Now we have discussed about different types of indexes in the last posts. In this post, let us discuss about the key column order (order of the column of indexes). The order of the key column of the index is decided based on the data access pattern and how do you want to organize the data.

The general guidelines for the order of the index key column is to keep the most selective column as the first column.It does not  meant that, a unique id column should be first column of all your index. The optimizer will decide to use the index based on the statistics available on the index. I will explain about the statistics in later post. Statistics gives the information about the density of the key column which give uniqueness of index, and histogram that stores the information about the distribution of the values within the column.

Let us consider an example of customer table which stores the information of customers from across the countries.The application running on top of this table deal with customers from a specific country based on the user permission/access location. 

CREATE customer (
   Customer_id     INT IDENTITY(1,1) NOT NULL
CountryCode     CHAR(3) NOT NULL,
FirstName       VARCHAR(100) NOT NULL,
LastName        VARCHAR(100) NOT NULL,
Phone     VARCHAR(20),
Email           VARCHAR(100)

CREATE UNIQUE CLUSTERED INDEX Ix_Customerid_Countrycode ON customer(Customer_id,Countrycode)

The clustered index is created based on the general guideline to keep the most selective column on the left side.If I need fetch a single records based on the customer_id, this index will work perfectly.So what is the drawback of this index ? In case if I need to fetch all/many customers based on the countrycode , the optimizer opt for clustered index scan.

SELECT * FROM customer WHERE Countrycode='VNH' AND customer_id=1216468

Let us try to fetch all customers with countrycode VNH. The table has around 620 thousand records and there are 3066 customers with VNH countrycode

SELECT FROM customer WHERE Countrycode='VNH' 

From the execution plan, it is clear that, optimizer has opted for clustered index scan  by scanning all 6825 pages used to store this table. We can optimize this by changing the index with countrycode as the first column.

DROP INDEX customer.Ix_CustomerId_CountryCode 
CREATE UNIQUE CLUSTERED INDEX Ix_CountryCode_CustomerId ON customer(Countrycode,Customer_id)

SELECT FROM customer WHERE Countrycode='VNH' AND customer_id=1216468

Let us try to fetch all customers with countrycode VNH. 

SELECT FROM customer WHERE Countrycode='VNH' 

From the execution plan, it is clear that , the optimizer used index seek in both the cases and IO operation has reduced drastically in the scenario while fetching all the customers with VNH countrycode.

Other problem will be, while keeping the customer_id as the first column , data will be stored in the order of customer_id column and you will have many pages (almost all pages) will have data belongs to multiple countrycode. This may lead to more blocking/deadlock issues.By defining the index with countrycode as the first column, only a few pages will have data overlapped with multiple countrycode and will help to reduce the blocking issues.The important point is, by defining the index with countrycode as the first column will cause for higher level of index fragmentation, but that can be controlled by defining proper fill factor value. I have experienced this in one of our project and experienced lot of improvement after changing the indexes with countrycode as the first column.

In short the general guidelines about the key column is a good starting point but at the same time you have to consider the data access pattern in your application.Hope this will help you to resolve some issue that you are facing.

If you liked this post, do like my page on FaceBook

Monday 8 April 2013

SQL Server : Part 9 : Filtered Index : A new way for Performance Improvemnt

Filtered index is a new feature introduced in SQL Server 2008. All the indexes that we have discussed till now were always on the entire table. In other words, Index and table will have same number of records.With filtered index, it is possible to create an index for a subset of the table.This is achieved by adding a where clause in the index creation statement.This helps to reduce the size of index in terms storage as well as depth of the index.The where condition specified in the index creation statement determine the existence of a record in the index.

This will be a great performance on larger table where a good chunk of queries work on smaller portion of the table.A regular index will be on entire table ignoring the fact that most of the queries are interested in a smaller portion of the index.This makes the index deeper and need more pages to store the index b tree structure which results in more IO. Where as filtered index will be created only for a portion of the table and hence less pages are required to store the index.

Let us consider an example of salesorder table which contain data for last five years. Major part of active queries on this table are based on the last calender year and the current calender year.A simple example of filtered index will be like this.

ON salesorder(SalesOrderId,OrderDate,Status,Customer_id,TotalDue
WHERE OrderDate>'2012-01-01'

Unique column with multiple null values: One of the significant use of filtered index is to define complex unique constraint on a column. The unique constraint will not allow you to have multiple null values in a unique column. How do you enforce uniqueness on column except for NULL values ? For example , Productcode column  in the product table can be null but should be unique if the value is defined on that column. Let us see the below example.


        Productid            INT NOT NULL PRIMARY KEY,
        ProductCode     CHAR(10) ,
       ProductName     VARCHAR(100)

CREATE UNIQUE INDEX ix_Unique_Filtered ON Product(Productcode) WHERE productcode IS NOT NULL
INSERT INTO Product VALUES(1,'AR-5381','Adjustable Race')
INSERT INTO Product VALUES(2,NULL,'Bearing Ball')
INSERT INTO Product VALUES(3,NULL,'BB Ball Bearing')
INSERT INTO Product VALUES(4,'AR-5381','Adjustable Race-Small')

We are able to insert multiple record with NULL value in the productcode column, but when we are trying to insert a record with duplicate value 'AR-5381', it is not allowing to insert duplicate value into the Productcode column. The unique filtered index defined on this table help us to enforce this kind of uniqueness.

Read operation: The above said is one of the common usage of filtered index.Another usage of filtered index is to support the queries. Let us see a sample below.

USE mydb
SELECT * INTO SalesOrderheader FROM AdventureWorks2008.Sales.SalesOrderheader
--Unique Clustered index
CREATE UNIQUE CLUSTERED INDEX ix_SalesOrderheader ON SalesOrderheader(SalesOrderid)
--Filtered Inde
CREATE INDEX ix_filtered_index ON SalesOrderheader(orderdate) WHERE orderdate>'2008-01-01'

SELECT orderdate,SalesOrderid FROM SalesOrderheader WHERE 
SELECT orderdate,SalesOrderid FROM SalesOrderheader WHERE 
SELECT orderdate,SalesOrderid FROM SalesOrderheader WHERE 

An index is defined on the column orderdate for the records which are ordered after 2008-01-01. There are three select query with different condition on the orderdate. The first one is trying to fetch a range of records which comes under the filtered criteria (sub set of filtered result). Second query try to select records which ordered on specific date which comes under the filter criteria.The third one also try to fetch the orders placed on a specific date, but that date is not come under the filtered range.

The first two queries are made use of the filtered index and the third one gone for clustered index scan. This is because the third query can not be served from filtered index as the predicate of the query is not comes under the filter criteria used in the index creation statement.In short , the records needed to perform the third select operation is not available in the filtered index.The filtered index gives lots of performance improvement for this type of queries especially when the number of records filtered out are much more than the number of records satisfying the filtering criteria.

It is not mandatory that filtering column should be part of the index but in that case predicate  of the select statement should exactly match withe filtered index predicate. Let us create an index on orderdate by filtering the records with territoryid<5

CREATE INDEX ix_TerritoryID_Filter ON SalesOrderheader (OrderDate) WHERE TerritoryID<=5
SELECT salesorderid,orderdate FROM SalesOrderheader WHERE 
SELECT salesorderid,orderdate FROM SalesOrderheader WHERE 

Let us see the execution plan of the two select statement.

The predicate of the first statement is exactly matching with the where condition of the filtered index creation statement. The second statement, is actually a subset of filtered index, but as the columns territoryid is not part of the index , SQL server will not be able to filter the records and hence execution is performed by clustered index scan.

Limitation of filtered index   The concept of filtered index is  very attractive but there are some limitation in the usage of filtered index , especially when it comes to parameterization.Let us rewrite the query 

SELECT orderdate,SalesOrderid FROM SalesOrderheader WHERE orderdate>'2008-05-01'

DECLARE @Orderdate date='2008-05-01'
SELECT orderdate,SalesOrderid FROM SalesOrderheader WHERE orderdate>@Orderdate

The execution plan is not utilizing the filtered index when we have rewritten the query with local parameter. The reason behind that is , at the compile time , the query optimizer does not know what value will be passed for the parameter @OrderDate. So optimizer  has to generate a safe plan to satisfy all the criteria. This is the same issue when we change the database property to forced parameterization  or defined the statement  as procedure. 

ALTER DATABASE mydb SET parameterization forced
SELECT orderdate,SalesOrderid FROM SalesOrderheader WHERE orderdate>'2008-05-01'

CREATE PROCEDURE GetSalesorder (@OrderDate date
SELECT orderdate,SalesOrderid FROM SalesOrderheader WHERE orderdate=@OrderDate END 

In both these cases optimizer will not use the filtered index as it does not know what value will be passed during the run time and it try to generate a safe plan with out considering the availability of filtered index.

While we changing the database property parameterization to forced, the optimizer will replace all the static predicate with local variable. For example , the statement 

SELECT orderdate,SalesOrderid FROM SalesOrderheader WHERE orderdate>'2008-05-01'

will be treated as below by the optimizer

DECLARE @Orderdate date='2008-05-01'

SELECT orderdate,SalesOrderid FROM SalesOrderheader WHERE orderdate>@Orderdate

The option to force SQL server to use the filtered index is to make  dynamic statement  as given below.

DECLARE @Orderdate date='2008-05-01'
SET @SQL=N'SELECT orderdate,SalesOrderid FROM SalesOrderheader WHERE 
orderdate>'''+CAST(@Orderdate AS CHAR(10))+''''

The execution plan is given below 

From the execution plan it is clear that ,it is making use of the filtered index.

If you liked this post, do like my page on FaceBook