Solutions with Practical SQL DBA : Index Fragmentation

Showing posts with label Index Fragmentation. Show all posts

Tuesday, 16 April 2013

SQL Server : Part 10: Importance of Key Column Position While Creating Index

Now we have discussed about different types of indexes in the last posts. In this post, let us discuss about the key column order (order of the column of indexes). The order of the key column of the index is decided based on the data access pattern and how do you want to organize the data.

The general guidelines for the order of the index key column is to keep the most selective column as the first column.It does not meant that, a unique id column should be first column of all your index. The optimizer will decide to use the index based on the statistics available on the index. I will explain about the statistics in later post. Statistics gives the information about the density of the key column which give uniqueness of index, and histogram that stores the information about the distribution of the values within the column.

Let us consider an example of customer table which stores the information of customers from across the countries.The application running on top of this table deal with customers from a specific country based on the user permission/access location.

CREATE customer (

   Customer_id     INT IDENTITY(1,1) NOT NULL
   CountryCode     CHAR(3) NOT NULL,
   FirstName       VARCHAR(100) NOT NULL,
   LastName        VARCHAR(100) NOT NULL,
   MobilePhone     VARCHAR(20),
   Email           VARCHAR(100)
   
)

GO

CREATE UNIQUE CLUSTERED INDEX Ix_Customerid_Countrycode ON customer(Customer_id,Countrycode)

The clustered index is created based on the general guideline to keep the most selective column on the left side.If I need fetch a single records based on the customer_id, this index will work perfectly.So what is the drawback of this index ? In case if I need to fetch all/many customers based on the countrycode , the optimizer opt for clustered index scan.

SET STATISTICS IO ON
go

SELECT * FROM customer WHERE Countrycode='VNH' AND customer_id=1216468

Let us try to fetch all customers with countrycode VNH. The table has around 620 thousand records and there are 3066 customers with VNH countrycode

SELECT * FROM customer WHERE Countrycode='VNH'

From the execution plan, it is clear that, optimizer has opted for clustered index scan by scanning all 6825 pages used to store this table. We can optimize this by changing the index with countrycode as the first column.

DROP INDEX customer.Ix_CustomerId_CountryCode
GO
CREATE UNIQUE CLUSTERED INDEX Ix_CountryCode_CustomerId ON customer(Countrycode,Customer_id)

SET STATISTICS IO ON
go

SELECT * FROM customer WHERE Countrycode='VNH' AND customer_id=1216468

Let us try to fetch all customers with countrycode VNH.






SELECT * FROM customer WHERE Countrycode='VNH'

From the execution plan, it is clear that , the optimizer used index seek in both the cases and IO operation has reduced drastically in the scenario while fetching all the customers with VNH countrycode.

Other problem will be, while keeping the customer_id as the first column , data will be stored in the order of customer_id column and you will have many pages (almost all pages) will have data belongs to multiple countrycode. This may lead to more blocking/deadlock issues.By defining the index with countrycode as the first column, only a few pages will have data overlapped with multiple countrycode and will help to reduce the blocking issues.The important point is, by defining the index with countrycode as the first column will cause for higher level of index fragmentation, but that can be controlled by defining proper fill factor value. I have experienced this in one of our project and experienced lot of improvement after changing the indexes with countrycode as the first column.

In short the general guidelines about the key column is a good starting point but at the same time you have to consider the data access pattern in your application.Hope this will help you to resolve some issue that you are facing.

If you liked this post, do like my page on FaceBook

Thursday, 21 March 2013

SQL Server : Part 6 : Design consideration of Clustered Index

In our earlier post, we have discussed about the structure and storage of clustered index. In this post we will discuss about the design consideration of clustered index.There are couple of points that need to be considered while selecting clustered index key.There is no hard rule in selecting the clustering key . It is best practices and guidelines and the internal knowledge will help us to select right clustering key.

Uniqueness

SQL server will allow you to create a clustered index on non unique column, but uniqueness is one of the most desirable attribute for any indexes especially for a clustered index.Even if SQL server allow to create clustered index on a non unique column , internally SQL server add 4 bytes value for all duplicate instance of clustered key and this 4 bytes variable length column is known as uniquiifier .In this case SQL server consider the clustered key as the combination of non unique column on which clustered index is defined and internally generated uniquifier column.This value will be stored where ever the clustered key to be stored. For example in the leaf level of a non clustered index defined on a clustered table.

Let us see an example.I am going to create a copy of SalesOrderDetail table and define clustered index on the productid column which has duplicate values.

Use MyDb
GO
SELECT * INTO dbo.SalesOrderDetailDupCI FROM AdventureWorks2008.Sales.SalesOrderDetail
GO
CREATE CLUSTERED INDEX ix_SalesOrderDetailDupCI ON dbo.SalesOrderDetailDupCI(ProductId)

Now let us run the DBCC IND command

DBCC IND('mydb','SalesOrderDetailDupCI',1)

This returns me 1608 pages and the root page number is 3570(The value of PagePID column of the record having highest value for indexlevel column ) . Below is the output of DBCC Page command for the root page and one intermediate page

In the output, we can see one additional column called UNIQUIFIER which we did not seen when we created a unique clustered index in earlier post. When a clustered index created on a non unique column , SQL server add 4 bytes random value for all duplicate occurrence of clustered key. It will not generate the uniquifier value for the first occurrence. That is the reason there are 0 for uniquifier column for some records. So clustered key defined on the non unique column has an overhead of generating the uniquifier value and also widen the cluster index key.In our example we defined clustered index on a 4 bytes column but actual clustered key size become 8 bytes due to uniquifier.This combination has to replicated to leaf level of all non clustered index .The magnitude of the issue will increase when there is non clustered index defined on non unique column where this clustered index value need to be stored in the non leaf level pages also.(We will discuss about non clustered index on non unique column in later post).

If your table does not have a single unique key to define clustered index, try to make it unique by adding one or two narrow column to the clustering key. That will avoid the overhead of adding the uniquifier and reduce the bookmark look up operation as the non leaf level of non clustered index has more columns (The additional column added to clustered key to make it unique)

Static
Another desired property of the clustered index key is to be a Static. when we define clustered index on a non static column , that makes the update statement(updating the clustered index key) more costly as it has to move the record into different page to make sure the data is stored in the logical order of clustered index and leaf level of all non clustered index need to be updated. Let us see a sample

set statistics io on
GO

--Unique clustered index on SalesOrderDetailIdUPDATE SalesOrderDetail_StaticIndex SET ProductID =99 WHERE ProductID=707
GO--Nonunique clustered index on ProductIdUPDATE SalesOrderDetail SET ProductID =99 WHERE ProductID=707

In the below output you can see a huge change in IO for the second update statement.

Even in small table (Customer_id,Firstname.lastname) on which clustered index defined on a non static column (Lastname) and which has one non clustered index, any change in the clustered index key need to make changes in minimum of two pages.One data page and one leaf level page of non clustered index.

Size of the clustered index key

Size of the clustered index refers to the number of bytes requires to store the clustered index key.As the size of the clustered index increases, more IO required to fetch the records.This is happening because an index page can store lesser index row if the clustered index is wider. That increase number of pages in the intermediate level and depth of the indexes (Levels in the b tree structure) .For example, a table which contain millions of records might need only 3 level in b tree structure if the clustered index in defined on integer column. If we defined clustered index on wider column (say a uniqueidentifier column which need 16 bytes), the depth of index might increase to 4 (Level of index). Any clustered index seek requires 4 IO operation compared to 3( if the clustered index is narrow).

This issue propagate to non clustered index also as the clustered index keys are stored in the leaf level of all non clustered index as a pointer to the clustered index. If the non clustered index is defined on the non unique column , the clustered key need to be stored in non leaf level pages of the non clustered index . Which again might cause for more page in intermediate level and to increase the depth of the in non clustered index which in turn increase the IO operation in non clustered index seek/scan also. As the clustered index depth is increased to 4, each bookmark look up operation also need to do 4 IO operation.

Sequential

It is a best practice to define clustered index on a ever increasing(sequential) column. That is the reason we can commonly seen clustered index defined on identity column. Clustered index defined on a non sequential column cause for fragmentation . To read more on fragmentation refer this posts . A non sequential clustered index key force the SQL server to insert record in between to maintain the logical ordering of the data.This lead to page split which cause for external fragmentation and internal fragmentation.

Conclusion

We have discussed about the desired qualities of the clustered index key and the reason behind that.The above points that we discussed are general best practices to be considered while deciding the clustered index key.Apart from this, data access pattern also influence in deciding the clustered index key.If we do not have complete understand of the data access pattern , we might need to test the performance with different strategies.

If you liked this post, do like my page on FaceBook

Thursday, 19 April 2012

SQL Server: Measuring The Index Fragmentation

In my previous post , I have explained about the different types of fragmentation and how it will affect the query performance. In this post we will discuss about measuring the index fragmentation.

Measuring Internal Fragmentation

Internal fragmentation is all about the fullness of the page and it is measured using the sys.dm_db_index_physical_stats function with DETAILED mode. The avg_page_space_used_in_percent column in the output gives the internal fragmentation of the index. Below query list all the indexes which have more than 10 pages and page fullness is less than 85 percent.


EXEC sp_configure 'show advanced options',1
GO

RECONFIGURE WITH OVERRIDE
GO

DECLARE @DefaultFillFactor INT 
DECLARE @Fillfactor TABLE(Name VARCHAR(100),Minimum INT ,Maximum INT,config_value INT ,run_value INT)

INSERT INTO @Fillfactor EXEC sp_configure 'fill factor (%)'
SELECT @DefaultFillFactor = CASE WHEN run_value=0 THEN 100 ELSE run_value END FROM @Fillfactor

SELECT
 DB_NAME() AS DBname,
 QUOTENAME(s.name) AS CchemaName,
 QUOTENAME(o.name) AS TableName,
 i.name AS IndexName,
 stats.Index_type_desc AS IndexType,
 stats.page_count AS [PageCount],
 stats.partition_number AS PartitionNumber,
 CASE WHEN i.fill_factor>0 THEN i.fill_factor ELSE @DefaultFillFactor END  AS [Fill Factor],
  stats.avg_page_space_used_in_percent,
  CASE WHEN stats.index_level =0 THEN 'Leaf Level' ELSE 'Nonleaf Level' END AS IndexLevel
 FROM 
        sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL , NULL, 'DETAILED') AS stats,
        sys.objects AS o,
        sys.schemas AS s,
        sys.indexes AS iWHERE 
        o.OBJECT_ID = stats.OBJECT_ID AND s.schema_id = o.schema_id       AND     i.OBJECT_ID = stats.OBJECT_ID AND i.index_id = stats.index_idAND     stats.avg_page_space_used_in_percent<= 85 AND stats.page_count >= 10 AND        stats.index_id > 0 ORDER BY  stats.avg_page_space_used_in_percent ASC,
stats.page_count DESC

I have used the where condition to fetch only the indexes which have more than 10 pages and page fullness is less than 85 percentage. This is based on my environment and some best practices documentations.Low values for avg_page_space_used_in_percent and higher value for PageCount together will affect the performance of the system. The value of avg_page_space_used_in_percent will be low due to various reasons

Due to Page split and deleting records: In this scenario we have to REBUILD or REORGANIZE the indexes. If the fragmentation is reported in the non leaf level , REBUILD is required to reduce the fragmentation.
Due to fill factor setting : A wrong setting of fill factor value of the index might cause the internal fragmentation.If the internal fragmentation is due the fill factor setting, we have to REBUILD the index with new fill factor value.
Due to record size : Some time size of the record might account for internal fragmentation. For example let us assume that size of one record is 3000 bytes and page can hold only two record. The third record can not be fitted into a page as the remaining free space in the page is less than 3000 bytes. In this scenario each page will have empty space of 2060 bytes. To get rid of the fragmentation due to the size of the record , we might need to redesign the table or has to do a vertical partitioning of the table.

Measuring External Fragmentation

External fragmentation also measured using the sys.dm_db_index_physical_stats function with LIMITED mode ,but we will be using the avg_fragmentation_in_percent from the result to measure the external fragmentation. With LIMITED mode it will give the fragmentation of the leaf level. To get the fragmentation of non leaf level, it should be executed with DETAILED or SAMPLE mode. A fragment is a continuous allocation of pages.For example if an index has 150 pages and pages are allocated from 1 to 50, 55 to 60 ,65 to 120 and 140 to 180.Each of these sequences are called as fragment and we can say that this index has four fragment.


EXEC sp_configure 'show advanced options',1
GO

RECONFIGURE WITH OVERRIDE
G

ODECLARE @DefaultFillFactor INT 
DECLARE @Fillfactor TABLE(Name VARCHAR(100),Minimum INT ,Maximum INT,config_value INT ,run_value INT)

INSERT INTO @Fillfactor EXEC sp_configure 'fill factor (%)'
SELECT @DefaultFillFactor = CASE WHEN run_value=0 THEN 100 ELSE run_value END FROM @Fillfactor

SELECT
 DB_NAME() AS DBname,
 QUOTENAME(s.name) AS CchemaName,
 QUOTENAME(o.name) AS TableName,
 i.name AS IndexName,
 stats.Index_type_desc AS IndexType,
 stats.page_count AS [PageCount],
 stats.partition_number AS PartitionNumber,
 CASE WHEN i.fill_factor>0 THEN i.fill_factor ELSE @DefaultFillFactor END  AS [Fill Factor],
  stats.avg_fragmentation_in_percent,stats.fragment_count,
  CASE WHEN stats.index_level =0 THEN 'Leaf Level' ELSE 'Nonleaf Level' END AS IndexLevel
 FROM 
        sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL , NULL, 'LIMITED') AS stats,
        sys.objects AS o,
        sys.schemas AS s,
        sys.indexes AS iWHERE 
        o.OBJECT_ID = stats.OBJECT_ID AND s.schema_id = o.schema_id       AND     i.OBJECT_ID = stats.OBJECT_ID AND i.index_id = stats.index_idAND     stats.avg_fragmentation_in_percent>= 20 AND stats.page_count >= 1000ORDER BY  stats.avg_fragmentation_in_percent DESC,stats.page_count DESC

In this query , 
I have used a where condition to fetch indexes which have fragmentation  greater than 20 percent and have minimum of 1000 pages.

avg_fragmentation_in_percent can have higher value due to various reasons :

SQL server storage engine allocates pages from mixed extent to a table or index till the page count reaches eight.Once the page count reaches to eight SQL server storage engine starts assigning full uniform extents to the index. So there is a possibility of having higher fragmentation for small table and rebuilding indexes might increase the fragmentation.For example, let us assume that an index has 7 pages and these pages are allocated from two mixed extent, while rebuilding the index there is possibility of allocating pages from more than 2 extents and maximum of seven extents which in turn increase the fragmentation.
Even the pages are allocated from uniform extent , there is possibility of fragmentation. When the size of index grow , it need more pages in the non leaf level also.If last page allocated to leaf level is 250 and to accommodate more row in the leaf lever index structure might need a page in index level 1, then SQL server storage engine allocate page 251 to the index level 1 which create fragment in the leaf level.
Other common reason is the page split due to the DML operations . This I have explained well in my previous post.Rebuild/Reorganize index may not be effective to fix fragmentation happened due to the fist two reason, but it can reduce the fragmentation caused by the page split or delete operation.

In our environment we follow the index maintenance as given below:

20 to 40 percentage of fragmentation is handled with reorganizing the index.
All index which has more 40 percentage fragmentation will considered for rebuild
Index which has less than 1000 pages will be ignored by the index maintenance logic.
Index which has more than 50K pages and fragmentation between 10 and 20 will also be considered for Reorganize.

If you liked this post, do like my page on FaceBook at http://www.facebook.com/practicalSqlDba

Wednesday, 4 April 2012

SQL Server - Index Fragmentation - Understanding Fragmentation

When I had a discussion with couple of my friends about index fragmentation, I realized that they have different understanding about index fragmentation. In this post I will try my level best to explain different types of fragmentation.Understanding the concept of index fragmentation is important for detecting and removing fragmentation efficiently.

What is Fragmentation

Fragmentation can be defined as any condition that cause more than optimal amount of disk I/O to be performed in accessing a table or cause the longer disk I/O. Optimal performance of SELECT queries occurs when the data pages of tables are contiguous as possible and pages are fully packed as possible.Fragmentation breaks this rule and reduce the performance of the queries. Fragmentation can happen in two level . One is file system level fragmentation which is called as Logical/Physical Disk Fragmentation and Index level fragmentation. Each of them are described in the below sections.

Logical/Physical Disk Fragmentation

Logical fragmentation is the fragmentation of database file in the file system itself like any other files.This occurs when the file system is not able allocate contiguous space for database file.As a result, disk head has to move back and forth to read from the database files. SQL server is completely unaware about this kind of fragmentation and it is not possible to measure the logical disk fragmentation using any script. Logical disk fragmentation can happen due to various reason like

Placing database file in the same disk where other files( like OS files and other application files)are kept.
Frequent growth of the database file in smaller chunks.

To remove logical fragmentation we can use the windows fragmentation tool but note that we need to stop the SQL server while running the defragmentation tools.Otherwise the defragmentation tool will skip database file as it is used by the SQL server.

The best ways to to avoid the logical fragmentation are :

Keep the database files in a separate disk isolated from other application files and log files.
While creating new database,estimate the size of database file and allocate enough space to avoid the frequent growth of data files.
Specify the database growth option to allocate larger chunks rather than small chunks frequently.

Index Level Fragmentation

Index level fragmentation comes in two flavors : Internal Fragmentation and External Fragmentation. If Index level fragmentation is high ,it may prevents optimizer from using the available indexes in optimal way.

Internal Fragmentation

Internal fragmentation is measured in average page fullness of the index(Page density). A page that is 100% full has no internal fragmentation.In other words, internal fragmentation occur when there is empty space in the index page and this can happen due to insert/update/delete DML operation.Every index page can hold a certain number of records based on the size of the index, but that does not guaranteed that the page always hold maximum number records. Internal fragmentation is normally reported as a percentage of fullness in bytes, not in records. An index page that has internal fragmentation 90% may be full in terms of record. The remaining 10% bytes of the pages may not be enough to hold one more record. In a 8KB pages, maximum of 8060 bytes can be used by data.Rest of space are used by page header and row offset array.Let us assume that we have index with fixed size of 100 bytes and the index has 800 entries. So we can can store 8060/100= 80 records per page by leaving 60 bytes empty as it is not enough to hold one more records and this index requires 10 pages to store the entire index structure.If you calculate the average fullness of this index, in ideal scenario it will come as 99.26%. Let us see how it will look like in Fig 1.

Fig 1

Let us assume that we are deleting the half of the entries randomly of this table which reduce the total number of entries in this index to 400.Now the pages will look like as given in Fig 2 with total of 40600 bytes free space across 10 pages. If you calculate the the average fullness as Total data size*100/Total page size = 4000*100/80600= 49.62% . It clearly says that, half of the spaces are empty and the index has internal fragmentation.

Fig 2

How Internal Fragmentation will affect the performance of the SQL server ?

Internal Fragmentation will increase the I/O. When you run queries that scan part or complete table/index, if you have internal fragmentation on that table/index, it causes additional page reads. In our example, the entire data can be stored in 5 pages. When the query needs to do index scan it has to read 10 pages instead of 5 pages. Which means 50% more I/O.
Internal Fragmentation reduce the efficiency of buffer cache.When indexes has internal fragmentation, it need more space to fit in the buffer.In our case this single index will use 5 additional pages to fit into the buffer which should have used to store other index pages. This will reduce the cache hit ratio. In turn it will increase the physical I/O. It also increase the logical reads.
This also increase the size of the database file. It need more disk space to store the additional pages and reduce the performance of Backup and Restore.

External Fragmentation

External Fragmentation happens when the logical order of the pages does not match the physical order of the pages. External fragmentation refers to the lack of correlation between the logical sequence of an index and its physical sequence. It is measured as the percentage of out-of-order pages in the leaf pages of an index. An out-of-order page is a page for which the next physical page allocated to the index is not the page pointed to by the next-page pointer in the current leaf page.Let us see the Fig 3 below. It is representation of index with three pages.Data is stored in sequential page. In other terms logical order and physical order are same and it store the index keys from 1 to 16 a. All pages are completely full except the Page 3

Fig 3

Let us see what will happen if we insert the value 4 to the underlying table in the Fig 4.

Fig 4

While inserting the value 4 into the table it has to place in the Page 1 between value 3 and 5 but unfortunately Page 1 does not have any free space to occupy one more record. The only option is perform a page split by dividing the Page 1 evenly by leaving half of the data in Page 1 and moving half of the data to new page (Page 4). From Fig 4 we can understand that the logical order of the Page 4 is not matching with the physical order. External Fragmentation can happen due to various reasons:

While allocating pages for new table , SQL server allocate pages from mixed extend till it reaches the 8 pages. There is possibility of having the first 8 pages from 8 different extents.
When all records are deleted from a page, the page will be de-allocated from the index(The de-allocation of pages will not happen immediately) which create gap and increase the fragmentation.
Once object reached 8 pages size, SQL server will start allocating uniform extent to the objects.While allocation uniform extent to an index, next sequential extent to the current extent might be already allocated to other objects/indexes.

How External Fragmentation will affect the performance of the SQL server ?

While reading individual rows, external fragmentation will not affect the performance as it directly go to the page and fetch the data.Unordered scans also will not affected by the external fragmentation as it use the IAM pages to find which extents need to be fetched. In the case of ordered index scan ,external fragmentation might become a degrading factor for performance. The degradation of the performance is because the disk drive's heads have to jump around on the physical disk, rather than performing just contiguous read operations.Also note that external fragmentation will not affect the performance once the pages are loaded into the buffer pool.

I will explain how to detect/measure the fragmentation in the next post.

If you liked this post, do like my page on FaceBook at http://www.facebook.com/practicalSqlDba