Pages

Wednesday, 4 April 2012

SQL Server - Index Fragmentation - Understanding Fragmentation

When I had a discussion with couple of my friends about index fragmentation, I realized that they have different understanding about index fragmentation. In this post I will try my level best to explain different types of fragmentation.Understanding the concept of index fragmentation is important  for detecting and removing fragmentation efficiently.


What is Fragmentation

Fragmentation can be defined as any condition that cause more than optimal amount of disk I/O to be performed in accessing a table or cause the longer  disk I/O. Optimal performance of SELECT queries occurs when the data pages of tables are contiguous  as possible and pages are fully packed as possible.Fragmentation breaks this rule and reduce the performance of the queries. Fragmentation can happen in two level . One is file system level fragmentation which is called as Logical/Physical Disk Fragmentation and Index level fragmentation. Each of them are described in the below sections.


Logical/Physical  Disk Fragmentation

Logical fragmentation is the fragmentation of database file in the file system itself like any other files.This occurs when the file system is not able allocate contiguous space for database file.As a result, disk head has to move back and forth to read from the database files. SQL server is completely unaware about this kind of fragmentation and it is not possible to measure the logical disk fragmentation using any script. Logical disk fragmentation can happen due to various reason like
  • Placing  database file in the same disk where other files( like OS files and other application files)are kept.
  • Frequent growth of the database file in smaller chunks.
To remove logical fragmentation we can use the windows fragmentation tool but note that we need to stop the SQL server while running the defragmentation tools.Otherwise the defragmentation tool will skip database file as it is used by the SQL server. 

The best ways to to avoid the logical fragmentation are :
  • Keep the database files in a separate disk isolated from other application files and log files.
  • While creating new database,estimate the size of database file and allocate enough space to avoid the frequent growth of data files.
  • Specify the database growth option to allocate larger chunks rather than small chunks frequently.


Index Level Fragmentation

Index level fragmentation comes in two flavors : Internal Fragmentation and External Fragmentation. If Index level fragmentation is high ,it may prevents optimizer from using the available indexes in optimal way.


Internal Fragmentation

Internal fragmentation is measured in average page fullness of the index(Page density). A page that is 100% full has no internal fragmentation.In other words, internal fragmentation occur when there is empty space in the index page and this can happen due to insert/update/delete DML operation.Every index page can hold a certain number of records based on the size of the index, but that does not guaranteed that the page always hold maximum number records.  Internal fragmentation is normally reported as a percentage of fullness in bytes, not in records. An index page that has internal fragmentation 90% may be full in terms of record. The remaining 10% bytes of the pages may not be enough to hold one more record. In a 8KB  pages,  maximum of  8060 bytes can be used by data.Rest of space are used by page header and row offset array.Let us assume that we have index with fixed size of 100 bytes and the index has 800 entries. So we can can store 8060/100= 80 records per page by leaving 60 bytes empty as it is not enough to hold one more records and this index requires 10 pages to store the entire index structure.If you calculate the average fullness of this index, in ideal scenario it will come as 99.26%. Let us see how it will look like in Fig 1.

Fig 1











Let us assume that we are deleting the half of the entries randomly of this table which reduce the total number of entries in this index to 400.Now the pages will look like as given  in Fig 2 with total of 40600 bytes free space across 10 pages. If you calculate the the average fullness as Total data size*100/Total page size = 4000*100/80600= 49.62% . It clearly says that, half of the spaces are empty and the index has internal fragmentation. 
Fig 2







How Internal Fragmentation will affect the performance of the SQL server ?


  1. Internal Fragmentation will increase the I/O. When you run queries that scan part or complete table/index, if you have internal fragmentation on that table/index, it causes additional page reads. In our example, the entire data can be stored in 5 pages. When the query needs to do index scan it has to read 10 pages instead of 5 pages. Which means 50% more I/O.
  2. Internal Fragmentation reduce the efficiency of buffer cache.When indexes has internal fragmentation, it need more space to fit in the buffer.In our case this single index will use 5 additional pages to fit into the buffer which should have used to store other index pages. This will reduce the cache hit ratio.  In turn  it will increase the physical I/O. It also increase the logical reads.
  3. This also increase the size of the database file. It need more disk space to store the additional pages and reduce the performance of Backup and Restore.

External Fragmentation

External Fragmentation happens when the logical order of the pages does not match the physical order of the pages. External fragmentation refers to the lack of correlation between the logical sequence of an index and its physical sequence. It is  measured as the percentage of out-of-order pages in the leaf pages of an index. An out-of-order page is a page for which the next physical page allocated to the index is not the page pointed to by the next-page pointer in the current leaf page.Let us see the Fig 3 below. It is representation of index with three pages.Data is stored in sequential page. In other terms logical order and physical order are same and it store the index keys from 1 to 16 a. All pages are completely full except the Page 3

Fig 3



















Let us see what will happen if we insert the value 4 to the underlying table in the Fig 4.
Fig 4




















While inserting the value 4 into the table it has to place in the Page 1 between value 3 and 5 but unfortunately Page 1 does not have any free space to occupy one more record. The only option is perform a page split by dividing the Page 1 evenly by leaving half of the data in Page 1 and moving half of the data to new  page (Page 4). From Fig 4 we can understand that the logical order of the Page 4 is not matching with the physical order. External Fragmentation can happen due to various reasons:

  1. While allocating pages for new table , SQL server allocate pages from mixed extend till it reaches the 8 pages. There is possibility of having the first 8 pages from 8 different extents. 
  2. When all records are deleted from a page, the page will be de-allocated from the index(The de-allocation  of pages will not happen immediately) which create gap and increase the fragmentation.
  3. Once object reached 8 pages size, SQL server will start allocating uniform extent to the objects.While allocation uniform extent to an index, next sequential extent to the current extent might be already allocated to other objects/indexes.


How External Fragmentation will affect the performance of the SQL server ?


While reading individual rows, external fragmentation will not affect the performance as it directly go to the page and fetch the data.Unordered scans also will not affected by the external fragmentation as it use the IAM pages to find which extents need to be fetched. In the case of ordered index scan ,external fragmentation might become a degrading factor for performance. The degradation of the performance is because the disk drive's heads have to jump around on the physical disk, rather than performing just contiguous read operations.Also note that external fragmentation will not affect the performance once the pages are loaded into the buffer pool.


I will explain how to detect/measure the fragmentation in the next post.


If you liked this post, do like my page on FaceBook at http://www.facebook.com/practicalSqlDba











20 comments:

  1. Respected Sir,
    This journal is very helpful to get an full understanding over fragmentation on SQL Server. This article also give some knowledge from the operating system point of view.

    ReplyDelete
    Replies
    1. Thank you Subrat for your inspiring comments.

      Thanks
      Neslon
      www.PracticalSqlDba.com

      Delete
  2. Hi Nelson,

    you said at the and "Also note that external fragmentation will affect the performance once the pages are loaded into the buffer pool". How is that possible because data is in the memory and there is no use of disk drive's heads?

    Best regards,
    Tiago

    ReplyDelete
    Replies
    1. That was mistake even after multiple reading before hitting the publish button. I have corrected that . Thank you for pointing it out.

      Delete
  3. I must disagree with many of the definitions stated, as "internal fragmentation" is an aspect of selecting fillfactor, and for indexes that aren't either read-only or written solely in a sequential fashion, necessary for good performance.

    "external fragmentation" is a term I would assign to filesystem and storage (disk, SAN) level fragmentation, which I don't believe was covered at all.

    I almost must disagree with speaking of "physical files" unless one is addressing the actual physical storage; if you're not talking about the particular volume of matter whose physical state changes at a molecular or atomic level, you're not talking about a physical file. SANs can do amazing things... some of which result in the OS seeing a contiguous file that is actually not contiguous on the actual disk platters in question.

    ReplyDelete
    Replies
    1. Alexander Suprun20 May 2012 13:21

      I will strongly disagry on that point. All the terms used are very well known in SQL server community, and coming up with some new definitions will certanly confuse most of the readers.

      Delete
  4. Alexander Suprun20 May 2012 13:28

    Hi Nelson,

    Very well written article, as usual. But there is one inaccuracy I have to point out. The optimizer never ever considers any kind of fragmentations when creates an execution plan. The reason is simple. It would have to scan all the tables and indexes participated in the query to figure out what the current fragmentations are, and as you can imagine it will take unacceptably long time during a plan optimization process.

    ReplyDelete
    Replies
    1. Thank you for pointing it out. You are 100% correct. The change in the execution plan in our environmnet due to the statistics got updated while rebuilding the indexes. I have removed the statement from the post. Thank you once again..

      Thanks
      Nelson Johhn A
      www.PracticalSqlDba.com

      Delete
    2. Alexander Suprun14 May 2013 08:55

      Hi Nelson,

      My appologies, but you might be partly correct on this one. Although the optimizer cannot scan the tables to find out the exact fragmentation levels, it still indirectly uses fragmentation to chose execution plan. When internal fragmentation increases the index becomes bigger, and the bigger the index the smaller the chance that optimizer will use it.

      Thanks,
      Alexander Suprun

      Delete
  5. Really great work.Can you please let us know next post URL

    ReplyDelete
    Replies
    1. http://www.practicalsqldba.com/2012/04/sql-server-measuring-index.html

      Delete
  6. why is that 8kb limit?
    why not 16 or 32kb :-D
    why *Exactly* 8?
    Regards

    ReplyDelete
    Replies
    1. :) nice question. but that depends on microprocessors addressing capability, network packets and 10,000 other things.... but funny enough a lot of it is based on rule of thumb, or more like best guess.... this link explains in detail --> http://sqlblog.com/blogs/linchi_shea/archive/2008/03/03/is-the-8kb-page-obsolete-or-aging.aspx

      it depends on the average amount of I/O sql does, if with time there are substantial changes in the average, they can increase the page size too. also you dont want to have a big page, because then again, the amount of CPU cycles to get to the data buried in one big page will be high. if you have a smaller page, then that increases the amount of pages again the cycles for lateral scans increase..... so yeah, best guess to be honest :) and rule of thumb and some theories mentioned in the article above... but thats a good question :)

      Delete
    2. Thanks for pointing to a nice article ...

      Delete
  7. Even if I kept a fill factor of 70 for the indexes, why the above mentioned insert is not helping from page split ?

    ReplyDelete
  8. Great work! Really helpful!!

    ReplyDelete
  9. Spectacular article nelson sir keep it up and nice to meeting you.

    ReplyDelete
  10. Hi ,

    I was to try that fragmentation will not cause different execution plan but ended with something difference than what i read here. I would like you to explain if i missed any of your points with misunderstandig or am doing anything wrong.


    Create Table BigTable1 ( id int identity , c2 char(100) default replicate('a', 100))

    INSERT INTO BigTable1 default values
    go 1500

    --Now I’m creating one NCI with heavy fragmentation
    CREATE NONCLUSTERED INDEX [idx_BigTable_SomeColumn1] ON [dbo].[BigTable1] (c2 ASC)
    with fillfactor=2

    --The following query cover the index

    Select c2 from BigTable1 where c2 like 'aa%' option(recompile)
    --Based on ur thoughts the query should use the index (should not look into fragmentation)
    --but optimizer uses the table scan. Since its better than NCI

    --Now I’m going to rebuild the index
    ALTER INDEX [idx_BigTable_SomeColumn1] ON [dbo].[BigTable1] REBUILD
    with (fillfactor = 100)


    --Again check the same query
    Select c2 from BigTable1 where c2 like 'aa%' option(recompile)
    --Now its using NCI seek


    ReplyDelete
  11. Buenas tardes, Existe la posobildad de hacer la desfragmentacion de una base de datos por medio de linea de comandos en MSDOS.

    Saludos

    http://itixmih.wordpress.com/

    ReplyDelete