Solutions with Practical SQL DBA : July 2013

Wednesday, 24 July 2013

SQL Server : Understanding Bulk Change Map (BCM) Page and Bulk Logged Recovery Model

In the last post , we have discussed about the Differential Change Map pages and how it is utilized in the differential backup process.In this post, let us try to understand the bulk change map page and how the bulk logged recovery model works.

Bulk Change Map(BCM): SQL Server uses Bulk Change Map (BCM) page to track extent modified by bulk logged operation since last backup log operation. BCM page is the 7th page in the data file.BCM has a bit for every extent it tracks. If the bit is 1, the corresponding extent is modified after the last log backup due to bulk logged operation, if the bit is 0, there is no change in the corresponding extent due to bulk logged operation after the last log backup.A BCM page can hold information of around 64000 extents. Precisely BCM pages repeat after every 511232 pages .One BCM page can track the changes of exactly 63904 extents.The second BCM page will be 511239th page.

You can see the BCM pages in all the databases, but they are only relevant when the database is in the bulk recovery model.In this recovery model, while taking log backup, SQL server scans the BCM page and include the extents that are marked as changed in the log backup along with transaction log.This helps SQL server to make bulk logged operation to be recoverable if the database is restored from database backup and a sequence of transaction log backups.In the bulk logged recovery model, bulk logged operation like BCP,Bulkinsert,Select Into are minimally logged in the transaction log and modified extents are tracked through the BCM pages.This helps SQL server to have optimized performance for bulk operations.

BCM pages are not relevant in simple recovery and full recovery model. In simple and full recovery model, the bulk logged operation are treated as fully logged operation. In short, in simple and bulk recovery model, BCM page will not keep track of the changes happened through bulk logged operations.

Let us try to understand with a sample.

CREATE DATABASE Mydb

ALTER DATABASE Mydb SET RECOVERY BULK_LOGGED

BACKUP DATABASE Mydb TO DISK='D:\mydb.bak'

DBCC TRACEON(3604)

DBCC PAGE('MYDB',1,7,3)

From the output ,it is clear that none of the extents are changed due to bulk logged operations. Let us try to perform a bulk logged operation.

SELECT * INTO mydb..SalesOrderDetail FROM AdventureWorks2008.sales.SalesOrderDetail

DBCC PAGE('MYDB',1,6,3)

Now in the output, you can see two entries marked as MIN_LOGGED.The first one is due to the change in the IAM page and second one is the actual data inserted as part of Select Into operation.Now when we take a log backup, SQL server backups the transaction log changes along with extent marked as changed in the BCM file. After the log backup, the bits in the BCM pages are flipped to mark there is no extent changed due to the bulk logged operation. Let us take the log backup.

BACKUP LOG Mydb TO DISK='D:\mydblog.trn'

DBCC PAGE('MYDB',1,7,3)

From the output, we can see that SQL server flipped the bit in BCM page and none of the extents are marked as changed due to the bulk logged operation. Note that BCM pages are not cleared by taking a full backup or differential backup.The base of the transaction log backups is the previous transaction log backup. SQL server should know what changes are happened between two transnational log backup.If a differential or full backup cleared the BCM page,next transaction log backup can not track/recover the changes between the previous log backup and full/differential backup. In short a full backup or differential backup will not break the transaction log backup chain.

If you liked this post, do like my page on FaceBook

Tuesday, 23 July 2013

SQL Server : Understanding Differential Change Map (DCM) Page and Differential Backups

In our earlier posts, we have discussed about different types of pages.Data Pages, GAM and SGAM,PFS page and IAM page. In this post, we will try to understand the Differential Change Map (DCM) pages and differential backups.

Differential Change Map(DCM): SQL Server uses Differential Changes Map (DCM) page to track extent modified after the last full backup operation. DCM page is the 6th page in the data file.DCM page track the extents modified after the full backup.DCM has a bit for every extent it tracks. If the bit is 1, the corresponding extent is modified after the last full backup, if the bit is 0, there is no change in the corresponding extent after the last full backup.A DCM page can hold information of around 64000 extents. Precisely DCM pages repeat after every 511232 pages .One DCM page can track the changes of exactly 63904 extents.The second DCM page will be 511238th page.

Differential backups read the DCM pages to determine which extents have been modified after the last full backup. This greatly reduces the number of pages that a differential backup must scan. The length of time that a differential backup runs is proportional to the number of extents modified since the last full backup and not the overall size of the database.

Let us see an example:

CREATE DATABASE Mydb
GO
BACKUP DATABASE Mydb TO DISK='D:\mydb.bak'
GO
DBCC TRACEON(3604)
DBCC PAGE('MYDB',1,6,3)

FIG 1

From the output it is clear that, four extents which starts from 0th page to 32th page have changed after the last full backup. It is not necessary to have changes in all these 32 pages. But we can clearly say that , there were changes in at least four pages , one page from each extents.This might happen due to the changes triggered by full backup command in the internal tables

Now let us try to add some data to this database and again see the DCM page

SELECT * INTO mydb..SalesOrderDetail FROM AdventureWorks2008.sales.SalesOrderDetail
DBCC PAGE('MYDB',1,6,3)

FIG 2

As part of select we have create a table and inserted data into that. It will trigger changes in many of the system tables. As a result we can see many extents are marked as modified after the last full backup.Till the page number 1:176 , the changes are due to the internal objects modification. Pages from 1:176 to 1:1671 are got modified due to data inserted into the SalesOrderDetail table.When we take differential backup, SQL server just read the DCM pages and include the pages marked as changed in the differential backup. For example as we mentioned earlier,there might be only changes to four pages in the first entry (1:0 - 1:24) but differential backup include all 32 pages in the backup. Let us take a differential backup.

BACKUP DATABASE Mydb TO DISK='D:\Mydbdifferential.bak' WITH DIFFERENTIAL

DBCC PAGE('MYDB',1,6,3)

FIG 3

After the differential backup , there is no change in the DCM pages except two more extents are marked as changed.Before the differential backups, the extents 72-80 and 136-144 were marked as not changed. After the differential backup, these two extents are marked as changed. Refer Fig 2 and Fig 3.This is happened due the changes triggered by differential backup in the internal tables.From this it is clear that, differential backup will not clear the DCM page. In Short differential backup always include all the changes happened from the last full backup. Not the changes from the last differential backup.There is a misconception that, the differential backup contain the changes happened from the last differential backup which is completely wrong.

If we look into the FIG 3, we can see that the size of the differential backup file is 13452 KB.Let us try ti calculate it using the DCM page information.From the Fig 3 , we can see that DCM page has marked 1616 pages as changed. The calculation is given below

(32-0)+(64-48)+(88-72)+(168-112)+(1672-176)=1616

The size of the page is 8 KB. So the 1616 page account to 12928 KB (1616X8). Which is very close to the size of the differential backup file. The difference is overhead of the file header etc.

Now Let us take a full backup and see what will happen.

BACKUP DATABASE Mydb TO DISK='D:\MydbFull.bak'

DBCC PAGE('MYDB',1,6,3)

FIG 4

Now SQL server cleared DCM page.Everything is marked as Not Changed except the four extents 1:0 -1:32. No this backup will work as base for all subsequent differential backups.There is an option to take full backup with out disturbing the existing backup chain.

BACKUP DATABASE Mydb TO DISK='D:\MydbFull_Copy.bak' WITH COPY_ONLY

In this scenario SQL server will not reset the DCM pages. Still MydbFull.bak will be the base full backup for subsequent differential backups. This will be helpful in some scenario where you need to take full backup but do not want disturb the backup chain.

To summaries, SQL server tracks the changes to the extents through the differential change map(DCM) page.While taking differential backup, SQL server just backup the extents marked as changed in the DCM pages. It helps SQL server to speed up the differential backup process with out scanning through all pages to check it is modified or not after the last full backup.Differential backup will not clear the DCM page and differential backup contain all the changes happened from the last full backup. While taking full backup, SQL server reset the bits in DCM page to mark it is not changed.

If you liked this post, do like my page on FaceBook

Wednesday, 17 July 2013

SQL Server: Performance Tuning :Understanding Set Statistics Time output

In the last post we have discussed about Set Statistics IO and how it will help us in the performance tuning. In this post we will discuss about the Set Statistics Time which will give the statistics of time taken to execute a query.

Let us start with a example.

USE AdventureWorks2008
GO
DBCC dropcleanbuffers
DBCC freeproccache

GO
SET STATISTICS TIME ON
GO

SELECT * 
    FROM Sales.SalesOrderHeader SOH INNER JOIN  Sales.SalesOrderDetail SOD ON
            SOH.SalesOrderID=SOD.SalesOrderID 
    WHERE ProductID BETWEEN 700 
        AND 800
GO

SELECT * 
    FROM Sales.SalesOrderHeader SOH INNER JOIN  Sales.SalesOrderDetail SOD ON
            SOH.SalesOrderID=SOD.SalesOrderID 
    WHERE ProductID BETWEEN 700 
        AND 800

There aretwo select statement in the example .The first one is executed after clearing the buffer. Let us look into the output.

SQL Server parse and Compile time : When we submit a query to SQL server to execute,it has to parse and compile for any syntax error and optimizer has to produce the optimal plan for the execution. SQL Server parse and Compile time refers to the time taken to complete this pre -execute steps.If you look into the output of second execution, the CPU time and elapsed time are 0 in the SQL Server parse and Compile time section. That shows that SQL server did not spend any time in parsing and compiling the query as the execution plan was readily available in the cache. CPU time refers to the actual time spend on CPU and elapsed time refers to the total time taken for the completion of the parse and compile. The difference between the CPU time and elapsed time might wait time in the queue to get the CPU cycle or it was waiting for the IO completion. This does not have much significance in performance tuning as the value will vary from execution to execution. If you are getting consistent value in this section, probably you will be running the procedure with recompile option.

SQL Server Execution Time: This refers to the time taken by SQL server to complete the execution of the compiled plan. CPU time refers to the actual time spend on CPU where as the elapsed time is the total time to complete the execution which includes signal wait time, wait time to complete the IO operation and time taken to transfer the output to the client.The CPU time can be used to baseline the performance tuning. This value will not vary much from execution to execution unless you modify the query or data. The load on the server will not impact much on this value. Please note that time shown is in milliseconds. The value of CPU time might vary from execution to execution for the same query with same data but it will be only in 100's which is only part of a second. The elapsed time will depend on many factor, like load on the server, IO load ,network bandwidth between server and client. So always use the CPU time as baseline while doing the performance tuning.

If you liked this post, do like my page on FaceBook

Tuesday, 16 July 2013

SQL Server: Performance Tuning :Understanding Set Statistics IO output

Performance tuning is one of the important task of DBA. Many times people comes to us with different types of performance issue.To handle all these, we should have through knowledge in SQL server. The common question people used to ask me,this procedure used to complete in a reasonable time in the morning but it takes more time in the evening ? The funny answer I used to give is , the procedure might have tired after many executions.

The time required to complete the execution of procedure or a statement is completely depends on the workload on the server.If you have heavy workload in the evening, your procedure might take more time as it has to wait for CPU cycle and IO completion.To get consistent response time, we need to reduce the resource required to complete the execution and that is called performance tuning.

IO and CPU are the main resource utilized to complete the execution. Lesser the resource more consistent performance. In this post let us try to understand the role of DBCC STATISTCS IO in performance tuning.

By default SET STATISTICS IO is off and you can turn it on in session level by running the below statement.

SET STATISTICS IO ON

This statement will help us get the number IO (Page read/write) happened while executing the statement.Let us see a sample output.

USE mydb
GO
SELECT * INTO SalesOrderDetail FROM AdventureWorks2008.Sales.SalesOrderDetail
GO
SET STATISTICS IO ON
DBCC dropcleanbuffers
DBCC freeproccache
GO
SELECT * FROM SalesOrderDetail
GO
SELECT * FROM SalesOrderDetail

Fig 1

The output of Set Statistics IO can be found in the message tab of output pane. We have executed same statement twice. First one after clearing the cache and other one with out clearing the cache.

Let us try to understand the output.

Scan Count: As per the BOL, Scan Count is the number of seeks/scans started after reaching the leaf level in any direction to retrieve all the values to construct the final dataset for the output.

Scan count is 0 if the index used is a unique index or clustered index on a primary key and you are seeking for only one value. For example WHERE Primary_Key_Column = <value>.

Scant count is 1 when you are searching for one value using a non-unique clustered index which is defined on a non-primary key column. This is done to check for duplicate values for the key value that you are searching for. For example WHERE Clustered_Index_Key_Column = <value>.

Scan count is N when N is the number of different seek/scan started towards the left or right side at the leaf level after locating a key value using the index key.

This number tells us that the optimizer has choosen a plan that caused this object to be read repeatedly. Many people misunderstood this as the number of time the entire table is scanned and which is completely wrong.

Let us try to understand with a sample.

CREATE TABLE ScanCount (Id INT IDENTITY(1,1),Value CHAR(1))

INSERT INTO ScanCount (Value ) VALUES ('A') ,('B'),('C'),('D'), ('E') , ('F')
CREATE UNIQUE CLUSTERED INDEX ix_ScanCount ON ScanCount(Id)

SET STATISTICS IO ON

--Unique clustered Index used to search single value

SELECT * FROM ScanCount  WHERE Id =1
--Unique clustered Index used to search multiple value
SELECT * FROM ScanCount  WHERE Id IN(1,2,3,4,5,6)
--Unique clustered Index used to search multiple value
SELECT * FROM ScanCount  WHERE Id BETWEEN 1 AND 6

Let us see the output of the above query.

In the output for first select statement, the scan count is 0. It is inline with the BOL statement "Scan count is 0 if the index used is a unique index or clustered index on a primary key and you are seeking for only one value" As it is unique index (clustered/non clustered ), further scan towards left or right is not required in the leaf level as there will be only one value matching the criteria. That is the logical reason for scan count to be 0 while searching on a unique index for single value. Scan count will be 1 if the search is happening on index(clustered or non clustered) which is not defined as unique.

For the second select statement, the scan count is 6. This is because we are searching for multiple values. BOL is not very clear about this scenario but we need to interpret it as: "Scan count is N if the index used is a unique index or clustered index on a primary key and you are seeking for N values"

It will be very clear if we look into the seek predicates in the execution plan.

Fig 2

Even if it is single where condition, it is splitted into multiple predicates.For each seek predicate, it generate one scan

For the last select statement, the scan count is 1 and it is as per BOL "Scan count is N when N is the number of different seek/scan started towards the left or right side at the leaf level after locating a key value using the index key" Basically the clustered index structure is used to reach the value 1 in the leaf node and a leaf level scan is started towards the left till the it find the value 6. The seek predicate will give us more clear idea.

Fig 3

Logical Read: Number of pages read from data cache. Lower the number, better the performance. This number is very significant in performance tuning as it will not change execution to execution unless there is a change in the data or in the query. This value will give us a good reference to measure the performance improvement while doing the query performance tuning.

Physical Read: Number of pages read from the disk. This will vary from execution to execution.In most scenario, the second successive execution value of physical read will be 0. Refer the Fig 1. If the value is not coming down in the successive execution, we can assume that there is something wrong with memory configuration of the server or there is memory pressure due to heavy workload. You need to look into more details in server level to understand the issue. In the query tuning it does not have much significance as this value keep changing execution to execution and you can not do much to control this in the query level to reduce this number.

Read-ahead Reads: Number of pages placed into the cache for the query.This value tells us the number of physical page reads that SQL server performed as part of read ahead mechanism. SQL server reads physical data pages into cache before the query execution request for that page assuming it might need later to complete the query.If you look into the Fig 1, the physical read 1 and read ahead read is 1303. That means query execution requested for single page and read ahead read mechanism read 1303 to the data cache assuming query execution might request these page to complete the operation. Like the physical read, this value does not have much significance in the query turning.

Lob Logical Reads: Number of text, ntext, image, or large value type (varchar(max), nvarchar(max), varbinary(max)) pages read from the data cache.We should give enough importance for this number like Logical reads.

Lob Physical Reads: Number of text, ntext, image, or large value type (varchar(max), nvarchar(max), varbinary(max)) pages read from the disk.

Lob Logical Reads: Number of text, ntext, image, or large value type (varchar(max), nvarchar(max), varbinary(max)) pages read from the disk as part of read ahead mechanism.

In short Logical reads and Lob Logical Reads are two important numbers that we need to look into while working with performance tuning . How to bring down the number of these two parameters is outside the scope of this post. In general by creating appropriate index or rewriting the query will help us to reduce the number drastically.

If you liked this post, do like my page on FaceBook

Wednesday, 10 July 2013

SQL Server: All About SQL Server Statistics : Consolidated post

Statistics is the one of the important factor which helps the optimizer to choose the right execution plan.At the same time, many of us (so called DBA) will not give enough attention to statistics while troubleshooting the performance issues.It might be because of lack of knowledge in statistics or lack of mechanism to figure out the stale statistics is the root cause of the performance issue.

The query optimizer use the statistics to determine the rows returned in each step.The estimated rows information in the execution plan is calculated based on the statistics available on the columns.The statistics gives the distribution of data with in the column.With out statistics, the query optimizer can not be determine the efficiency of different plan.By using the information stored in the statistics , the query optimizer can make right choice in accessing the data.

Each statistics in SQL server store information about density vector and Histogram.

Density

Density is the ratio of unique values with in the given column or a set of columns.Read More
Histogram

Histogram gives the details of data distribution of leading column of the statistics. Read More

There are two types of the statistics. Index statistics and column statistics. Index statistics are created automatically when we define index on a table. The column statistics are always on single column and it is created automantically when the column is refered in a query. To create the a column statistics automatically ,the auto create statistics should be enabled in the database level.We have discussed in details about how the statistics are getting created, how the autoupdate and auto create statistics works, how stale statistics hits the performance in our post. Read here

SQL server update the statistics automatically after reasonable changes happened in the table. There is a threshold values for triggering auto update statistics.Is that default threshold is good enough for optimal performance? Do we really bothered about updating the statistics manually ? This post will answer for these question

In SQL server 2005 and 2008, it is difficult task to identify the statistics that need a manual updates. This post will give some idea to start with.

If you liked this post, do like my page on FaceBook

Monday, 8 July 2013

SQL Server: Part 5 : All About SQL Server Statistics : How to Detect outdated Statistics ?

In the last post, we have discussed about the auto update statistics threshold and concluded that auto updated statistics threshold are good enough to get optimal performance in some workload. In many cases,a manual update of statistics will help to obtain better performance. In this post let us discuss, how to detect the outdated statistics.

In SQL server 2005 onwards, SQL server uses the ColModCtr to keep track of the changes in the leading column of the statistics. Unfortunatly that is not exposed through any DMV or system view in SQL server 2005 or SQL server 2008. In SQL server 2008 R2 (SP2) onwards, sys.dm_db_stats_properties will give the details of statistics along with the changes in the leading column of the statistics.

For older versions of SQL server users, we need to depends on rowmodctr available in the sys.sysindexes. As per BOL : In SQL Server 2005 and later, rowmodctr is not fully compatible with earlier versions.In earlier versions of SQL Server, the Database Engine maintained row-level modification counters. Such counters are now maintained at the column level. Therefore, the rowmodctr column is calculated and produces results that are similar to the results in earlier versions, but are not exact.

The below query will give an estimate of changes in the statistics.

SELECT
TableName=OBJECT_NAME(i.OBJECT_ID)
,ObjectType=o.type_desc
  ,StatisticsName=i.[name]
  ,statisticsUpdateDate = STATS_DATE(i.OBJECT_ID, i.index_id)
,RecordModified=si.rowmodctr
  ,NumberofRecords=si.rowcnt
FROM sys.indexes i
JOIN sys.objects o ON    i.OBJECT_ID=o.OBJECT_ID
JOIN sys.sysindexes si ON    i.OBJECT_ID=si.id
  AND i.index_id=si.indid
WHERE  o.TYPE <> 'S' AND STATS_DATE(i.OBJECT_ID, i.index_id) IS NOT NULL
UNION ALL
SELECT
TableName=OBJECT_NAME(o.OBJECT_ID)
,ObjectType=o.type_desc
  ,StatisticsName=s.name
  ,statisticsUpdateDate= STATS_DATE(o.OBJECT_ID, s.stats_id)
,RecordModified=si.rowmodctr
  ,NumberofRecords=ir.rowcnt
FROM sys.stats s INNER JOIN sys.objects o ON s.OBJECT_ID=o.OBJECT_ID
JOIN sys.sysindexes si ON s.OBJECT_ID=si.id AND s.stats_id= si.indid
INNER JOIN (SELECT id,rowcnt FROM sys.sysindexes WHERE indid IN (0,1)) IR
ON IR.id=o.OBJECT_ID  WHERE  o.TYPE <> 'S'
AND (s.auto_created=1 OR s.user_created=1)AND STATS_DATE(o.OBJECT_ID, s.stats_id) IS NOT NULL

By keeping this as a base, based on the workload/types of query running, we can schedule the statistics update at appropriate interval instead of blindly updating all statistics.

If you liked this post, do like my page on FaceBook

Tuesday, 2 July 2013

SQL Server: Part 4 : All About SQL Server Statistics : Auto Update Statistics Threshold - Importance of Manual statistics maintenance

In the last post, we have discussed about the auto create and auto update property of the statistics. Do we really need to do manual statistics maintenance to keep the performance optimal? The answer is yes, depends on your work load. SQL server will do auto update statistics only when it reaches the threshold limits.When high volume of Insert/Update/Delete operation are happening, the inbuilt auto update stats is not good enough to get consistent performance.

After a set of insert,delete and update, it may not be reflected in the statistics. If the SQL Server query optimizer requires statistics for a particular column in a table that has undergone substantial update activity since the last time the statistics were created or updated, SQL Server automatically updates the statistics by sampling the column values (by using auto update statistics). The statistics auto update is triggered by query optimization or by execution of a compiled plan, and it involves only a subset of the columns referred to in the query. Statistics are updated before query compilation if AUTO_UPDATE_STATISTCS_ASYNC is OFF, and asynchronously if it is ON. When statistics are updated asynchronously, the query that triggered the update proceeds using the old statistics. This provides more predictable query response time for some work loads, particularly those with short running queries and very large tables.

When a query is first compiled, if the optimizer needs a particular statistics object, and that statistics object exists, the statistics object is updated if it is out of date. When a query is executed and its plan is in the cache, the statistics the plan depends on are checked to see if they are out of date. If so, the plan is removed from the cache, and during recompilation of the query, the statistics are updated. The plan also is removed from the cache if any of the statistics it depends on have changed.

SQL Server 2008 determines whether to update statistics based on changes to column modification counters (colmodctrs).

A statistics object is considered out of date in the following cases:

If the statistics is defined on a regular table, it is out of date if:

The table size has gone from 0 to >0 rows (test 1).
The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then (test 2).
The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered (test 3).

The above part is from MSDN technical article,Statistics Used by the Query Optimizer in Microsoft SQL Server 2008

The first two condition are fairly good but when it comes third which handle the bigger table, some time the threshold will be too high to invalidate the statistics. For example consider a table with 1000000 records.Only after 200500 records modified (update/insert), the statistics will be invalidated to perform an auto update.

Let us see do a hands on.

Use Mydb

SELECT * INTO SalesOrderDetail FROM AdventureWorks2008.sales.SalesOrderDetail
CREATE INDEX ix_ProductID ON SalesOrderDetail(ProductID)
SELECT * FROM SalesOrderDetail WHERE ProductID=725

I have created a copy of SalesOrderDetail table and created index on ProductId. Let us see the execution plan of the select statement.

Optimizer has selected  index seek along with bookmark lookup operation as optimized plan and it was able to complete this with 377 logical reads.

The salesOrderDetail table has 121317 records. As per the third condition mention above to invalidate the statistics, 20% of

121317 =24263 + 500 = 24763 records to be modified. Let us update just 5000 records of this table with productid 725 and run the same select statement.

SET ROWCOUNT 5000
UPDATE SalesOrderDetail SET ProductID=725 WHERE ProductID<>725
SET ROWCOUNT 0
SELECT * FROM SalesOrderDetail WHERE ProductID=725

The estimated number of rows in the execution plan is 374 . This is based on the statistics gathered prior to update operation. The optimizer selected index seek and bookmark lookup as optimal plan based on the statistical data.The select operation performed 5392 logical reads to complete the operation.

As a next step,let us update 19762 records with productid 725. In effect we are updating 24762 (including 5000 record updated in the previous step) which is one record less than the number of records to be updated (24763) to invalidate the statistics.

SET ROWCOUNT 19762
UPDATE SalesOrderDetail SET ProductID=725 WHERE ProductID<>725
SET ROWCOUNT 0
SELECT * FROM SalesOrderDetail WHERE ProductID=725

The estimated number of rows is still 374 based on the statistics gathered prior to both updated statement. The optimizer selected index seek with bookmark lookup as optimized plan for the select statement based on the statistics. 25212 logical reads performed to complete the operation.

Now let us update one more record to invalidate the statistics.

SET ROWCOUNT 19762

UPDATE SalesOrderDetail SET ProductID=725 WHERE ProductID<>725SET ROWCOUNT 0SELECT * FROM SalesOrderDetail WHERE ProductID=725

As we expected the select statement triggered the auto update stats and in the plan the estimated number rows and actual number of rows are very close. That helped the optimizer to choose a better execution plan. Optimizer selected table scan rather than going for index seek and bookmark lookup operation. The select operation took only 1496 logical reads to select 25137 which much lower that the 25212 logical read to select 2516 records in the previous step.In the first step , when we updated the 5000 records itself , the optimizer might have selected table scan as optimized plan instead of index seek and bookmark operation if the statistics was got updated that moment. So that it can complete the operation with 1495 logical read instead of 5392 logical read, which is much better.

From this exercise, it is clear that the threshold for auto update statistics is not good enough to get optimal performance.This will be worst in bigger table. A manual statistics update is needed to guarantee optimal performance of the query but the frequency of the update depends on the workload.

Even though statistics become outdated after lot of DML operation,it will not get updated automatically till a query plan try to access that statistics .To make it more clear,SQL Server will automatically update a statistic when:

A query compiles for the first time, and a statistic used in the plan is out of date
A query has an existing query plan, but a statistic in the plan is out of date

If you liked this post, do like my page on FaceBook