Pages

Tuesday, 2 July 2013

SQL Server: Part 4 : All About SQL Server Statistics : Auto Update Statistics Threshold - Importance of Manual statistics maintenance

In the last post, we have discussed about the auto create and auto update property of the statistics. Do we really need to do manual statistics maintenance to keep the performance optimal? The answer is yes, depends on your work load. SQL server will do auto update statistics only when it reaches the threshold limits.When high volume of Insert/Update/Delete operation are happening, the inbuilt auto update stats is not good enough to get consistent  performance.

After a set of insert,delete and update, it may not be reflected in the statistics. If the SQL Server query optimizer requires statistics for a particular column in a table that has undergone substantial update activity since the last time the statistics were created or updated, SQL Server automatically updates the statistics by sampling the column values (by using auto update statistics). The statistics auto update is triggered by query optimization or by execution of a compiled plan, and it involves only a subset of the columns referred to in the query. Statistics are updated before query compilation if AUTO_UPDATE_STATISTCS_ASYNC is OFF, and asynchronously if it is ON. When statistics are updated asynchronously, the query that triggered the update proceeds using the old statistics. This provides more predictable query response time for some work loads, particularly those with short running queries and very large tables.

When a query is first compiled, if the optimizer needs a particular statistics object, and that statistics object exists, the statistics object is updated if it is out of date. When a query is executed and its plan is in the cache, the statistics the plan depends on are checked to see if they are out of date. If so, the plan is removed from the cache, and during recompilation of the query, the statistics are updated. The plan also is removed from the cache if any of the statistics it depends on have changed.

SQL Server 2008 determines whether to update statistics based on changes to column modification counters (colmodctrs).

A statistics object is considered out of date in the following cases:

 If the statistics is defined on a regular table, it is out of date if:


  1. The table size has gone from 0 to >0 rows (test 1).
  2. The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then (test 2).
  3. The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered (test 3).
The above part is from MSDN technical article,Statistics Used by the Query Optimizer in Microsoft SQL Server 2008


The first two condition are fairly good but when it comes third which handle the bigger table, some time the threshold will be too high to invalidate the statistics. For example consider a table with 1000000 records.Only after 200500 records modified (update/insert), the statistics will be invalidated to perform an auto update. 

Let us see do a hands on.

Use Mydb
SELECT * INTO SalesOrderDetail FROM AdventureWorks2008.sales.SalesOrderDetail
CREATE INDEX ix_ProductID ON SalesOrderDetail(ProductID)
SELECT * FROM SalesOrderDetail WHERE ProductID=725

I have created a copy of SalesOrderDetail table and created index on ProductId. Let us see the execution plan of the select statement.















Optimizer has selected  index seek along with bookmark lookup operation as optimized plan and it was able to complete this with 377 logical reads.

The salesOrderDetail table has 121317 records. As per the third condition mention above to invalidate the statistics, 20% of 121317 =24263 + 500 = 24763 records to be modified. Let us update just 5000 records of this table with productid 725 and run the same select statement.

SET ROWCOUNT 5000
UPDATE SalesOrderDetail SET ProductID=725 WHERE ProductID<>725
SET ROWCOUNT 0
SELECT * FROM SalesOrderDetail WHERE ProductID=725






















The estimated number of rows in the execution plan is 374 . This is based on the statistics gathered prior to update operation. The optimizer selected  index seek and bookmark lookup as optimal plan based on the statistical data.The select operation performed 5392 logical reads to complete the operation.

As a next step,let us update 19762 records  with productid 725. In effect we are updating 24762 (including 5000 record updated in the previous step) which is one record less than the number of records to be updated (24763) to invalidate the statistics.

SET ROWCOUNT 19762
UPDATE SalesOrderDetail SET ProductID=725 WHERE ProductID<>725
SET ROWCOUNT 0
SELECT FROM SalesOrderDetail WHERE ProductID=725

















The estimated number of rows is still 374 based on the statistics gathered prior to both updated statement. The optimizer selected index seek with bookmark lookup as optimized plan for the select statement based on the statistics. 25212 logical reads performed to complete the operation.

Now let us update one more record to invalidate the statistics.


SET ROWCOUNT 19762
UPDATE SalesOrderDetail SET ProductID=725 WHERE ProductID<>725SET ROWCOUNT 0SELECT FROM SalesOrderDetail WHERE ProductID=725





















As we expected the select statement triggered the auto update stats and in the plan the estimated number rows and actual number of rows are very close. That helped the optimizer to choose a better execution plan. Optimizer selected table scan rather than going for index seek and bookmark lookup operation. The select operation took only 1496 logical reads to select 25137 which much lower that the 25212 logical read to select 2516 records in the previous step.In the first step , when we updated the 5000 records itself , the optimizer might have selected table scan as optimized plan instead of index seek and bookmark operation if the statistics was got updated that moment. So that it can complete the operation with 1495 logical read instead of 5392 logical read, which is much better.

From this exercise, it is clear that the threshold for auto update statistics is not good enough to get optimal performance.This will be worst in bigger table. A manual statistics update is needed to guarantee optimal performance of the query but the frequency of the update depends on the workload.

Even though statistics become outdated after lot of DML operation,it will not get updated automatically till a query plan try to access that statistics .To make it more clear,SQL Server will automatically update a statistic when:
  • A query compiles for the first time, and a statistic used in the plan is out of date
  • A query has an existing query plan, but a statistic in the plan is out of date

If you liked this post, do like my page on FaceBook

No comments:

Post a Comment