Pages

Monday 1 July 2013

SQL Server: Part 3 : All About SQL Server Statistics :Who Create and Manage the Statistics ? Role of Statistics in Performance Tuning

In the last posts, we have discussed about the histogram ,density vector and how SQL server use the statistics information in cardinality estimation. In this post let us discuss about how the statistics are getting created and the importance of statistics in performance tuning.

There are two types of statistics, Index statistics and column statistics. Index statistics are getting created as part of index creation. Column statistics are automatically created by SQL server when that column is referred in the where condition or group by clause of a query.

There are database level setting which control auto create statistics and auto update statistics behavior of a database. 

















The Auto Create Statistics by default is enabled and it helps the query optimizer to create the column statistics (only on single column) when ever it is required for better query estimation. It is always suggested to keep this enabled for better performance.

The Auto Update statistics is also enabled by default and it helps the query optimizer to update the statistics when it is outdated. Statistics become outdated when there is considerable change in the data. There is a threshold limit to mark the statistics out dated.

Auto Update Statistics Asynchronously is disable by default. When the auto update statistics is enabled , the update statistics can happen in two ways. In synchronous mode (the default mode), query optimizer wait for for completion of the update statistics if it is outdated. In the Asynchronous mode, query optimizer initiate the update statistics and it will not wait for the completion to generate the plan. Some workload can benefit consistent performance by change the update statistics to asynchronous mode. SQL server will not perform full scan while auto create/ auto update statistics. It will always do sampling of data to calculate the statistics in acceptable time.

In the first post about statistics , we have seen the auto creation of statistics while referring a column in the group by/where condition. Let us see how SQL server do the estimation when the auto create statistics is turned off. Let us run the below statement and see the output.

ALTER DATABASE mydb SET AUTO_CREATE_STATISTICS OFF
GO
SELECT * INTO SalesOrderDetail_NoStats FROM SalesOrderDetail
GO
SELECT ProductID,COUNT(*) FROM dbo.SalesOrderDetail_NoStats GROUP BY ProductID\
GO
EXEC SP_HELPSTATS 'SalesOrderDetail_NoStats', 'ALL'

The first statement is turning off the auto update statistics property of MyDb database. In the second statement salesOrderDetail table is copied into a new table. Now we do a grouping on ProductId. Let us see the estimated execution plan.











In the plan , you can see a warning symbol on the table scan operator. In the details part, it is warning about statistics unavailability. In the output of aggregate operator, optimizer estimated 348.306 rows. How optimizer is able to estimate that number with out having a statistics ? It is nothing but the number which equally divides the total number of records. The table has 121317 records and if you divide 121317 / 348.306=348.306. The estimated number of rows will be same If you tried the group by with any other column on that table. Please note that Auto Create Statistics only control the auto statics creation of column statistics.It will not control the index statistics created as part of index creation. 

Let us try to understand how the statistics affect the query performance. For this, I am running two sets of query as given below.

USE mydb
GO
/* Part 1 WITH AUTO STATS UPDATE ON */

ALTER DATABASE mydb SET AUTO_CREATE_STATISTICS ON
ALTER DATABASE mydb SET AUTO_UPDATE_STATISTICS ON
SET STATISTICS IO ON
DROP TABLE SalesOrderDetail_NoStats 
SELECT * INTO SalesOrderDetail_NoStats FROM SalesOrderDetail
CREATE INDEX ix_productid  ON SalesOrderDetail_NoStats (productid)
UPDATE dbo.SalesOrderDetail_NoStats SET ProductID=775 WHERE SalesOrderDetailID<>1
SELECT * FROM dbo.SalesOrderDetail_NoStats WHERE ProductID=776
SELECT * FROM dbo.SalesOrderDetail_NoStats WHERE ProductID=775

/* Part 2  WITH AUTO STATS UPDATE Off */


ALTER DATABASE mydb SET AUTO_CREATE_STATISTICS ON

ALTER DATABASE mydb SET AUTO_UPDATE_STATISTICS ON
SET STATISTICS IO ON
DROP TABLE SalesOrderDetail_NoStats 
SELECT * INTO SalesOrderDetail_NoStats FROM SalesOrderDetail
CREATE INDEX ix_productid  ON SalesOrderDetail_NoStats (productid)
UPDATE dbo.SalesOrderDetail_NoStats SET ProductID=775 WHERE SalesOrderDetailID<>1
--Disabling the auto update stats
ALTER DATABASE mydb SET AUTO_UPDATE_STATISTICS ON
SELECT * FROM dbo.SalesOrderDetail_NoStats WHERE ProductID=776
SELECT * FROM dbo.SalesOrderDetail_NoStats WHERE ProductID=775

In both sets we are creating an index (apparently statistics also) on productid column .After that I am updating all records in the table with productid 775 by leaving only one record with different productid value.After the update, the table has only two distinct productid values 775 and 776. In the first set, I performed  the  select query with auto update statistics on. In the second set select query executed with auto update statistics off. Let us see the execution plan and output of Statistics IO 

Fig A



Fig B































Let us analyse the execution plan. In the first figure (Fig A) (where the select statements are executed with the auto update stats on), the first query with where condition ProductId=776 estimated 1.000348 records and optimizer gone for index seek. In the second query with where condition ProductId=775 estimated 121316 records and optimizer chosen table scan instead of non clustered index seek and bookmark lookup operation. For optimizer it is cost effective to scan through all pages instead of using index seek and bookmark lookup operation to fetch all records (except one record with productid 776) from the table. It performed only 1495 logical reads to complete the operation.

In the second figure (Fig B) (where the select statements are executed with the auto update stats off), the first query with where condition ProductId=776 estimated 228 records and optimizer gone for index seek. This estimation is done based on the statistics created as part of index creation and it is become obsolete after the update. In the second query with where condition ProductId=775 estimated number of records is 234 .This is completely misleading the query optimizer to use  non clustered index seek and bookmark lookup operation instead of a simple table scan to fetch all records (except one record with productid 776) from the table.The select statement performed 121690 logical IO operation which is very high compared to 1495 IO operation used in the plan with auto update statistics On. 

From the above examples, it is clear that optimizer requires updated statistics information to choose the optimal execution plan even if you have all necessary indexes. While troubleshooting the performance issues, we need to give attention to the statistics also.Considerable difference between the estimated number of records and actual number of records should be a good indicator to look into the statistics information or manual update of the statistics.

The auto update statistics featured can be turned off at different levels:

  • On the database level, using the alter database command 
    •  ALTER DATABASE mydb SET AUTO_UPDATE_STATISTICS OFF
  • At index level , using the STATISTICS_NORECOMPUTE option while creating or rebuilding the index. This is little bit confusing. By default this is off. That means the auto update property is on.
  •  At Statistics level, using the NORECOMPUTE option while creating/updating the statistics
The procedure sp_autostats helps us to see the Auto Update statistics settings of each statics associated with a table. If the auto update statistics is disabled in the database level, table level settings will be discarded. Table level Auto Update statistics settings can be changed using the sp_autostats procedure.


If you liked this post, do like my page on FaceBook

4 comments:

  1. Very nice article and understandable technical language with simple example.
    Only need to update under /* Part 2 WITH AUTO STATS UPDATE Off */
    --Disabling the auto update stats
    ALTER DATABASE mydb SET AUTO_UPDATE_STATISTICS ON <== Here use OFF

    And request from all DBA you must keep blogging and help to other DBA like me.

    Thanks
    Vasant Jagtap

    ReplyDelete




  2. I loved the way you discuss the topic great work thanks for the share, Let me share this, vmware training in pune

    ReplyDelete
  3. Hi,
    I really loved the way you explained it. This gonna help a lot of people for sure!!
    New Year Wishes Quotes Status
    New Year Quotes by Famous Personalities

    ReplyDelete
  4. This website is something that is required on the internet, someone with a bit of originality!
    Happy New Year Images with Quotes
    happy new year wishes for friends and family

    ReplyDelete