Solutions with Practical SQL DBA : SQL Server: Performance Tuning :Understanding Set Statistics IO output

Tuesday, 16 July 2013

SQL Server: Performance Tuning :Understanding Set Statistics IO output

Performance tuning is one of the important task of DBA. Many times people comes to us with different types of performance issue.To handle all these, we should have through knowledge in SQL server. The common question people used to ask me,this procedure used to complete in a reasonable time in the morning but it takes more time in the evening ? The funny answer I used to give is , the procedure might have tired after many executions.

The time required to complete the execution of procedure or a statement is completely depends on the workload on the server.If you have heavy workload in the evening, your procedure might take more time as it has to wait for CPU cycle and IO completion.To get consistent response time, we need to reduce the resource required to complete the execution and that is called performance tuning.

IO and CPU are the main resource utilized to complete the execution. Lesser the resource more consistent performance. In this post let us try to understand the role of DBCC STATISTCS IO in performance tuning.

By default SET STATISTICS IO is off and you can turn it on in session level by running the below statement.

SET STATISTICS IO ON

This statement will help us get the number IO (Page read/write) happened while executing the statement.Let us see a sample output.

USE mydb
GO
SELECT * INTO SalesOrderDetail FROM AdventureWorks2008.Sales.SalesOrderDetail
GO
SET STATISTICS IO ON
DBCC dropcleanbuffers
DBCC freeproccache
GO
SELECT * FROM SalesOrderDetail
GO
SELECT * FROM SalesOrderDetail

Fig 1

The output of Set Statistics IO can be found in the message tab of output pane. We have executed same statement twice. First one after clearing the cache and other one with out clearing the cache.

Let us try to understand the output.

Scan Count: As per the BOL, Scan Count is the number of seeks/scans started after reaching the leaf level in any direction to retrieve all the values to construct the final dataset for the output.

Scan count is 0 if the index used is a unique index or clustered index on a primary key and you are seeking for only one value. For example WHERE Primary_Key_Column = <value>.

Scant count is 1 when you are searching for one value using a non-unique clustered index which is defined on a non-primary key column. This is done to check for duplicate values for the key value that you are searching for. For example WHERE Clustered_Index_Key_Column = <value>.

Scan count is N when N is the number of different seek/scan started towards the left or right side at the leaf level after locating a key value using the index key.

This number tells us that the optimizer has choosen a plan that caused this object to be read repeatedly. Many people misunderstood this as the number of time the entire table is scanned and which is completely wrong.

Let us try to understand with a sample.

CREATE TABLE ScanCount (Id INT IDENTITY(1,1),Value CHAR(1))

INSERT INTO ScanCount (Value ) VALUES ('A') ,('B'),('C'),('D'), ('E') , ('F')
CREATE UNIQUE CLUSTERED INDEX ix_ScanCount ON ScanCount(Id)

SET STATISTICS IO ON

--Unique clustered Index used to search single value

SELECT * FROM ScanCount  WHERE Id =1
--Unique clustered Index used to search multiple value
SELECT * FROM ScanCount  WHERE Id IN(1,2,3,4,5,6)
--Unique clustered Index used to search multiple value
SELECT * FROM ScanCount  WHERE Id BETWEEN 1 AND 6

Let us see the output of the above query.

In the output for first select statement, the scan count is 0. It is inline with the BOL statement "Scan count is 0 if the index used is a unique index or clustered index on a primary key and you are seeking for only one value" As it is unique index (clustered/non clustered ), further scan towards left or right is not required in the leaf level as there will be only one value matching the criteria. That is the logical reason for scan count to be 0 while searching on a unique index for single value. Scan count will be 1 if the search is happening on index(clustered or non clustered) which is not defined as unique.

For the second select statement, the scan count is 6. This is because we are searching for multiple values. BOL is not very clear about this scenario but we need to interpret it as: "Scan count is N if the index used is a unique index or clustered index on a primary key and you are seeking for N values"

It will be very clear if we look into the seek predicates in the execution plan.

Fig 2

Even if it is single where condition, it is splitted into multiple predicates.For each seek predicate, it generate one scan

For the last select statement, the scan count is 1 and it is as per BOL "Scan count is N when N is the number of different seek/scan started towards the left or right side at the leaf level after locating a key value using the index key" Basically the clustered index structure is used to reach the value 1 in the leaf node and a leaf level scan is started towards the left till the it find the value 6. The seek predicate will give us more clear idea.

Fig 3

Logical Read: Number of pages read from data cache. Lower the number, better the performance. This number is very significant in performance tuning as it will not change execution to execution unless there is a change in the data or in the query. This value will give us a good reference to measure the performance improvement while doing the query performance tuning.

Physical Read: Number of pages read from the disk. This will vary from execution to execution.In most scenario, the second successive execution value of physical read will be 0. Refer the Fig 1. If the value is not coming down in the successive execution, we can assume that there is something wrong with memory configuration of the server or there is memory pressure due to heavy workload. You need to look into more details in server level to understand the issue. In the query tuning it does not have much significance as this value keep changing execution to execution and you can not do much to control this in the query level to reduce this number.

Read-ahead Reads: Number of pages placed into the cache for the query.This value tells us the number of physical page reads that SQL server performed as part of read ahead mechanism. SQL server reads physical data pages into cache before the query execution request for that page assuming it might need later to complete the query.If you look into the Fig 1, the physical read 1 and read ahead read is 1303. That means query execution requested for single page and read ahead read mechanism read 1303 to the data cache assuming query execution might request these page to complete the operation. Like the physical read, this value does not have much significance in the query turning.

Lob Logical Reads: Number of text, ntext, image, or large value type (varchar(max), nvarchar(max), varbinary(max)) pages read from the data cache.We should give enough importance for this number like Logical reads.

Lob Physical Reads: Number of text, ntext, image, or large value type (varchar(max), nvarchar(max), varbinary(max)) pages read from the disk.

Lob Logical Reads: Number of text, ntext, image, or large value type (varchar(max), nvarchar(max), varbinary(max)) pages read from the disk as part of read ahead mechanism.

In short Logical reads and Lob Logical Reads are two important numbers that we need to look into while working with performance tuning . How to bring down the number of these two parameters is outside the scope of this post. In general by creating appropriate index or rewriting the query will help us to reduce the number drastically.

If you liked this post, do like my page on FaceBook