Solutions with Practical SQL DBA : SQL Server: Part 2 : All About SQL Server Statistics :Histogram

Thursday, 27 June 2013

SQL Server: Part 2 : All About SQL Server Statistics :Histogram

In the Part 1 about SQL server Statistics, we have discussed about density vector information stored in the the statistics. In this post, let us discuss about the histogram. Let us create a copy of SalesOrderDetail table and two indexes on top of that as we did in our first part.

USE mydb
GO
SELECT * INTO SalesOrderDetail FROM AdventureWorks2008.Sales.SalesOrderDetail
CREATE UNIQUE CLUSTERED INDEX ix_SalesOrderDetailID ON SalesOrderDetail(SalesOrderDetailID)
CREATE NONCLUSTERED INDEX ix_productid ON SalesOrderDetail(productid)

Let us see the histogram information of the non clustered index.

DBCC SHOW_STATISTICS('dbo.SalesOrderDetail', 'ix_productid') WITH HISTOGRAM

You can see 200 records in the output. I have shown only the first 18 records. To create the histogram, SQL server split the data into different buckets (called steps) based on the value of first column of the index.Each record in the output is called as bucket or step.The maximum number of bucket is 200 based on the data distribution.Histogram is a statistical representation of your data.In other words it is the distribution of records based on the value of first column of the index. Histogram is always based only on the first column of the index even if the index is composite one.This is one of the reason why it is always suggested to have most selective column as the first column of the index, but there are exceptions.

Let us look at the output of the histogram. It tried to put the 121317 records in the table into 200 buckets (steps) based on the value of productid.

The RANGE_HI_KEY column represent the upper boundary of each bucket.The lower boundary of each bucket is the RANGE_HI_KEY+1 of the previous bucket. For the first bucket, the lower boundary is the smallest value of the column on which the histogram is generated.

The RANGE_ROWS column represent the number records in that bucket range but not equal to the value of RANGE_HI_KEY. The value 0 on the first record says, there is no record in the table whose productid value is less than 707. If you look into the 11th record with RANGE_HI_KEY value 718, we have 218 in the RANGE_ROWS column .This says ,there are 218 records with productid value is greater than 716 (previous RANGE_HI_KEY) and productid value is less than 718. The output of below query proves that:

SELECT COUNT(*) FROM SalesOrderDetail WHERE productid>716 AND productid<718

The EQ_ROWS is the number of records in the table matching with RANGE_HI_KEY. For the first records, 3083 in the EQ_ROWS says that there are 3083 records in the table with productid 707.The output of below query proves that:

SELECT COUNT(*) FROM SalesOrderDetail WHERE productid=707

The DISTINCT_RANGE_ROWS represent the number of distinct records (distinct productid ) between two RANGE_HI_KEY values. If you look into the 11th record with RANGE_HI_KEY value 718, we have value 1 in the DISTINCT_RANGE_ROWS column .This says ,there is only 1 distinct records withe productid value is greater than 716 (previous RANGE_HI_KEY) and productid value is less than 718. The output of below query proves that:


SELECT COUNT(distinct productid) FROM SalesOrderDetail WHERE productid>716 AND productid<718

The AVG_RANGE_ROWS column represent the average number of rows per distinct values.This is equivalent to RANGE_ROWS / DISTINCT_RANGE_ROWS when RANGE_ROWS value is greater than 0. Otherwise AVG_RANGE_ROWS is considered as 1.

How SQL server optimizer use the histogram for cardinality estimation ? Let us consider the execution plan of the below query.

SELECT productid FROM SalesOrderDetail WHERE productid>=716 AND productid<=718

From where the Estimated Number of rows (1513) is calculated ? Let us go to the histogram

Add the highlighted values which will match to 1513 which is the estimated number of rows in the execution plan .
1076 is the number of records with productid value 716
218 is the number of records with productid value greater than 716 and productid less than 718
219 is the number of records with productid value 718

Sum of these three values is the estimated number of rows in the above execution plan.

When there is complex where condition ,optimizer create required statistics called column statistics and use complex algorithm on top of the histogram data for cardinality estimation. We will discuss about that in the next post.

If you liked this post, do like my page on FaceBook

13 comments:

Tony Trus11 July 2013 at 11:07
Thanks for the great article. There is one syntax change for others that read this article: change the word DENSITY_VECTOR to HISTOGRAM. You could use both and separate with a comma but the density vector is not what is pictured. Otherwise its a good refresher article for a lot of us and a new command for others.

DBCC SHOW_STATISTICS('dbo.SalesOrderDetail', 'ix_productid') WITH DENSITY_VECTOR, HISTOGRAM
ReplyDelete
Replies
Sudhir Giri20 August 2013 at 08:39
Superb article!
ReplyDelete
Replies
John Willson20 October 2013 at 21:41
This will really come in handy especially if you'd want to review some stuff. Thank you very much for sharing this SQL server solution.
ReplyDelete
Replies
Lydia Wilkinson19 November 2013 at 21:20
Thanks for posting this article. This will come handy in my field of work. Thanks again, I cant wait to tell my friends about this blog.
ReplyDelete
Replies
Anonymous24 June 2014 at 21:39
I have a question and it might be silly for you. How can I get the number of values that match each distinct column value from Statistics in SQL server?
ReplyDelete
Replies
Mr. Nam19 May 2015 at 19:37
Bạn đang lo lắng phân vân khi không biết nên đặt niềm tiên ở công ty thiết kế web nào.? Bạn chưa bít giá cả và chất lượng dịch vụ ra sao.? Tôi xin giới thiệu:
Công Ty TNHH HTSolution là một công ty thiết kế website chuyên cung cấp các dịch vụ chuyên nhiệp như: Thiết kế website giá rẻ, phần mềm quản lý quán cà phê, cần nâng cấp website….vv.
Nếu bạn có nhu cầu sử dụng dịch vụ phần mềm quản lý hay cần làm web của công ty chúng tôi vui lòng liên hệ:
CÔNG TY TNHH HTSOLUTION
Địa chỉ: 78th Thăng Long - P.4 - Q.Tân Bình - TP.HCM
Điện thoại: (08) 6267 3377 (Anh Hoàng) or 0985327297 (Anh Trường)
Email: Info@htsolution.vn
ReplyDelete
Replies
vmware training in pune15 December 2016 at 22:43

I loved the way you discuss the topic great work thanks for the share, Let me share this, vmware training in pune
ReplyDelete
Replies
Unknown24 March 2017 at 04:20
Nice article and explanation is good,Thank you for sharing your experience on SQL DBA.you have clearly explained about the process thus it is very much interesting and i got more information from your blog.
For more details about please check our website.

Oracle Fusion Financial Training Institute
ReplyDelete
Replies
Unknown11 May 2018 at 03:02
thank you for sharing nice information really very nice ..i am interested java training
ReplyDelete
Replies
شركة تنظيف منازل28 December 2018 at 21:52
شركة تنظيف منازل بجدة
شركة تنظيف منازل بمكة
افضل شركة تنظيف بجدة
تتميز شركتنا شركة تنظيف خزانات بجدة
بعدة عوامل هامة تجعلنا أفضل شركة تنظيف خزانات بجدة فتقوم بجميع خدمات التنظيف ومنها تنظيف
والشوائب الخزانات بجدة وايضا شركة تنظيف خزانات بالمدينة المنورة متخصصة في مجال تنظيف الخزانات وكذلك شركة غسيل خزانات بالمدينة سواء كانت العلوية أو السفلية وتعقيمها تعقيم تام بأفضل أنواع المطهرات
ReplyDelete
Replies
adam clean12 November 2019 at 02:19
من افضل واقوى الشركات التي تقدم خدمات مكافحة حشرات المنزل البق والصراصير والنمل والعته تلك التي تستخدم مبيدات آمنة وفعالة ومضمونة مثل افضل شركات مكافحة الحشرات بجدة التي تتعامل في مكافحة الحشرات وتستخدم مبيدات آمنة ومضمونة كما هو الوضع او النظام في مثل شركة مكافحة حشرات التي تتعامل مع الحشرات في المنازل
ReplyDelete
Replies
Gokul Krish20 April 2026 at 04:59
This is a fantastic deep dive into the DCM page and differential backups. I always assumed differential backups just compared the last full backup to the current state, but knowing SQL Server uses the DCM to track which extents have changed at the page level really clarifies why differentials are so much faster than full backups. The DBCC PAGE examples to visualize the DCM bits before and after data insertion were especially helpful. For anyone looking to build a career in database administration or site reliability engineering, understanding storage internals like this is just the beginning. Mastering automation, CI/CD pipelines, and infrastructure as code is equally critical. That's why I've been exploring DevOps Training Institute in Electronic City to complement my database skills. Thanks for sharing this.
ReplyDelete
Replies

Add comment

Solutions with Practical SQL DBA

Pages

Thursday, 27 June 2013

SQL Server: Part 2 : All About SQL Server Statistics :Histogram

13 comments:

Subject