Solutions with Practical SQL DBA : Administration

Showing posts with label Administration. Show all posts

Monday, 30 September 2013

SQL Server :Part 1 : Architecture of Transaction Log

In our earlier post, we have discussed in detail about the architecture of the data file and different types of pages. In this post, we will be discussing about the architecture of log file.

Logging is a mechanism used in RDBMS to support various ACID (Atomicity,Consistency,Isolation and Durability) properties of transactions. A transaction log is a physical file in which SQL server stores the details of all transactions and data modifications performed on the database.In the event of of disaster, that causes SQL server to shutdown unexpectedly (Power failure/hardware failure), the transaction log is used to bring the database in a consistent state while restarting the server.On restarting the server, database goes through the recovery process.During this recovery process , the transaction log is used to make sure that all committed transactions are written to respective data pages (rolled forward) and revert the uncommitted transaction that were written to data pages.

Logically transaction log is a set of log records.Each records is identified by a log sequence number (LSN). The new log records is always written at the logical end of log file with a LSN which is greater than the previous one.Each LSN will be associated with a transaction id , which can be used to group the log records of a transaction. As log file store the log records in the sequential order as it happens, It is not necessary that, log records of a transaction are always available in sequence in the log file.Each log records will have the LSN of previous log as a backward pointer and that will help in rollback of transaction.

Transaction log will store separate log entries for each operation.For example, while inserting a record into a table, transaction log will store separate log entry for inserting into clustered index and other non clustered index. In the same way, if a single update statement is updating 10 records, transaction log will capture 10 separate log entries.For data modification, transaction log store either the logical operation performed or the before and after image of the record.

To understand it better, let us look into the transaction log using the sys.fn_dblog. It is an undocumented function which will help us to read the active portion of the log. we are using the below query to create two tables and insert some records into that.We will delete the records from these table to keep the table clean. This insert and delete operation is to make sure that the pages are allocated to the table and we will not get unnecessary entries in the transaction log while examining it.We have issued a manual checkpoint to force the SQL server to write the log information to data file and clear the log. Note that, one table is heap table and other one is clustered index table.

USE Mydb

GO

CREATE TABLE LoginfoHeap

(
   id INT)

GO

CREATE TABLE LoginfoCI

(
   id INT PRIMARY KEY)


INSERT INTO LoginfoHeap VALUES(1),(2)

INSERT INTO LoginfoCI VALUES(2),(4)

GO

DELETE FROM LoginfoHeap

DELETE FROM LoginfoCI

GO

CHECKPOINT

GO

SELECT
[Current LSN],
Operation ,Context ,
[Transaction ID],
[Previous LSN],AllocUnitName,[
Previous Page LSN],
[Page ID],[XACT ID],SUSER_SNAME(CONVERT(VARBINARY,[Transaction SID])) AS 'Login',
[Begin Time],[End Time]
FROM sys.fn_dblog (NULL, NULL)

From the output it is very clear that , we have only two active log entries.First one is written ,when the checkpoint started. The second one is written, when the checkpoint completed the process.

Now we will insert,update and delete records to these tables through two session

---SESSION I
   BEGIN TRAN
INSERT INTO LoginfoCI VALUES(2)

--SESSION 2
   BEGIN TRAN
INSERT INTO LoginfoHeap VALUES(1),(2)

---SESSION I
   INSERT INTO LoginfoCI VALUES(4)

--SESSION 2
   UPDATE LoginfoHeap   SET id =8 WHERE id=1

---SESSION I
   UPDATE LoginfoCI  SET id =6 WHERE id=2

--SESSION 2
   DELETE FROM LoginfoHeap    WHERE id=2

---SESSION I
   DELETE FROM LoginfoCI   WHERE id=4
   SELECT * FROM sys.dm_tran_current_transaction
   COMMIT

--SESSION 2
   SELECT * FROM sys.dm_tran_current_transaction
   COMMIT

The DMV sys.dm_tran_current_transcation returns a single row that displays the state information of the current transaction in the current session.We are interested only in the transaction_id, which will help us to filter the output of sys.fn_dblog. Let us see the output of sys.fn_dblog.

SELECT
[Current LSN],
Operation ,
Context ,
[Transaction ID],
[Previous LSN],
AllocUnitName,
[Previous Page LSN],
[Page ID],[XACT ID],
SUSER_SNAME(CONVERT(VARBINARY,[Transaction SID])) AS 'Login',
[Begin Time],
[End Time]
FROM sys.fn_dblog (NULL, NULL)
WHERE [Transaction ID] IN
(
SELECT [Transaction ID] FROM sys.fn_dblog (NULL, NULL)
WHERE [XACT ID] IN (856960,856981)
)

The values 856960 and 856981 are the transaction_id returned from sys.dm_tran_current_transaction.We have filter the output to get only the relevant rows in which we are interested.

In our script, we have opened two transaction and all our transaction log entries are grouped to
one of these transaction_id marked in red and green.Let us analyse what we did and how it is captured in the transaction log.

In the session 1, we have started the transaction and inserted a single record.The first records in the output map to the BEGIN TRAN command. This is the starting point of the transaction and created a new transaction_id.The previous LSN column value is 0 as this is the first log entry in this transaction.In the same log records,it stores the XACT_ID,login and transaction start time.The second record represent the insert into the clustered table.The transaction_id is used to group the entries associated with a transaction. The previouse LSN column, is a pointer to the previous log entry in the same transaction which help SQL server to move backwards in case of rollback.Page id column refer the the page number where this LSN made the change.Previous Page LSN column refer the last log sequence number(LSN) which modify this page.When LSN modify a page, it will also update the corresponding LSN number in the page header (m_lsn field in the header. For more detail refer this post)

In the session 2, we have opened another transaction and inserted two records through single insert statement to the heap table. You can map these operations to row number 3,4, and 5 in the transaction log output. Third row represent the Begin tran command. Even if we inserted two records in single insert statement , SQL server recorded two separate entry in the transaction log.

As a next step, in session 1 we have added 1 record to the clustered index table.We can map this operation to the 6th record in the transaction log output.

In the next statement , we have modified a record in heap table through Session 2. You can map this to the 7th record in the transaction log output.If you look into the previous LSN column , it will be current LSN column value of the last record associated with this transaction.

In the same way, as a next statement we have modified a record in the clustered table through session 1. We can map the 8th and 9th records in the transaction log output to the update operation on the clustered table. You might have noticed that, when we modified a record in the heap table, transaction log recorded operation in a single row. Where as the same operation in a clustered table has two record in the transaction log. One for delete and other one for insert. When you modify the clustered index key, SQL server internally delete the existing record and insert a new record. This is because, the record need to be stored in the new location based on the modified value(based on the order of clustered index column). The easiest way for SQL server to achieve this is , delete the existing record and insert it as new records with modified clustered column value.

In the next two statement, we are deleting one record from heap table and clustered table.This can be mapped to the 10th and 11th records in the output.Finally we have issued the commit statement in both sessions.12th and 13th record in the transaction log output can be mapped to the commit operation.The Previous LSN column refer the Current LSN column of corresponding begin tran statement. It will also capture the transaction end time in the End time column.

Understanding the VLF(Virtual Log File)

A database can have one or more log file. In general there will be only one log file as there is no performance improvement by having multiple log file. SQL server uses the transaction log in sequential manner.As the data file divided into pages,log files are divided into virtual log file(VLF).The size of the VLFs in a log file may not be in equal size. SQL server decide the size and number of VLF in a log file based on the size of the log file growth as given below.

Growth upto 64 MB = 4 VLF

From 64 MB to 1 GB = 8 VLF

Larger than 1 GB = 16 VLF

Let us create a database with 64 MB initial log size and later increase it to 1 GB. As per above calculation the log file should have 12 VLFs. 4 VLF based on initial size and 8 VLF due to changing the log size to 1 GB.

USE MASTER;
GO
CREATE DATABASE Mydb
ON
( NAME = MyDb_dat, FILENAME = 'D:\MyDb\Mydb.mdf',
SIZE = 10MB, MAXSIZE = 3072MB, FILEGROWTH = 5MB )

LOG ON ( NAME = MyDb_log,FILENAME = 'D:\MyDb\MyDB.ldf',
    SIZE = 64MB, MAXSIZE = 2048MB, FILEGROWTH = 5MB ) ;

GO

ALTER DATABASE Mydb
MODIFY FILE ( NAME = MyDb_Log,FILENAME = 'D:\MyDb\MyDB.ldf', SIZE = 1024MB)

Now Let us see how many VLF got created. To find out the number of VLF in database log file, we can make use of DBCC Loginfo.

DBCC loginfo('mydb')

The output is given below.

There are 12 records in the output each represent a VLF.Let us try to understand the result

FileId: This is the file id of the log file and will be same for all 12 records as we have only one log file.If we have multiple log file , we can multiple numbers here

FileSize: This is the size of the VLF. If you look into the first four, have same size except the fourth one. This because first 8KB of the log file is used for file header. If you add filesize value of first four records along with 8192(8KB) , you will get 64MB which is the initial size of the log file.

16711680+16711680+16711680+16965632 =67100672+8192 =67108864bytes =64MB

In the same if you add the last 8 records it will account the 960 MB (1024-64) , the growth happened due to the alter statement.

StartOffSet: This values is also in bytes, and is the sort column of the output. The first VLF alwasy start from 8192, which is the number of bytes in a page.As mentioned above, the first 8KB is used for file header and will not store any log.

FSeqNo: The file sequence number indicates the order of usage of the VLFs. The row with the highest FSeqNo value is the VLF where current log records are being written.FSeqNo values are not consistent. It will keep changing each time when VLF are getting reused. We will discuss more about this later in this post. A value of 0 in this column means that this VLF has never been used at all. That is the reason we have 0 for all records except one where it is currently logging.

Status: Status has two possible values : 0 and 2. A value of 2 means the VLF is not reusable and a value 0 means it can be reused.It will be more clear as we go further.

Parity: Parity has three possible values 0 ,64 and 128. If the VLF is not used yet, it will have a value 0 and will be set to 64 on first use.Every time a VLF is reused, the parity value is switched between 64 and 128.

CreateLSN: The value indicates when the VLF is created or to group the VLF based on the creation. A values 0 indicates, those VLFs are created as part of database creation. In our case first four records has a value 0 which indicate these VLFs are created as part of database creation with 64MB log size. The remaining 8 records has the same value. These VLF are created as part of our alter database statement to increase the size of the log file from 64 MB to 1024MB

The above output description is referred from Kalen Delaney Blog Post

Now our transaction log will looks like below

Now we have learned about the LSN and VLF. we will discuss more about transaction log in the next post.

If you liked this post, do like my page on FaceBook

Wednesday, 24 July 2013

SQL Server : Understanding Bulk Change Map (BCM) Page and Bulk Logged Recovery Model

In the last post , we have discussed about the Differential Change Map pages and how it is utilized in the differential backup process.In this post, let us try to understand the bulk change map page and how the bulk logged recovery model works.

Bulk Change Map(BCM): SQL Server uses Bulk Change Map (BCM) page to track extent modified by bulk logged operation since last backup log operation. BCM page is the 7th page in the data file.BCM has a bit for every extent it tracks. If the bit is 1, the corresponding extent is modified after the last log backup due to bulk logged operation, if the bit is 0, there is no change in the corresponding extent due to bulk logged operation after the last log backup.A BCM page can hold information of around 64000 extents. Precisely BCM pages repeat after every 511232 pages .One BCM page can track the changes of exactly 63904 extents.The second BCM page will be 511239th page.

You can see the BCM pages in all the databases, but they are only relevant when the database is in the bulk recovery model.In this recovery model, while taking log backup, SQL server scans the BCM page and include the extents that are marked as changed in the log backup along with transaction log.This helps SQL server to make bulk logged operation to be recoverable if the database is restored from database backup and a sequence of transaction log backups.In the bulk logged recovery model, bulk logged operation like BCP,Bulkinsert,Select Into are minimally logged in the transaction log and modified extents are tracked through the BCM pages.This helps SQL server to have optimized performance for bulk operations.

BCM pages are not relevant in simple recovery and full recovery model. In simple and full recovery model, the bulk logged operation are treated as fully logged operation. In short, in simple and bulk recovery model, BCM page will not keep track of the changes happened through bulk logged operations.

Let us try to understand with a sample.

CREATE DATABASE Mydb

ALTER DATABASE Mydb SET RECOVERY BULK_LOGGED

BACKUP DATABASE Mydb TO DISK='D:\mydb.bak'

DBCC TRACEON(3604)

DBCC PAGE('MYDB',1,7,3)

From the output ,it is clear that none of the extents are changed due to bulk logged operations. Let us try to perform a bulk logged operation.

SELECT * INTO mydb..SalesOrderDetail FROM AdventureWorks2008.sales.SalesOrderDetail

DBCC PAGE('MYDB',1,6,3)

Now in the output, you can see two entries marked as MIN_LOGGED.The first one is due to the change in the IAM page and second one is the actual data inserted as part of Select Into operation.Now when we take a log backup, SQL server backups the transaction log changes along with extent marked as changed in the BCM file. After the log backup, the bits in the BCM pages are flipped to mark there is no extent changed due to the bulk logged operation. Let us take the log backup.

BACKUP LOG Mydb TO DISK='D:\mydblog.trn'

DBCC PAGE('MYDB',1,7,3)

From the output, we can see that SQL server flipped the bit in BCM page and none of the extents are marked as changed due to the bulk logged operation. Note that BCM pages are not cleared by taking a full backup or differential backup.The base of the transaction log backups is the previous transaction log backup. SQL server should know what changes are happened between two transnational log backup.If a differential or full backup cleared the BCM page,next transaction log backup can not track/recover the changes between the previous log backup and full/differential backup. In short a full backup or differential backup will not break the transaction log backup chain.

If you liked this post, do like my page on FaceBook

Tuesday, 23 July 2013

SQL Server : Understanding Differential Change Map (DCM) Page and Differential Backups

In our earlier posts, we have discussed about different types of pages.Data Pages, GAM and SGAM,PFS page and IAM page. In this post, we will try to understand the Differential Change Map (DCM) pages and differential backups.

Differential Change Map(DCM): SQL Server uses Differential Changes Map (DCM) page to track extent modified after the last full backup operation. DCM page is the 6th page in the data file.DCM page track the extents modified after the full backup.DCM has a bit for every extent it tracks. If the bit is 1, the corresponding extent is modified after the last full backup, if the bit is 0, there is no change in the corresponding extent after the last full backup.A DCM page can hold information of around 64000 extents. Precisely DCM pages repeat after every 511232 pages .One DCM page can track the changes of exactly 63904 extents.The second DCM page will be 511238th page.

Differential backups read the DCM pages to determine which extents have been modified after the last full backup. This greatly reduces the number of pages that a differential backup must scan. The length of time that a differential backup runs is proportional to the number of extents modified since the last full backup and not the overall size of the database.

Let us see an example:

CREATE DATABASE Mydb
GO
BACKUP DATABASE Mydb TO DISK='D:\mydb.bak'
GO
DBCC TRACEON(3604)
DBCC PAGE('MYDB',1,6,3)

FIG 1

From the output it is clear that, four extents which starts from 0th page to 32th page have changed after the last full backup. It is not necessary to have changes in all these 32 pages. But we can clearly say that , there were changes in at least four pages , one page from each extents.This might happen due to the changes triggered by full backup command in the internal tables

Now let us try to add some data to this database and again see the DCM page

SELECT * INTO mydb..SalesOrderDetail FROM AdventureWorks2008.sales.SalesOrderDetail
DBCC PAGE('MYDB',1,6,3)

FIG 2

As part of select we have create a table and inserted data into that. It will trigger changes in many of the system tables. As a result we can see many extents are marked as modified after the last full backup.Till the page number 1:176 , the changes are due to the internal objects modification. Pages from 1:176 to 1:1671 are got modified due to data inserted into the SalesOrderDetail table.When we take differential backup, SQL server just read the DCM pages and include the pages marked as changed in the differential backup. For example as we mentioned earlier,there might be only changes to four pages in the first entry (1:0 - 1:24) but differential backup include all 32 pages in the backup. Let us take a differential backup.

BACKUP DATABASE Mydb TO DISK='D:\Mydbdifferential.bak' WITH DIFFERENTIAL

DBCC PAGE('MYDB',1,6,3)

FIG 3

After the differential backup , there is no change in the DCM pages except two more extents are marked as changed.Before the differential backups, the extents 72-80 and 136-144 were marked as not changed. After the differential backup, these two extents are marked as changed. Refer Fig 2 and Fig 3.This is happened due the changes triggered by differential backup in the internal tables.From this it is clear that, differential backup will not clear the DCM page. In Short differential backup always include all the changes happened from the last full backup. Not the changes from the last differential backup.There is a misconception that, the differential backup contain the changes happened from the last differential backup which is completely wrong.

If we look into the FIG 3, we can see that the size of the differential backup file is 13452 KB.Let us try ti calculate it using the DCM page information.From the Fig 3 , we can see that DCM page has marked 1616 pages as changed. The calculation is given below

(32-0)+(64-48)+(88-72)+(168-112)+(1672-176)=1616

The size of the page is 8 KB. So the 1616 page account to 12928 KB (1616X8). Which is very close to the size of the differential backup file. The difference is overhead of the file header etc.

Now Let us take a full backup and see what will happen.

BACKUP DATABASE Mydb TO DISK='D:\MydbFull.bak'

DBCC PAGE('MYDB',1,6,3)

FIG 4

Now SQL server cleared DCM page.Everything is marked as Not Changed except the four extents 1:0 -1:32. No this backup will work as base for all subsequent differential backups.There is an option to take full backup with out disturbing the existing backup chain.

BACKUP DATABASE Mydb TO DISK='D:\MydbFull_Copy.bak' WITH COPY_ONLY

In this scenario SQL server will not reset the DCM pages. Still MydbFull.bak will be the base full backup for subsequent differential backups. This will be helpful in some scenario where you need to take full backup but do not want disturb the backup chain.

To summaries, SQL server tracks the changes to the extents through the differential change map(DCM) page.While taking differential backup, SQL server just backup the extents marked as changed in the DCM pages. It helps SQL server to speed up the differential backup process with out scanning through all pages to check it is modified or not after the last full backup.Differential backup will not clear the DCM page and differential backup contain all the changes happened from the last full backup. While taking full backup, SQL server reset the bits in DCM page to mark it is not changed.

If you liked this post, do like my page on FaceBook

Wednesday, 10 July 2013

SQL Server: All About SQL Server Statistics : Consolidated post

Statistics is the one of the important factor which helps the optimizer to choose the right execution plan.At the same time, many of us (so called DBA) will not give enough attention to statistics while troubleshooting the performance issues.It might be because of lack of knowledge in statistics or lack of mechanism to figure out the stale statistics is the root cause of the performance issue.

The query optimizer use the statistics to determine the rows returned in each step.The estimated rows information in the execution plan is calculated based on the statistics available on the columns.The statistics gives the distribution of data with in the column.With out statistics, the query optimizer can not be determine the efficiency of different plan.By using the information stored in the statistics , the query optimizer can make right choice in accessing the data.

Each statistics in SQL server store information about density vector and Histogram.

Density

Density is the ratio of unique values with in the given column or a set of columns.Read More
Histogram

Histogram gives the details of data distribution of leading column of the statistics. Read More

There are two types of the statistics. Index statistics and column statistics. Index statistics are created automatically when we define index on a table. The column statistics are always on single column and it is created automantically when the column is refered in a query. To create the a column statistics automatically ,the auto create statistics should be enabled in the database level.We have discussed in details about how the statistics are getting created, how the autoupdate and auto create statistics works, how stale statistics hits the performance in our post. Read here

SQL server update the statistics automatically after reasonable changes happened in the table. There is a threshold values for triggering auto update statistics.Is that default threshold is good enough for optimal performance? Do we really bothered about updating the statistics manually ? This post will answer for these question

In SQL server 2005 and 2008, it is difficult task to identify the statistics that need a manual updates. This post will give some idea to start with.

If you liked this post, do like my page on FaceBook

Monday, 8 July 2013

SQL Server: Part 5 : All About SQL Server Statistics : How to Detect outdated Statistics ?

In the last post, we have discussed about the auto update statistics threshold and concluded that auto updated statistics threshold are good enough to get optimal performance in some workload. In many cases,a manual update of statistics will help to obtain better performance. In this post let us discuss, how to detect the outdated statistics.

In SQL server 2005 onwards, SQL server uses the ColModCtr to keep track of the changes in the leading column of the statistics. Unfortunatly that is not exposed through any DMV or system view in SQL server 2005 or SQL server 2008. In SQL server 2008 R2 (SP2) onwards, sys.dm_db_stats_properties will give the details of statistics along with the changes in the leading column of the statistics.

For older versions of SQL server users, we need to depends on rowmodctr available in the sys.sysindexes. As per BOL : In SQL Server 2005 and later, rowmodctr is not fully compatible with earlier versions.In earlier versions of SQL Server, the Database Engine maintained row-level modification counters. Such counters are now maintained at the column level. Therefore, the rowmodctr column is calculated and produces results that are similar to the results in earlier versions, but are not exact.

The below query will give an estimate of changes in the statistics.

SELECT
TableName=OBJECT_NAME(i.OBJECT_ID)
,ObjectType=o.type_desc
  ,StatisticsName=i.[name]
  ,statisticsUpdateDate = STATS_DATE(i.OBJECT_ID, i.index_id)
,RecordModified=si.rowmodctr
  ,NumberofRecords=si.rowcnt
FROM sys.indexes i
JOIN sys.objects o ON    i.OBJECT_ID=o.OBJECT_ID
JOIN sys.sysindexes si ON    i.OBJECT_ID=si.id
  AND i.index_id=si.indid
WHERE  o.TYPE <> 'S' AND STATS_DATE(i.OBJECT_ID, i.index_id) IS NOT NULL
UNION ALL
SELECT
TableName=OBJECT_NAME(o.OBJECT_ID)
,ObjectType=o.type_desc
  ,StatisticsName=s.name
  ,statisticsUpdateDate= STATS_DATE(o.OBJECT_ID, s.stats_id)
,RecordModified=si.rowmodctr
  ,NumberofRecords=ir.rowcnt
FROM sys.stats s INNER JOIN sys.objects o ON s.OBJECT_ID=o.OBJECT_ID
JOIN sys.sysindexes si ON s.OBJECT_ID=si.id AND s.stats_id= si.indid
INNER JOIN (SELECT id,rowcnt FROM sys.sysindexes WHERE indid IN (0,1)) IR
ON IR.id=o.OBJECT_ID  WHERE  o.TYPE <> 'S'
AND (s.auto_created=1 OR s.user_created=1)AND STATS_DATE(o.OBJECT_ID, s.stats_id) IS NOT NULL

By keeping this as a base, based on the workload/types of query running, we can schedule the statistics update at appropriate interval instead of blindly updating all statistics.

If you liked this post, do like my page on FaceBook