Wednesday, 2 December 2020

SQL Database Corruption, how to investigate root cause?


In this article, we will discuss the MS SQL Server database corruption. 

So, first, we need to understand what the cause of corruption is. Usually, in all the scenarios of SQL Server database corruption, the main corruption cause is related to the IO subsystem level, which means that the root cause is a problem with the drives, drivers, and possibly even drivers. And while the specific root causes can vary widely (simply due to the sheer complexity involved in dealing with magnetic storage). The main thing to remember about disk systems is that any person in the IT knows that all major operating systems. It ships with the equivalent of a kind of Disk-Check utility (CHKDSK) that can scan for bad sectors, bad entries, and other storage issues that can infiltrate storage environments.


If you are beginner to Microsoft SQL Server. You could do the following things to solve the database corruption. And these tricks can’t help you out:

  • Reopen SQL Server
    • It just holds up the issue and gives raise to the system to run through crash restoration on the databases. Not to mention, in most systems, you will not be able to do this right away and will hold up the issue further
  • Delete all the procedure cache
  • Separate and moving the Microsoft SQL server to a new server
    • When you do this you will feel pain because SQL Server will fail to attach on the second server and on your primary.  At this moment you have to look into "hack attach" SQL Server and I can understand it can be a very painful experience.

If you know what will be helpful to solve any problem or what can't be helpful. It requires that you have to be prepared every time for these kinds of problems.  It means that you have to create a database that is corrupt and try everything to recovery that database with the slightest data loss.

You may read this: How to Reduce the Risk of SQL Database Corruption

Root cause analysis:

Root cause analysis may be a crucial part of this method and should not be unmarked regardless of however you pass through the info. This can be a vital step in preventing the matter from occurring once more and doubtless earlier than you're thinking that. In my expertise, once corruption happens, it's absolute to happen once more if no actions area unit is taken to rectify the matter. To boot, this is often seemed to be worse the second time.

Now, I'd counsel, that though you think that you recognize the explanation for the corruption (E.G. power outage with no UPS) investigate the subsequent sources anyways. Perhaps the outage was simply helped and there have been warning signs occurring. To begin, I perpetually recommend these places to seem.

  • Memory and disk medicine to create certain there aren't any issues with the present hardware
  • SQL Server error logs
  • Windows event viewer
  • While rare, sit down with your vendors to examine if they need to have issues with the computer code you're using
  • Software errors, believe it or not, Microsoft has been known to cause corruption. See KB2969896. this is often wherever gap tickets with Microsoft also are helpful

The event viewer and SQL server error logs may be viewed along.

But, I suggest dividing these out to the system administrators as they regularly have more manpower on their team to review these. 

Helpful Tip:

In fact, even once knowing what the matter is, I forever counsel gap a price tag with Microsoft as a result of they're going to not solely provide an additional set of eyes on the problem however additionally their experience on the boot, Microsoft will and can assist you with the next steps to assist notice the foundation reason behind the matter and wherever the corruption originated from.

Corruption problems:

If the database is corrupt, it is possible to repair the database using SQL Recovery Software. This software will allow repairing the database in case of corruption.


So finally, after this article, we learn many things about database corruption and how to resolve that corrupt database. Most of the things are too common, and now you can solve this kind of common corruption. With time when will you finish this series, the goal will be that when you find out you have corruption, it is coming from your alerts, not an end-user, and you will have a procedure to let your managers know where you sit and what the next steps are. Because of this, you will get a lot of benefits, and also it allows you to work without having someone breathing down your neck frequently.

Friday, 6 November 2020

[Solved] SQL Backup Detected Corruption in the Database Log

Summary: In this article, we will discuss about the ‘SQL Backup Detected Corruption in the Database Log’ error. It will also describe the reason behind the error and manual workarounds to resolve it. The article also explains an alternative solution that can be used to restore the database and its transaction log backup – when the manual solutions fail. 

When performing transaction log backup for a SQL database, to restore the database after network maintenance or in the event of a crash, you may find the backup job failed with the following error:

Backup failed for Server xxx (Microsoft.SqlServer.SmoExtended)

System.Data.SqlClient.SqlError: BACKUP detected corruption in the database log. Check the errorlog for more information. (Microsoft.SqlServer.Smo)

The error message clearly indicates that the transaction log is damaged (corrupted). Checking the SQL errorlog for more details on the error shows:

2020-11-01 13:30:40.570 spid62 Backup detected log corruption in database TestDB. Context is Bad Middle Sector. LogFile: 2 ‘D:\Data\TestDB_log.ldf’ VLF SeqNo: x280d VLFBase: x10df10000 LogBlockOffset: x10efa1000 SectorStatus: 2 LogBlock.StartLsn.SeqNo: x280d LogBlock.StartLsn.
2020-11-01 13:30:40.650 Backup Error: 3041, Severity: 16, State: 1.
2020-11-01 13:30:40.650 Backup BACKUP failed to complete the command BACKUP DATABASE TestDB. Check the backup application log for detailed messages

However, the FULL database backup completed successfully and even running DBCC CHECKDB integrity check didn’t find any errors.

What Could Have Caused the SQL Transaction Log Backup to Fail?

A transaction log (T-log) backup allows restoring a database to a certain point-in-time, before the failure occurred. It does so by taking a backup of all the transaction logs created since the last log backup, including the corrupt portion of the T-log. This causes the backup to fail.

However, a FULL database backup only has to back up the beginning of the last active part of the T-log – at the time the backup is taken. Also, DBCC CHECKDB requires the same amount of log as the FULL database backup – at the time of the db snapshot was generated. This is why the full backup executed successfully and no errors were reported by DBCC CHECKDB.

Manual Workarounds to Backup Detected Log Corruption in SQL Database

Following are the manual workarounds you can apply to resolve the SQL backup log corruption issue:

Workaround 1: Change the SQL Recovery Model from FULL to SIMPLE

To fix the ‘SQL Server backup detected corruption in the database log’ issue, try switching the database to the SIMPLE recovery model.

Switching to SIMPLE recovery model will ignore the corrupted portion of the T-log. Subsequently, change the recovery model back to FULL and execute the backups again.

Here’s the steps you need to perform to change the recovery model:

Step 1: Make sure there are no active users by stopping all user activity in the db.

Step 2: Change the db from FULL to a SIMPLE recovery model. To do so, follow these steps:

  • Open SQL Server Management Studio (SSMS) and connect to an instance of the SQL Server database engine.
  • From Object Explorer, expand the server tree by clicking the server name.
  • Next, depending on the db you are using, select a ‘user database’ or choose a ‘system database’ by expanding System Databases.
  • Right-click the selected db, and then select Properties.
  • In the Database Properties dialog box, click Options under ‘Select a page’.

  • Choose the Simple recovery model from the ‘Recovery model’ list box, and then click OK

Step 3: Now set the db back to the FULL recovery model by following the same steps from 1 till 5 above. Then, select Full as your recovery model from the list box.

Step 4: Perform a FULL database backup again.

Step 5: Take log backups again.

Hopefully, performing these steps will help you perform the transaction log backup without any issue.

Note: This solution won’t be feasible if you’re using database mirroring for the database for which you have encountered the ‘backup detected log corruption’ error. That’s because, in order to switch to the SIMPLE recovery model you will need to break the mirror and then reconfigure the db which can take significant amount of time and effort. In this case, try the next workaround.

Workaround 2: Create Transaction Log Backup using Continue on Error Option

To complete executing the backup of T-log without any error, try running log backup of SQL database with the CONTINUE AFTER ERROR option. You can either choose to run the option directly from SSMS or by executing a T-SQL script.

Steps to run the ‘Continue on Error’ option from SSMS are as follows:

Step 1: Run SSMS as an administrator.

Step 2: From ‘Back Up Database’ window, click Options under ‘Select a page’ on the left panel. Then, select the ‘Continue on error’ checkbox under the Reliability section. 

Step 3: Click OK.

Now, run the log backup to check if starts without the backup detecting an error in SQL database.

Ending Note

The above-discussed manual solutions won’t work if the transaction log is missing or damaged, putting the database in suspect mode. In that case, you can try restoring the database from backups or run Emergency-mode repair to recover the db from suspect mode.

However, none of the above solutions might work in case of severe database corruption in SQL Server. Also, implementing the ‘Emergency-mode repair’ method involves data loss risk. But, using a specialized SQL database repair software such as Stellar Repair for MS SQL can help you repair a severely corrupted database and restore it back to its original state in just a few steps. The software helps in repairing both SQL database MDF and NDF files. Once the MDF file is repaired, you can create a transaction log file of the database and back it up without any encountering any error.

Saturday, 20 June 2020

Stairway to Microsoft Azure SQL Database Part 5 : Azure SQL serve Server Firewall and virtual Network Setting

In our last blog, we were discussing about Azure SQL server and we learned that Azure SQL server is a logical construct that acts as a central administrative point for the databases hosted under that server. One of the important aspects of the server is managing the connections. All these connection settings are found in the firewall and virtual network under the Security tab. Let us discuss various connectivity options in this blog.

Public Network Access: By default, Azure SQL server provides a public endpoint to access SQL Server through whitelisted IP addresses. You can access your database from anywhere in the world through the internet as long as your IP address is whitelisted. You can whitelist a single IP address or range of IP addresses using the option available under the page "Firewall and Virtual Network settings". As per the security best practices, you should never allow public endpoint to your database. To disable the public endpoint associated with your server, you can enable the "Deny Public network" option. Azure portal will not allow you to disable the public network option unless you have configured a private endpoint connection. This makes sense as without a private endpoint you will not able to connect the server once you disable the public network access. We will discuss private endpoints and its configuration in our next blog. You can whitelist the IP address in two-level, on the server level, and another one in the database level. When you whitelist the IP address from the portal, it will whitelist at the server level. You can find the whitelisted IP address information using the DMV sys.firewall_rule under the master database. You can also whitelist the IP address using the procedure sp_set_firewall_rule. To white list the IP in the database level, you can use the procedure sp_set_database_firewall_rule. Database firewall rule can be configured for the master database as well as for the user databases. Database level firewall setting makes more sense while using the contained user. The below diagram provide a better understanding of this setup. While establishing the connection, Azure SQL checks database level whitelisting first and if it is not available will check server-level whitelisting.
Azure Firewall Setup
Azure SQL Firewall 

Minimum TLS version: TLS is a cryptographic protocol that provides end to end communication security over the network. The minimal TLS version setting allows you to control the version of TLS used by the server. If you set the Minimum TLS version as '>1.2' and try to connect from the client with a TLS version less than 1.2, your connection will fail with an error "Login failed with invalid TLS version". If your applications are using the unencrypted connection, do not set any minimum TLS version. You can use the 'encrypt_option' column available in DMV sys.dm_exec_connections to check the connection from the applications is encrypted or not. Azure suggest to have   "Encrypt=True;TrustServerCertificate=False;" in the connection string. With this, the end to end communication between the server and client will be encrypted. By setting the TrustServerCertificate to false, it forces the client machine to verify the certificate provided by the server. That helps to avoid any kind of spoofing or man in the middle attack.

Connection Policy: Azure provides three connection policy to set. 
  1. Proxy
  2. Redirect 
  3. Default
These connection policies define the communication channel between the server and the client machine. When you try to connect Azure SQL server, the request first reach to Azure SQL gateway. Azure has a set of gateway IP addresses for each region which keeps listening on the port 1433. The number of gateway in each region depends on the workload in that region. For example, East US2 has 5 gateway IP addresses whereas Australia central region has only one gateway IP address. You can do telnet to these IP addresses through port number 1433. Once it reached the gateway, based on the connection policy setting, request forwarded to the actual SQL server node hosting your database or return the redirection information back to the client. You can do nslookup to your Azure SQL server and it provides you the detail of associated gateway. I did nslookup to one of my Azure SQL servers hosted in the US East2 region. It returned the one of the gateway IP address and its DNS name. Now I can do telnet to this IP address using the port number 1433. 

To establish the connection from your local machine or from the Virtual machine hosted in Azure, it should allow the outbound traffic to all the Azure gateway IP  address of that region through port number 1433. The gateway IP address of each region is published on their website. Let us look into each connection policy in detail.

        Proxy:  With this policy, when the client machine initiates the connection to Azure SQL server, first it will connect to one of the gateways in that region and forward the connection to the actual compute node in which your SQL database/server is hosted. In the below case (diagram), I am trying to connect to my server "". When I did the nslookup to my server name, it resolved to which is one of the gateway IP addresses in the US East 2 region. When the connection is initiated from the client to the server, it will get connected to the gateway and gateway forward the connection to the node in which my database is hosted. In this case, the gateway act as a proxy between the client machine and the server. The client machine does not have any idea to which the connection is forwarded.

Proxy Connection

            Redirect:  With this connection policy, when the client initiates the connection, it will connect to the gateway and gateway return the redirection information to connect the node in which the database is hosted to the client. Using that information, the client directly connects to the node in which the database is hosted bypassing the gateway. Azure claims that this will reduce the latency. To make this work, the client machine :
  • Should allow outbound communication to the SQL database gateway IP addresses on port 1433 (This is for the initial connection to get the redirection information)
  • Should allow outbound communication to all Azure SQL IP addresses (nodes) in the region on a port range of 11000 to 11999 (All the SQL node in azure listen to one of port between 11000 and 11999). This can be easily done using the service tag. It is not practical to add numerous IP addresses of the node manually. 

In the below example, the client machine is initiating the connection to connect the server "". The initial request will go to the gateway. On receiving the request, the gateway returns the redirection information (basically the name of the node in which the database is hosted), and the client uses that information to establish the direct communication with the node in which the database is hosted. There onwards all request in that connection goes directly to the node bypassing the gateway.

Redirection Connection 

    Default: This is the default setting when you create the Azure SQL server unless you explicitly change to Proxy or redirect. With this setting, all the connections initiated from the Azure use the redirect connection policy and use the proxy connection policy for all connections initiated from outside Azure.

Allow Azure Services and resources to access this server: When this setting is enabled, all the azure service and resources in that region can access this server. For example, if we have VM hosted in azure platform, that VM can access the SQL server without whitelisting the IP address of the VM. Keep in mind that, this is not restricting to the resource hosted in your account. Enabling this feature widely opens the communication from all the resources hosted in the Azure platform including the other customer's account also.

Connect  VNET/subnet: This is one of the options where you can attach the subnet to this sever. This allows all resources hosted in that subnet to connect to the Azure SQL server without whitelisting the IP address. Moreover, this enables communication through the Azure backbone networks. Microsoft suggests to use this feature to enable a secure and fast connection. Before attaching the subnet, note that we need to enable the service endpoint for the SQL server at the subnet level.
These are the various option available under the Firewall and Virtual network setting. In our next post, we will discuss enabling private endpoints.

Sunday, 31 May 2020

Stairway to Microsoft Azure SQL Database Part 4 : Azure SQL server , an overview

In the last few posts, we were discussing about database deployment model and purchase model. Out of the three deployment models, managed instance works almost like on-premises SQL server and hence we are not going to discuss much about the managed instance. On the other side, a single database and elastic pool works a completely different ways.

while creating a single database or elastic poor, you need to attach those into the Azure SQL server. In other words, you need to create a SQL server before creating the SQL database. While searching for services in the Azure portal, you can search for 'SQL database' and 'SQL server'. In this post, we are going to discuss about the SQL Server, which is a logical container for your single Azure databases and Elastic Pool.

As per the Azure documentation, "a SQL server is a logical construct that acts as a central administrative point for a collection of databases". At the server level,  you can administrate the following components:
  • Logins
  • Connectivity 
  • Firewall rules 
  • Auditing rules 
  • Threat detection Policy 
  • Auto Failover group
  • Transparent Data Encryption 
  • Auditing 
  • Backup Management 
When you create an empty SQL Server in Azure, it will not cost you anything until you create a single database or elastic pool under this  SQL server. To create an empty SQL server, search for "SQL Server"  in the Azure portal, and follow the instructions. While creating the SQL server, it will ask for a user name and password. This is the only SQL login that you can create from the Azure portal. This login has the highest privilege in the server and can access all databases in that server without providing explicit permission in each database. Azure SQL server does not support window authentication but supports Azure active directory authentication. For enabling the Azure active directory authentication, you need to connect your server with Azure AD. You can achieve this by setting the Active directory Admin from the Azure portal. You can find this option under the Settings section of the server. This can be an individual account or a group. This account also will have the highest privilege and can access all the databases under that server without providing explicit permission.  These are the only two logins that you can create from the Azure portal. There is no option to change the SQL login once you have created it but you can change the Active directory admin at any time. These are the only two logins will have access to all the databases created under this server. There is no option to create server wide login which will have access to all the databases under this server. Still, you can create server wide login and can explicitly create the corresponding user in each database like we used to do in the on-premise installation. 

Azure SQL database provides two types of user :

Login and user: This is the same as we used to do in on-premise installations. Create a login and create corresponding user in each database. For example, connect to master database 

create login njohn with password ='Password@123'
--Connect to DB1
create user njohn from login njohn 
--Connect to DB2 
create user njohn from login njohn 

In this case, the login information is stored at the server level and the password remains same across all databases. While copying the database DB1 and DB2 to another server, you also need to copy the login and need to map the SID properly.

Contained user: These are the users created under the database without having the logins. In this case, the user informations are stored in the database level. After the restoration of the databases, DB1 and DB1 to another server, we don't have to copy the login information separately as the user password and other information are stored in the user database. In this scenario, the same user can have different passwords to connect with each database. For example 

--Connect to DB1
create user njohn with password ='Password1' 
--Connect to DB2 
create user njohn with password ='Password2'

When you create the SQL server, by default a virtual master database will be created. This is the only system database that you can see in the Azure SQL server. If you look into this database, you can see two additional database role added with Azure SQL server :

dbmanager: Members of this database role can create and delete the databases. When a member of the dbmanager role creates the database, becomes the owner of the database, and will have full access to that database as dbo. The members of this role will not have access to the database which is not created by them (owner) unless you explicitly created users under the database for those logins like we do in the on-premise installation. This role exists only in the master database and only the users of the master database can be added into this role. The ideal method is, create a login and create a user for that login and add that user to the dbmanager role. Contained database users in the master database also can be added to the dbmanger role, but they will not have permission to create new databases.
loginmanager:  can create and delete logins in the master database. Note that, members of this role does not have the permission to create users in the master database. 

As mentioned earlier, the Azure SQL server is a logical construct and does not represent a specific VM or hardware like on-premise installation. The databases associated with an Azure SQL server, maybe running in different VMs which is not visible for us. In our on-premise installation, the SQL server is installed on  VM and all databases created in that database server will be running on the same VM. Whereas Azure SQL server is a logical server and can deploy the databases under that server in multiple VMs. The below diagram helps you to understand the concept in a better way.

We have learned in our earlier post, each Azure SQL database will have one or more standby replica for high availability depending on the service tier. The database name marked in red are standby replica. Azure SQL inherently multi-tenant and it needs to share its resource with multiple clients. You can see one extra database, DB5 which is not there in our on-premises diagram. When you connect to Azure SQL server, "", based on the database need to be connected, it will redirect the connection to data node hold the primary replica. This is the reason why we need to specify the database name while connecting through SSMS and not allowing us to change the database context using the USE statement. To change the context of the database, it needs to reroute the traffic to different data nodes. If we are not specifying the database name, it will connect to the virtual master database. If the 'datanode1' goes down for some reason, DB1, and DB2 comes online as the primary in Datanode2 and standby replica will get created in another data node. During this time, the connection to these two databases will be broken and the user will get an error message. To overcome these transient errors and to give a smooth experience to the application users, Microsoft suggests to implement retry mechanism for application connecting to Azure SQL. This is also beneficial to 
  • Handle the errors happens while azure perform the maintenance or patch update.  Azure SQL Database is a managed service and Azure performs the regular patch and version update. User does not have control over the date and time of these maintenance windows.
  • When we scale up the SQL database, there is a possibility to failover the database to another node due to a lack of resources in the current node. In that scenario, the existing connection will be terminated.

In our next post, we will discuss about connection architecture and different settings available under the "firewall and virtual network"

Monday, 25 May 2020

Ghost is taking full backup and breaking my backup chain . Be careful with Azure VM backup on VM having SQL server installation

I believe that I have strong knowledge in SQL Server backup and recovery process, but recently there was an incident which shook my confidence. I was working on a migration for a noncritical production server from SQL server 2008 to 2017. As we need to upgrade the OS also, we decided to go ahead with the side by side migration. The backup plan on the existing server was full backup on a weekly basis. So I decided to restore the previous week's full backup well in advance and during the migration window to take a differential backup and apply that on the new server and I communicated this process will take approximately 15 minutes. As I was pretty confident in this process, I didn't put any effort to do a dry run prior to the migration day and everything set for migration.

During the migration window, as a first step, I fired the command to take differential backup on the existing server. To my surprise, it throws the following error:

Msg 3035, Level 16, State 1, Line 1
Cannot perform a differential backup for database “Mydatabase”, because a current database backup does not exist. Perform a full database backup by reissuing BACKUP DATABASE

I was not able to believe this. Immediately I queried the backup set table and I saw multiple full backup records after the previous week full backup. I ran a set of queries and it revealed that some backup operation is happening without DBA's knowledge to a virtual device at a regular interval. As per the Microsoft documentation, Device_type 7 is a virtual device. 

I don't have time to dig further, I continued the migration by issuing full backup and restoring the same. This delayed the entire migration process by 3-4 hours and it was an embarrassing situation. 

After the migration, I decided to figure out the issue and went ahead and searched the SQL error log and I found the following messages along with the backup message.

  • I/O is frozen on database Mydatabase. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup
  • I/O was resumed on database Mydatabase. No user action is required.
  • Database backed up. Database: Mydatabase, creation date(time): 2012/02/02(09:42:18), pages dumped: 578, first LSN: 528:626:37, last LSN: 528:643:1, number of dump devices: 1, device information: (FILE=1, TYPE=VIRTUAL_DEVICE: {'{4618D52D-9C83-400A-A698-AB391E30369A}26'}).This is an informational message only. No user action is required.
The I/O frozen error was just before the backup and that triggered me to think about disk snapshot backups running behind the scene and contacted the infra team to understand this. On contacting them, they told me that, they have enabled the Azur VM backup on all VM on a daily basis and the Azure VM backup schedule on the server in question was matching with the above error message. With that, we were sure that the azure VM is the culprit behind this ghost SQL server backup. On reading further on azure VM backup, we learned that Azure VM backup uses the VSS (Volume Shadow copy) service to take the backup and SQL server is a VSS aware application and it takes a full backup of  SQL Server databases using SQL writer service. Before taking backup, it will freeze the disk operation to get a consistent backup. 

What is VSS backup? 
Volume Shadow Copy is a technology included in Microsoft Windows that can create backup copies or snapshots of computer files or volumes, even when they are in use. It is implemented as a Windows service called the Volume Shadow Copy service. It allows backup applications to safely back up locked and open files. 

Components of VSS:

  • VSS service: Part of windows operating system and installed as by default. This Orichistate the communication between the backup tool and VSS aware applications.
  • VSS requester: The application who initiate the backup request to the VSS service. This can be Microsoft or non-microsof backup applications such as Windows server backup utility, any third party VM backup tool, etc.
  • VSS writer   This is the component that guarantees a consistent data set to back up. For example, Microsoft provides the VSS writers for SQL server, Exchange server, windows registry, etc.  Many non-Microsoft applications also provide VSS components that need to guarantee data consistency during back up. 
Finally, we understood what was happening behind the scene and decided to stop the backup operation initiated by the Azure VM backup as DBA's are responsible for SQL server backup and recovery. We can not stop the Azure VM backup as the infra team need this for recovering the instance in case of disaster. So we found two solutions.
  • Stop and disable the SQL writer service. With this VSS will not able to take the SQL server database backup but still takes the VM backup.
  • Add below registry key    [                HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\BCDRAGENT] 
     "USEVSSCOPYBACKUP"="TRUE"  This will force the VSS to take COPY_ONLY backup which will not break the backup chain 
I hope this piece of information will help you one day. If you are more curious about VSS service, refer the below links:


Sunday, 17 May 2020

Stairway to Microsoft Azure SQL Database Part 3 : Purchase Models DTU vs vCore continue

In the last post, we have discussed the DTU purchase model. Azure also provides another purchase mode which is called vCore. vCore purchase model available with all three deployment models (Managed instance, Single database, and elastic pool). 

vCore purchase model provides :
  • Higher compute, memory, IO, and storage limit.  
  • Better control over opting for the hardware configuration.
  • Option to bring your own license using the Azure Hybrid Benefit(AHB) and to reserve the instance to reduce the cost.
  • Flexibility to choose the compute and storage independently.
  • Storage and compute charged separately. In the DTU model ,limited amount od storage bundled with the computing power.  Storage allocated above that is charged separately.

Service Tier: Like in the DTU model, the vCore model also provides multiple service tier to match with your performance requirements and budget. vCore purchase model provides the following service tiers.
  • General Purpose 
  • Hyper-Scale
  • Business Critical
General Purpose :
  • Offer Budget-oriented, balanced and scalable compute and storage option.
  • Suited for general workload.
  • Available with all three deployment model (Single, elastic and managed)
  • Option to select 2 to 80 vCore. The options are 2 ,4 ,6 ,8 ,10,12,14,16,18, 20,24,32,40 and 80 vCore
  • Storage upto 4 TB (0.12 USD per GB). 
  • Remote storage 
  • 500 to 20000 IOPS
  • 2 to 10 ms latency 
  • For high availability, it provides one replica.
  • No read replica.
  • 7 to 35 Backup retention 
Hyperscale :
  • Available only for single database deployment model
  • Provide highly scalable storage up to 100TB 
  • For rapid read scale-out deployment, provide up to 4 read replica 
  • Read replicas are charged.
  • Option to select compute power from 2 to 80 vCore (options are the same as in general-purpose) 
  • Use local SSD storage.
  • Snapshot based backup in Azure remote storage. Restore use these snapshots for fast recovery. 
  • Backup/restore does not depend on the size of the database and does not affect the IO or compute as it is performed using the snapshot. Hence backup and restorations are instantaneous.
  • For example. 50 TB database can be restored in less than 15 minutes. You can watch the video here 
  • Depending on the workloads, IOPS between 500 to 204800 with 1 to 10 ms latency.
  • Most suited for databases that will not fit into the 4TB storage limit of general-purpose and business-critical service tier.
  •  Hyperscale database engines use distributed function architecture and use the following components (we will discuss the architecture of hyper-scale in a different blog post). This architecture is different from the existing database engine architecture 
    • Compute 
    • Page Server
    • Log Service 
    • Aure storage 
Business Critical  :
  • Available with all three deployment model (Single, elastic and managed)
  • Option to select 2 to 80 vCore. The options are 2 ,4 ,6 ,8 ,10,12,14,16,18, 20,24,32,40 and 80 vCore
  • Offers business applications the highest resilience to failures by using several isolated replicas.
  • It keeps 4 replicas for better high availability 
  • To offload the read workload from the writable replica, provide read replica without additional charge.
  • Use local SSD storage for better IO performance.
  • Provides up to 4 TB storage with 1 -2 ms latency (0.25 USD per GB)
  • 5000 to  204800 IOPS
  • 7 to 35 days backup retention.

Compute Tier: Under the service tier,  azure SQL server provides the compute tier. For managed instance and elastic pool deployment models, azure provides only provisioned compute tier and this option will not be available to choose. For the single database deployment, there is one more additional compute tier called Serverless. Let us see the difference between these two compute tier:

Provisioned : 
  • compute tier is provisioned through out the lifecycle of elastic poo/ single database/managed instance and billed hourly basis.
  • There is no change in the billing based on the workload on the database.
  • Suitable for regular usage patterns with higher average compute utilization over time.
  • Compute scaling should be done manually (increasing the vCore)
Serverless :
  • Suitable for Intermittent, unpredictable usage with lower average compute utilization over time.
  • Automatically scale up the compute based on the workload and billed per second.
  • Automatically pause the database if there is no activity for a preconfigured time (Auto pause delay)  and start automatically when the wake-up events happen. During this period compute is not charged.
  • While provisioning, we need to define the minimum and maximum vCore. 
  • Based on the workload, the compute tier is scaled up to the max vCore defined. Compute is billed per second.
  • Auto pause delay is a configurable parameter that defines the required inactive period before pausing the database. This can vary from 1 hour to 7 days. For example, if we define the auto-pause delay as 2 hrs, then if there is no activity in the database for 2 hours, then the database will go to pause state and will not be charged for computing until the next wake up event. Till that point, only the storage cost will be charged.
  • The minimum and maximum vCore are configurable parameters that define the range of computing power available for the database. The maximum vCore can vary between 1 to 16 vCore. (1,2,4,6,8.1,10,12,14 and 16 are possible options) . The minimum vCore can vary from 0.5 vCPU to maximum vCore configured. Each vCpu provides 3GB of memory. If the workload requires memory more than that will be charged.
  • Cost
    • Compute cost and storage costs are charged separately. Storage cost remains the same throughout the life cycle unless you scale up the storage.
    • When the compute usage is between the min vCore and max vCore configured, compute cost is based on vCore or memory.
    • When the compute usage is below the min vCore setting, it will be still charged for min vCpu/ Memory.
    • For the duration of the paused state, Computing is not charged. Only storage will be charged.
  • Auto pause and auto-scaling
    • When the database is in a paused state, the first wake up event (example connection request)  fails with the error message "database is unavailable" with error code 40613. It will take approximately one minute to resume the operation. Applications that are working with a serverless database should have mechanisms to handle this error message and retry the connection after a minute.
    • When the load on the database increases, azure automatically scales up the compute / memory up to the max vCore configured. while scaling up the compute, at the end of the process all connections to the database are dropped. The applications which use the serverless database should have a mechanism to handle these error and try to reconnect after a specific time to provide smooth user experience to the end-users. 
  • Billing 
    • Use per second billing.
    • Each vCore in action provides 3GB memory. If the workload uses more memory than the memory bundled with the vCore, billing will be based on the memory. For example, for a specific time interval, the load on the database is using  2 vCore and 10 GB  memory. In this case, billing will be based on the memory as the memory utilization is above 6 GB (2vCore*3 GB). To derive the number vCore to be billed from the memory utilization, memory utilized will be divided by 3 (3GB for each Vcore). In our case, it will be billed for 10GB/3 = 3.33 vCore. The below table gives you a better understanding of the billing. Let us consider one database, that is provisioned with 3 vCore as minimum vCore and 16 as maximum vCore.The auto pause delay is configured as 2 hrs. Each vCore seconds is charged  at 0.000145 
Serverless Billing calculation

The total vCore seconds are 389952 and it will cost 56.54 USD (0.000145 USD * 389952 vCore seconds) for the workload mentioned in the above table which spread across 24hrs.

  • Best suited for :
    • Single database with intermittent and unpredictable usage patterns.
    • The database workload is only during specific intervals. So after the auto-pause delay, the database enters into stop state and not charged for the compute.
    • Workload is not sensitive to the response time and the application is capable fo handling the errors while resuming the service and auto-scaling
Apart from connecting to the database, there are many other events that resume the database from a paused state. You can find those events in the Azure documentation page 

Hardware Configuration Azure provides multiple hardware options to choose from under the service tier. These options differ from region to region. Commonly available hardware options are 

  • Gen4 
    • Provide 7 GB memory for each vCore.
    • Approaching the end of life by Jan 2023
  • Gen5 
    • Has new generation hardware
    • Has network acceleration and provide better IO performance on remote storage  (General purpose service tier)
    • It uses faster local SSD disk than Gen4
  • M-Series (preview)
    • New Memory-optimized hardware 
    • Provide 29GB per vCore.
    • It provides up to 128 vCore.
    • In full capacity memory can be nearly 4 TB (128 vCore X 29 GB per vCore).
  • Fsv2-series (Preview)
    • Compute-optimized hardware.
    • Low CPU latency and high clock speed for the most CPU demanding workloads.
    • Can deliver more CPU performance per vCore than Gen5.
    • The 72 vCore size can also provide more CPU performance for less cost than 80 vCores on Gen5.
    • provides less memory and tempdb size per vCore than other hardware. so workloads sensitive to those limits may want to consider Gen5 or M-series.

The below diagram provides you a visual representation of vCore purchase model options for Single database and Elastic pool.

vCore purchase model

vCore Purchase Model

Managed Instance deployment model also provides General Purpose and Business Critical service tier. Under the service tier, it provides Gen4 and Gen5 hardware. 

Each combination of these options has many other characteristics like tempdb space allocated, log space allocated, IOPS, no of concurrent session, log write throughput, etc.   All those detailed information can be found in the following Azure documentation.

I hope, this post helped you understand the vCore purchase mode in detail.  

Sunday, 3 May 2020

Stairway to Microsoft Azure SQL Database Part 2 : Purchase Models DTU vs vCore

In the last post, we have discussed about different deployment options available with Azure SQL server. I hope you got a good understanding of various deployment models. In this post let us discuss the purchase models available with each deployment model.

Azure SQL database provides multiple purchase models based on the deployment model to accommodate your performance requirements and budgets. At a high level, these purchase models are divided into two:
  • DTU or eDTU purchase model
  • vCore purchase model
Each of these purchase model provides multiple service tier for more customization of your performance requirements and budgets.
  • DTU or eDTU 
    • Basic
    • Standard
    • Premium  
  • vCore 
    • General Purpose
    • Hyperscale
    • Business Critical
DTU and eDTU purchase model 

The database transaction unit (DTU) is a specific composition of compute power, memory, storage, and IO. This purchase model provides us all the resources as a bundle which includes a specific amount of compute power, storage, and IO. It provides the simplicity and full control of your budget. This purchase model available with single database and elastic pool deployment model. As mentioned earlier, Azure provides multiple service tier to match our performance requirements and budget.
  • Basic tier: 
    • Suitable for very low CPU intensive workload. 
    • Provides 5 DTU with 2GB of storage. 
    • Each DTU provides 1-5 IOPS. 
    • 35 days backup retention. 
    • Each DTU costs 1 USD per month. 
    • Suitable for initial design and developments. 
  • Standard tier: 
    • Suitable for low and medium CPU intensive workload. 
    • It provides a DTU range between 10 and 3000 DTU with 250 GB storage. 
    • Additional storage up to 1 TB with additional cost. To go beyond 250 GB, we have to opt for 100 or more DTU. 
    • 35 days backup retention. 
    • Each DTU provides 1-5 IOPS. 
    • Each DTU costs 1.5 USD per month.  
    • The range of DTU in this tier is divided into 9 compute size: S0 10 DTU), S1 20 DTU, S250 DTU,S3 100 DTU, S4 200 DTU, S6 400 DTU, S7 800 DTU, S9 1600 DTU, and S12 3000 DTU. There is no option for us to randomly select the DTU in the range of DTU.
  • Premium Tier: 
    • Suitable for medium and high CPU and IO intensive workload. 
    • Provides DTU range between 125 and 4000 with a storage of 500 maximum to 4 TB depends on DTU.
    • Each DTU unit provides 25 IOPS. That makes it suitable for IO intensive workload. 
    • It also provides better IO latency (2ms for read and write) compared to basic and standard (5 ms for read and 10 ms for write).
    • 35 days backup retention. 
    • Support column store index and in-memory OLTP
    • DTU range divided into multiple compute sizes. p1 125 DTU,p2 250 DTU ,p4 500 DTU, p6 1000 DTU p11 1750 DTU, p15 4000 DTU 
    • Premium tier also supports read scale-out to load balance the read workload with no extra cost. As part of high availability architecture, each database in the premium service tier is automatically provisioned with multiple secondary replicas. These secondary replicas are provisioned with the same compute size of the primary replica. Enabling this feature, allow you to load balance your read-only workload using one of the read replica instead of read-write replica. This is very useful for read-intensive application which can segregate their read and write operations by passing the readintent=true property in the connection string. 
    • Premium tier also supports Zone redundancy. As part of high availability architecture, each database in the premium service tier is automatically provisioned with multiple secondary replicas in the same data center. This architecture helps to protect us from the server failure. By enabling the Zone redundant feature, the replicas are placed in different availability zones, it makes the databases resilient to the data center outage 
For more accurate details of each service tier, refer the following Azure documentation :
While using the single database deployment model, the DTU purchase model comes into the picture. In the same way, for an elastic pool deployment model, the eDTU purchase model is used. eDTU purchase model also provides different service tiers like in the DTU model but with different limits. For the details of each service tier, refer to the Azure documentation. As you know, in the elastic pool deployment model, the resource is allocated at the pool level and each database shares the resource. To avoid one database consuming all the resources, there is a couple of parameter settings in the elastic pool level.
  • Max eDTUs per database: This is the maximum number of eDTU that any database in the pool may use if available based on utilization by other databases in the pool. Max eDTU per database is not a resource guarantee for each database. if the workload on these databases are spread across time, we can over commit the max eDTU per database. For example, we have an elastic pool with 1000 eDTU and 5 databases attached to that. Out of that 2 of them are reporting data and most load comes to those two at night. The other two databases are part of the OLTP system will have varying workload but the peak will be between 10 AM and 5 PM. The fifth database is an internal database most of the load on this database will be between 9 AM and 12 PM. In this scenario, we can safely set max eDTU per database to 300 or 400 even if the total goes beyond the 1000 eDTU. 
  • Min eDTUs per database: This is the guaranteed minimum number of eDTU for each database in the pool.  This can set to 0 and that is the default value. This value multiplied by the number of databases in the pool should not exceed the eDTU allocated to the pool. To set the value of this parameter to 100 of an elastic pool which has 20 databases, we need to allocate a minimum of  2000 eDTU to the elastic pool.
  • Max storage per database: This is the maximum size of any database attached to this elastic pool. None of the databases can grow beyond this limit. Even if we set this value to a higher number which exceeds, (no.of database X  Max storage per database), the total available storage space of the pool storage limit, it will never allow all database storage to grow more than available storage space fo the pool.

Each service tiers in the elastic pool has a limit on the individual database resource utilization. The below table explains those limitations:

In case if you are migrating your workload from your on-premise installation of SQL Server to Azure SQL and you are not sure how much DTU need to be allocated and which service tier to be provisioned, DTU calculator comes handy. You need to provide values of the following performance counters :
  • Processor - % Processor Time
  • Logical Disk - Disk Reads/sec
  • Logical Disk - Disk Writes/sec
  • Database - Log Bytes Flushed/sec
I hope with this you got a good understanding of the DTU and eDTU purchase model.  In the next post, we will cover the vCore purchase model.