Solutions with Practical SQL DBA

Saturday, 20 June 2020

Stairway to Microsoft Azure SQL Database Part 5 : Azure SQL serve Server Firewall and virtual Network Setting

In our last blog, we were discussing about Azure SQL server and we learned that Azure SQL server is a logical construct that acts as a central administrative point for the databases hosted under that server. One of the important aspects of the server is managing the connections. All these connection settings are found in the firewall and virtual network under the Security tab. Let us discuss various connectivity options in this blog.

Public Network Access: By default, Azure SQL server provides a public endpoint to access SQL Server through whitelisted IP addresses. You can access your database from anywhere in the world through the internet as long as your IP address is whitelisted. You can whitelist a single IP address or range of IP addresses using the option available under the page "Firewall and Virtual Network settings". As per the security best practices, you should never allow public endpoint to your database. To disable the public endpoint associated with your server, you can enable the "Deny Public network" option. Azure portal will not allow you to disable the public network option unless you have configured a private endpoint connection. This makes sense as without a private endpoint you will not able to connect the server once you disable the public network access. We will discuss private endpoints and its configuration in our next blog. You can whitelist the IP address in two-level, on the server level, and another one in the database level. When you whitelist the IP address from the portal, it will whitelist at the server level. You can find the whitelisted IP address information using the DMV sys.firewall_rule under the master database. You can also whitelist the IP address using the procedure sp_set_firewall_rule. To white list the IP in the database level, you can use the procedure sp_set_database_firewall_rule. Database firewall rule can be configured for the master database as well as for the user databases. Database level firewall setting makes more sense while using the contained user. The below diagram provide a better understanding of this setup. While establishing the connection, Azure SQL checks database level whitelisting first and if it is not available will check server-level whitelisting.

Azure SQL Firewall

Minimum TLS version: TLS is a cryptographic protocol that provides end to end communication security over the network. The minimal TLS version setting allows you to control the version of TLS used by the server. If you set the Minimum TLS version as '>1.2' and try to connect from the client with a TLS version less than 1.2, your connection will fail with an error "Login failed with invalid TLS version". If your applications are using the unencrypted connection, do not set any minimum TLS version. You can use the 'encrypt_option' column available in DMV sys.dm_exec_connections to check the connection from the applications is encrypted or not. Azure suggest to have "Encrypt=True;TrustServerCertificate=False;" in the connection string. With this, the end to end communication between the server and client will be encrypted. By setting the TrustServerCertificate to false, it forces the client machine to verify the certificate provided by the server. That helps to avoid any kind of spoofing or man in the middle attack.

Connection Policy: Azure provides three connection policy to set.

Proxy
Redirect
Default

These connection policies define the communication channel between the server and the client machine. When you try to connect Azure SQL server, the request first reach to Azure SQL gateway. Azure has a set of gateway IP addresses for each region which keeps listening on the port 1433. The number of gateway in each region depends on the workload in that region. For example, East US2 has 5 gateway IP addresses whereas Australia central region has only one gateway IP address. You can do telnet to these IP addresses through port number 1433. Once it reached the gateway, based on the connection policy setting, request forwarded to the actual SQL server node hosting your database or return the redirection information back to the client. You can do nslookup to your Azure SQL server and it provides you the detail of associated gateway. I did nslookup to one of my Azure SQL servers hosted in the US East2 region. It returned the one of the gateway IP address and its DNS name. Now I can do telnet to this IP address using the port number 1433.

To establish the connection from your local machine or from the Virtual machine hosted in Azure, it should allow the outbound traffic to all the Azure gateway IP address of that region through port number 1433. The gateway IP address of each region is published on their website. Let us look into each connection policy in detail.

Proxy: With this policy, when the client machine initiates the connection to Azure SQL server, first it will connect to one of the gateways in that region and forward the connection to the actual compute node in which your SQL database/server is hosted. In the below case (diagram), I am trying to connect to my server "mydbserver.database.windows.net". When I did the nslookup to my server name, it resolved to 52.167.104.0 which is one of the gateway IP addresses in the US East 2 region. When the connection is initiated from the client to the server, it will get connected to the gateway and gateway forward the connection to the node in which my database is hosted. In this case, the gateway act as a proxy between the client machine and the server. The client machine does not have any idea to which the connection is forwarded.

Proxy Connection

Redirect: With this connection policy, when the client initiates the connection, it will connect to the gateway and gateway return the redirection information to connect the node in which the database is hosted to the client. Using that information, the client directly connects to the node in which the database is hosted bypassing the gateway. Azure claims that this will reduce the latency. To make this work, the client machine :

Should allow outbound communication to the SQL database gateway IP addresses on port 1433 (This is for the initial connection to get the redirection information)
Should allow outbound communication to all Azure SQL IP addresses (nodes) in the region on a port range of 11000 to 11999 (All the SQL node in azure listen to one of port between 11000 and 11999). This can be easily done using the service tag. It is not practical to add numerous IP addresses of the node manually.

In the below example, the client machine is initiating the connection to connect the server "Mydbserver.database.windows.net". The initial request will go to the gateway. On receiving the request, the gateway returns the redirection information (basically the name of the node in which the database is hosted), and the client uses that information to establish the direct communication with the node in which the database is hosted. There onwards all request in that connection goes directly to the node bypassing the gateway.

Redirection Connection

Default: This is the default setting when you create the Azure SQL server unless you explicitly change to Proxy or redirect. With this setting, all the connections initiated from the Azure use the redirect connection policy and use the proxy connection policy for all connections initiated from outside Azure.

Allow Azure Services and resources to access this server: When this setting is enabled, all the azure service and resources in that region can access this server. For example, if we have VM hosted in azure platform, that VM can access the SQL server without whitelisting the IP address of the VM. Keep in mind that, this is not restricting to the resource hosted in your account. Enabling this feature widely opens the communication from all the resources hosted in the Azure platform including the other customer's account also.

Connect VNET/subnet: This is one of the options where you can attach the subnet to this sever. This allows all resources hosted in that subnet to connect to the Azure SQL server without whitelisting the IP address. Moreover, this enables communication through the Azure backbone networks. Microsoft suggests to use this feature to enable a secure and fast connection. Before attaching the subnet, note that we need to enable the service endpoint for the SQL server at the subnet level.

These are the various option available under the Firewall and Virtual network setting. In our next post, we will discuss enabling private endpoints.

Sunday, 31 May 2020

Stairway to Microsoft Azure SQL Database Part 4 : Azure SQL server , an overview

In the last few posts, we were discussing about database deployment model and purchase model. Out of the three deployment models, managed instance works almost like on-premises SQL server and hence we are not going to discuss much about the managed instance. On the other side, a single database and elastic pool works a completely different ways.

while creating a single database or elastic poor, you need to attach those into the Azure SQL server. In other words, you need to create a SQL server before creating the SQL database. While searching for services in the Azure portal, you can search for 'SQL database' and 'SQL server'. In this post, we are going to discuss about the SQL Server, which is a logical container for your single Azure databases and Elastic Pool.

As per the Azure documentation, "a SQL server is a logical construct that acts as a central administrative point for a collection of databases". At the server level, you can administrate the following components:

Logins
Connectivity
Firewall rules
Auditing rules
Threat detection Policy
Auto Failover group
Transparent Data Encryption
Auditing
Backup Management

When you create an empty SQL Server in Azure, it will not cost you anything until you create a single database or elastic pool under this SQL server. To create an empty SQL server, search for "SQL Server" in the Azure portal, and follow the instructions. While creating the SQL server, it will ask for a user name and password. This is the only SQL login that you can create from the Azure portal. This login has the highest privilege in the server and can access all databases in that server without providing explicit permission in each database. Azure SQL server does not support window authentication but supports Azure active directory authentication. For enabling the Azure active directory authentication, you need to connect your server with Azure AD. You can achieve this by setting the Active directory Admin from the Azure portal. You can find this option under the Settings section of the server. This can be an individual account or a group. This account also will have the highest privilege and can access all the databases under that server without providing explicit permission. These are the only two logins that you can create from the Azure portal. There is no option to change the SQL login once you have created it but you can change the Active directory admin at any time. These are the only two logins will have access to all the databases created under this server. There is no option to create server wide login which will have access to all the databases under this server. Still, you can create server wide login and can explicitly create the corresponding user in each database like we used to do in the on-premise installation.

Azure SQL database provides two types of user :

Login and user: This is the same as we used to do in on-premise installations. Create a login and create corresponding user in each database. For example, connect to master database

create login njohn with password ='Password@123'

--Connect to DB1

create user njohn from login njohn

--Connect to DB2

create user njohn from login njohn

In this case, the login information is stored at the server level and the password remains same across all databases. While copying the database DB1 and DB2 to another server, you also need to copy the login and need to map the SID properly.

Contained user: These are the users created under the database without having the logins. In this case, the user informations are stored in the database level. After the restoration of the databases, DB1 and DB1 to another server, we don't have to copy the login information separately as the user password and other information are stored in the user database. In this scenario, the same user can have different passwords to connect with each database. For example

--Connect to DB1

create user njohn with password ='Password1'

--Connect to DB2

create user njohn with password ='Password2'

When you create the SQL server, by default a virtual master database will be created. This is the only system database that you can see in the Azure SQL server. If you look into this database, you can see two additional database role added with Azure SQL server :

dbmanager: Members of this database role can create and delete the databases. When a member of the dbmanager role creates the database, becomes the owner of the database, and will have full access to that database as dbo. The members of this role will not have access to the database which is not created by them (owner) unless you explicitly created users under the database for those logins like we do in the on-premise installation. This role exists only in the master database and only the users of the master database can be added into this role. The ideal method is, create a login and create a user for that login and add that user to the dbmanager role. Contained database users in the master database also can be added to the dbmanger role, but they will not have permission to create new databases.

loginmanager: can create and delete logins in the master database. Note that, members of this role does not have the permission to create users in the master database.

As mentioned earlier, the Azure SQL server is a logical construct and does not represent a specific VM or hardware like on-premise installation. The databases associated with an Azure SQL server, maybe running in different VMs which is not visible for us. In our on-premise installation, the SQL server is installed on VM and all databases created in that database server will be running on the same VM. Whereas Azure SQL server is a logical server and can deploy the databases under that server in multiple VMs. The below diagram helps you to understand the concept in a better way.

We have learned in our earlier post, each Azure SQL database will have one or more standby replica for high availability depending on the service tier. The database name marked in red are standby replica. Azure SQL inherently multi-tenant and it needs to share its resource with multiple clients. You can see one extra database, DB5 which is not there in our on-premises diagram. When you connect to Azure SQL server, "mysqlserver.database.windows.net", based on the database need to be connected, it will redirect the connection to data node hold the primary replica. This is the reason why we need to specify the database name while connecting through SSMS and not allowing us to change the database context using the USE statement. To change the context of the database, it needs to reroute the traffic to different data nodes. If we are not specifying the database name, it will connect to the virtual master database. If the 'datanode1' goes down for some reason, DB1, and DB2 comes online as the primary in Datanode2 and standby replica will get created in another data node. During this time, the connection to these two databases will be broken and the user will get an error message. To overcome these transient errors and to give a smooth experience to the application users, Microsoft suggests to implement retry mechanism for application connecting to Azure SQL. This is also beneficial to

Handle the errors happens while azure perform the maintenance or patch update. Azure SQL Database is a managed service and Azure performs the regular patch and version update. User does not have control over the date and time of these maintenance windows.
When we scale up the SQL database, there is a possibility to failover the database to another node due to a lack of resources in the current node. In that scenario, the existing connection will be terminated.

In our next post, we will discuss about connection architecture and different settings available under the "firewall and virtual network"

Monday, 25 May 2020

Ghost is taking full backup and breaking my backup chain . Be careful with Azure VM backup on VM having SQL server installation

I believe that I have strong knowledge in SQL Server backup and recovery process, but recently there was an incident which shook my confidence. I was working on a migration for a noncritical production server from SQL server 2008 to 2017. As we need to upgrade the OS also, we decided to go ahead with the side by side migration. The backup plan on the existing server was full backup on a weekly basis. So I decided to restore the previous week's full backup well in advance and during the migration window to take a differential backup and apply that on the new server and I communicated this process will take approximately 15 minutes. As I was pretty confident in this process, I didn't put any effort to do a dry run prior to the migration day and everything set for migration.

During the migration window, as a first step, I fired the command to take differential backup on the existing server. To my surprise, it throws the following error:

Msg 3035, Level 16, State 1, Line 1
Cannot perform a differential backup for database “Mydatabase”, because a current database backup does not exist. Perform a full database backup by reissuing BACKUP DATABASE

I was not able to believe this. Immediately I queried the backup set table and I saw multiple full backup records after the previous week full backup. I ran a set of queries and it revealed that some backup operation is happening without DBA's knowledge to a virtual device at a regular interval. As per the Microsoft documentation, Device_type 7 is a virtual device.

I don't have time to dig further, I continued the migration by issuing full backup and restoring the same. This delayed the entire migration process by 3-4 hours and it was an embarrassing situation.

After the migration, I decided to figure out the issue and went ahead and searched the SQL error log and I found the following messages along with the backup message.

I/O is frozen on database Mydatabase. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup
I/O was resumed on database Mydatabase. No user action is required.
Database backed up. Database: Mydatabase, creation date(time): 2012/02/02(09:42:18), pages dumped: 578, first LSN: 528:626:37, last LSN: 528:643:1, number of dump devices: 1, device information: (FILE=1, TYPE=VIRTUAL_DEVICE: {'{4618D52D-9C83-400A-A698-AB391E30369A}26'}).This is an informational message only. No user action is required.

The I/O frozen error was just before the backup and that triggered me to think about disk snapshot backups running behind the scene and contacted the infra team to understand this. On contacting them, they told me that, they have enabled the Azur VM backup on all VM on a daily basis and the Azure VM backup schedule on the server in question was matching with the above error message. With that, we were sure that the azure VM is the culprit behind this ghost SQL server backup. On reading further on azure VM backup, we learned that Azure VM backup uses the VSS (Volume Shadow copy) service to take the backup and SQL server is a VSS aware application and it takes a full backup of SQL Server databases using SQL writer service. Before taking backup, it will freeze the disk operation to get a consistent backup.

What is VSS backup?

Volume Shadow Copy is a technology included in Microsoft Windows that can create backup copies or snapshots of computer files or volumes, even when they are in use. It is implemented as a Windows service called the Volume Shadow Copy service. It allows backup applications to safely back up locked and open files.

Components of VSS:

VSS service: Part of windows operating system and installed as by default. This Orichistate the communication between the backup tool and VSS aware applications.

VSS requester: The application who initiate the backup request to the VSS service. This can be Microsoft or non-microsof backup applications such as Windows server backup utility, any third party VM backup tool, etc.

VSS writer This is the component that guarantees a consistent data set to back up. For example, Microsoft provides the VSS writers for SQL server, Exchange server, windows registry, etc. Many non-Microsoft applications also provide VSS components that need to guarantee data consistency during back up.

Finally, we understood what was happening behind the scene and decided to stop the backup operation initiated by the Azure VM backup as DBA's are responsible for SQL server backup and recovery. We can not stop the Azure VM backup as the infra team need this for recovering the instance in case of disaster. So we found two solutions.

Stop and disable the SQL writer service. With this VSS will not able to take the SQL server database backup but still takes the VM backup.
Add below registry key [ HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\BCDRAGENT]
"USEVSSCOPYBACKUP"="TRUE" This will force the VSS to take COPY_ONLY backup which will not break the backup chain

I hope this piece of information will help you one day. If you are more curious about VSS service, refer the below links:

Sunday, 17 May 2020

Stairway to Microsoft Azure SQL Database Part 3 : Purchase Models DTU vs vCore continue

In the last post, we have discussed the DTU purchase model. Azure also provides another purchase mode which is called vCore. vCore purchase model available with all three deployment models (Managed instance, Single database, and elastic pool).

vCore purchase model provides :

Higher compute, memory, IO, and storage limit.
Better control over opting for the hardware configuration.
Option to bring your own license using the Azure Hybrid Benefit(AHB) and to reserve the instance to reduce the cost.
Flexibility to choose the compute and storage independently.
Storage and compute charged separately. In the DTU model ,limited amount od storage bundled with the computing power. Storage allocated above that is charged separately.

Service Tier: Like in the DTU model, the vCore model also provides multiple service tier to match with your performance requirements and budget. vCore purchase model provides the following service tiers.

General Purpose
Hyper-Scale
Business Critical

General Purpose :

Offer Budget-oriented, balanced and scalable compute and storage option.
Suited for general workload.
Available with all three deployment model (Single, elastic and managed)
Option to select 2 to 80 vCore. The options are 2 ,4 ,6 ,8 ,10,12,14,16,18, 20,24,32,40 and 80 vCore
Storage upto 4 TB (0.12 USD per GB).
Remote storage
500 to 20000 IOPS
2 to 10 ms latency
For high availability, it provides one replica.
No read replica.
7 to 35 Backup retention

Hyperscale :

Available only for single database deployment model
Provide highly scalable storage up to 100TB
For rapid read scale-out deployment, provide up to 4 read replica
Read replicas are charged.
Option to select compute power from 2 to 80 vCore (options are the same as in general-purpose)
Use local SSD storage.
Snapshot based backup in Azure remote storage. Restore use these snapshots for fast recovery.
Backup/restore does not depend on the size of the database and does not affect the IO or compute as it is performed using the snapshot. Hence backup and restorations are instantaneous.
For example. 50 TB database can be restored in less than 15 minutes. You can watch the video here
Depending on the workloads, IOPS between 500 to 204800 with 1 to 10 ms latency.
Most suited for databases that will not fit into the 4TB storage limit of general-purpose and business-critical service tier.
Hyperscale database engines use distributed function architecture and use the following components (we will discuss the architecture of hyper-scale in a different blog post). This architecture is different from the existing database engine architecture

Compute
Page Server
Log Service
Aure storage

Business Critical :

Available with all three deployment model (Single, elastic and managed)
Option to select 2 to 80 vCore. The options are 2 ,4 ,6 ,8 ,10,12,14,16,18, 20,24,32,40 and 80 vCore
Offers business applications the highest resilience to failures by using several isolated replicas.
It keeps 4 replicas for better high availability
To offload the read workload from the writable replica, provide read replica without additional charge.
Use local SSD storage for better IO performance.
Provides up to 4 TB storage with 1 -2 ms latency (0.25 USD per GB)
5000 to 204800 IOPS
7 to 35 days backup retention.

Compute Tier: Under the service tier, azure SQL server provides the compute tier. For managed instance and elastic pool deployment models, azure provides only provisioned compute tier and this option will not be available to choose. For the single database deployment, there is one more additional compute tier called Serverless. Let us see the difference between these two compute tier:

Provisioned :

compute tier is provisioned through out the lifecycle of elastic poo/ single database/managed instance and billed hourly basis.
There is no change in the billing based on the workload on the database.
Suitable for regular usage patterns with higher average compute utilization over time.
Compute scaling should be done manually (increasing the vCore)

Serverless :

Suitable for Intermittent, unpredictable usage with lower average compute utilization over time.
Automatically scale up the compute based on the workload and billed per second.
Automatically pause the database if there is no activity for a preconfigured time (Auto pause delay) and start automatically when the wake-up events happen. During this period compute is not charged.
While provisioning, we need to define the minimum and maximum vCore.
Based on the workload, the compute tier is scaled up to the max vCore defined. Compute is billed per second.
Auto pause delay is a configurable parameter that defines the required inactive period before pausing the database. This can vary from 1 hour to 7 days. For example, if we define the auto-pause delay as 2 hrs, then if there is no activity in the database for 2 hours, then the database will go to pause state and will not be charged for computing until the next wake up event. Till that point, only the storage cost will be charged.
The minimum and maximum vCore are configurable parameters that define the range of computing power available for the database. The maximum vCore can vary between 1 to 16 vCore. (1,2,4,6,8.1,10,12,14 and 16 are possible options) . The minimum vCore can vary from 0.5 vCPU to maximum vCore configured. Each vCpu provides 3GB of memory. If the workload requires memory more than that will be charged.
Cost

Compute cost and storage costs are charged separately. Storage cost remains the same throughout the life cycle unless you scale up the storage.
When the compute usage is between the min vCore and max vCore configured, compute cost is based on vCore or memory.
When the compute usage is below the min vCore setting, it will be still charged for min vCpu/ Memory.
For the duration of the paused state, Computing is not charged. Only storage will be charged.

Auto pause and auto-scaling

When the database is in a paused state, the first wake up event (example connection request) fails with the error message "database is unavailable" with error code 40613. It will take approximately one minute to resume the operation. Applications that are working with a serverless database should have mechanisms to handle this error message and retry the connection after a minute.
When the load on the database increases, azure automatically scales up the compute / memory up to the max vCore configured. while scaling up the compute, at the end of the process all connections to the database are dropped. The applications which use the serverless database should have a mechanism to handle these error and try to reconnect after a specific time to provide smooth user experience to the end-users.

Billing

Use per second billing.
Each vCore in action provides 3GB memory. If the workload uses more memory than the memory bundled with the vCore, billing will be based on the memory. For example, for a specific time interval, the load on the database is using 2 vCore and 10 GB memory. In this case, billing will be based on the memory as the memory utilization is above 6 GB (2vCore*3 GB). To derive the number vCore to be billed from the memory utilization, memory utilized will be divided by 3 (3GB for each Vcore). In our case, it will be billed for 10GB/3 = 3.33 vCore. The below table gives you a better understanding of the billing. Let us consider one database, that is provisioned with 3 vCore as minimum vCore and 16 as maximum vCore.The auto pause delay is configured as 2 hrs. Each vCore seconds is charged at 0.000145

The total vCore seconds are 389952 and it will cost 56.54 USD (0.000145 USD * 389952 vCore seconds) for the workload mentioned in the above table which spread across 24hrs.

Best suited for :

Single database with intermittent and unpredictable usage patterns.
The database workload is only during specific intervals. So after the auto-pause delay, the database enters into stop state and not charged for the compute.
Workload is not sensitive to the response time and the application is capable fo handling the errors while resuming the service and auto-scaling

Apart from connecting to the database, there are many other events that resume the database from a paused state. You can find those events in the Azure documentation page

Hardware Configuration Azure provides multiple hardware options to choose from under the service tier. These options differ from region to region. Commonly available hardware options are

Gen4

Provide 7 GB memory for each vCore.
Approaching the end of life by Jan 2023

Gen5

Has new generation hardware
Has network acceleration and provide better IO performance on remote storage (General purpose service tier)
It uses faster local SSD disk than Gen4

M-Series (preview)

New Memory-optimized hardware
Provide 29GB per vCore.
It provides up to 128 vCore.
In full capacity memory can be nearly 4 TB (128 vCore X 29 GB per vCore).

Fsv2-series (Preview)

Compute-optimized hardware.
Low CPU latency and high clock speed for the most CPU demanding workloads.
Can deliver more CPU performance per vCore than Gen5.
The 72 vCore size can also provide more CPU performance for less cost than 80 vCores on Gen5.
provides less memory and tempdb size per vCore than other hardware. so workloads sensitive to those limits may want to consider Gen5 or M-series.

The below diagram provides you a visual representation of vCore purchase model options for Single database and Elastic pool.

vCore Purchase Model

Managed Instance deployment model also provides General Purpose and Business Critical service tier. Under the service tier, it provides Gen4 and Gen5 hardware.

Each combination of these options has many other characteristics like tempdb space allocated, log space allocated, IOPS, no of concurrent session, log write throughput, etc. All those detailed information can be found in the following Azure documentation.

I hope, this post helped you understand the vCore purchase mode in detail.