Pages

Friday, 6 November 2020

[Solved] SQL Backup Detected Corruption in the Database Log

Summary: In this article, we will discuss about the ‘SQL Backup Detected Corruption in the Database Log’ error. It will also describe the reason behind the error and manual workarounds to resolve it. The article also explains an alternative solution that can be used to restore the database and its transaction log backup – when the manual solutions fail. 

When performing transaction log backup for a SQL database, to restore the database after network maintenance or in the event of a crash, you may find the backup job failed with the following error:

Backup failed for Server xxx (Microsoft.SqlServer.SmoExtended)

System.Data.SqlClient.SqlError: BACKUP detected corruption in the database log. Check the errorlog for more information. (Microsoft.SqlServer.Smo)


The error message clearly indicates that the transaction log is damaged (corrupted). Checking the SQL errorlog for more details on the error shows:

2020-11-01 13:30:40.570 spid62 Backup detected log corruption in database TestDB. Context is Bad Middle Sector. LogFile: 2 ‘D:\Data\TestDB_log.ldf’ VLF SeqNo: x280d VLFBase: x10df10000 LogBlockOffset: x10efa1000 SectorStatus: 2 LogBlock.StartLsn.SeqNo: x280d LogBlock.StartLsn.
2020-11-01 13:30:40.650 Backup Error: 3041, Severity: 16, State: 1.
2020-11-01 13:30:40.650 Backup BACKUP failed to complete the command BACKUP DATABASE TestDB. Check the backup application log for detailed messages

However, the FULL database backup completed successfully and even running DBCC CHECKDB integrity check didn’t find any errors.

What Could Have Caused the SQL Transaction Log Backup to Fail?

A transaction log (T-log) backup allows restoring a database to a certain point-in-time, before the failure occurred. It does so by taking a backup of all the transaction logs created since the last log backup, including the corrupt portion of the T-log. This causes the backup to fail.

However, a FULL database backup only has to back up the beginning of the last active part of the T-log – at the time the backup is taken. Also, DBCC CHECKDB requires the same amount of log as the FULL database backup – at the time of the db snapshot was generated. This is why the full backup executed successfully and no errors were reported by DBCC CHECKDB.

Manual Workarounds to Backup Detected Log Corruption in SQL Database

Following are the manual workarounds you can apply to resolve the SQL backup log corruption issue:

Workaround 1: Change the SQL Recovery Model from FULL to SIMPLE

To fix the ‘SQL Server backup detected corruption in the database log’ issue, try switching the database to the SIMPLE recovery model.

Switching to SIMPLE recovery model will ignore the corrupted portion of the T-log. Subsequently, change the recovery model back to FULL and execute the backups again.

Here’s the steps you need to perform to change the recovery model:

Step 1: Make sure there are no active users by stopping all user activity in the db.

Step 2: Change the db from FULL to a SIMPLE recovery model. To do so, follow these steps:

  • Open SQL Server Management Studio (SSMS) and connect to an instance of the SQL Server database engine.
  • From Object Explorer, expand the server tree by clicking the server name.
  • Next, depending on the db you are using, select a ‘user database’ or choose a ‘system database’ by expanding System Databases.
  • Right-click the selected db, and then select Properties.
  • In the Database Properties dialog box, click Options under ‘Select a page’.

  • Choose the Simple recovery model from the ‘Recovery model’ list box, and then click OK

Step 3: Now set the db back to the FULL recovery model by following the same steps from 1 till 5 above. Then, select Full as your recovery model from the list box.

Step 4: Perform a FULL database backup again.

Step 5: Take log backups again.

Hopefully, performing these steps will help you perform the transaction log backup without any issue.

Note: This solution won’t be feasible if you’re using database mirroring for the database for which you have encountered the ‘backup detected log corruption’ error. That’s because, in order to switch to the SIMPLE recovery model you will need to break the mirror and then reconfigure the db which can take significant amount of time and effort. In this case, try the next workaround.

Workaround 2: Create Transaction Log Backup using Continue on Error Option

To complete executing the backup of T-log without any error, try running log backup of SQL database with the CONTINUE AFTER ERROR option. You can either choose to run the option directly from SSMS or by executing a T-SQL script.

Steps to run the ‘Continue on Error’ option from SSMS are as follows:

Step 1: Run SSMS as an administrator.

Step 2: From ‘Back Up Database’ window, click Options under ‘Select a page’ on the left panel. Then, select the ‘Continue on error’ checkbox under the Reliability section. 


Step 3: Click OK.

Now, run the log backup to check if starts without the backup detecting an error in SQL database.

Ending Note

The above-discussed manual solutions won’t work if the transaction log is missing or damaged, putting the database in suspect mode. In that case, you can try restoring the database from backups or run Emergency-mode repair to recover the db from suspect mode.

However, none of the above solutions might work in case of severe database corruption in SQL Server. Also, implementing the ‘Emergency-mode repair’ method involves data loss risk. But, using a specialized SQL database repair software such as Stellar Repair for MS SQL can help you repair a severely corrupted database and restore it back to its original state in just a few steps. The software helps in repairing both SQL database MDF and NDF files. Once the MDF file is repaired, you can create a transaction log file of the database and back it up without any encountering any error.

Saturday, 20 June 2020

Stairway to Microsoft Azure SQL Database Part 5 : Azure SQL serve Server Firewall and virtual Network Setting

In our last blog, we were discussing about Azure SQL server and we learned that Azure SQL server is a logical construct that acts as a central administrative point for the databases hosted under that server. One of the important aspects of the server is managing the connections. All these connection settings are found in the firewall and virtual network under the Security tab. Let us discuss various connectivity options in this blog.

Public Network Access: By default, Azure SQL server provides a public endpoint to access SQL Server through whitelisted IP addresses. You can access your database from anywhere in the world through the internet as long as your IP address is whitelisted. You can whitelist a single IP address or range of IP addresses using the option available under the page "Firewall and Virtual Network settings". As per the security best practices, you should never allow public endpoint to your database. To disable the public endpoint associated with your server, you can enable the "Deny Public network" option. Azure portal will not allow you to disable the public network option unless you have configured a private endpoint connection. This makes sense as without a private endpoint you will not able to connect the server once you disable the public network access. We will discuss private endpoints and its configuration in our next blog. You can whitelist the IP address in two-level, on the server level, and another one in the database level. When you whitelist the IP address from the portal, it will whitelist at the server level. You can find the whitelisted IP address information using the DMV sys.firewall_rule under the master database. You can also whitelist the IP address using the procedure sp_set_firewall_rule. To white list the IP in the database level, you can use the procedure sp_set_database_firewall_rule. Database firewall rule can be configured for the master database as well as for the user databases. Database level firewall setting makes more sense while using the contained user. The below diagram provide a better understanding of this setup. While establishing the connection, Azure SQL checks database level whitelisting first and if it is not available will check server-level whitelisting.
Azure Firewall Setup
Azure SQL Firewall 

Minimum TLS version: TLS is a cryptographic protocol that provides end to end communication security over the network. The minimal TLS version setting allows you to control the version of TLS used by the server. If you set the Minimum TLS version as '>1.2' and try to connect from the client with a TLS version less than 1.2, your connection will fail with an error "Login failed with invalid TLS version". If your applications are using the unencrypted connection, do not set any minimum TLS version. You can use the 'encrypt_option' column available in DMV sys.dm_exec_connections to check the connection from the applications is encrypted or not. Azure suggest to have   "Encrypt=True;TrustServerCertificate=False;" in the connection string. With this, the end to end communication between the server and client will be encrypted. By setting the TrustServerCertificate to false, it forces the client machine to verify the certificate provided by the server. That helps to avoid any kind of spoofing or man in the middle attack.

Connection Policy: Azure provides three connection policy to set. 
  1. Proxy
  2. Redirect 
  3. Default
These connection policies define the communication channel between the server and the client machine. When you try to connect Azure SQL server, the request first reach to Azure SQL gateway. Azure has a set of gateway IP addresses for each region which keeps listening on the port 1433. The number of gateway in each region depends on the workload in that region. For example, East US2 has 5 gateway IP addresses whereas Australia central region has only one gateway IP address. You can do telnet to these IP addresses through port number 1433. Once it reached the gateway, based on the connection policy setting, request forwarded to the actual SQL server node hosting your database or return the redirection information back to the client. You can do nslookup to your Azure SQL server and it provides you the detail of associated gateway. I did nslookup to one of my Azure SQL servers hosted in the US East2 region. It returned the one of the gateway IP address and its DNS name. Now I can do telnet to this IP address using the port number 1433. 



To establish the connection from your local machine or from the Virtual machine hosted in Azure, it should allow the outbound traffic to all the Azure gateway IP  address of that region through port number 1433. The gateway IP address of each region is published on their website. Let us look into each connection policy in detail.

        Proxy:  With this policy, when the client machine initiates the connection to Azure SQL server, first it will connect to one of the gateways in that region and forward the connection to the actual compute node in which your SQL database/server is hosted. In the below case (diagram), I am trying to connect to my server "mydbserver.database.windows.net". When I did the nslookup to my server name, it resolved to  52.167.104.0 which is one of the gateway IP addresses in the US East 2 region. When the connection is initiated from the client to the server, it will get connected to the gateway and gateway forward the connection to the node in which my database is hosted. In this case, the gateway act as a proxy between the client machine and the server. The client machine does not have any idea to which the connection is forwarded.

Proxy Connection



            Redirect:  With this connection policy, when the client initiates the connection, it will connect to the gateway and gateway return the redirection information to connect the node in which the database is hosted to the client. Using that information, the client directly connects to the node in which the database is hosted bypassing the gateway. Azure claims that this will reduce the latency. To make this work, the client machine :
  • Should allow outbound communication to the SQL database gateway IP addresses on port 1433 (This is for the initial connection to get the redirection information)
  • Should allow outbound communication to all Azure SQL IP addresses (nodes) in the region on a port range of 11000 to 11999 (All the SQL node in azure listen to one of port between 11000 and 11999). This can be easily done using the service tag. It is not practical to add numerous IP addresses of the node manually. 

In the below example, the client machine is initiating the connection to connect the server "Mydbserver.database.windows.net". The initial request will go to the gateway. On receiving the request, the gateway returns the redirection information (basically the name of the node in which the database is hosted), and the client uses that information to establish the direct communication with the node in which the database is hosted. There onwards all request in that connection goes directly to the node bypassing the gateway.

Redirection Connection 


    Default: This is the default setting when you create the Azure SQL server unless you explicitly change to Proxy or redirect. With this setting, all the connections initiated from the Azure use the redirect connection policy and use the proxy connection policy for all connections initiated from outside Azure.

Allow Azure Services and resources to access this server: When this setting is enabled, all the azure service and resources in that region can access this server. For example, if we have VM hosted in azure platform, that VM can access the SQL server without whitelisting the IP address of the VM. Keep in mind that, this is not restricting to the resource hosted in your account. Enabling this feature widely opens the communication from all the resources hosted in the Azure platform including the other customer's account also.

Connect  VNET/subnet: This is one of the options where you can attach the subnet to this sever. This allows all resources hosted in that subnet to connect to the Azure SQL server without whitelisting the IP address. Moreover, this enables communication through the Azure backbone networks. Microsoft suggests to use this feature to enable a secure and fast connection. Before attaching the subnet, note that we need to enable the service endpoint for the SQL server at the subnet level.
 
These are the various option available under the Firewall and Virtual network setting. In our next post, we will discuss enabling private endpoints.


Sunday, 31 May 2020

Stairway to Microsoft Azure SQL Database Part 4 : Azure SQL server , an overview

In the last few posts, we were discussing about database deployment model and purchase model. Out of the three deployment models, managed instance works almost like on-premises SQL server and hence we are not going to discuss much about the managed instance. On the other side, a single database and elastic pool works a completely different ways.

while creating a single database or elastic poor, you need to attach those into the Azure SQL server. In other words, you need to create a SQL server before creating the SQL database. While searching for services in the Azure portal, you can search for 'SQL database' and 'SQL server'. In this post, we are going to discuss about the SQL Server, which is a logical container for your single Azure databases and Elastic Pool.

As per the Azure documentation, "a SQL server is a logical construct that acts as a central administrative point for a collection of databases". At the server level,  you can administrate the following components:
  • Logins
  • Connectivity 
  • Firewall rules 
  • Auditing rules 
  • Threat detection Policy 
  • Auto Failover group
  • Transparent Data Encryption 
  • Auditing 
  • Backup Management 
When you create an empty SQL Server in Azure, it will not cost you anything until you create a single database or elastic pool under this  SQL server. To create an empty SQL server, search for "SQL Server"  in the Azure portal, and follow the instructions. While creating the SQL server, it will ask for a user name and password. This is the only SQL login that you can create from the Azure portal. This login has the highest privilege in the server and can access all databases in that server without providing explicit permission in each database. Azure SQL server does not support window authentication but supports Azure active directory authentication. For enabling the Azure active directory authentication, you need to connect your server with Azure AD. You can achieve this by setting the Active directory Admin from the Azure portal. You can find this option under the Settings section of the server. This can be an individual account or a group. This account also will have the highest privilege and can access all the databases under that server without providing explicit permission.  These are the only two logins that you can create from the Azure portal. There is no option to change the SQL login once you have created it but you can change the Active directory admin at any time. These are the only two logins will have access to all the databases created under this server. There is no option to create server wide login which will have access to all the databases under this server. Still, you can create server wide login and can explicitly create the corresponding user in each database like we used to do in the on-premise installation. 

Azure SQL database provides two types of user :

Login and user: This is the same as we used to do in on-premise installations. Create a login and create corresponding user in each database. For example, connect to master database 

create login njohn with password ='Password@123'
--Connect to DB1
create user njohn from login njohn 
--Connect to DB2 
create user njohn from login njohn 

In this case, the login information is stored at the server level and the password remains same across all databases. While copying the database DB1 and DB2 to another server, you also need to copy the login and need to map the SID properly.

Contained user: These are the users created under the database without having the logins. In this case, the user informations are stored in the database level. After the restoration of the databases, DB1 and DB1 to another server, we don't have to copy the login information separately as the user password and other information are stored in the user database. In this scenario, the same user can have different passwords to connect with each database. For example 


--Connect to DB1
create user njohn with password ='Password1' 
--Connect to DB2 
create user njohn with password ='Password2'


When you create the SQL server, by default a virtual master database will be created. This is the only system database that you can see in the Azure SQL server. If you look into this database, you can see two additional database role added with Azure SQL server :

dbmanager: Members of this database role can create and delete the databases. When a member of the dbmanager role creates the database, becomes the owner of the database, and will have full access to that database as dbo. The members of this role will not have access to the database which is not created by them (owner) unless you explicitly created users under the database for those logins like we do in the on-premise installation. This role exists only in the master database and only the users of the master database can be added into this role. The ideal method is, create a login and create a user for that login and add that user to the dbmanager role. Contained database users in the master database also can be added to the dbmanger role, but they will not have permission to create new databases.
 
loginmanager:  can create and delete logins in the master database. Note that, members of this role does not have the permission to create users in the master database. 

As mentioned earlier, the Azure SQL server is a logical construct and does not represent a specific VM or hardware like on-premise installation. The databases associated with an Azure SQL server, maybe running in different VMs which is not visible for us. In our on-premise installation, the SQL server is installed on  VM and all databases created in that database server will be running on the same VM. Whereas Azure SQL server is a logical server and can deploy the databases under that server in multiple VMs. The below diagram helps you to understand the concept in a better way.

 
We have learned in our earlier post, each Azure SQL database will have one or more standby replica for high availability depending on the service tier. The database name marked in red are standby replica. Azure SQL inherently multi-tenant and it needs to share its resource with multiple clients. You can see one extra database, DB5 which is not there in our on-premises diagram. When you connect to Azure SQL server, "mysqlserver.database.windows.net", based on the database need to be connected, it will redirect the connection to data node hold the primary replica. This is the reason why we need to specify the database name while connecting through SSMS and not allowing us to change the database context using the USE statement. To change the context of the database, it needs to reroute the traffic to different data nodes. If we are not specifying the database name, it will connect to the virtual master database. If the 'datanode1' goes down for some reason, DB1, and DB2 comes online as the primary in Datanode2 and standby replica will get created in another data node. During this time, the connection to these two databases will be broken and the user will get an error message. To overcome these transient errors and to give a smooth experience to the application users, Microsoft suggests to implement retry mechanism for application connecting to Azure SQL. This is also beneficial to 
  • Handle the errors happens while azure perform the maintenance or patch update.  Azure SQL Database is a managed service and Azure performs the regular patch and version update. User does not have control over the date and time of these maintenance windows.
  • When we scale up the SQL database, there is a possibility to failover the database to another node due to a lack of resources in the current node. In that scenario, the existing connection will be terminated.

In our next post, we will discuss about connection architecture and different settings available under the "firewall and virtual network"





Monday, 25 May 2020

Ghost is taking full backup and breaking my backup chain . Be careful with Azure VM backup on VM having SQL server installation

I believe that I have strong knowledge in SQL Server backup and recovery process, but recently there was an incident which shook my confidence. I was working on a migration for a noncritical production server from SQL server 2008 to 2017. As we need to upgrade the OS also, we decided to go ahead with the side by side migration. The backup plan on the existing server was full backup on a weekly basis. So I decided to restore the previous week's full backup well in advance and during the migration window to take a differential backup and apply that on the new server and I communicated this process will take approximately 15 minutes. As I was pretty confident in this process, I didn't put any effort to do a dry run prior to the migration day and everything set for migration.

During the migration window, as a first step, I fired the command to take differential backup on the existing server. To my surprise, it throws the following error:

Msg 3035, Level 16, State 1, Line 1
Cannot perform a differential backup for database “Mydatabase”, because a current database backup does not exist. Perform a full database backup by reissuing BACKUP DATABASE

I was not able to believe this. Immediately I queried the backup set table and I saw multiple full backup records after the previous week full backup. I ran a set of queries and it revealed that some backup operation is happening without DBA's knowledge to a virtual device at a regular interval. As per the Microsoft documentation, Device_type 7 is a virtual device. 













I don't have time to dig further, I continued the migration by issuing full backup and restoring the same. This delayed the entire migration process by 3-4 hours and it was an embarrassing situation. 

After the migration, I decided to figure out the issue and went ahead and searched the SQL error log and I found the following messages along with the backup message.

  • I/O is frozen on database Mydatabase. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup
  • I/O was resumed on database Mydatabase. No user action is required.
  • Database backed up. Database: Mydatabase, creation date(time): 2012/02/02(09:42:18), pages dumped: 578, first LSN: 528:626:37, last LSN: 528:643:1, number of dump devices: 1, device information: (FILE=1, TYPE=VIRTUAL_DEVICE: {'{4618D52D-9C83-400A-A698-AB391E30369A}26'}).This is an informational message only. No user action is required.
The I/O frozen error was just before the backup and that triggered me to think about disk snapshot backups running behind the scene and contacted the infra team to understand this. On contacting them, they told me that, they have enabled the Azur VM backup on all VM on a daily basis and the Azure VM backup schedule on the server in question was matching with the above error message. With that, we were sure that the azure VM is the culprit behind this ghost SQL server backup. On reading further on azure VM backup, we learned that Azure VM backup uses the VSS (Volume Shadow copy) service to take the backup and SQL server is a VSS aware application and it takes a full backup of  SQL Server databases using SQL writer service. Before taking backup, it will freeze the disk operation to get a consistent backup. 

What is VSS backup? 
Volume Shadow Copy is a technology included in Microsoft Windows that can create backup copies or snapshots of computer files or volumes, even when they are in use. It is implemented as a Windows service called the Volume Shadow Copy service. It allows backup applications to safely back up locked and open files. 

Components of VSS:

  • VSS service: Part of windows operating system and installed as by default. This Orichistate the communication between the backup tool and VSS aware applications.
  • VSS requester: The application who initiate the backup request to the VSS service. This can be Microsoft or non-microsof backup applications such as Windows server backup utility, any third party VM backup tool, etc.
  • VSS writer   This is the component that guarantees a consistent data set to back up. For example, Microsoft provides the VSS writers for SQL server, Exchange server, windows registry, etc.  Many non-Microsoft applications also provide VSS components that need to guarantee data consistency during back up. 
Finally, we understood what was happening behind the scene and decided to stop the backup operation initiated by the Azure VM backup as DBA's are responsible for SQL server backup and recovery. We can not stop the Azure VM backup as the infra team need this for recovering the instance in case of disaster. So we found two solutions.
  • Stop and disable the SQL writer service. With this VSS will not able to take the SQL server database backup but still takes the VM backup.
  • Add below registry key    [                HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\BCDRAGENT] 
     "USEVSSCOPYBACKUP"="TRUE"  This will force the VSS to take COPY_ONLY backup which will not break the backup chain 
 
I hope this piece of information will help you one day. If you are more curious about VSS service, refer the below links: