Pages

Wednesday, 15 June 2022

DBCC CHECKDB REPAIR_ALLOW_DATA_LOSS with Example

The T-SQL DBCC, acronym for “Database Console Command”, is a command that performs several types of tasks. These tasks are mainly of the validation and maintenance type.

Some of the DBCC commands, like the ones below, work on an internal read-only database snapshot. This means that the database engine creates a database snapshot and brings it to transactionally consistent state. The DBCC command then performs the checks against this snapshot. When the execution is completed, the snapshot is dropped.

  • DBCC CHECKALLOC
  • DBCC CHECKCATALOG
  • DBCC CHECKDB
  • DBCC CHECKFILEGROUP
  • DBCC CHECKTABLE

The DBCC CHECKALLOC command checks the consistency of disk space allocation structures for a specified database.

The DBCC CHECKCATALOG checks the catalog consistency within the specified database.

The DBCC CHECKFILEGROUP checks the allocation and structural integrity of all tables and indexed views in the specified file group of the current database.

The DBCC CHECKTABLE command checks the integrity of all the pages and structures that make up the table or indexed view.

We have described the function of four commands and no, we have not missed the DBCC CHECKDB command. We will be discussing the DBCC CHECKDB below as this command is the "sum" of these four commands.

DBCC CHECKDB: What does this command do?

DBCC CHECKDB is an important command because it checks both the logical and physical integrity of all the objects in the specified database. This command performs the following: 

·         Executes DBCC CHECKALLOC on the database.

·         Executes DBCC CHECKTABLE on each table and view.

·         Executes DBCC CHECKCATALOG on the database.

·        Validates the contents of every indexed view in the database.

·       Validates link-level consistency between table metadata and file system directories and files when storing varbinary (max) data in the file system using FILESTREAM.

·         Validates the Service Broker data in the database. 

Usage of this command is simple. Just indicate the database name.

When the checks proceed, a log will be produced.

Let us now examine the log report. It shows the name of the database.

The first highlighted part of the log actually refers to the last checks carried out by the CHECKDB command and is used for the "validation of the Service Broker data in the database".

Service Broker is an asynchronous messaging framework with which you can implement scalable, distributed, highly available, reliable, and secure database applications based on SQL Server.

Then logical checks start by doing a primitive check on the data pages of critical system tables.


If any errors are found at this point, they cannot be fixed and CHECKDB terminates immediately.

This error message appears: System table pre-checks: Object ID O_ID. Loop in data chain detected at P_ID. Check statement terminated because of an irreparable error.

Then, logical checks will be performed on all the other tables, systems, and users.


Without entering too many details, logical checks that are performed include:

  • Validate each table’s storage engine metadata
  • Read and check all the data, indexes, and text pages, depending on the page type
  • Check all inter-page relationships
  • Check the page header counts in each page
  •  Perform any necessary repairs (if a repair level was specified)

If the command is executed to check if the database has problems, look at the end of the log. If it is all good, then we will see 0 allocation errors and 0 consistency errors. Otherwise, there is a problem.

 


We can check the log to find out on which object the corruption occurred. We can find tables and indexes, highlighted in red.

This command has options to repair the database, in case of errors.

By specifying one of the options in the DBCC command, we can try to fix the errors.

I suggest using the options REPAIR_FAST, REPAIR_REBUILD, and REPAIR_ALLOW_DATA_LOSS only as the last resort. It is to be noted that the REPAIR_ALLOW_DATA_LOSS can cause data loss.

Let us see how this option works.

DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS

The DBCC command checks both physical and logical integrity of the database. If there are any errors, we can try to fix them.

In particular, the "most aggressive" option is REPAIR_ALLOW_DATA_LOSS which attempts to repair data even at the cost of losing it.

Let us see how this command works when this option is specified.

I have a database “Recovered_corrupted_db_data” with a table “corrupted_usertable”

Often, we notice errors while making a SELECT on a table.

When we execute this query:

We get this error:

In this case, SQL Server is trying to read the table.

The data within this table is spread over multiple pages. When it reads the page 260, SQL Server encounters a logical consistency error.

SQL Server computes a checksum for each page when writes the data and verifies the correctness of this calculate value when it reads the data.

Note: We have a checksum value for each page and not for each row of data.

Since the database returned an error, we can execute the DBCC CHECKDB command.

The CHECKDB command has also detected that there is a problem on the page 260.

So, it is confirmed that we have a problem.

As said previously, use the REPAIR_ALLOW_DATA_LOSS command as the last option.

Note: If you have a backup, use it. You can also use a specialized software that can recover data from a corrupted .mdf file. For example, I use the easy-to-use Stellar Repair for MS SQL software.

If, however, we have no alternatives and the other options, like REPAIR_FAST and REPAIR_REBUILD, do not work, we can try this option.

Remember that we must switch to the single user mode before executing this command.

After executing the command, the log shows that all the errors have been repaired.

Now, if we run the CHECKDB command again, we can see that there are no errors.


Great! But what happened?

The table initially contained 100 rows with IDs starting from 1 to 100.

Now, we can see (in the image below) that rows with ID from 49 to 81 have been lost.

But this is not the only problem.

We also have a row with a completely wrong value for the column ID.


This means that the data is no longer reliable.

It is easy to understand that the DBCC CHECKDB with the option REPAIR_ALLOW_DATA_LOSS has omitted entirely the corrupt page (as we said before there is no table row level checksum).

Physically, the data is copied into new data pages, reconstructing a new link between the pages of the table. 


To Conclude

In this article, we discussed the DBCC CHECKDB command that is able to check the database. It checks both logical and physical integrity of the database. This command is also able to repair the database. We have also discussed how the DBCC CHECKDB command with the REPAIR_ALLOW _DATA_LOSS option works.

Monday, 7 February 2022

SQL Server – Backing up the Tail of the Log

When a database gets corrupted or failure occurs, before restoring the database from the backups you've created, it is recommended to create a tail-log backup of the records that haven't been backed up. This helps restore the database to the exact point at which it failed, preventing data loss.

Read on to learn about the other reasons when you need to back up the tail of the transaction log. Also, understand how to take a tail-log backup and restore it to get back the data you fear losing in the event of a crisis.

Why and When Should You Back Up The Tail of the Log?

Tail-log backup helps capture the tail of the log records when the database is offline, damaged, or data files are missing.

Reasons Why You Need To Back Up the Tail of the Transaction Log

  • Database is corrupted, or the data file is corrupted or deleted.
  • Database goes offline and doesn't start; you may want to recover the database as quickly as possible. But, before you begin recovery, first take the tail-log backup.
  • The database is online, and you plan on restoring the database, start by backing up the tail of the log.
  • Migrating database from one server to the other server.

Example Demonstrating the Need to Take Tail-log Backup

Let's say, you run DBCC CHECKDB to check for corruption in the database. It returns consistency errors and you decide to restore the previously taken backups, such as the Full backup. Then, you restore the Differential and all the transaction log backups. But you don't want to lose the log records that haven't been captured in the transaction log backup. So, to avoid losing those log records (i.e., the tail of the backup) and maintain the log chain intact, you will need to take Tail Log Backup.

Let’s consider a scenario.

Assume taking a Full database backup and Log backups after every one hour.

Time

Event

8:00 AM

Create a Full Database Backup

9:00 AM

Take Transaction Log Backup

10:00 AM

Take Transaction Log Backup

11:00 AM

Take Transaction Log Backup

11:30 AM

Failure occurs


You can restore the database starting from Full backup (taken at 8 AM), then restore all the three transaction log backups (taken at 9 AM, 10 AM, 11 AM). But, there are no backups from 11:00 AM till 11:30 AM, resulting in data loss.

So, how to recover the database without data loss between 11-11:30 AM?

Take t-log backup by executing the BACKUP LOG command ‘WITH NO_TRUNCATE’ option. It will create a t-log backup file. Restore the file after the last transaction log backup (11 AM) WITH NORECOVERY to recover the lost data.

USE MASTER

GO

BACKUP LOG [Database] TO DISK = ‘C:\ProgramFiles\MSSQLServer\Data\Tail_Log1.LOG’ WITH NO_TRUNCATE;

How to Back up and Restore Tail of the Log?

Before we discuss the process to back up the tail of the transaction log and restore it, it’s important to know the clauses you need for creating a t-log backup.

  • NORECOVERY: Using this clause leave the database into the restoring state. This assures that the database won’t change after the t-log backup.
  • NO_TRUNCATE: Use this clause only when the database is damaged.

  • CONTINUE_AFTER_ERROR: If a database is damaged and you cannot take t-log backup, back up the tail of the log using CONTINUE_AFTER_ERROR.

Demo

  • Create a new database

CREATE DATABASE Tail_LogDB;

GO

USE Tail_LogDB;

GO


  • Create a new table and insert some data into it.

CREATE TABLE Employee (

EmployeeID int IDENTITY(1000,1) PRIMARY KEY NOT NULL,

EmployeeAge int

);

GO

This T-SQL query will create a table named Employee with columns ‘EmployeeID’ and ‘EmployeeAge’. 


  • Create a stored procedure to add more records to the table.

CREATE PROCEDURE InsertEmployee

AS

DECLARE @i int = 100

WHILE @i > 0

BEGIN

INSERT Employee (EmployeeAge) VALUES (@i)

Set @1 -=1

END

GO

EXECUTE InsertEmployee;

GO

SELECT * FROM Employee;

GO

Executing this T-SQL query will create an ‘InsertEmployee’ stored procedure that runs through a loop to add 100 more records into the Employee table. Then, select the Employee table to verify that everything works.


  • Create a full backup of the Tail_LogDB

BACKUP DATABASE Tail_LogDB

TO DISK = 'C:\TempDB\ Tail_LogDB_FULL.bak'

This command will create a full database backup with the 100 records we added in the table in Step 3. And, the backup gets saved in a folder we have created, ‘TempDB’.

  • Insert some more records into the table

EXECUTE InsertEmployee;

GO

SELECT * FROM Employee;

GO

After executing this T-SQL query, we will have 200 records in the database table. 



  • Simulate a database failure

If you're keeping your data and log files on different physical drives, then it's entirely possible that drive failure takes out the data file and leaves you only with the transaction log. We can simulate this simply by deleting the mdf file from the hard drive. Here's how:
  • Right-click on Tail_LogDB > Tasks > Take Offline. 


  • Select the ‘Drop all active connections’ checkbox and press OK. 


  • Now refresh the database, and you can see that the db is now OFFLINE. 

  • Next, go to the location where the data file is stored (i.e., TempDB folder), and you can see the db that we just saved. 


  • Now go to the location where the .mdf file and .ldf files for the Tail_LogDB database are saved. Delete the .mdf file.  


Now let's head back to SSMS and understand how we can recover from this disaster.

Bring Database Back Online

·         Right-click on Tail_LogDB > Tasks > Bring Online.


·         A dialog box with errors, click Close.


         Refresh the database.

As you can see, the database status has changed to Recovery Pending. Before attempting the restore operation, ensure to back up the tail of the log to capture the second instance of the 100 records we added into the database.

Now, let’s take the tail of the log. 

Switch into the master database and execute the BACKUP LOG statement with the CONTINUE_AFTER_ERROR option. This option will ensure to perform tail log backup even if any error occurs.

USE master;

GO

BACKUP LOG Tail_LogDB

TO DISK = 'C:\TempDB\Tail_LogDB.log'

WITH CONTINUE_AFTER_ERROR;

GO

Restore the t-log backup

Let's initiate the restore process by restoring the full database backup 'WITH NORECOVERY' option. Using this option specifies that the restore procedure would not attempt to undo or roll back any uncommitted transactions. This is important because if a modification to the data had begun but not finished when the failure occurred, there would be a record in the transaction log. Typically, SQL Server will attempt to roll back any of these partially completed changes during a restore, and we don't want this to happen.

USE master

RESTORE DATABASE Tail_LogDB

FROM DISK = 'C:\TempDB\Tail_LogDB_FULL.bak'

WITH NORECOVERY;

GO

This restores the backup of the first 100 records. 


To complete restoring the entire record set, let's restore the log file as well.

RESTORE LOG Tail_LogDB

FROM DISK = 'C:\TempDB\Tail_LogDB.log';

GO


·         Verify the results

USE Tail_LogDB

SELECT * FROM Employee;

GO


So, as you can see, all the 200 records are now restored.

Conclusion: Key Take-Away Points

  • A tail-log backup is useful to avoid losing data when a database is damaged or corrupted. However, you may fail to back up the tail of a damaged database log. So, when executing the BACKUP LOG statement, use WITH CONTINUE_AFTER_ERROR option to take t-log backup.
  • You must also take a tail-log backup before restoring a database in an ONLINE state. If the database is in OFFLINE state and doesn't start, back up the tail of the transaction log WITH NORECOVERY before performing the restore procedure.
  • It is also recommended to take t-log backup when migrating a large database from one source to another.
  • But remember, you can take tail-log backups only if the transaction log file is accessible. Meaning, you cannot perform t-log backup on a database with a corrupted and inaccessible log file. 

Wednesday, 24 November 2021

3-2-1 Backup Rule for Data Protection in SQL – What Is It & Is It Relevant?

In the event of a disaster leading to database unavailability, restoring backups is the first step to ensure business continuity. However, situations may arise when attempts to restore the database fails. For instance, the tape drive used for storing backups may get damaged rendering the backup data corrupt and unusable. In fact, tape drive media failure is the most common cause why restore fails. To prevent this and ensure database can be restored with uncorrupted data, SQL users implement the 3-2-1 backup rule for data protection. This article discusses this data protection rule and if it is still relevant.

The 3-2-1 Backup Rule for Data Protection – What It Is?

The 3-2-1 rule of data protection ensures that database can be restored with uncorrupted data. The idea behind the rule is to have:

3 copies of backup: You must have at least three copies of your data. One is your production data plus two backup copies. The more copies you have, the less risk you have of losing data.

2 copies on different media: Ensure to store two copies of database backup on different media types. This is important because a backup media can fail. When you split your backup into different media and a device fails, you’ll have another to fall back on.

1 copy on offsite: Store one of the two backups offsite. Doing so ensures that if anything happens to one backup copy, it won’t (hopefully) affect the other copy.



Is the 3-2-1 Backup Rule of Data Protection Relevant for SQL Users?

The 3-2-1 backup is a good starting point for devising any disaster recovery plan, particularly for SQL users who aren’t backing up at all. But, the backup rule has certain shortcomings.

Data Can Be Compromised

Maintaining three copies of data is fine, as more copies ensure recovery is possible in case of any disaster. But keeping two copies on different media types has limitations. Having two copies stored in two storage media or devices means quicker access to the backup (if the primary fails), however, this might not always be the case.

What happens if a ransomware infects your secondary storage while the primary is already down? You may lose all the data unless you pay a ransom. And, with several organizations replacing tape backup with cloud storage, an ever-increasing number of databases becoming vulnerable to ransomware attacks.

According a report by Imperva, “46% of all on-premises databases are vulnerable to attack.” Imperva predicted that data breaches will continue to grow as nearly one out of two on-premises databases is vulnerable to attacks. And so, you need more comprehensive and stronger data protection strategies than ever before.

Faulty Interpretation of the 3-2-1 Backup Rule

Backing up on tape drives is more expensive than backing up data on the cloud. And, as the demand for storage space grows so does the need for storage cost. Though, tape is still used – but due to slow recovery time and high cost involved – users are moving data to offsite locations, such as the cloud. That’s where the problem starts.

As cloud-based services not necessarily store backups at the same storage facility, point “2” and “1” in the backup rule are ignored. In other words, moving offsite data to cloud can fulfil the purpose of point “2” – it can be used to store a backup copy that is incorruptible and used for recovery if the first copy is affected. But this way, you’ll have only a single copy which doesn’t offer the protection you need from ransomware or other cyber threats.

Air Gap Protection is Lost

Though tape-based storage can slow down your recovery due to bandwidth constraints, it provides air gap to prevent ransomware from affecting your backup copies. However, air gap protection is missing in the 3-2-1 rule.

Air gap is basically a way of protecting a backup copy by storing it on a network that is physically separate from the primary data.

It was easy to provide an air gap when using tape backups. You can place tape backups in a box and transport them to off-site locations, creating an air gap between your backup and primary data copy. This makes it harder for hackers to attack a database, as they cannot attack both primary and backup storage devices.

How to Overcome 3-2-1 Backup Rule Shortcomings?

Backup strategies like 3-2-1-1-0 or 4-3-2 offers additional protection against ransomware attacks. Let’s discuss in brief about these two strategies:

3-2-1-1-0 Backup Rule

Like the ‘3-2-1’ backup rule, the 3-2-1-1-0 data protection strategy also requires maintaining at least three copies of data, storing data on at least two different storage media, and storing one backup copy offsite. Plus, it requires two additional steps:

  • Keeping one tape backup copy offline or air gapped, as it requires storing tape backups off-site. Or you may store cloud backups with immutability, thereby preventing data from getting modified or changed.
  • Monitors data to help identify and correct any errors in the backups.

Essentially, the 3-2-1-1-0 backup rule ensures that you've an error-free offline backup copy you can use to recover data in case of system failure or cloud failure.

4-3-2 Backup Rule

Developed by IT security partners, Continuity Centers, the 4-3-2 rule states that four copies of data are stored in three different locations. Out of the three locations, two are offsite and the third copy is stored in the cloud. And, the fourth backup goes to another cloud storage.

The 4-3-2 backup strategy ensures that duplicate copies of backups are created and stored at geographically distant locations to avoid data loss in the events of natural disasters.

Concluding Thoughts

Preventing data loss, in the event of a disaster, is crucial for business continuity. And so, you must be regularly taking backups and reviewing them to ensure their effectiveness in restoring the SQL databases. The 3-2-1 backup rule is a good starting point for data security, but you need more extensive backup strategy to protect your data against the growing number of digital threats. Upgrading the 3-2-1 rule to a 3-2-1-1-0 or 4-3-2 backup strategy provides an additional layer of security to help you recover ransomware affected databases.

Thursday, 23 September 2021

How to Design Backups in SQL Server?

The Recovery Point Objective (RPO) and Recovery Time Objective (RTO) are basic concepts related to the information to be recovered and the time that it will take to recover. In a Recovery Design plan, it is essential to define the time required to restore the data and how much data we can lose in the recovery.

The RPO defines the objective related to the point of recovery. In some cases, we can lose one hour of data, and sometimes we can lose 1 entire day of data. Therefore, defining the RPO at the beginning is critical to design the plan. The same applies to the RTO.

In this article, we will talk about both RPO and RTO, and also discuss how to design the right backup and restore strategy.

Recovery Point Objective (RPO)

Recovery Point Objective (RPO) is a proportion of how often it is necessary to take backups. On the off chance that a disaster happens between backups, would you be able to stand to lose ten minutes of information refreshes? Or then again ten hours? Or then again, an entire full day? RPO addresses how new recuperated information will be. The RPO shows the measure of information (refreshed or made) that will be lost or should be restored after a blackout.

Recovery Time Objective (RTO)

Recovery Time Objective (RTO) isn't just the length of time between the disaster and recovery. The goal additionally represents the means IT team should take to reestablish the application and its information. In the event that IT team has put resources into failover administrations for high-need applications, they can securely communicate RTO in short order.

How to Design the Right Backup and Restore Strategy to Meet Business Goals?

To reduce the RTO, you need to consider combining the transaction log backups and differential backups in your strategy. If you only make full backups, the time to restore the data may be too high. Using differential backups can help reduce the recovery time.

Here are some tips to consider when doing a backup:

  • Make sure to store the backup on a different server. This will prevent you from losing data in case the server is damaged.
  • Use fast hard disks. The backup and restore will be faster if you use hardware with good performance. This will help reach the RTO.
  • Use differential backups to reduce the number of transactional files used to restore. The differential backups allow to restore the data faster and to get the RTO.
  • Estimate how often the data changes. If your data does not change too much during the day, you can make one single differential backup each day and a weekly full backup. On the other hand, if your database has several transactions per minute, you will need to combine full, differential, and transactional backups to recover all the data in the correct RPO.
  • Always test your backup and make sure that RPO and RTO are accomplished on your tests. You will test that the integrity of the data is fine. It is recommended to have a production environment and a testing environment to verify the backups.
  • Make sure to automate the backup and restore strategy by using jobs. The backups should run in a schedule that depends on how the RTO and RPO are designed.
  • If your database is big, using the MAXTRANSFERSIZE and the BUFFERCOUNT may help. These arguments will allow you to increase the size and the buffer. This way you will be able to restore the database faster.
  • Use database and backup compression to increase the speed of the backup.

 Use Stellar Repair for MS SQL for Faster Recovery

There is also another way to recover information in a fast and secure way. Stellar Repair for MS SQL is a well-known third-party software that helps recover data from a corrupt or damaged database. It can also repair damaged databases. If you want to know more about this software, please visit the website here.