RAID 1 is probably the most robust RAID type, but even it can fail. In this article we will look at the main causes of RAID 1 failure and how to recover from it in case of an unexpected RAID 1 malfunction.
- How RAID 1 works?
- The main drawbacks of RAID 1
- Causes of RAID 1 failure
- What if RAID 1 can’t activate the Spare Disk?
- How to replace a failed RAID 1 disk in Linux?
- How to replace a failed RAID 1 disk in Windows?
- How to recover data if a RAID 1 array fails?
Many users who are concerned about preserving important data choose RAID 1 as their primary storage. For example, financiers like to use this RAID type to store financial reports and small databases that they use in their daily work and that are important. It is due to the high reliability of RAID 1. At the same time, despite its perceived reliability, one should not neglect to backup important data, because even RAID 1 cannot guarantee data safety. To understand why this is so, let us look at the structure of RAID 1, its advantages and disadvantages as well as the possible risks of losing important data.
How RAID 1 works
RAID 1 is a disk array in which each drive is an exact copy of the previous one. That is why it is also called “mirroring“. That is, it is not a backup of data, but a redundancy of volumes on the disks. When important information is written, the controller simultaneously writes it to several disks (not just one as usual). Hence there is a strong decrease in the data recording speed. For example, if it takes 5 minutes to write a 10GB archive to a regular hard drive, then writing the same file to a three-drive RAID 1 array will take 15 minutes respectively (because the system will write 30GB of data (three times 10GB). At the same time, the reading will be three times faster, as the information is read from three disks simultaneously (as in the case of RAID 0). The way the information is written is shown in the illustration below.
Therefore, we can conclude that RAID 1 is suitable for users who store valuable information and care about its safety in the first place, despite the low data write speed. That is why if the speed of data transfer is important for you – you should have a look at other RAID levels. You can read about all advantages and disadvantages of different RAID configurations in the article “RAID types — advantages, disadvantages and possible problems“
The main drawbacks of RAID 1
Despite the high level of data integrity RAID 1 has some disadvantages. First of all, it is worth saying that if you want to increase the level of reliability in a RAID 1 array – you can use more than two disks. The higher the number of drives – the higher the reliability. However, this is also where the first disadvantage of RAID 1 – the price per gigabyte of storage. The thing is that no matter how many disks you add to the array – you will have the storage capacity of the smallest of the drives (by the way, it is worth noting that it is recommended to use drives with the same characteristics). All other money will be spent on data security, because the remaining drives will store copies of the information, and therefore they will not be available to the user.
The other drawback is (as mentioned above) low write speed of the information. That is, the more drives you use, the slower the write speed will be. In addition, the maximum write speed will directly depend on the slowest drive, because until one block of information is written to all drives – the recording of the next block will not start. And this is one more reason why it is highly recommended to use identical drives.
Many software controllers do not support “hot-swapping” of a failed drive. Accordingly, to replace a damaged drive you will have to disconnect the power. It makes it very inconvenient to use RAID 1 in servers that are used by a large number of people, as turning off the power will lead to inaccessibility of the data. The best way to do this is to use hardware controllers that can provide hot-swap support for disks. However, hardware controllers are usually more expensive than software controllers, which will also affect the overall cost of a RAID 1 array. This level of RAID is great for home servers with important information, for which two disks are sufficient. In this case, the price of the array will not be too high, plus you can use a software controller, which will reduce the total price of the array.
Causes of RAID 1 failure
There are not too many reasons that can cause RAID 1 to fail, but they do exist. The first and one of the most significant are power fluctuations and sudden power failure. Power failures often break the controller, which is responsible for the data distribution. To restore the array functionality it is necessary to use a controller of the same firm or operating system, because they are not interchangeable, and to restore the data by simply connecting the disk as a single device. In order to restore the array functionality, it is necessary to use a controller of the same firm or operating system, because they are not interchangeable, so you will not be able to restore the data by simply connecting the disk as a single drive. In addition, there is no guarantee that after replacing the controller (even if it is the same model) the data will be available again. The thing is that the new controller will not “know” exactly where the initial block of information is located on the disk and it will not be able to build a RAID array correctly. In this situation, it is better to remove the data from the drives, recreate the array, and copy the information from the restored copy back. Read about how to recover data from RAID 1 in the last paragraph of this article.
Sometimes there are cases when power failures take out both drives at once. In such a situation the process of data recovery becomes very complicated since you must first fix the physical problem of the drive by replacing the broken parts and only then proceed to data recovery.
One more reason for data loss (and perhaps the most common one) is the human factor. Often system administrators are careless about their job, and users lose important data as a result of accidental deletion or formatting of an entire array. In this situation, there is no way to recover data using standard methods. You will have to use third-party data recovery software.
What if RAID 1 can’t activate the Spare Disk?
The Linux operating system supports the addition of so-called Spare disks. A spare disk is used in the array in case of failure of one of the drives, and it is automatically activated. All the data is copied to it. The user will not be aware of this process. The only thing he will get is a message that the spare drive has been used and that the failed drive can be ejected. As you can see this is a very nice feature to increase data security. In Linux, the “mdadm” tool is responsible for the build and correct functioning of RAID arrays. However, sometimes this tool may not work properly and RAID 1 sometimes fails to activate the spare disk. This can happen for the following reasons:
- Read errors during synchronization – solved by reconnecting the drive or replacing it;
- Bad sectors – if the spare disk is used with too many bad sectors then mdadm will not add it to the array since it is highly probable that it will fail soon and there is no sense in copying information to such a disk. The user will only see a message that one of the disks in the array has failed and the spare disk has not been used;
- Damaged connection cable or an incorrectly connected spare drive – sometimes mdadm cannot activate a spare drive because the connection cable is not fully sealed or is damaged. Consequently, the utility will just not “find” the drive you need and will not be able to activate it;
- Damaged drive – sometimes the user may not even realize that his drive is broken or not working properly. For example, sometimes the controller of a hard drive or an SSD may fail. In this situation, the utility will also be unable to activate the replacement drive if needed;
Whatever the reason for the array controller not being able to activate the spare disk – it’s important to fix it right away, or better yet, prevent it. To do this, use only serviceable hardware that you are 100% sure of. Otherwise, you could lose important data.
How to replace a failed RAID 1 disk in Linux?
This part of the article is very important to read for novice users since often they do not know how to properly replace a broken disk and either run the array initialization process or rebuild the array from the beginning. Each of these actions invariably leads to data loss.
First of all, you should familiarize yourself with the disk replacement process for your RAID level, as the replacement procedure will differ depending on the controller type and RAID level. For example, check whether your controller supports hot-swapping the disk, as this will determine will you power down the array or not.
So, the procedure to replace a broken disk in RAID 1 is as follows:
Step 1: Make a backup of all important data, as users often lose information during the broken disk replacement. If your RAID 1 array is working, you can just copy the necessary files to a different location. If your RAID 1 array fails and won’t start, you can recover your data with professional data recovery software — RS RAID Retrieve. The information recovery process is described in detail in the last paragraph of this article.
Step 2: If you use a software controller – mark drive as broken, then remove it from the array. To do this, perform the following commands one by one in the terminal:
# mdadm /dev/md0 -f /dev/sda2
# mdadm /dev/md0 –remove /dev/sda2
Note that sdb2 is the drive connected to the second SATA port. Just in case, let us remind you that in the Linux operating system disks are identified as follows:
- sd — are the letters that represent the SATA connection type;
- a is the drive number. For example, a — is drive number one, b — is drive number two, c — is drive number three, etc.
- 2 — is the partition number on the disk;
Thus, sda2 is the second partition on the first SATA disk.
Step 2: It is advisable to disconnect power even if your controller supports the hot-swap function. It will allow you to work safely with the array and eliminate the possibility of short circuits, etc. The only option when you can keep the power on is if you are replacing a drive on a server that is used by a large number of people. And even in this situation, it is better to either move the data to a different server (if you have one) or create a service alert.
Step 3: Plug the new drive into the array and power it up. Then copy the partition table to the new drive using the “sfdisk” utility. To do this, run the command:
# sfdisk -d /dev/sda | sfdisk /dev/sdb
where /dev/sda is the source and /dev/sdb is the new disk where the partition table is copied to.
If your system doesn’t have sfdisk, you can install it by running the following command in the terminal:
apt install sgdisk/sfdisk
Step 4: Now you need to tell the mdadm tool to include the new drive into the array so that the controller copies the information on it and works with it as an array part. If there is more than one partition, it should be done for each partition:
# mdadm /dev/md125 -a /dev/sdb1
# mdadm /dev/md125 -a /dev/sdb2
# mdadm /dev/md125 -a /dev/sdb3
After this, the rebuilding process of your array will begin. Under no circumstances should you turn off the power until it is finished. After the new disk is added, you will be able to use your RAID 1 array again as before.
How to replace a failed RAID 1 disk in Windows?
In the Windows operating system, if an array disk fails, it gets the status “Failed Redundancy” in Disk Manager.
The algorithm for disk replacement is as follows:
Step 1: Back up all important files so that you don’t lose information in case something goes wrong.
Step 2: Turn off the power, then replace the damaged drive with a new one. After that, turn on the power to the computer again. Right-click on “Start” and select “Disk Management“
Step 3: In the Disk Management window that appears, you will be prompted to use the new disk. Click “OK“. Then right-click on it and select “Convert to Dynamic Disk“.
Step 4: You will see a Disk Conversion window. Сheck the checkbox of your disk and click “OK“.
Step 5: Right-click on your array and select “Remove Mirror“. The newwindow will open in front of you. Select the missing disk (the one we previously removed), right-click on it, and select “Eject Disk“
Step 6: Right-click on the disk that remains in the array and choose “Add Mirror“. In the window that appears select a new disk and click “Add Mirror“
After that a window will appear warning you that the selected disks will be converted to dynamic. Click “OK“, then your new disk will be successfully added to RAID 1.
How to recover data if a RAID 1 array fails?
Although a RAID 1 array is reliable, users sometimes lose important information. There can be many reasons for this – from accidental data deletion or formatting of the array to data loss during the broken disk replacement. In any case, you should take care of data security before taking any actions. For example, even if your array stops to start, the first thing to do is to extract the data from the array disks and then to manipulate the disks or the controller.
To recover data from RAID 1 you should:
Step 1: Download and install RS RAID Retrieve. Launch the application after installing. The built-in “RAID constructor” will open in front of you. Click “Next“
Step 2: Choose the method of adding a RAID array for scanning. RS RAID Retrieve offers three options to choose from:
- Automatic mode – allows you to simply specify the drives that made up the array, and the program will automatically determine their order, array type, and other parameters;
- Search by manufacturer – this option should be chosen if you know the manufacturer of your RAID controller. This option is also automatic and does not require any knowledge about the RAID array structure. Having the manufacturer’s information allows you to reduce the time to build the array, and is, therefore, faster than the previous option;
- Manual creation – this option is worth using if you know what type of RAID you are using. In this case, you can specify all parameters you know, and those which you do not know – the program will automatically determine
After you select the appropriate option – click “Next“.
Step 3: Select the disks that make up the RAID array and click “Next“. It will start the process of detecting the array configurations. When it is complete, click “Finish“.
Step 4: After the constructor builds the array – it will appear as a regular drive. Double left-click on it. The File Recovery Wizard will open in front of you. Click “Next“
Step 5: RS RAID Retrieve will offer to scan your array for files to recover. You will have two options: a quick scan and a full analysis of the array. Select the desired option. Then select the file system type that was used on the array. If you do not know this information, check all available options, like on the screenshot. It is worth noting that RS RAID Retrieve supports ALL modern file systems.
Step 6: The array scanning process will start. When it finishes, you will see the previous structure of files and folders. Find the necessary files, right-click on them and select “Recovery“
Step 7: Specify the location where the recovered files will be saved. This can be a hard drive, a ZIP-archive, or an FTP-server. Click “Next“
After clicking the “Next” button, the program will begin the recovery process. When it finishes – the selected files will be in the specified location.
After all, files are successfully restored – recreate the RAID 1 array, and then copy the files back.
As you can see, the RAID 1 data recovery process is quite simple and doesn’t require much PC knowledge, making RS RAID Retrieve the perfect application for professionals and novice users alike.