In today’s world, everyone has long understood the value of information and the enormous potential inherent in the development of IT technology. Since servers and computers can fail, the question of how to ensure data security arose automatically, since the loss of important information can bankrupt entire companies, and losses can reach many millions. This in turn led to the emergence of RAID arrays, a technology that is designed to prevent data loss by combining several drives into one array. However, as the practice has shown, RAID arrays can also fail.
In this article we will look at the main causes of RAID failure.
- History of RAID development
- What is RAID degraded mode?
- Causes of data loss on RAID arrays
- RAID controller failure.
- RAID assemble error
- Drive failure
- Server failure
- What to do when a RAID array fails or if a RAID array cannot assemble after rebooting?
History of RAID development
At the beginning of the computer technology development all the attention was focused on how to make computers as user friendly as possible. There was no such concept as “personal computer” in those days, since computers were most often used in the military industry (but that is a different story there, as the military industry has its own developments in information security, etc.) and in large corporations. But in those days, the computers had very little functionality, and for the most part, programmers worked with them.
Even in the 1970s, when Apple and Microsoft started their activities – the issue of data security was not in the first place. Everything changed with the advent and development of the Internet, which began to cover more and more countries and allowed users to communicate. It is also worth noting that by that time everyone had already got used to personal computers and realized that they can greatly simplify life by processing vast amounts of information. With the advent of digital cameras and camcorders for personal use it became clear to everyone that personal computers will be in almost every home. After that came the digital industry boom, which automatically raised the question of data security. Big companies, which already had large servers to store data on which their level of efficiency depended a lot, contributed to this. Therefore, in 1987 RAID arrays were invented. Their main purpose was to avoid losing important information and since the technology was efficient and offered a choice of several options of data protection depending on the user’s needs it quickly became widespread. The block of disks combined into a RAID array looks like this:
Although no standardization has been done, the following RAID levels have been accepted as standard:
- RAID 1 — a mirror array in which each disk is a complete copy of the other;
- RAID 2 — is a disk array that uses a Hemming code;
- RAID 3, 4 — disk arrays with striping and a dedicated parity disk;
- RAID 5 — disk array with redundancy and no dedicated parity disk;
- RAID 0 is a disk array whose main purpose is to increase the read/write speed and there is no redundancy at all;
All other types of RAID arrays (such as RAID 10, RAID 50, etc.) are based on the above RAID types and use their concept in one way or another.
The use of RAID arrays has proven so effective that today almost all modern data stores (servers, NAS, etc.) use RAID arrays in one form or another.
However, despite the reliability of this solution, it is worth noting that the probability of data loss is still present (although greatly reduced), as even RAID arrays sometimes fail. It can happen due to many reasons and for more information about this read the next paragraph of this article.
What is RAID degraded mode?
The RAID array, as well as ordinary disks, can be exposed to all kinds of failures, and if one of the disks fails the whole array will go to the so-called “Degraded mode” In this mode, the data is still available and the array continues its work but with strong performance degradation. The degraded mode is the controller’s responsibility which enables this mode if any disk fails or one of the disks is absent. When the array goes to degraded mode user will see the message “Degraded Array event was detected on device md dev/md/1” or “ARRAY IS DEGRADED – 1 disk is missing“.
You can also see the “[U_]” symbol when checking the RAID status in the terminal. Usually it is near the damaged disk and means that it is unsynchronized.
In this case you should immediately replace the damaged disk, because if one more disk fails, all the data in the array will be lost.
Causes of data loss on RAID arrays
When using RAID arrays, data is stored on the same drives that are used in conventional computers, which can fail, etc. RAID technology allows to prevent data loss, but the data recovery process can be very slow because often when one drive fails the speed of the whole RAID array is very slow, especially when it comes to terabytes of information, such as on a server. Furthermore, in some cases replacing a damaged drive with a new one requires a power outage, which is also not very good for servers. Therefore, it is best to know the main causes of RAID failure so that you can prevent trouble.
So, among the main reasons are the following:
RAID controller failure.
The RAID controller is one of the most important elements since it is responsible for data distribution between the drives and allows the array to work as a single drive. If the array ceases to work – most often it is caused by the controller failure. It is worth noting that hardware controllers break a bit less often than software controllers, but they are also more expensive. In addition, there is no compatibility between hardware controllers from different manufacturers. That means, if you bought the controller from Supermicro then in order to restore the array functionality you will have to buy the same model. Otherwise, you will have to recreate the array, which will lead to data loss. Some of the reasons why a controller fails include a voltage drop or a sudden power outage. It is true for both hardware RAID controllers and software RAID controllers. Therefore, be sure to take care of an uninterrupted power supply to safeguard your RAID array from possible problems.
RAID assemble error
During every computer reboot, the RAID array is reassembled and its further functionality depends on whether the reassembly goes well or not. If during the rebuild the array goes through a power surge or other force majeure event — the RAID array can fall and the user may lose the data.
We all know that the main purpose of RAID arrays is to protect data if one or two drives fail. Usually, a RAID array can do this without problems. However, it happens sometimes that a failure of one or more drives corrupts the data on the adjacent drive, and in this situation the RAID array may become completely inoperable, which in turn leads to data loss. Therefore it is strongly recommended to periodically check the health of the drives which are used in a RAID array.
A host computer, as well as any other computer, can fail or malfunction. This in turn affects the RAID array. In 70% of these cases, the data is unavailable.
All of the above-mentioned failures are the most common causes of RAID failure. Usually, after such failures, you need to use third-party data recovery software. Read about how to recover data on a RAID array in the next paragraph of this article.
What to do when a RAID array fails or if a RAID array cannot assemble after rebooting?
If your RAID array has stopped working after a crash or is not reassembled after a reboot, you should first extract the RAID data to avoid damaging it during the RAID array troubleshooting. To do this, you should:
Step 1: Power off your computer/server or NAS device and disconnect the drives that made up the RAID array.
Step 2: Connect those drives to the working computer (disconnect its power beforehand).
Step 3: Power on the working computer. Then download and install RS RAID Retrieve following the tips of the Windows Setup Wizard.
We purposely chose this program because it has extensive data recovery capabilities and an intuitive interface at the same time, which means that it is excellent for both inexperienced users and professionals.
Step 4: Launch RS RAID Retrieve by double-clicking the icon on your desktop. The built-in RAID constructor will open in front of you.
Step 5: Choose the type of adding a RAID array for scanning. RS RAID Retrieve offers three options to choose from:
- Automatic mode – allows you to simply specify the drives that made up the array, and the program will determine their order, array type, and other parameters automatically
- Search by manufacturer – you should choose this option if you know the manufacturer of your RAID controller. This option is also automatic and does not require any knowledge of the RAID array structure. Knowing the manufacturer allows for a shorter time to build the array, therefore it is faster than the previous option;
- Manual mode – use this option if you know what type of RAID you are using. In this case, you can specify all parameters you know, and those that you do not know – the program will automatically determine.
After you choose the appropriate option – click “Next“.
Step 6: Select the disks that consisted of the RAID array and click “Next“. The process of detecting the array configurations will begin. When it is complete, click “Finish“
Step 7: In the program window, select your array, right-click on it and choose “Save Disk“, then specify where to save the disk copy and click “Save” again
This will start copying files to the specified location. You can also save individual files or recover lost data if needed. To do this, double-click on the array and choose a scan type. RS RAID Retrieve offers two scan types to choose from — quick scan and full analysis. The first option is worth choosing if you just want to copy files to the other drive, and the second option is worth choosing if you want to recover lost data.
Also, select the file system type of your array in this step. RS RAID Retrieve supports ALL modern file systems.
Now when everything is set up, click “Next”
The array scanning process will start and when it is finished, you will see the previous structure of files and folders.
Step 8: Select the file you want to restore and double-click it. Then select the location where you want to recover the lost file. It can be a hard drive, a ZIP archive, or an FTP server. Most importantly, make sure the location where the new files are written is different from the array drives. Then click “Recover“
Now, when the data is safe – you can proceed to restore the array itself. The first thing to do is to find the cause of the problem and fix it.
The RAID array may not reassemble after rebooting due to the following reasons:
- Error in the mdadm.conf file (it is in the wrong place, or the file does not exist);
- Assembling error;
- A virus or malware;
- Bad sectors on RAID disks;
- Human error;
- Other causes;
The first two causes are quite common, so they are worth paying special attention to.
If the cause or failure was at the physical level, replace the failed elements.
If you do not want to waste time fixing software errors – you can just recreate the RAID array and then copy the data from the saved copy.