What is RAID and Which Type of RAID Should Use

Den Broosen

5 years ago

Each year, the performance of computer hardware is increasing at a high rate. Processors are equipped with a large number of cores and streams, and graphics cards with a higher chip frequency. However, as for hard drives, it seems that their limit has been reached a long time ago and has been frozen since then. HDD specifications have recently been changing only in capacity, but not speed. SSD drives can correct this situation, but as a rule, they are much more expensive and have relatively low resource potential. Even before the advent of SSDs, so-called RAID arrays were invented in 1987. Below we will tell you what these arrays are, what kinds of arrays exist, and why a typical user needs them.

What is RAID and what it is used for?

RAID is a disk array of several hard disks. This array is used to improve storage reliability or to increase read/write speeds (or both). You can create the software RAID (using operating system features) and hardware RAID using a compatible motherboard, controller, or NAS.

To install the array, you will need a motherboard that supports raid technology or hardware controller, at least two hard drives of the same type (completely in all parameters), connected to the motherboard.

We strongly recommend the use of hard drives that are the same in all parameters, because if you connect two hard disks with different memory capacities — RAID will use the disk space equal to the smallest of the drives, and there will be unused disk space on the other disk. Besides, using different hard drives, there is a possibility of premature failure of one of the drives that can lead in the loss of important data.

RAID is also often used in NAS servers, which are essentially a computer with some disk array, connected to a network (usually local) and supporting the protocols accepted in the network. Several such computers can be combined into one system.

It is worth noting that if you create or delete a RAID — all information on the drives is deleted. So you need to backup important data.

Types of RAID controllers: software and hardware.

Disk arrays can be based on one of two architectures: software or hardware. It is impossible to say which one is better. Each variant of array organization fulfills a particular need, taking into account financial possibilities, the number of users, and applications used. Both architectures are based on program code implementation. They differ whether the code is executed in the computer’s CPU (software implementation) or a specialized processor on a RAID controller (hardware implementation).

The name “RAID controllers” tells us about the primary purpose of the device – array management. The array, which is created in the operating system, is called software RAID.

It means abstraction on the organization of a RAID-array directly through the CPU, which is the controller as the program decision with the possibility of alternation and mirror display of the data. But all calculations on the fact are performed by the CPU.

When using software RAID, the best solution choose RAID 0, RAID 1, RAID 2, because they do not load the processor as much as other types of RAID. JBOD will also be a good choice when using software RAID.

If your processor is powerful enough, you can also use RAID 5 or sometimes RAID 10.

However, it is good to remember that when using combined RAID types, it is better to use hardware RAID as this reduces the CPU load and speeds up the system.

The OS provides software support to manage disks for different RAID types. It can be used as the cheapest solution since expensive drive controller boards, and chassis are not required for hot-swapping.

Software RAID works as well with more inexpensive IDE disks or SCSI disks. Given the speed of modern processors, the performance of software RAID in some cases can be better than that of hardware RAIDs.

It’s also worth noting that software RAID can be assembled in almost any operating system.

The performance of a software array depends on the RAID type and performance of the processor and its loading.

The most important software RAID features:

Rebuilding process supports streams.
Configuration tied to the core.
The array can be ported to other Linux systems without rebuilding.
Array reconstruction is performed in the background using free system resources.
Support for hot-swapping.
Automatic CPU detection allows you to win by using optimization.

The main advantage of software implementation is low cost. However, it has many disadvantages: low performance, CPU loading with additional work. Software usually implements those RAID levels, which do not require significant calculations. Given these features, RAID systems with software implementation are used in entry-level servers. Since standard operating systems include support for multiple RAID levels (0, 1, 5 etc.), the software architecture cost has been reduced to zero.

The best, but not always free, solution for organizing disks on the server is the hardware solution. With a significant load on the disk system, which requires the server to process large data amounts, can work only a separate equipment RAID-controller. It connects via PCI connector to the motherboard and independently solves the tasks of hard disk array management. By providing speed and reliable data mirroring, the hardware RAID controller performs calculations without CPU load due to its dedicated standalone CPU.

At the same time, the RAID hardware architecture is more complicated as it requires special hardware components. The array controller, often referred to as a RAID adapter, contains its own XOR calculator, auxiliary memory, and SCSI or UDMA channels. This architecture allows significant performance gains to be achieved. However, for entry-level systems where the server processor is busy with little time, the difference between hardware and software architectures is almost imperceptible. But it is quite noticeable at high load on the I/O subsystem. Accordingly, hardware RAID implementations are more expensive than software ones.

Entirely autonomous systems are, in principle, a separate computer that is used to organize storage systems. Usually, an external controller is placed in a separate rack and can have a large number of I / O channels, including host channels, which makes it possible to connect to the system several host computers and organize cluster systems. In systems with a standalone controller, it is possible to implement “hot” reserve controllers. One of the disadvantages of such systems remains their high price.

Standard RAID levels

There are several RAID levels that have been designed to meet different needs and installation on various PC configurations. Let’s look at some of the most popular RAID configurations of identical size disks.

RAID 0 (“Striping”)

RAID 0 (“Striping”) – uses from two to four hard disks, which together process the information, which improves performance. The information on this type of RAID is broken down into data blocks and written to both/more disks in turn.

One data block to one drive, one data block to other, etc. It significantly improves performance (it depends on the number of drives, i.e. 4 drives will run faster than 2 drives), but data security across the entire array suffers. If any hard disk drive that is part of such a RAID fails, all information is almost entirely and irrecoverably lost, since part of the file may be on a damaged disk.

In general, when using such a RAID-array, it is strongly recommended to make backups of valuable information to an external drive continuously.

The main benefits of RAID 0:

highest performance for applications that require intensive I/O requests and large data volumes;
ease of implementation;
low cost per unit of volume.

Disadvantages of RAID 0:

is not a failsafe solution;
a single drive failure will cause all array data to be lost.

RAID 1 (Mirror)

Unlike RAID 0, when using RAID 1, you “lose” the capacity of the second hard disk drive because it is used to write a full copy of the first hard disk drive to it.

The advantage of RAID 1 is that it has high reliability. Everything will work as long as at least one hard drive is functioning, i.e., even if one drive fails – you will not lose a single byte of information as the second one is a full copy of the first one and replaces it when it fails. This type of raid is often used in servers where reliability is the priority.

With this approach, performance suffers a lot. However, sometimes reliability is much more important than productivity.

RAID 1 benefits:

ease of implementation;
easy array recovery in case of failure;
high enough performance for high-intensity applications.

RAID 1 shortcomings:

high cost per unit of volume;
one disk contains a full copy of the second one;
low data transfer rate.

RAID 2

When constructing these arrays, a recovery algorithm using Humming codes is used (an American engineer who developed it in 1950 to correct errors in computers). To enable this RAID level, two groups of disks are created – one for data storage and one for error correction codes.

The main advantage of RAID 2 is the ability to correct errors on “on the fly” without reducing the data transfer speed between the disk array and the CPU.

This type of RAID is not very common in home systems due to the excessive redundancy of the number of hard disks – for example, in an array of seven hard drives, only four will be allocated to data. As the number of discs grows, the redundancy decreases.

The main benefits of RAID 2:

quick error correction (“on the fly”);
extremely high transfer rate for large volumes of data;
when the number of disks increases, the overhead decreases;
easy implementation.

Disadvantages of RAID 2:

high cost with few disks;
low request processing speed (not recommended for systems focused on transaction processing).

RAID 3 and RAID 4

These two types of disk arrays are very similar in the construction scheme. Both use multiple hard disks to store information, one of which is used solely for checksums.

Three hard disks are enough to create RAID 3 and RAID 4. Unlike RAID 2, data recovery is not possible on the fly – information is recovered after a failed hard drive is replaced for some time.

In RAID 3, the data stream is partitioned at the byte level and written simultaneously to all but one drive in the array. This disk is intended to store checksums calculated when data is written. Failure of any of the drives in the array will not result in a loss of information.

RAID 3 is suitable for applications with large files and low frequencies of access (mainly in the multimedia environment). Using only one drive to store control information explains that the disk space usage ratio is quite high (resulting in relatively low cost). At least three hard disk drives are required to implement an array.

The difference between RAID 3 and RAID 4 is the data partitioning level. In RAID 3, the information is broken down into separate bytes, which leads to a severe slowdown when writing/reading a large number of small files. In RAID 4, data is split into different blocks that are no larger than one sector on the disk. As a result, the processing speed of small files is increased, which is critical for personal computers. For this reason, RAID 4 has become more widespread.

A significant shortcoming of the arrays under consideration is the increased load on the hard disk dedicated for storing checksums, which significantly reduces its resource.

The data loss is possible in the following cases:

accidental deleting files;
damage the files;
Operating system problems;
RAID rebuild failure;
lost or damaged parity;
controller board malfunction;
change of partition structure or re-initialization.

RAID 3 and RAID 4 benefits:

extremely high data transfer rate;
drive failure has a minimal effect on the speed of the array;
small overhead costs to realize redundancy.

RAID 3 shortcomings:

difficult implementation;
low performance with high intensity of requests for small amounts of data.

RAID 5

This is a so-called fail-safe array of independent drives with distributed checksum storage. It means that on an array of n disks, n-1 disk will be allocated for direct data storage, and the latter will store the checksum of iteration of n-1 stripe. Let’s imagine that we need to write some file. It will divide into equal-length portions and start writing cyclically to all n-1 disks one by one. The last drive will have a checksum of bytes of each iteration’s data portions written to it, where an XOR bit operation will implement the checksum.

It is necessary to warn at once, that in case of failure of any of disks, it all will switch to the emergency mode that will essentially reduce performance, as for assemblage of a superfluous file manipulations for restoration of its “missing” parts will be made. If two or more disks fail at the same time, the information stored on them cannot be recovered. In general, the implementation of a Level 5 raid array provides relatively high access speed, parallel access to various files, and good fail-safe.

RAID 5 arrays are designed for stressful drive operation and are well suited to multi-user systems. With proper write planning, up to N/2 blocks can be processed in parallel, where N is the number of disks in a group. The minimum number of drives is three.

The main benefits of RAID 5:

read data transactions are very fast, while write data transactions are a little slower (due to the parity that has to be calculated);
in the case of a disk failure, you still have access to all data, even if the failed disk is replaced — the storage controller recovers data on the new drive.

Disadvantages of RAID 5:

disk failures affect bandwidth. However, it should be noted that it remains at an acceptable level;
it’s a sophisticated technology. If one of the drives in an array that uses 4 TB drives fails and is replaced, data recovery (recovery time) may take a day or more depending on the load on the array and the speed of the controller. If other drive fails during this time, your data will be lost without a recovery capability;

RAID 6

This is an extended version of RAID 5, which provides dual parity control of the stored information. Two disks are required to store the monitoring information. Designed for critical applications, the RAID 6 architecture has very low write performance due to the need for additional checksums. Data is broken down at the block level (as in RAID 5), but in addition to the previous architecture, a second scheme is used to improve fail-safe, this architecture is dual fail-safe. However, when performing a logical write, there are six calls to the disk, which significantly increases the processing time per request. The minimum number of drives is four.

RAID 6 benefits:

as with RAID 5, data read operations are speedy;
if two disks fail, you will still have access to all data, even if the failed drives are replaced. Thus, RAID 6 is more secure than RAID 5.

RAID 6 disadvantages:

write operations are slower than RAID 5 because of the additional parity data that needs to be calculated. Write performance may be even 20% lower;
drive failures affect the array performance;
it’s a sophisticated technology. It can take a long time to recover an array where one drive has failed.

RAID 7

RAID 7 (Optimized Asynchrony for High I/O Rates as well as High Data Transfer Rates) unlike other levels, is not an open industry standard – it is a registered trademark of Storage Computer Corporation. It is based on the concepts used in Levels 3 and 4. The ability to cache data has been added. RAID 7 also includes a controller with a built-in microprocessor running a real-time OS. It allows all data transfer requests to be processed asynchronously and independently.

The checksum block is integrated with the buffering block; a separate disk is used to store parity information, which can be placed on any channel. RAID 7 has high-speed data transfer and request processing, good scalability. The most significant disadvantage of this level is the cost of its implementation.

RAID 7 benefits:

very high data transfer rate and high request processing speed (1.5 to 6 times higher than other standard RAID levels);
good scalability;
significantly increased (due to the availability of a cache) the speed of reading small amounts of data;
no additional data transfer is required for parity calculation.

RAID 7 shortcomings:

the property of one company;
the complexity of implementation;
very high cost per unit of volume;
cannot be serviced by the user;
the need to use an uninterruptible power supply to prevent data loss from the cache memory;
short warranty period.

JBOD

The user can also use JBOD — a disk array in which a single logical space is allocated to the hard drives in sequence. It means that the controller works as a standard IDE or SATA controller without using the mechanisms of combining disks into an array. In this case, each drive will be detected as a separate device in the operating system.

Combined RAID types (10, 01, 50, 60)

In addition to the basic types discussed above, various combinations of these types are widely used to compensate for some shortcomings of simple RAID. In particular, RAID 10 and RAID 0+1 schemes are widely used. In the first case, a pair of mirror arrays are combined into RAID 0, in the second case, on the contrary, two RAID 0 arrays are combined into a mirror. In both cases, the increased performance of RAID 0 is added to RAID 1 information security.

Often, to increase the protection of crucial information used schemes to build RAID 51 or RAID 61 – mirroring and already highly protected arrays provide exceptional data security in case of any failure. However, it is not reasonable to implement such arrays at home because of excessive redundancy.

RAID 10

RAID 10 is an array of independent disks where the levels used in the system are reversible and represent a stripe of mirrors. Nested array drives are paired into RAID 1 “mirrors”. These mirror pairs are then converted to a shared array using RAID 0 striping.

Each drive in a RAID 1 array can be damaged without losing data. However, the downside to the system is that the damaged disks are irreplaceable, and if a system error occurs, the user will be forced to use the remaining resources of the system. Some RAID 10 systems have a particular “hot spare” drive that automatically replaces the failed drive in the array.

In most cases, RAID 10 offers better performance and less latency than all other RAID levels except for RAID 0 (better performance). It is one of the preferred levels for “high performance” applications that require high system performance.

Unfortunately, the probability of data loss still remains. Among the main reasons are the following:

software failure of the RAID controller;
failure or incorrect replacement of the controller;
incorrect configuration or lack of monitoring;
hardware malfunction of a critical number of disks;
array un-synchronization with subsequent failure of the current participant;
file system corruption, accidental deletion of information, formatting of disks.

The main RAID 10 benefits:

the highest read and write speeds among commercial RAID types;
higher reliability than RAID 5;
if something goes wrong with one of the drives in a RAID 10 configuration, the recovery time is very fast because all you need to do is copy all the data from the mirror to the new drive. It can take as little as 30 minutes for 1TB drives.

Disadvantages of RAID 10:

disk space efficiency 50%.

RAID 01

RAID 01 (RAID 0+1) is one type of combined RAID array. It allows you to implement the speed of RAID 0 and the reliability of RAID 1 in a single array. But the most important thing is that it be built on a software controller.

RAID 01 is a RAID 1 array with two RAID 0 arrays inside. The data stream is first copied and then each copy is striped and written to two (or more) disks. Hence, the minimum number of disks to implement RAID 01 is four.

Inexperienced users often confuse RAID 01 and RAID 10. The reason for this is the similarity in both name and realization. However, each of these types has its advantages. For example, RAID 01 will be faster than RAID 10. It is all about the two RAID 0 arrays on which each copy of the data is written. If you remember the principle of RAID 0 you know that speed is achieved by striping – data is divided into “stripes” and written to the drives at the same time.

A schematic representation of RAID 01 is as follows:

Thus, RAID 01 allows surviving the failure of any group of disks, which can consist of two or more drives.

It is worth noting that it is recommended to use the same number of disks for each group. It is explained by the fact, that since two identical copies of the data stream are created – the size of the whole array is limited to the size of the group with the minimum number of disks. Accordingly, using a larger number of disks will not make sense, as they will not be used.

The advantage of RAID 01:

faster performance;
data remains available as long as at least one group of disks is in working order;

RAID 50 (RAID 5+0)

RAID 50 (also known as RAID 5+0) is a nested RAID consisting of RAID 5 and RAID 0 arrays with high write and download speeds. RAID 50 is quite famous.

A RAID 50 system requires at least six drives to work. As the number of RAID disks in the system increases, the disks’ performance also increases, which has a corresponding impact on the speed of data recovery as the RAID recovery interval (step) increases.

Some of the most important RAID 50 advantages are as follows:

high average data recovery speed (much faster than RAID 5);
especially high speed of data writing;
increased failsafe performance (compared to RAID 5).

The main RAID 50 disadvantages:

high cost;
limited scalability.

To lose data in a RAID 50 array, three disks must fail at once, which is not possible in practice.

RAID 60 (RAID 6+0)

RAID 60 (also called RAID 6+0) is a combined RAID 0 and RAID 6 array set that offers the user improved performance and speed of array data processing. This combination is not widespread, but has some advantages, particularly the ability to maintain production (no computation and writing latency of large parity bits) while simultaneously increasing the total amount of space.

At least eight drives are required for this combination.

The combination of RAID 6 and striping (RAID 0) provides the following benefits:

high data transfer rate;
a significant increase in read speed compared to drives that are not combined in RAID array;
high fail-safe.

RAID 60 shortcomings:

low disk space usage efficiency compared to RAID 5, 6;
IOPS write speed lower than RAID 0, 10.

RAID 60 has twice the error immunity: any two drives in an array can break down without losing data. Thus, in a shared system, up to four drives can fail without data loss.

Which type of RAID is best to use.

When choosing RAID, it all depends on whether you need it for performance or fail-safe (or both). The choice of RAID type also depends on what machine it will be installed on — PC, server, NAS, etc. since this determines what RAID type (hardware or software) is best to use. The software supports fewer levels than hardware RAID. In the case of hardware RAID, you have to determine the type. Different controllers support different RAID levels and dictate which drives can be used in the array: SAS, SATA, or SSD.

When it comes to server performance, you can choose RAID 0 because multiple drives read and write data, improving I/O operations. At least two disks are required. Both RAID software and hardware support RAID 0.

The disadvantage is that the fail-safe is not enabled. If one drive fails, it affects the entire array, and the chances of data loss or corruption increase.

If fail-safe is required and speed is not important, you can choose RAID 1 because data is copied smoothly and simultaneously from one drive to other one, creating a copy or mirror. If one drive fails, the other drive will continue to work. It is the easiest way to implement fail-safe and relatively inexpensive. The disadvantage is that RAID 1 reduces performance.

RAID 1 can be implemented through both software and hardware.

RAID 5 is the most common RAID configuration for business servers and enterprise NAS devices because it provides better performance than mirroring and good fail-safe. With RAID 5, data and parity (additional data used for recovery) is allocated to three or more drives. If the drive fails — data is recreated from distributed data and parity blocks – smoothly and automatically. The system will work even if one of the disks is damaged. The other advantage of RAID 5 is you can replace damaged drive without shutting down the server or interrupting users from accessing the server. This is a great solution for fail-safe.

The downside of RAID 5 is that it reduces performance on servers that perform multiple write operations. For example, when many employees are working on a server with RAID 5, there may be a noticeable lag.

RAID 6 is also an excellent choice for businesses. For increased reliability, it is worthwhile using RAID 6 using two disks for the parity block. Such array will continue to work even if two hard disks fail. The main disadvantage of that decision would be expensive. That is why RAID 6 is more suitable for business than home use.

RAID 10 is perfect for intensively used database servers or any server that performs multiple write operations. RAID 10 can be implemented as hardware or software, but the consensus is that many of the performance benefits are lost when using RAID 10 software.

RAID 50, as well as RAID10, is the most recommended RAID level for use in applications where high performance is required combined with acceptable reliability. However, it should be noted that RAID 50 would be better suited to many large disk drives – more reliable than RAID 5 and more cost-effective than RAID 10. This array type is recommended for data handling applications that require high storage reliability, high request rates, high transmission speeds, and large storage capacity.

The RAID 60 array is perfect for online customer service that requires high fail-safe performance, because, although similar to RAID 50, it can withstand up to twice as much failure. Besides, RAID 60 is used quite often in video monitoring systems, since this array shows better results for many years and many integrators use this technology due to its advantages in fail-safe. The other positive thing is the excellent performance in sequential access, which is a feature of video streaming.

The choice between RAID 50/60 and RAID 10 will most probably depend on available budgets, server capacity and your data protection needs. Moreover, cost comes to the forefront when we talk about SSD solutions (both enterprise and consumer class).

What to do if data is lost

Although the main purpose of RAID arrays is to improve data security they also have their disadvantages. The most vulnerable part of a RAID array is the RAID controller. It distributes the data between the disks and tells the operating system how to read the data from the disks.

Among other things, the disks themselves can fail. But perhaps the weakest link is the users themselves, who do not always know how to work with the array, and either accidentally start the initialization process or do things that lead to data loss.

Regardless of the cause of the loss of data, you need to know how to recover it correctly, since in the case of a RAID array you have to assemble it first, and only then proceed to the recovery itself.

The only program of its kind that knows how to do it all correctly and restore your data is RS RAID Retrieve.

RS Raid Retrieve

Data recovery from damaged RAID arrays

Download

Available for: Windows, macOS, Linux

The program is easy to use, and thanks to the built-in RAID constructor it can assemble the broken array and select the necessary parameters (rotation direction, disk order, etc.). All it takes is a few clicks of your mouse.

Important: It is highly recommended to use RS RAID Retrieve immediately after a RAID problem is detected. Otherwise, you risk losing data irretrievably.

Just connect the drives to the working computer and run RS RAID Retrieve. The program will do the rest.

We would also like to mention that RS RAID Retrieve is not demanding on computer resources, which allows you to recover data from RAID array even using weak computers or office laptops.

Contents