The importance of reliable data storage is obvious to any level of user. Especially nowadays, when the amount of stored data is growing at an extra rate, regardless of whether this data is personal (the photo and video collections) or corporate (financial and project documentation, scientific research results, etc.). One of the tools to help solve the data storage problem to a certain extent is based on the creation of RAID disk arrays. In this article we will explain how to create a software RAID in Linux.
- RAID concept
- The types of RAID arrays
- LVM or mdadm - what is the best to use
- Creating software RAID 1 with mdadm in Linux
- How to create a software RAID in Linux by using mouse only.
- How to check your RAID array status in Linux?
- How to add, remove or replace the drive in a Linux software RAID array
- How to delete a software RAID array in Linux?
- What to do if the array suddenly became inactive or does not work after a reboot?
- Advantages and disadvantages of software RAID
RAID is Redundant Array of Independent Disks (although a free interpretation is probably more correct: an array of redundant independent disks) is a hardware or software subsystem in which stored data is distributed (often with redundancy) across multiple hard disks (physical or virtual). A hardware RAID subsystem is the most efficient, both from the reliability and performance standpoints. However, software RAID can be very useful and Linux has all components needed to create a software RAID system.
The types of RAID arrays
With RAID 0, two or more disks are used only to increase performance by separating read/write data between them. There is virtually no “redundancy” here.
RAID 1 is the first level that provides redundancy. This mode is often called “mirroring” because data is duplicated on all disks in the array. Reliability is increased, but write performance is reduced because the same data is written multiple times. RAID 1 requires at least two disks.
The feature of RAID 4 is a separate disk for writing parity information. Therefore, the weak point of this subsystem is the wait time for writing to this particular disk. For this reason it is recommended to use RAID 5 in all cases except those in which the use of RAID 4 is extremely necessary and justified.
RAID 5 separates both data and parity information during writes. For this reason RAID 5 was considered the most efficient and economical level until new developments in this area such as RAID 5EE, RAID 6, RAID 10 and the combined RAID 1+0, RAID 5+0, RAID 1+5 levels were introduced. At least three disks are required for RAID 5.
Software RAID support has been available in the Linux kernel since version 2.0, although the first version was hardly suitable for practical use: its capabilities were very limited and it contained a fair number of bugs. The situation has improved since 2.4 kernels and today’s RAID implementations in Linux are suitable for practical use.
LVM or mdadm – what is the best to use
Many users who already have experience with Linux often ask themselves is it better to create RAID using the MDADM utility or it is better to use the LVM technology.
LVM is a data volume management system for Linux. It allows you to create logical volumes over physical partitions (or even unpartitioned hard disks) which will appear in the system itself as ordinary partitions. The main advantages of LVM are that, first, you can create one group of logical volumes over any number of physical partitions, and second, logical volumes can be easily resized at runtime. Furthermore, LVM also supports snapshots, copying of partitions on the fly, and RAID-1-like mirroring.
In simple words, by using LVM technology, you can take three hard disks, merge them into one and then create two partitions while LVM distributes data between the hard disks automatically. The main advantages of LVM technology are:
- The ability to change capacity. When using logical volumes, the size of the file systems is not limited to a single disk, as you combine disks and partitions into a single logical volume.
- The ability to change the size of the storage pool. With simple commands, you can increase or decrease the logical volume size without having to reformat or repartition disks.
- Live data movement. To create new, faster and more resilient storage subsystems, you can now move data while the system is active. You can move data even while the disks are being accessed. For example, you can clear the disk you are replacing on the fly before deleting it.
- Easy device naming. Logical volumes can be grouped for easy management. Groups, in turn, can be given any name you like.
- Drive striping. It is possible to create a logical volume with alternating data on two or more disks. This can significantly improve performance.
- Volume Mirroring. Using logical volumes, you can easily set up mirroring for your data.
- Volume snapshots. Using logical volumes, you can create snapshots of your devices in order to back up or test the result of changes you make without risking actual data.
Among the main disadvantages are the following:
- It is a relatively old technology and because of this the transfer rate is a bit lower than RAID 5 (let alone RAID 0);
- Only Linux. Lack of official support for most other operating systems (FreeBSD, Windows Virtual PC).
- Reliability is lower than with RAID technology.
- Difficult to recover data if LVM fails.
- More fine-tuning requires advanced knowledge and commands.
Mdadm is a utility to create and manage software RAID arrays on Linux previously known as mdctl. MD is short for Multiple Devices.
In fact, mdadm can perform 7 groups of operations:
- Create – creates a RAID array of multiple disks (with a superblock on each device).
- Assemble – assembles (previously created) array and activates it. Disks to be assembled may be specified explicitly or searched automatically. mdadm will verify whether components will form a valid array.
- Build – combining disks into an array (without superblocks). For such arrays mdadm does not distinguish between creation and subsequent build. It is also impossible to check if the necessary devices were specified in the correct order. Do not use this mode if you do not know what it is for.
- Manage – Manage the array: add new free disks (spares) and remove faulty devices.
- Follow/Monitor – Monitor one or more md-devices and react to their status. This only makes sense for RAID 1, 4, 5, 6 or multipath arrays, as only they can have different states. raid0 or linear can have no missing, spare or failed disks, so there is nothing to monitor.
- Grow – expanding or shrinking (shrinking) the array, or otherwise reforming it (reshape). For the moment, it supports changing the active size of components in RAID levels 1/4/5/6, as well as changing the number of active devices in RAID1.
- 7) Misc – other operations with independent disks. For example, viewing and modifying array superblocks and stopping active arrays.
The advantages of using Mdadm include the following:
- The RAID creation process is simple and does not require extensive Linux knowledge. In fact, it is possible to build a RAID array using just a mouse (more about this below).
- Higher reliability.
- Higher data transfer speed.
- The possibility for data recovery if something happens with the array.
- Easier hard drive replacement in case of failure.
- A great choice which array to use: if you only need data transfer rate – use RAID 0, if the main thing is reliability – use RAID 1, if both speed and reliability – use RAID 5 or RAID 5+1, etc.
At the same time MDADM has such disadvantages:
- Lack of ability to resize the volume on the fly.
- More complicated process of re-building the array.
Everyone decides for himself which technology to use, but we recommend using RAID arrays because they allow to get both high speed and high reliability. At the same time, for those who prefer to use LVM, it is possible to use LVM over RAID.
Creating software RAID 1 with mdadm in Linux
If you are a novice user, you can practice creating, configuring, and managing a RAID array in a virtual machine. It will allow you to try out all the benefits of RAID arrays without having to buy additional hard drives and decide if it is right for you or not. You can read about how to create and configure a virtual machine in this article.
For a detailed review we chose RAID-1 level because it is the simplest array from an architectural point of view.In addition, it has the most redundancy (in terms of reliability).
So, suppose we have two disks that are connected to the SATA 1 and SATA 2 ports on the motherboard. In Linux, the SATA ports are labeled as follows:
- the sda drive is the disk connected to the SATA 0 port;
- the sdc drive is the disk connected to the SATA 1 port;
- the sdd drive is the disk connected to the SATA 2 port;
Since our drives are connected to the SATA 1 and SATA 2 ports, to display them enter the following command in the terminal:
# fdisk -l /dev/sdc /dev/sdd
From these two disks we will create a software RAID and mount to /mnt as file storage. If the partitions are larger than 2TB, you will need to use the “parted” command and do a GPT partitioning. To do this enter the following command:
# parted -a optimal /dev/sda
If the partitions are smaller than 2TB, you can use the fdisk utility to partition. Let’s create partitions type fd – Linux raid auto.
To do this we should use the following command:
# fdisk /dev/sdd
As a result of these manipulations we will have two disks and one partition on each disk. To check this type the command
# fdisk -l /dev/sdc /dev/sdd
in the terminal and press Enter
Now we should create the RAID array itself. To do this we will use the following command:
# mdadm –create –verbose /dev/md2 –level=1 –raid-devices=2 /dev/sdc1 /dev/sdd1
- -level=1 — specifies which level you want to create. Accordingly, if you want to create RAID 5, you should specify level=5;
- –raid-devices=2 — indicates the number of disks in the RAID array. Accordingly, if we want to build RAID 5 — then we need to specify at least three disks -raid-devices=3
- /dev/sdc1 /dev/sdd1 — this parameter specifies what particular disks will be used. Accordingly, if we are going to build RAID 5 we will need to specify the necessary disks separated with a space. For example: /dev/sda1 /dev/sdc1 /dev/sdd1
After the system builds the array, the resynchronization process will start. You don’t need to click anything, just wait for the resynchronization to complete.
After completing the synchronization process, we need to create a file system on the array device. To do this enter the following command in the terminal:
# mkfs.ext4 /dev/md2
If everything was successful, our software RAID array will be created.
At this stage it is worth noting that although our software RAID is created, but after each reboot you will have to perform all the commands again. In order not to do this and for everything to work in automatic mode do the following steps:
Step 1: Enter the command # mdadm –examine -scan in the terminal. You will see something like this result:
#mdadm –examine -scan
ARRAY /dev/md/0 metadata=1.2 UUID=5c8952f8:8456e312:d0b5af49:a7e38514 name=cs37907:0
ARRAY /dev/md/1 metadata=1.2 UUID=2b6d40e1:1d5515f0:5dfe78ca:868250d0 name=cs37901:1
ARRAY /dev/md/1 metadata=1.2 UUID=96fea4eb:5040d522:f83a5802:ea3b6a74 name=cs37907:2
As you can see from the result above, mdadm found three raids, md2 is the file system we created a few minutes ago.
Step 2: Add the md2 string to /etc/mdadm/mdadm.conf
Step 3: Now, we need to update initramfs. We can do this by performing the command # update-initramfs -u
Step 4: Mount RAID to /mnt by entering the command # mount /dev/md2 /mnt
Step 5: Add our RAID to the fstab by entering the command /dev/md2 /mnt ext4 defaults 0 0
All done. Now after reboot the RAID will automatically rebuild and mount to /mnt
It is worth noting that when working with software RAID all work is performed as with a usual disk. You do not need to do anything extra. You just work as usual and the software RAID controller does everything by itself.
How to create a software RAID in Linux by using mouse only.
For those who do not like to work with the terminal, there is a very useful utility called Webmin. It allows you to create a software raid using a graphical interface, which is very convenient.
Webmin is a handy control panel accessible via a web browser, and Webmin modules are the external interface to the console utilities.
A SuperUser account must be used to install it.
It is worth noting that installation of this utility is possible either through the terminal or through the Software Center (if such is available in your system). The second option is much easier – just download the Webmin package from the official website and open it in the Software Center.
After downloading the installation package, Webmin and the modules required to satisfy the dependencies will be installed, which will be shown in the Software Center with the status “Installed”
If you use a terminal to install the Webmin package – then the following steps are required:
Step 1: Launch the terminal, type the command sudo apt-get install perl libnet-ssleay-perl openssl libauthen-pam-perl libpam-runtime libio-pty-perl libdigest-md5-perl and press Enter. If necessary, enter the password and press Enter again
This command will install the packages required for correct Webmin functioning.
Step 2: Add a Webmin repository so that we can install and update Webmin using our package manager. To do this, we should add the repository to the file /etc/apt/sources.list
If there is no such file, create one.
Step 3: Open the sources.list file with any text editor and add the string # Repository for Webmin at the end deb http://download.webmin.com/download/repository sarge contrib
Save the file and close exit editor
Step 4: Open the terminal, type
$ sudo apt-get update
and press Enter to update the package index of our system.
Step 5: After that, download the Webmin PGP key using wget and add it to the list of keys of our system. To do this run the command:
$ wget -q -O- http://www.webmin.com/jcameron-key.asc | sudo apt-key add
Step 6: Now update the package list to add the Webmin repository, which the system now trusts. To do this, use the command again:
$ sudo apt update
Step 7: Now all that’s left is to install the Webmin package. Type the command
$ sudo apt install webmin
in the terminal and press Enter.
After these steps, the Webmin package will be installed and available for use.
To create a software RAID you should perform the following steps:
Step 1: Open your browser, type in the url address: https://localhost:10000 and press Enter. (Note that you need to type https:// and not http://)
This will open the Webmin interface.
Step 2: On the left side of the browser window, we can see the Webmin menu. Select “Hardware” and click on “Linux RAID” in the list that opens.
Step 3: Click on “Linear (Concatenated)” and select our RAID type from the drop down list. We will create RAID 5 as an example. Confirm the action by clicking on “Create RAID device of level” button.
Step 4: In this step, we will choose the disks that will build our RAID array. Choose you drives in the “Partitions in the RAID” field. All other parameters you can left as they are. Make sure that the “Force initialization of RAID” function is enabled, as it is responsible for automatically array built after a system reboot. Then click the “Create” button.
The RAID creation process will begin. Once it finishes, our RAID 5 array is created and ready to use.
How to check your RAID array status in Linux?
If you have any doubts about whether your array is working properly you can easily check the status of your Linux software RAID. To do this open the terminal and type the following commands:
#echo ‘check’ > /sys/block/md0/md/sync_action
If the result is “0”, then your array is fine. Your terminal should show approximately the following text:
[root@server ~] #echo ‘check’ > /sys/block/md0/md/sync_action
[root@server ~] #cat /sys/block/md0/md/mismatch_cnt
If you want to stop checking your RAID array – enter the command in the terminal:
#echo ‘idle’ > /sys/block/md0/md/sync_action
To check the status of all available RAID arrays use the command:
# cat /proc/mdstat
If your RAID state ends with [UU] then there are no problems with the arrays. Your terminal should show approximately result:
Personalities : [raid1]
md0 : active raid1 vdc vdb
20954112 blocks super 1.2 [2/2] [UU]
To display more detailed information about the array you can use the command: # mdadm -D /dev/md0
How to add, remove or replace the drive in a Linux software RAID array
Suppose you have a working RAID 1 and you decide to expand your array by adding another working drive. To do this you should:
Step 1: Connect the drive with the power off, boot the Linux system, run a terminal with superuser permissions and perform the command
# mdadm /dev/md0 -add /dev/vdb
This command adds an empty disk to the array. This disk will now appear as hot-spare.
Step 2: To make the connected disk drive work, expand the raid array using the command
# mdadm -G /dev/md0 -raid-devices=3
The rebuild process will start and the disk will be added to the array.
If you want to remove the drive from the array, it is just as easy. You only need to run two commands:
Step 1: Perform in the terminal the following command:
mdadm /dev/md0 –fail /dev/sda1
This command will mark the drive as failed.
Step 2: In the terminal type the command mdadm /dev/md0 -remove /dev/sda1 and press “Enter”
This command will remove the drive from the array. After the drive is removed, all data in the array is available for use.
If a drive has failed, it can be replaced with another drive without any problems. To do this, first you need to determine which disk failed. Perform in the terminal the following command:
# cat /proc/mdstat
The failed disk will be marked as [U_] When both disks are working, the output will be [UU].
To remove the faulty drive, use the command:
# mdadm /dev/md0 –remove /dev/vdc
After removing it, connect the new drive and add it to the array by pressing the command in the terminal:
# mdadm /dev/md0 –add /dev/vdd
The array rebuilding and reconfiguring will start automatically after adding a new drive.
How to delete a software RAID array in Linux?
If you want to delete a RAID array permanently, follow the steps below:
Step 1: Enter the command # umount /backup in the terminal and press “Enter”. This command will unmount the array from the directory;
Step 2: Perform the command # mdadm -S /dev/md0 to stop the RAID device;
Step 3: After that, clear all the superblocks on the drives from which it was built by using the following commands:
# mdadm –zero-superblock /dev/vdb
# mdadm –zero-superblock /dev/vdc
Now, your RAID array will be permanently deleted and you can use the hard drives as you decide.
What to do if the array suddenly became inactive or does not work after a reboot?
Sometimes there are situations when a user has built the RAID array, saved some data, turned off the computer, and went to mind his own business. When he comes back, he turns on the computer and sees that the array is not working or has gone into an inactive state. The array can also become inactive if there is a hardware failure or power failure. All disks are marked as inactive, but there are no errors on the disks. That is, everything is set up correctly, but the array doesn’t work – all I/O requests are crashing. In this situation, you should stop the array and rebuild it again. To do this:
Step 1: Stop the array by entering the command: # mdadm –stop /dev/md0 in the terminal
Step 2: While in the terminal, type the command: # mdadm –assemble –scan -force to rebuild our array.
After that, the array will be fully functioning again and all data on it will be available to the user.
If the array does not rebuild after reboot, it is likely that the auto-build array after reboot was not prescribed or was incorrectly configured. How to configure auto-build array after reboot was explained in the previous paragraph. Just enter the above commands again and the array will build automatically after each reboot.
Advantages and disadvantages of software RAID
Software RAID is great for home solutions where you are looking for more reliability, more performance, or both, but everybody itself decides which RAID controller to use. We can only point out the advantages and disadvantages.
The advantages are:
- Large settings flexibility (you can keep some of the data from the disk in a raid, some of it not);
- This solution is free of charge;
- Softare Raid support is implemented in almost all operating systems: Windows, Linux, FreeBSD; and it is a part of the operating system kernel, which does not require additional costly investments.
- The speed is the same as Hardware Raid. Current computers with modern processors provide Software Raid speed on simple arrays (0, 1, 10) at least as fast as hardware controllers. And in some cases, if you compare the speed with low-cost controllers, Software Raid may be little faster.
- Easy to recover after a failure. In the event of a crash, Software Raid requires far less recovery time than a frantic search for a compatible hardware controller. To the extent that disks can be easily moved to any other server where they are guaranteed to work.
Disadvantages are the following:
- lower performance when compared to fairly expensive RAID controllers;
- setting up is more difficulty;
- higher failure probability in case of power failure compared to hardware Raid-array due to the absence of Battery Backup Unit (BBU).