The Linux operating system is quite flexible. It can be used on desktops as well as on servers. The main thing is to choose the correct file system for your needs. In this article, we will look in detail at the ZFS file system.
- What is a ZFS file system?
- History of the ZFS file system evolution
- ZFS file system structure
- The ZFS file system features
- Disadvantages of the ZFS file system
- Data security in ZFS
- ZFS and Mac OS
- The best alternatives to the ZFS file system
What is a ZFS file system?
The main task for the developers of the ZFS file system was to create a modern file system that could work with huge amounts of data. At the same time, performance should not be degraded, plus the ZFS file system had to have all the modern features.
Sun Microsystems wanted to create a new type of file system for their Solaris operating system, which would be ahead of its time. That is why it included a lot of innovations such as data layout structure, pooling support, and more.
Even the concept of a ZFS file system was innovative – ZFS needed to include a logical volume manager, provide convenient volume management and adhere to lightweight file system principles. In addition to this ZFS was supposed to offer redundancy.
In 2005 the development team led by Matthew Arena and Jeff Bonwick introduced a file system that met all these requirements.
ZFS is very fast, technologically innovative, and effective. In addition, it is 128-bit instead of the usual 64-bit. In this way, the developers put a huge emphasis on its relevance in the future.
ZFS was able to pull together many solutions that are used in other file systems in isolation. For example, the ZFS somehow uses LVM (Volume Manager), Linux RAID (albeit a little modified), some XFS features, snapshots, and more. If you want to know more information about Linux RAID read the article “RAID – what is it and which type is better to use“
LVM technology is described in detail in the article “LVM – what it is, advantages and disadvantages“.
It cannot be omitted that the ZFS file system uses copy-on-write technology. It means that when the file is overwritten it is not physically rewritten. Instead, a new block is allocated and a new copy is written inside. Then, instead of deleting the old files, the metadata that points to the new block is updated, and original information will only be deleted if there is insufficient disk space. Such a solution significantly increases the data recovery capabilities since you can always view the previous version of the files in case the information is corrupted.
Using the copy-on-write method allows you to not use journaling at all and still be able to recover lost data.
All these advantages and the great potential of ZFS are the reasons why so many people are still talking about it.
History of the ZFS file system evolution
After the release of the ZFS file system, its source code was integrated into the Solaris operating system kernel. In the same year, porting to other operating systems began and by 2008, ZFS had been ported to FreeBSD. After that, the porting ZFS to Linux starts. But, the porting process was complicated by the changing license to the CDDL (Common Development and Distribution License). Although it is based on the principles of free software distribution, it is incompatible with the principles of the GNU/Linux license. The porting was finished, but in the Linux operating system, ZFS can only be used in a very limited way. To solve this issue, modern Linux distributions offer methods to install ZFS. That is, the user first installs the desired Linux distributive and then uses the suggested method to install the ZFS file system.
The most effective methods to install ZFS on Linux are to use the FUSE module or to install a separate ZFS on the Linux kernel module.
Shortly thereafter, Oracle acquired Sun Microsystems along with all the licenses and closed the ZFS code. Thus, further free development was not possible. It caused many key developers to switch to other companies and creates the OpenZFS project, which adheres to the free development concept.
However, the license was never changed, as many ZFS developers held the copyright to it. Accordingly, it was easier to leave things as they were and use the methods proposed by the Linux distributions, rather than seek permission from each of the developers.
There are currently 37 versions of the ZFS file system, each with new improvements.
It is also worth noting that the ZFS file system is actively evolving. Its technical perspective, coupled with continual performance and feature improvements, can make ZFS the No. 1 system for server systems in years to come.
ZFS file system structure
The structure of the ZFS file system is organized as a Merkle tree or hash tree. To begin with, it is worth saying that the hash tree is used in cryptocurrency blockchain (for example, in Bitcoin or Ethereum). In fact, the ZFS file system is a disk manager and LVM in one package.
That is, at the lowest level, several physical disks are combined into a virtual group – VDEV (Virtual Device). There may be a large number of such groups. Redundancy is also provided at this level. The parity is done inside each disk group. It is possible to choose Mirror (analog of RAID-1) or RAID-Z (works on the principle of RAID-5 with several modifications). The latter type is divided into several levels: RAID-Z1 – uses two disks for data storage and one disk for parity data, RAID-Z2 – uses two disks for data storage and two disks for parity data, and RAID-Z3 – uses two disks for user data and three disks for parity data. The type of RAID-Z is chosen depending on what the user needs more – reliability or performance.
Then, all VDEVs (groups of disks) are combined into a common pool. This is an LVM, which combines several RAID arrays.
On top of this structure is the file system itself with user data.
The ZFS file system structure allows new disk groups to be added dynamically, and each group can have its configuration. On the software level, it is treated as a separate VDEV regardless of configuration.
The ZFS file system features
In this part of the article, we will take a look at the most interesting features of the ZFS file system and explain why people love it so much.
The developers have implemented a huge number of useful features in ZFS, the most notable of which are the following:
1) The maximum writable file size has been greatly increased – it is now 16 exbibytes;
2) The maximum size of a volume is 256 zebibytes, and the number of such volumes may be any size;
3) ZFS is a 128-bit file system – in practice, this means that it provides the capacity of 16 million 32 or 64-bit systems. Thus, it is almost impossible to fill such a 128-bit pool. One of the creators of ZFS, Jeff Bonwick, even joked about this. He said that it takes more energy to power a pool of that size than to boil the ocean;
4) Snapshots are used to monitor the state of the system – the snapshot records the original state of the file system and the state at the moment. If a file is deleted from the system – it is also deleted from the snapshot. When writing new information – new blocks are highlighted. The main feature is that you do not need any additional space for snapshots;
5) Data integrity check and automatic data correction – every time new data is written, the file system creates a checksum for it. When data is read, the checksum is compared. If there is a discrepancy, the file system marks the error and automatically tries to fix it;
6) At least two copies of checksums for metadata – usually they are in different places for more security;
7) High on-the-fly compression speed – much depends on the algorithm. For example, if you use the LZ4 algorithm, ZFS will easily achieve 800 MB/sec write speed per core, and read speed is a minimum of 4.5 GB/sec.
8) Atomicity – the ZFS file system is atomic due to the Merkle tree. Block integrity is guaranteed by the transactional nature of the file system. This solution allows you not to use the WAL-log. The disadvantage of this solution is the need to know a lot of commands and utilities;
9) Pooling support – disks can be joined into VDEV groups, which in turn can be joined into pools;
10) Highest performance – ZFS can work with hundreds of pools (not discs, but pools) without performance loss;
11) The ability to create a lightweight file system – in ZFS, manipulating the file system is easier than in other file systems. All manipulations are more like working with directories than with a file system;
12) Thanks to the copy-on-write method – you will never lose a file while writing it to disk;
13) Automatically expandable data transfer channel – when you connect additional disks, you need to increase the bandwidth, otherwise the performance will decrease. ZFS has taken this into account. Therefore, when you connect new VDEVs, the file system will automatically configure the disks and expand the data channel;
14) The ability to schedule data work – this feature becomes useful on servers, for example. That is when there is “CPU downtime” you can use those resources to work with data. In this way, you can use the hardware resources on your machine more efficiently.
As you can see ZFS is a very powerful file system and we have not even mentioned all the fine features which make it so useful. But most importantly — with ZFS you can create a huge, fast, and expandable local storage system.
Disadvantages of the ZFS file system
In the previous paragraph of this article, we reviewed the main advantages of the ZFS file system. Now it is time to talk about its disadvantages, which are also quite a few:
- Not too fast on hard disks – because of its structure, ZFS requires fast random access, which hard disks cannot boast of. Accordingly, as the number of hard disks increases, performance may decrease. There have been cases where a home computer with a not-so-fast hard disk drive has seen its performance drop so low that it is unbearable to use the computer. Therefore, you can get the full potential only on SSD drives;
- The need to know a large number of commands and utilities – to get the maximum effect — you must be able to “communicate” with this file system on its language;
- Inability to change the disk structure in VDEV – you can only configure each VDEV disk group once (number of disks, redundancy, etc.). It is not possible to make changes once the configuration has been applied. A patch is currently being developed which will be able to fix this problem;
- The need for a large amount of RAM – although the minimum requirements specify a min 4GB RAM, in practice it is best to use at least 8GB or more.
- High level of data fragmentation – occurs due to the peculiarities of the file system. Currently, there is no normal defragmenter;
- Impossibility to reduce the number of VDEVs – a fix for this problem is planned soon;
- Lack of quota allocation between the users – at the moment this issue is solved by creating a file system of different sizes for each user;
- You cannot mix VDEV and ZPOOL – that is, you cannot create a RAID 10 or RAID 01 counterpart, since the redundancy can only be organized on a lower level, within each VDEV disk group;
- Severe degradation of performance if large amounts of data are deleted;
- Increased load on the processor when using RAID Z – occurs due to the need to calculate the parity data. It is also the situation with software RAID 5. However, RAID 5 is usually used in smaller storages. ZFS storages are often very large, so the hardware needs to be very powerful;
Despite all the disadvantages, there is currently no file system that can handle large storages so effectively. And given the active development of ZFS – the reason for its popularity becomes clear.
Data security in ZFS
The ZFS file system pays a lot of attention not only to working with large amounts of data but also to its security. Agree that it would not be very pleasant if the data of a pool of a hundred disks suddenly disappeared because of a single drive failure.
One method of protecting the data has been the implementation of RAID Z. As mentioned above, this technology is based on RAID 5. If any drive fails, RAID Z allows you to simply pull out the broken hard drive and insert a new one. The main thing is that it has to be the same size. The file system will do the rest. The user will only notice a slight performance degradation caused by recalculating the algorithm restoring the data to the new drive.
The ZFS file system automatically analyzes its state by comparing snapshots. If there is a problem, ZFS will fix the error, and if that is not possible, you will see a message. In any case, you will be able to recover the data. That is because of the “copy on write” method on which all ZFS is based. When data is overwritten, new data is written to a new block without changing the old data.
It also prevents data loss due to power failure while the file is being modified.
SHA-256 is one more technology that protects the data from loss. The file system automatically generates a checksum when the file is written and recalculates the checksum when the file is copied. If there is a difference in the comparison, ZFS will immediately recognize the errors.
The use of the above technologies noticeably reduces the chance of data loss or corruption. It is one more reason why many large projects have turned their attention to the ZFS file system.
ZFS and Mac OS
In 2009, Apple announced that it was porting the ZFS file system to Mac OS. Unfortunately, the porting process was not completed, and by the presentation of the new version of the Mac OS, all references to ZFS had been removed. The reasons for the rejection of ZFS were not disclosed.
Since Windows is by far the most popular system in the world, sooner or later it becomes necessary to open a ZFS drive in Windows. There can be many reasons for this, ranging from basic file transfer to more specialized tasks.
In any case, the Windows operating system does not support ZFS out of the box, since the native Windows file systems are NTFS and FAT (16, 32).
But what if you need to open a ZFS drive in Windows?
There are several ways to do that.
The first way is to use the RS Partition Recovery. The program is easy to use and allows you to work with the data on the ZFS drive immediately. You do not even need to reboot your computer.
But most importantly, you can recover lost data if it is lost, the file structure of the disk is damaged, or the disk is formatted. The recovery feature makes RS Partition Recovery stand out from the competition because often Windows does not work correctly with the ZFS file system. It can lead to the loss of important data.
It is also worth mentioning that RS Partition Recovery supports ALL modern file systems including Btrfs, Ext2,3,4, XFS, HFS, UFS, and many others.
All of the above features of RS Partition Recovery make it a “must-have” program for every user.
The second way is to install a special driver called ZFSin.
This driver adds support for ZFS on a native level. But things are not as rosy as they look at first glance.
The fact is that quite often the ZFSin driver conflicts with operating system drivers. But the saddest thing is that this almost always leads to a Windows Blue Screen of Death or operating system crash.
In addition, if your ZFS flash drive is plugged into the system during a driver conflict, there is a high probability that the data or the logical structure of the drive will be corrupted.
Thus, instead of supporting ZFS on Windows, you are very likely to end up with a non-functional operating system. That is the reason why the first method is preferable to the second.
The best alternatives to the ZFS file system
If after analyzing all the advantages and disadvantages, you are not sure that you want to use ZFS, or you don’t want to learn a lot of commands, you can consider the best alternatives to this file system.
If we are talking about alternatives for home use, the best alternative is to use Ext4. Yes, it is slower than ZFS and doesn’t have many modern features.
But, at the same time, it is robust and easy to manage and can provide enough performance for most users.
When we are talking about using it on a server, the best alternative is the Btrfs file system. Like ZFS, it is still under heavy development but is easier to maintain and fast enough to handle. The disadvantages of Btrfs are its immature ecosystem and the relatively small amount of data it handles (compared to ZFS).
The choice of a file system depends entirely on the needs and skills of the user. We have only shown you the strengths and weaknesses of ZFS and hope that we have helped you make the right choice.