A RAID (Redundant array of independent disks) or RAID System, is a set of redundant and independent disks. Redundant because they will store duplicated information to ensure fault tolerance and improve availability. And independent, because there is no dependence between them, which allows us to replace any disk in the set with a new one, and it will work perfectly with the disks the array already had.
This type of disk grouping aims to improve the achievable performance with a single disk. Depending on the type of combination we can improve security, storage capacity or data availability. And the RAID controller will ensure that, for the server, this combination of disks appears as one, under the same drive letter.
The different combinations of disks are called RAID levels, and there are some mechanisms that have to be known for working with RAID systems:
- Stripping: Consists of the fragmentation of files and storing the different fragments on different disks. This practice improves the performance when recovering a file, the different disks where the fragments were saved are accessed at the same time and different fragments are recovered simultaneously, significantly improving the access speed.
- Mirroring: With this technique, we save each fragment of data on several disks, redundantly. In return, we obtain improvements in the security of the data, and in the event of a loss or eventual corruption of them, we can easily restore them since we will have copies on other disks.
- Computing parity: Through this technique, some extra data are generated that allow the recovery of the data fragments in case of failure but use less data than with the mirroring technique.
There are seven RAID levels (0-6), being the most common the levels 0, 1 and 5.
Also known as distribution volume. At this level, files are simply fragmented in data that are stored on different disks.
It obtains a greater speed of access to the data, both writing and reading, because it uses parallel access to the disks in which fragments are stored. The drawback is that there is no redundancy, so if a basic hard disk fails, there is no way to recover the data.
Also known as mirror volume. On this occasion, we save the fragments of our files on two disks, being one an identical copy of the other.
This system is very safe since we have a backup in real time. If one disk fails, we can easily replace it and duplicate again all the data since all the information is available on another disk.
The drawback is that it slows down the system considerably. The access speed will be the same as that of a basic hard disk since all data is written by duplicating on the two disks.
At this level, we do bit-level stripping and also generate parity bits using Hamming code.
Computing parity is used in this RAID to reach a compromise between security and speed. The amount of data originated by computing parity is significantly inferior to the one generated by the mirroring technique that simply duplicated all the data, thus this level is faster than level 1. And having redundancy, level 2 is more secure than level 0. The drawback is the computing effort required.
Each bit is saved sequentially on a disk. And the parity bit is saved on a dedicated disk. Parity information would allow us to recover data from a disk in case it failed. This level is theoretical and is not used in practice.
Similar to RAID 2, but the stripping is done at the byte level.
Since parity has to be calculated at the byte level, the disks have to be perfectly synchronized and will be accessed simultaneously.
Implementation of this level is unusual.
In this case stripping is done at the block level and computing parity is calculated of the different groups of fragments and stored on a dedicated disk. It is similar to level 3 but works at the block level. Any failure fragment can be generated from the parity bits and the remaining fragments. Each fragment of each group is written on one disk and the parity bits on another so that if one disk failed, the data on it could be recovered from the rest of the disks.
Since it works at the block level, this means that the disks can work independently, allowing simultaneous access to fragments on different disks. For example, in the previous image, we could access simultaneously fragments A1, A2 and A3, as long as the controller allowed it.
The limiting factor of this level is the dedicated parity disk, which means that whenever we want to save some data, we have to write to it, becoming the bottleneck.
This level is also known as IDA (Independent Disk Access) with a dedicated parity disk.
Level 5 is similar to 4, the difference is that it does not save the parity blocks on a dedicated disk, but distributes them among the different disks. For this reason, RAID 5 is also known as a distributed level with parity.
By saving the parity blocks to different disks, the bottleneck of the dedicated parity disk at level 4 is removed.
Like level 4, it allows simultaneous access to data fragments located on different disks, thus improving access speed. It also allows for recovering lost data in the event of a disk failure, based on parity information and the rest of the block fragments.
It is one of the most commonly used levels.
Level 6 is very similar to Level 5, except that it computes two parity blocks for each fragment group. And these two parity blocks are stored on different disks.
Thus, this level has greater security than level 5, since by having two parity blocks, the system allows for recovery from the simultaneous failure of two disks. However, we penalize performance and reduce storage capacity.
Combination of RAID levels
Usually, instead of working with a specific level, we will use a combination of them. That way, we will improve the characteristics of the final set.
Some of the most common combinations are presented below.
Two RAID 0 combinations (stripping) are controlled by a RAID 1 (mirroring). There is a level 1 controller that sends the same data to each level 0 controller, saving the information twice. Each level 0 controller strips the data and distributes them across the different disks.
Using this combination, we have the advantage of level 0 which improves access speed and the advantage of fault tolerance provided by the redundancy of level 1. We are using twice as many disks as we would with a level 0.
Very similar to the previous one, only now it will be a level 0 controller that strips the data in fragments and pass them to two level 1 controllers that do the mirroring. That is, first we fragment the data and then we duplicate it.
The advantage of this combination is that it supports simultaneous failures on more than one disk, as long as each one is in a different RAID 1 combination.
Sometimes called RAID 50. First, a level 0 controller fragments the data and then level 5 controllers distribute it across the different disks in the volumes, generating the parity block.
Other RAID combination
There are more combinations, although the above are probably the most common. We can also find the following RAID levels: 0+3, 3+0, 0+5, 1+5 or 5+1.
This post is part of the collection “Data Access and Storage Systems”. You can see the index of this collection here