Good overview of RAID storage & backup for musicians

While music is my avocation, I make my daily bread in the data storage industry. As such, I was happy to see this article shared by em magazine.

This gives a good overview of the importance of a regular backup regimen. After a broad overview, it specifically addresses the benefits of employing RAID as a means of eliminating the human limitation of being disciplined enough to stay on a backup regimen. RAID forms an automated backup of sorts, however it does have its limitations.

Limitations of RAID

Perhaps the major limitation of RAID is that it will happily (and immediately) propagate any human error. As an example, have you ever mistakenly deleted a file? Overwrote a file you needed with a blank or incorrect file? Studies over the years have consistently shown that human errors such as these are behind 75% – 85% of all data loss. The RAID will happily overwrite the good data with bad data on all the redundant copies it maintains.

The author of the original article does address this issue at the end in his discussion of progressive backups. I just wanted to stress that this point should not be an afterthought.

Why RAID works

As the author mentions, RAID is an acronym for Redundant Array of Independent Disks. The key to the reason that RAID can guard against hardware damage leading to data loss is in the word Redundant. All data stored to a RAID is stored in more than one copy on multiple disks.  (I speak here of all but RAID 0, which is not truly RAID, as there is no data redundancy in RAID 0). If a disk breaks, the data from that disk can be algorithmically recreated from the redundant data on the other disks.

What happens when a disk breaks?

It is important, in the case of a disk failure, to replace the failed disk before a second drive fails. Upon adding the new disk, the missing data is then reconstituted and written to the new disk. This process is termed rebuilding a degraded array. If a second drive fails in a degraded array, before the rebuild can complete, data is lost.

It is important to note that a degraded array (one with a failed disk) operates much slower than a healthy array. Users can thereby expect file operations to take longer, and the rebuild time can sometimes be measured in days.

RAID 6 guards against dual failure

This extended rebuild window has given rise to yet another RAID architecture, typically termed RAID 6. This is similar to RAID 5. However, instead of keeping one extra copy in each ‘stripe group’, two algorithmically different copies are kept. A ‘stripe group’ is the set of disks that are managed as a single disk visible to the host computer. Most affordable RAID systems are limited to a single stripe group. The benefit of this RAID 6 over RAID 5 is that you are covered against data loss if a second disk goes down before a degraded array can be rebuilt.

Capacity lost to overhead

While the author claims that RAID 5 can consist of 3-5 disks in each array, it can actually contain any number of three or more. The capacity penalty for RAID 5 is dependent on the number of disks in the array. In RAID 5, the capacity overhead is negligibly larger than one disk’s worth of capacity. This is due to the single redundant data in each stripe group. In RAID 6, the overhead is two disk’s worth of capacity. A RAID 6 can have any number for four or larger in a stripe group.

Part of a good backup plan

To return to an earlier point, RAID cannot guard against human error. If you overwrite a file, RAID does nothing to solve this problem. Similarly, if your house is hit by an asteroid (or if someone steals your RAID unit), you will lose data. Accordingly, it is crucial not to mistake the use of RAID with a complete, well thought out backup regimen. It would serve the user well to regularly back up the RAID system (perhaps to an external hard drive), and store it offsite.

Planned obsolescence of some RAID subsystems?

Last, if in the marketplace for an external RAID subsystem, there is a point of which to be aware. We are currently at a natural inflection point on drive capacity of the larger drives available. Due to the fact that 32-bit processors have dominated system architectures for many years (we’re just making the transition to 64-bit), and almost all drives currently have a 512-byte sector size, a 2TiB drive is the maximum size some systems will ever be able to employ. 2^32 Bytes = 4GiBytes, and 4Gi * 512 Bytes = 2TiBytes. There are systems that employ a 64-bit path for Logical Block Addresses, but others do not. This is somewhat analogous to the Y2K scare of some years back. (or the Unix time epoch issue due to hit in 2038). The upshot of all this is, if you are in the market for a RAID subsystem, it would be wise to ensure it is already compatible with drives larger than 2TiB.