
Previously, I wrote an article for BigAdmin about why I chose the ZFS file system to ensure my data was safe: “How I Used Solaris OS and ZFS to Solve My Mac OS X Storage Problem.”*
One of the reasons I chose the ZFS file system as opposed to Apple HFS+, Linux ext3/ext4, or Microsoft Windows NTFS is because the ZFS file system checksums all the data written to and read from it. This might seem unnecessary, a little obsessive, or even CPU-hungry, but it is essential for long-term data storage and for detecting data rot.
On Windows Server 2012, you can choose to use ReFS, which has some of the functionality of ZFS, such as checksuming and copy-on-write, however, it doesn’t currently do deduplication or compression like ZFS.
So what is data rot, why should I fear it, and most importantly, what can I do about it?
Quite simply, data rot is the result of tiny changes in the magnetic particles that make up the me- dia in hard disks, it may also be caused by faulty memory cells on SSD disks. The effect this has on your data is random but predictable: data loss. It might be the contents of a file that gets corrupted, the file header that describes the contents of the file, or, worse, the file allocation table that describes the location or links to the file. The file might be a system file or a data file; either way, it’s eventually going to be bad news.
According to a recent study, Analyzing the Effects of Disk-Pointer Corruption (pdf), 0.66% of SATA disks and 0.06% of Fibre Channel disks developed corruption in 17 months of use. The same article describes how some corruption is worse than others and explains that most modern filing systems are unable to deal effectively with this (excluding the ZFS file system, of course!).**
So you’re probably thinking “Doesn’t chkdsk detect and correct this kind of problem (or the fsck utility or Disk Utility in Linux or Mac OS X, respectively)”? Well, maybe, maybe not, depending on where the corruption occurs. If the corruption occurs in the file system structure, then see the References*** listed below. If it occurs in the file content, then the answer is “probably not”.
We’ve established what data rot is and how existing tools are not suited to detecting, correcting, or preventing it. Now, on to why you should care about this…
How important is your data? I mean, really? Think about it. I personally have the following data stored on my computer: photos and videos of my daughter since birth, software downloads I’ve purchased (including Adobe Photoshop and Adobe Dreamweaver, which weren’t cheap), my iTunes library (for which I must have spent a couple of hundred, if not into the triple 0’s, of dollars), and various work projects.
I’m not prepared to let anything happen to this data. So I’ve taken steps to avoid obvious problems:
• The file server is a dedicated box.
• My data is separated out to avoid accidental deletion.
I back up my data regularly (on the Mac with Time Machine and on FreeBSD with the ZFS snapshots, which I send to an off-site duplicate via the ZFS send and receive commands). I’ve also taken steps to design my storage solution correctly: I use several disks in a RAID configuration (RAID-Z with a hot spare) to ensure a single disk failure can’t cause data loss.
Finally, I choose to use the ZFS file system because I know that it checksums every read and write to the filing system, ensuring that my data is as it was when it was written to disk.
I run a “scrub” of the ZFS file system every week to ensure that no data has become corrupted by data rot, and this week, it detected over 20 instances of it. Thankfully, ZFS effortlessly replaced the corrupted data with good data held elsewhere on disk (thanks to RAID-Z) without any loss whatsoever.
Conclusion: To prevent data rot, choose the ZFS file system.
Although I didn’t lose data, the experience did drive me to write this article, because I wanted to make people aware of this issue. I’ve been successfully using ZFS since its first release on Solaris in 2005, providing 11 years of data protection.
*http://web.archive.org/web/20090228135946/http:/www.sun.com/bigadmin/content/submitted/data_rot.jsp#Refe rences
**http://web.archive.org/web/20090130012930/http://www.sun.com/bigadmin/content/submitted/zfs_mac_os.x.jsp http://web.archive.org/web/20140131190051/http://www.cs.wisc.edu/wind/Publications/pointer-dsn08.pdf
***http://web.archive.org/web/20090228135946/http:/www.sun.com/bigadmin/content/submitted/data_rot.jsp#References
References
• An Analysis of Data Corruption in the Storage Stack (pdf); L. N. Bairavasundaram, G. R. Goodson, B. Schroeder, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. In FAST ’08, 2008.*
• Analyzing the Effects of Disk-Pointer Corruption (pdf); Lakshmi N. Bairavasundaram, Meenali Rungta, Nitin Agrawal, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, and Michael M. Swift. In FAST ’08, 2008.*
*http://web.archive.org/web/20080706010925/http://www.usenix.org/events/fast08/tech/full_papers/bairavasunda ram/bairavasundaram.pdf
*http://web.archive.org/web/20140131190051/http://www.cs.wisc.edu/wind/Publications/pointer-dsn08.pdf
About the Author:
Kevin McAleer is the director of Advice Factory, offering advice and IT consultancy services to businesses in the UK. He is an Apple Mac fan and also an evangelist for Oracle’s ZFS technology.
The article comes from BSD Mag Vol.10 No.08
0 responses on "Using ZFS to Fight Data Rot by Kevin McAleer"