JS Ext

Tuesday, August 19, 2014

The Importance of Backups

I have a decent amount of data.  All my data is stored in a ZFS RAID-Z array.  RAID is not a replacement for backup, however.  I had a data corruption issue that luckily didn't have an effect on me, since I had an appropriate backup.

First, some background.  One of the hard disks in my ZFS array is in an external chassis.  I didn't realize this chassis was plugged into the wrong outlet on my UPS.  It was plugged into the "Surge Only" outlet.  I had a power flicker and my ZFS went down to a degraded state.  Normally, I would have gone down there to take care of the problem immediately, but children change your response time.

When it rains, it pours, however.  When I restarted the enclosure (after plugging it into the right port), ZFS ran a scan.  The scan discovered 5 files that had bitrot.  Luckily, I had a backup of all 5 files, and I was able to restore them.

Here are some tips on effective backups:

1) Categorize your data by importance.  My really importance files (tax/legal documents and family pictures/video) are stored in the cloud (my personal Owncloud instance).  I keep the important files offsite in the event that my house is completely destroyed.  For less important files, I store on an external USB3 hard disk.  This disk is not normally connected to my server.  I have to plug it in to perform a backup or restore.  I don't want a lightning bolt to destroy my data and my backup, but I am less concerned with losing this data if my entire house is destroyed.

2) I highly recommend ZFS.  Although it is not impervious to bitrot, but is far better than other solutions.  It also identifies which files have been impacted.  "zpool status -v" gives you a list of files that have issues.  It is much easier to restore 5 files and to restore everything.

3) Schedule a ZFS scrub.  A scrub will go through your data and verify that all the data matches the CRCs.  While ZFS can let you know something bad happened, it is good to pre-emptively verify your data.  I perform a scrub once a week.  I know the data corruption issue I had occurred within a one week window.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.