Developing a comprehensive, rigorous backup solution.

Issues, criteria

  1. Availability is what it's all about — having access to my files (and applications) in the face of risks: hardware failure, software bugs, human error… (Most terminology here is from the security domain where risks are dealt with rigorously… So, need to expound analysis, terminology…)
  2. Tiers (Ramifications from analysis, but needs prominence…)
    1. Mirroring: up-to-date copy (soft realtime) to protect against hardware failure. Ie, must not be on same medium. Remote increases latency, and traffic is expensive; typically done with RAID. (But, reconsider…)
    2. Versioning: read-only snapshots to protect against "bad" changes (mistaken deletion, broken upgrades, etc).
    3. Off-site: to protect against entire site failures (typically, catastrophes: theft, fire, etc).
  3. Encryption: confidentiality plays a special role, because redundancy is the most cost-effective approach to reliability, so we want to make multiple, distributed backups, especially remote (off-site) copies — which must be encrypted, obviously, and preferably not require privileged access (root login).
    1. Encrypt both storage and communications. (GnuPG, KGpg, OpenSSH.)
  4. Operating system backup
    1. Mirroring?
    2. Versioning: take snapshots. (How to clone current OS configuration?)
  5. Coverage: ensure everything is backed up.
    1. Exclusions: use a negative approach — include everything except what's explicitly excluded. Deals with careless omissions.
    2. (Canonical tree, FS boundaries, symlinks, offline, mtime?)
    3. Metadata (ownership, permissions, user/group identities, extended attributes…)
  6. History/versioning (rsnapshot, FUSE…?)
  7. Automation: Cron (nice (run in background), when, frequency…)
    1. Automated SSH login
  8. Dependencies
    1. Format: something that conforms with other criteria, is reliable, convenient… (Duplicity, Box Backup or EncFS/SSHFS? Rsync, rdiff-backup…?)
  9. Restoring (recovery, testing)
    1. Verification (If it's not tested, assume broken…)
    2. Selective restoration: browsing archives…
    3. OS snapshots (How to revert OS changes?)
  10. Detection and monitoring (How to detect "bit rot"? Tripwire, SMART, CRCs/parity?)
    1. RAID mirroring
    2. Monitoring automated backups
  11. Performance
    1. Efficiency: incremental (space)/differential (time).

Choice of tools

  1. Rsync
    1. Simple to use, but… really falls short on most requirements:
      1. Mirroring: Rsync isn't continuous, and expensive to execute, even if efficient in bandwidth; useless for mirroring.
      2. Versioning: yeah, there are hard-links based hacks to do it, but… feel like hacks?
      3. Off-site: no encryption, so can't. There's Rsyncrypto, which I don't trust, or can hack something if have privileged access to the remote host, eg, to run an encrypted file system there, but beats the no privilege requirement…
  2. Duplicity (Done; explain…)
  3. Dar? Others? Why not?


  1. Merge with Secure remote backup.

(Appending notes disabled temporarily.)

Last modified 2009-09-29 10:34:58 +0000