Digital Antiquity Backup Utility
...
This document describes Digital Antiquity’s procedures it’s archival and retrieval procedures for Digital Antiquity assets, including: the tDAR resource filestore, tDAR PostgreSQL metadata database, and Digital Antiquity websites.
Note: This document is not an installation guide or a tutorial. Installation, usage, and configuration instructions are available on Digital Antiquity's Bitbucket site.
- Usage and project information: https://bitbucket.org/tdar/backup
- Installation and configuration: https://bitbucket.org/tdar/backup/src/b7409a3a8f0aa0a29125a87cf4fe34b4a30a3441/installation-notes.md
Process Summary
Digital Antiquity will perform a routine full backup of assets and transfer these assets to the Amazon Glacier service (by way of Amazons S3-to-Glacier) utility. Digital Antiquity will augment these full-backups with smaller, differential backups that occur on a more frequent schedule. Digital Antiquity will compress & encrypt all data prior to sending it to Glacier.
...
We plan to use Amazon’s automated S3-to-Glacier transfer functionality. More information can be found here: https://aws.amazon.com/blogs/aws/archive-s3-to-glacier/.
Amazon Glacier is Amazon’s data archival service. Glacier provides low-cost, durable storage that is tailored for data archival and backup services.
...
Amazon S3 is a near-realtime online storage service. While it can serve as a backup destination, it is more tailored for low-latency & high-availability file access and S3 pricing reflects this. S3 has been in service longer than Glacier, and benefits from a good selection of mature 3rd-party file transfer utilities.
Amazon Primers
Amazon S3 Filesystem Layout
Buckets
Top-level container
Container for objects
Non-hierarchical (no buckets in buckets)
Objects
Essentially files
Have name, permissions.
Folders
Hierarchical
Don’t really exist. Serve as a construct when downloading and visualizing.
Internally, just a prefix prepended to object name.
Limits
Unlimited # of buckets
Unlimited # of objects per bucket
Max object size: 5TB
...
Vault
top level container, akin to s3 bucket
Archive
Roughly akin to s3 object.
Limits: effectively none, for a filesystem of our size (or a filesystem 1000x our size)
How Manifests Work
How S3-to-Glacier works
How to copy to S3
Suggested S3 File Layout
The basics
One bucket per “app” (e.g. “tdar filesystem”, “postgres”, “jira”, etc)
snapshot contained in “snapshot” subfolder
differential backups in “diffs” subfolder.
each folder contains:
one object containing manifest file(s)
one object containing winterized backup + manifest file(s)
Example Layout
filestore-2015q1.manifest
tdar-filestore-2015q1.tar.gz
filestore-2015-jan-01.deleted.txt
filestore-2015-jan-01.modified.txt
filestore-2015-jan-01.tar.gz
filestore-2015-jan-14.deleted.txt
filestore-2015-jan-14.modified.txt
filestore-2015-jan-14.tar.gz
...