Digital Antiquity Backup Utility


Overview

This document describes Digital Antiquity’s procedures it’s archival and retrieval procedures for Digital Antiquity assets, including: the tDAR resource filestore,  tDAR PostgreSQL metadata database,  and Digital Antiquity websites.


Note: This document is not an installation guide or a tutorial.  Installation, usage, and configuration instructions are available on Digital Antiquity's Bitbucket site.

 

Process Summary


Digital Antiquity will perform a routine full backup of assets and transfer these assets to the Amazon Glacier service (by way of Amazons S3-to-Glacier) utility.   Digital Antiquity will augment these full-backups with smaller,  differential backups that occur on a more frequent schedule.  Digital Antiquity will compress & encrypt all data prior to sending it to Glacier.


Priorities


Backup Procedures

This section generally describes the steps involved in the backup process.  The process is implemented as a set of unix scripts & utilities.

Full Backups (“snapshots”)

  1. Manifest file generation

    1. Essentially a listing of every file contained in the backup

    2. Elements:

      1. full path and filename

      2. hash signature (xxhash)

      3. (undecided) owner+group

      4. (undecided) permissions

      5. (undecided) create+modify date

  2. Backup to scratch location

  3. File “Winterization”

    1. Archive (tar)

    2. Compression

    3. Encryption  

  4. Transfer to endpoints.

    1. Endpoint 1:Transfer  backup file(s) to Glacier

      1. use s3cmd to transfer backup to Amazon S3 bucket

      2. After 1 month,  Amazon automatically migrates backup files to Amazon Glacier

    2. Endpoint 2: Transfer backup files to external hard drive

Differential Backups

The process for differential backups is very similar,  with the main differences relating to the manifest file generation process.

  1. Manifest File Generation

    1. obtain full backup manifest

    2. generate new, full backup manifest

    3. using old+new manifest, derive list of file actions

      1. Deleted files

      2. New + modified files - this also serves as the manifest for the contents of the differential backup.

  2. Backup to scratch location

  3. File winterization

  4. Transfer to endpoints


Restoration Procedures

Restoring a Full Backup

  1. Obtain full backup manifest

  2. Obtain full backup file  (i.e. transfer to scratch location)

  3. Unpack full backup

    1. unencrypt

    2. uncompress

    3. untar

    4. Optional - consult manifest & verify hashes.

  4. Move backup files from scratch to target

Restoring Differential Backup

  1. Obtain differential backup manifest

  2. Unpack differential backup file

  3. Process deletions and additions

    1. Process Deleted files - the differential manifest specifies which files to remove from the target filesystem.

    2. Move backup files from scratch to target


Glacier Backup Details


Procedure for Backing up to Glacier

We plan to use Amazon’s automated S3-to-Glacier transfer functionality.  More information can be found here: https://aws.amazon.com/blogs/aws/archive-s3-to-glacier/.


Amazon Glacier is Amazon’s data archival service.   Glacier provides low-cost, durable storage that is tailored for data archival and backup services.  


Amazon S3 is a near-realtime online storage service.  While it can serve as a backup destination, it is more tailored for low-latency & high-availability file access and S3 pricing reflects this.  S3 has been in service longer than Glacier, and benefits from a good selection of mature 3rd-party file transfer utilities.



Amazon Primers

Amazon S3 Filesystem Layout

Glacier Filesystem Layout



How Manifests Work

How S3-to-Glacier works

How to copy to S3


Suggested S3 File Layout

Example Layout