Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: added section re: fixity checks

...

  1. XML representations of each metadata record are stored at the base of each record directory. They are dated and time-stamped to allow for multiple versions.  These are automatically generated when a user saves a file and provide a backup in case of a database error, and versions to see changes over time.
  2. Sub-folders are created for each file a user associated with that record using the file id as a folder name. 
  3. Within each folder associated with a “file”, is another folder for successive versions of a file – thus when/if a file is replaced, it is provisioned a new version number eg. v1.
  4. Within each “version” folder is the original uploaded version of the file along with the MD5 checksum for that file.  This MD5 is also stored in tDAR's metadata database in order to perform routine integrity checks on the file.
  5. Finally, a derivatives “deriv“ folder maintains additional supporting files. Each derivative could theoretically be generated or re-generated as needed, but we decided that improved performance is worth the cost of storing derivatives.   The derivatives include:
    1. 3 separate thumbnails (small, medium, large) for each document or image or other resources for which a thumbnail would be useful, for use in various displays in tDAR.
    2. Extracted metadata from the document header, for indexing and other purposes.
    3. Translated versions of data sets using coding sheets and ontologies
    4. Extracted text for full-text indexing for documents, data sets, or other files, for faster reindexing, which occurs when a record is saved or other points.
    5. Other files as needed

...

As tDAR's functionality expands, and the number of files and formats is increased, these workflows will need to be improved and increase their flexibility and functionality.

Fixity Checks and File Integrity

In addition to the workflows associated with the ingest of file, tDAR performs recurring integrity checks on these files to ensure that they have not been corrupted or otherwise altered from their original form at the time of ingest.  At the time of ingest, the system records the MD5 checksum of the file in tDAR's metadata database.  TDAR then routinely confirms the fixity of these files by comparing a file's current MD5 check against the recorded MD5 value in the tDAR metadata database.  Any discrepancies are recorded and reported to Digital Antiquity staff. 

Web Layer – Struts2, Freemarker, JavaScript, CSS, and HTML

...

  • TDAR.common: Common functions and utilities that are utilized on most pages in tDAR, and low-level functionality utilized by the other TDAR components
  • TDAR.advancedSearch: functionality related to to the tDAR's "Advanced Search" page.
  • TDAR.autocomplete: provides the functionality for "autocomplete" form fields.
  • TDAR.contexthelp: enables context-sensitive help pop-ups on various tDAR forms.
  • TDAR.datatable: extends the JQuery DataTable plugin and allows it to be more-easily used in conjunction with tDAR-specific data.
  • TDAR.fileupload: extends the JQuery File Upload plugin , enables validation rules on the types of files and file names that users may upload to tDAR.
  • TDAR.integration: support for TDAR's dataset integration UI.
  • TDAR.maps: enables google map support, provides UI that allows users to designate map boundaries for tDAR resources.
  • TDAR.pricing: support for TDAR's pricing page UI.
  • TDAR.repeatrow: enables support for multi-valued data-entry in tDAR forms.

...


Concerns & Potential Pitfalls

...