Linking Data Sets and Resources in tDAR

Background:

The creation of explicit links between records in tDAR and a dataset (i.e. an image database stem from a number of projects currently in the works at tDAR.  These include:

  1. The Mimbres Ceramic Database
  2. The Chesapeake Project
  3. The University of Arizona TreeRing lab

The work at the Mimbres project was the earliest and provided access to the majority of the materials, and is being used at the model.  There are a number of different ways in which this process could be implemented.  These include

  1. Creating a custom field structure that hangs off of a project or collection and anything in that collection can use that custom field structure. This would be pretty simple and straightforward, and have some nice benefits in terms of clarity and management, but it has a few issues:
    1. duplication of data (with the dataset) and thus synchronization
    2. complexity of management (adding an additional place to manage fields)
  2. Not creating tDAR records for each of the images, or media and producing some sort of mapping. This would be simpler, but have the main problem of how do you search, link or cite a record?
    1. In this model, things would live in the database, which would be fine, but things would no longer be visible to the tDAR record search, users could not browse or view them easily, but would have to go into a tDAR record for the dataset, and then find the record
  3. Creating an explicit link between a dataset and a set of resources in tDAR (perhaps via a project or a collection).  In this case:
    1. The dataset maintains itself as an entity
    2. The data table columns can be used for field configuration and mapping if needed
    3. The resource is still a resource in tDAR
    4. When the dataset is updated, a simple reindex is all that's needed (and can be automated)
    5. References should be able to be made bidirectionally
    6. Matching can be done via a link to a DataTableColumn and a value to look for and stored in the resource
    7. data can be indexed on the resource via a transient hashMap of DataTableColumns->Values on the InformationResource
Option # 3 is definitely the most extensible and functional at this point. Changes in this area would require code changes as follows:
  • enhancement of the DataTableColumnMapping section of tDAR to add new options:
    • Mapping Column
    • Allow alteration of Display Name
    • in the future adding "hidden" and "searchable" options on columns
  • enhancing the indexing service to do the data table lookup to preserve values
  • Mapping can be done via the filename being placed into a column of the database and then matched with a resource in tDAR with that file attached. Two additional configuration options should logically be available:
    • whether the file extension is included
    • whether there is a delimiter for multiple files eg filea;fileb; et.
  • With the Mimbres Database there is one additional configuration consideration, that's the mapping and combination of the Black &White filenames with the color ones.
  • enhancing the record view to display the data
  • creating the lookup and matching logic in the DatasetService
  • Creating a View option for the dataset in tDAR and ideally wire the images and links to display in that browse view
  • add tests and ensure that replacement works as intended