faunal working group meeting notes

Metadata

  • What metadata do we need for analysis?
  • What metadata do we need for preservation?
  • And what is the minimal set?

confidentiality / access (RPA?)

Project view screen

  • add coding sheets, ontologies, information resource lists (minimum metadata)

Column metadata registration

  • extract to first class object, viewable. Description field would be useful for instance.

ontology issues

Vagueness / interpretation problems: Small/medium/large mammal issue

Presents a standardization issue for integration

  • Keith: create size classes (rodent-sized, cat-sized, dog-sized, deer-sized, etc.)
  • Karen: bovid classes from Africa / weight of mammal to standardize
  • map to Indeterminate Mammal class

fuzzy variables / nasty variables

At each level, have opt-out indeterminate class or bovid classes

break categories down to general types of animal / weight.

Kate: where an ontology class doesn't fit you should be able to add to it and create your own ontology on the fly

Give size classes weight ranges (from all over the world).

Regional faunal ontologies (fauna for Turkey, Africa, etc.)

sounds like integrating/standardizing by weight may be the best way to move forward

Kate: faunmap

Karen: integrated taxonomic system

Kate: need to map multiple ontologies to the same dataset

Show more ontologies, search by name

given north american fauna, need to be able to search other parts of the global / master ontology

worldwide ontology that has regional subontologies - see if we can represent this in OWL and then filter out via RDF...

problem with whole buried rodent skeletons?

when aggregating data, problem with inappropriate data (i.e., if something is found in the archaeological record like a cow bone that was found in the dig but should not be included within an aggregated set).

contaminated / disturbed context

modification variables:

  • multiple variables for one analyst are folded into a single variable for another analyst
    • Keith: what responsibility is on the system and what responsibility is on the analyst
    • is an appropriate solution just to include the context and have the user recode the data themselves
    • just don't provide the wrong answer

integrity (mixed/non-mixed), confidence, reliability measures

karen: notes field can have information that affects the integrability of the data (i.e., these forty rows are all for one rodent)

guidance / guides to good practices, how to structure your database so that tdar can interpret it in the best way. Should tdar treat datasets that don't have a "problematic" column differently?

reputation of information resources (datasets, coding sheets, etc.)

old world issues - language issues, i18n / localization but standardization is OK.

multi-lingual OWL file - deal with the same way that we have synonyms? Instead we'll probably need a <dc:language> annotation on an OWL node and then have an equivalence relation between those OWL nodes.

Improvements to the software

Comprehensibility index for your dataset, how usable is your dataset (is it coded, does it have a document / paper describing it, is it translated, has it been mapped to the ontology, do all coding sheet codes map to actual values, etc.)

Audit / review / turbotax like wizard driven interface where the steps to data entry are clear / explicit.

Look at value distribution to detect if column is arbitrary vs code

Suggestion: inverting the column metadata registration might be easier, where you have the measurement types (arbitrary integer/real/string, coded integer/real/string) and you map each value to a list of column names. E.g., coded integer -> Species, Burning, Foo.

Software-provided suggestions, make inferences given certain column names.

Taphonomic variables

  • Weathering (slightly <-> heavy), modifications (burning, gnawing, etc.), butchering, worked, tools, completeness (increments by quarters?), condition
  • Presence / absence
  • if present, is possible to map to an ontology of burning if desired but could also just map to a higher level concept
  • will we have an ontology for every faunal variable then?
  • searching based on more specific concepts don't display data mapped to higher level concepts...
  • bone concepts: proximal / distal / side / vertebrae / etc
  • proximal fused/unfused/fusing or distal fused/unfused/fusing issue

THE GOAL

Capture context by mapping the distinct data values within a column to a node in an ontology and then using that ontological context to integrate the databases.

Instead of having an ontology that captures everything, capture broad categories and give users the ability to extend that ontology (or replace it), etc.

Taphonomic issues

Example search: if dataset has > 50% weathered artifacts, not interested

Data issues

Who has the permissions to improve / modify / annotate datasets?

future directions / todo

  • index coding sheets / datasets with similarity index
  • store search queries that people issue
  • persistent queries (saved filters)
  • IMPORTANT: instead of having the system resolve split/aggregated variables, users can download dataset, transform, re-upload (how to preserve provenience in a formal way? could do so in the ResourceRelationship table)
  • peer review datasets
  • dataset reputation
  • measurements from different sites (id'ed by specimen) e.g., http://toothwear.asu.edu
    • Sarah: 3-8 measurement types / dimensions, key/value pairs with measurement type as key