Over the past few years we've seen a lot of data sets go into tDAR. In that time, we've learned a great deal about what makes a data set ready for archiving, and conversely, what some of the common problem-spots are. Within tDAR, we try to catch many of these errors and provide users with warnings. Below are a few tips to avoid the most common issues we see when archiving a CSV, TAB or Excel data set.
Add Column Names
Verify that each data table in your data set has column names. Column names are critical to proper archiving. Without column names, it becomes difficult to understand what data or type of data is in each column. Make sure the column name is descriptive and clear.
Example: Data set without any column names
Example: Data set with column names
Add Column Descriptions
Add column descriptions to make the data easier to understand. Even better, once you've uploaded a data set to tDAR, add descriptions to your column identifying counts, how the data was collected, or any notes or issues.
Verify Column Formatting
If you format columns as numbers or currency, double check that every row matches that format. To check the formatting, highlight the cell, column, or cells and choose "Format > Cells" and check:
Example: Numeric column with textual data
Check for Errors in Calculations or Functions
Review your data workbooks prior to uploading to check for errors in calculations or functions. tDAR will also check your data for errors where possible and try and warn you.
Example: Workbook with Excel errors
Upload and Apply Coding Sheets to Unlock Coded Data
Upload and apply coding sheets in tDAR to unlock coded data. When collecting data, we often use coding sheets to simplify and speed up data entry. Once collected, however, it's important to capture the coded-values and coding sheets within tDAR as well, otherwise, it can become difficult to understand what the data represents. For example, in the table below, what is Material "B"? It could be "Bead" or "Bone" or something entirely different.
Example: Workbook with coded data
Check for Stray Data
Verify that you don't have stray data (check columns or comments) in your columns.Often when working with a data set, users will add a check columnto total up a row to check a value or add a comment. Make sure that these columns have been removed, or clearly labeled with a column name.
Example: Workbook unlabeled comments
Example: Workbook check-column
Verify Data is Properly Loaded
Check that your data has been loaded into Excel properly. Often, Excel is not the "starting point" for data. Instead data has been imported from another source; this may be a "tab" or character-delimited file, or a plain text file. Sometimes, Excel may not load the data properly and split data into multiple columns, or combine it into a single column. It's important to check every row to make sure that the data has been imported properly.
Example: Workbook with data-conversion errors (note level data)
Common Issues / Problems
Merged Column Headers
Avoid merged column headers. Make sure that each column has its own unique header column. For example: the material column should be two columns.
Example: Data set with column names and merged column headers (incorrect)
Example: Data set with column names and separate column headers (corrected)
|Locus||Quantity||Material (detailed)||Material (General)|
Multiple Data Tables
Don't include multiple data tables within the same excel worksheet. tDAR treats each workbook as its own data table. Adding multiple within the same workbook prevents the system from properly archiving or documenting the data within the columns.
Example 5: Workbook with two tables (incorrect)
Data as Images or Graphics
Don't include data as images or graphics in Excel Workbooks. tDAR cannot read the image contents. The row data will not be a "valid" data set. In most cases, data like this should be made into a PDF and uploaded as a document.
Example: Workbook with image data
Example: Workbook with graphics and textual comments descriptions