Dataset Requirements
This page provides an overview of the minimum metadata required for publication on ESS-DIVE. These requirements are used by the ESS-DIVE team to review datasets before approval for publication.
ESS-DIVE’s dataset metadata requirements allow you to fully describe your dataset so that others can more easily find and use relevant data from dataset searches. Metadata for each dataset submitted should meet the guidelines in the description of each field listed below, and metadata completeness will be assessed during the dataset publication process using both automated and manual review workflows. Ensuring that your dataset has complete metadata before requesting publication will expedite the publication process.
Dataset metadata fields marked with a red asterisk (*) are required to submit your dataset. All dataset metadata fields are reviewed using the requirements outlined in the Description of each field.
JSON-LD Fields
Datasets that are created or edited using the Dataset API must format metadata using the JSON-LD schema. The JSON-LD rows indicates what each metadata field looks like in the JSON-LD schema. This field is only necessary if you are using or plan to use the Dataset API to submit datasets to ESS-DIVE.
Automated Checks
Datasets are reviewed based on a set of automated checks that are performed whenever a dataset is submitted. The results of these checks are compiled into Assessment Reports, which are also used by ESS-DIVE reviewers to assess the quality of the dataset before publication. Assessment reports can be viewed by the dataset submitter prior to requesting publication and will become visible to users of ESS-DIVE once the dataset is published. Failed automated checks or warnings that appear on the Assessment Report should be addressed by the dataset submitter before requesting publication. For additional details on assessment reports, please review the Assessment Report documentation.
Please note that assessment reports can take minutes, or up to 24 hours, to generate.
Working Offline? ESS-DIVE's Offline Metadata Template can be used to prepare your dataset metadata prior to submission. We recommend using the template to collaborate with your team members in Google Docs, then copying and pasting the completed fields into the ESS-DIVE Dataset Submission form when you are ready to create your dataset.
Overview
Title*
Format | Free Text |
Description | Include a title between 7-20 words long which contains information such as the topic, geographic location, dates, and scale of data. If data is associated with a journal publication, the title may include the journal name. Avoid unexplained acronyms or project-specific vocabulary. If there is an existing DOI for the data, use the same title. |
Example | Raw sapflow and soil moisture data from January 2016-April 2016 in Manaus, Brazil |
JSON-LD Field |
|
Automated Check | Required: Dataset title is between 7 and 40 words in length |
Existing DOI & Alternate Identifier
Format | Free Text |
Description | If this dataset has been previously published elsewhere, enter the DOI or alternate identifier. Identifiers are used to locate the dataset within your project's data management system and can provide pertinent contextual information for users. Ensure the identifier correctly leads to the dataset that you are submitting. |
Example | http://dx.doi.org/XXXX |
JSON-LD Field |
|
Abstract*
Format | Free Text |
Description | The abstract should be at least 100 words in length, written in full sentences, and understandable to anyone who has not seen related manuscripts. Describe the content of the dataset, and provide all necessary scientific context, avoid unexplained acronyms or project specific terms, and include specific details that promote the reproducibility of your data. This may include source data for synthesis work, software necessary to view the related files, ecosystem type involved, or measurement types. Include a statement about the purpose for why these data were generated and the research question it is intended to answer. |
Example | This dataset contains raw output from a data logger connected to 9 sapflow and 5 soil moisture sensors in Manaus, Brazil. The file xxx.dat contains raw data and the metadata file (BR-Ma2_E-fieldlog_20160501.xls) has information on locations where the sensors were installed and other sensor maintenance details. No data processing or QA/QC was done on the raw datasets. Processed data will be uploaded as separate datasets on ESS-DIVE. This research was performed as a part of the NGEE Tropics project, which aims to advance model predictions of tropical forest carbon cycle responses to a changing climate over the 21st Century. |
JSON-LD Field |
|
Automated Check | Required: Abstract is at least 100 words in length |
Keywords*
Format | GCMD Keywords OR Free Text |
Description | Add a minimum of three total keywords or data variables. As you begin typing in the web form field, GCMD controlled vocabulary terms will appear in a dropdown list. Selecting from the GCMD controlled keywords where possible is encouraged but not required. You can also enter your own keywords. Ensure that keyword terms differ from words in the title to increase the findability of your dataset in searches. |
Example | Earth Science, Land Surface, Soils |
JSON-LD Field |
|
Automated Check | Required: At least three keywords Optional: Keywords differ from terms in dataset title |
Data Variables
Format | GCMD Keywords or Free Text |
Description | Add variables to increase the findability of your dataset in searches. Similarly to the keywords field, selecting variable terms from GCMD controlled vocabulary where possible is encouraged but not required. |
Example | Soil Moisture |
JSON-LD Field |
|
Publication Date
Format | YYYY or YYYY-MM-DD |
Description | Specify a custom date or year when this dataset can be made publicly available. If this is not specified, it will default to the current date. |
Example | 2019 or 2019-04-19 |
JSON-LD Field |
|
Automated Check | Required: Publication date is present |
Usage Rights*
Format | Select choice |
Description | Choose how you wish your data to be shared and reused. Creative Commons Attribution (CC BY 4.0) requires that the dataset be cited by anyone using the data. Creative Commons Public Domain (CC BY 1.0) dedicates the data to the public domain without restriction. When using the API, enter the URL for the selected CC BY license. |
Example | Select Creative Commons Attribution (CC BY 4.0) or Creative Commons Public Domain (CC BY 1.0) |
JSON-LD Field |
|
Automated Check | Optional: Usage rights is set to Creative Commons CC-BY license |
Project *
Online Form only
Format | Controlled List |
Description | Select the DOE project name from the drop down list, which will appear when you start typing in the project name or Principal Investigator (PI) name. If multiple projects were involved, enter the project that had the largest contribution to this dataset. |
Example | Next-Generation Ecosystem Experiments (NGEE) Tropics [PI: Jeffrey Chambers] |
JSON-LD Field |
|
Automated Check | Required: Project name from controlled list |
API Only
Format | Value |
Description | Enter the project ID into the JSON-LD field. Written project names will not be accepted. Look up your project ID using ESS-DIVE's Project List. If multiple projects were involved, enter the project that had the largest contribution to this dataset. |
Example | |
JSON-LD Field |
Funding Organization*
Format | Controlled List or Free Text |
Description | List the organizations that funded the work. When using the web form, you can choose from the drop down list as you begin to enter the funding organization. |
Example | [Example from dropdown list]: U.S. DOE > Office of Science > Biological and Environmental Research (BER) |
JSON-LD Field |
|
Automated Check | Optional: Funding organization "U.S. DOE > Office of Science > Biological and Environmental Research (BER)" is present |
DOE Contracts
Format | Controlled List or Free Text |
Description | List the numbers of any DOE contract under which the data in the package was funded. Enter "NONE" if no DOE funding applies. If the dataset is a result of a joint effort between two or more DOE Site/Facility Management Contractors, etc., additional DOE contract numbers may be entered. |
Example | AC0205CH11231 |
JSON-LD Field |
|
Related References
Format | Free Text |
Description | Include the full citations and DOIs of datasets or publications associated with your dataset. These related materials allow users to learn more about the dataset, processing methods, or how the data were used. |
Example | Somebody J. (2018), Sapflow and soil moisture coupling in the Amazon, Journal. doi: xx.xxxx |
JSON-LD Field |
|
People
Contact*
Format | Free Text |
Description | List the person who should be contacted by users seeking further information for the data. Only one contact is allowed. Including the ORCID of this individual is strongly encouraged. |
Example | First name, Last name, Organization, Email, ORCID (strongly encouraged) |
JSON-LD Field |
|
Automated Check | Required: Contact is present Required: Contact ORCID is provided |
Creators
Format | Free Text |
Description | Include the main researchers involved in producing the data such as authors, owners, originators, and principal investigators. List creators in the order they should appear in the dataset citation. One or more creators is required and including email addresses is highly encouraged. |
Example | First name, Last name, Organization, Email, ORCID (not required for creators) |
JSON-LD Field |
|
Automated Check | Required: At least one creator is present |
Contributors
Format | Free Text |
Description | List any additional contributors involved in producing the data. These may include people who assisted in creating the dataset but are not considered authors. Contributors will not appear in the data citation. Including email addresses is highly encouraged. |
Example | First name, Last name, Organization, Email, ORCID (not required for contributors) |
JSON-LD Field |
|
Dates
Start Date
Format | YYYY-MM-DD |
Description | Earliest date of data collection included in the dataset. |
Example | 2017-04-16 |
JSON-LD Field |
|
Automated Check | Required: Start date is present |
End Date
Format | YYYY-MM-DD |
Description | Last date of data collection included in the dataset. This field can be left blank if your dataset is open ended. |
Example | 2019-07-13 |
JSON-LD Field |
|
Automated Check | Required: End date is present |
Locations
Geographic Description
Format | Free Text |
Description | A short description of the location(s) where data was collected. This may include the location name, known identifiers if associated with a specific project (e.g. Ameriflux site name), and ecosystem type involved. Multiple geographic descriptions can be added if necessary. A complete geographic description will increase the findability of your dataset, as all terms entered are searchable through the data portal. |
Example | Br-Ma2, Manaus, Brazil: ZF2 K34 Tower. Eddy covariance site established in 1999 on kilometer 34 of the ZF2 highway. It was later expanded into an atmospheric and soil sampling hub. It is a 1.5m x 2.5 m- section aluminum tower, 50 m tall, on a medium-sized plateau (Araujo et al., 2002). |
JSON-LD Field |
|
Bounding Box Coordinates
Format | Latitude and Longitude in WGS 84 decimal degrees |
Description | Latitude and Longitude of the location(s) this data represent in WGS84 decimal format. Enter only one coordinate pair for a single point and bounding box coordinates for non-point locations. Ensure coordinate accuracy before submitting your dataset. If the data location is better represented by a shape, you may also include a KML file in the file uploads. |
Example | Northwest Coordinates [Lat Long]/Southeast Coordinates [Lat Long] |
JSON-LD Field |
|
Automated Check | Optional: Coordinates describing the point location or geographic area of the dataset are present |
Methods
Methods
Format | Free Text |
Description | Methods for a dataset should focus on all aspects of dataset production and should be thorough enough for your work to be reproduced. Include descriptions of the experimental design, laboratory and/or field collection methods (e.g. observations and/or devices used), source data for synthesis studies, data processing, and QA/QC procedures, and known issues or limitations of data where applicable. A complete methods section will improve findability of your data, as all text entered into methods will also be searchable for users through the data portal filters. You may provide a citation for any methods used that were published previously, but methods related to data production must still be included. |
Example | An example of a complete methods section can be viewed at: Conlisk E ; Castanha C ; Germino M J ; Veblen T T ; Smith J M ; Kueppers L M (2017): Data from: "Declines in low-elevation subalpine tree populations outpace growth in high-elevation populations with warming". Subalpine and Alpine Species Range Shifts with Climate Change: Temperature and Soil Moisture Manipulations to Test Species and Population Responses. doi:10.15485/1730950 |
JSON-LD Field |
|
Automated Check | Required: Methods description is more than 7 words in length |
Additional Automated Checks
The below checks are run on each dataset upon submission as a part of the ESS-DIVE automated check suite. Informational checks appear on the assessment reports and are not pass/fail.
Criteria | Required/Optional | FAIR Category |
---|---|---|
URLs in metadata resolve correctly | Required | Findable |
Data file formats are non-proprietary | Optional | Reusable |
Informational: Number of creators with email addresses provided | Informational | Findable |
Informational: Number of contacts with email addresses provided | Informational | Findable |
Informational: Count of data entities present | Informational | Interoperable |
Last updated