Dataset Requirements

This page provides an overview of the minimum metadata required for publication on ESS-DIVE. These requirements are used by the ESS-DIVE team to review datasets before approval for publication.

ESS-DIVE’s dataset metadata requirements allow you to fully describe your dataset so that others can more easily find and use relevant data from dataset searches. Metadata for each dataset submitted should meet the guidelines in the description of each field listed below, and metadata completeness will be assessed during the dataset publication process using both automated and manual review workflows. Ensuring that your dataset has complete metadata before requesting publication will expedite the publication process.

Dataset metadata fields marked with a red asterisk (*) are required to submit your dataset. All dataset metadata fields are reviewed using the requirements outlined in the Description of each field.

JSON-LD Fields

Datasets that are created or edited using the Dataset API must format metadata using the JSON-LD schema. The JSON-LD rows indicates what each metadata field looks like in the JSON-LD schema. This field is only necessary if you are using or plan to use the Dataset API to submit datasets to ESS-DIVE.

Automated Checks

Datasets are reviewed based on a set of automated checks that are performed whenever a dataset is submitted. The results of these checks are compiled into Assessment Reports, which are also used by ESS-DIVE reviewers to assess the quality of the dataset before publication. Assessment reports can be viewed by the dataset submitter prior to requesting publication and will become visible to users of ESS-DIVE once the dataset is published. Failed automated checks or warnings that appear on the Assessment Report should be addressed by the dataset submitter before requesting publication. For additional details on assessment reports, please review the Assessment Report documentation.

Please note that assessment reports can take minutes, or up to 24 hours, to generate.

Working Offline? ESS-DIVE's Offline Metadata Template can be used to prepare your dataset metadata prior to submission. We recommend using the template to collaborate with your team members in Google Docs, then copying and pasting the completed fields into the ESS-DIVE Dataset Submission form when you are ready to create your dataset.

Overview

Title*

Format

Free Text

Description

Include a title between 7-20 words long which contains information such as the topic, geographic location, dates, and scale of data. If data is associated with a journal publication, the title may include the journal name.

Avoid unexplained acronyms or project-specific vocabulary. If there is an existing DOI for the data, use the same title.

Example

Raw sapflow and soil moisture data from January 2016-April 2016 in Manaus, Brazil

JSON-LD Field

name

Automated Check

Required: Dataset title is between 7 and 40 words in length

Existing DOI & Alternate Identifier

Format

Free Text

Description

If this dataset has been previously published elsewhere, enter the DOI or alternate identifier. Identifiers are used to locate the dataset within your project's data management system and can provide pertinent contextual information for users. Ensure the identifier correctly leads to the dataset that you are submitting.

Example

http://dx.doi.org/XXXX

JSON-LD Field

alternateName

Abstract*

Format

Free Text

Description

The abstract should be at least 100 words in length, written in full sentences, and understandable to anyone who has not seen related manuscripts. Describe the content of the dataset, and provide all necessary scientific context, avoid unexplained acronyms or project specific terms, and include specific details that promote the reproducibility of your data. This may include source data for synthesis work, software necessary to view the related files, ecosystem type involved, or measurement types. Include a statement about the purpose for why these data were generated and the research question it is intended to answer.

Example

This dataset contains raw output from a data logger connected to 9 sapflow and 5 soil moisture sensors in Manaus, Brazil. The file xxx.dat contains raw data and the metadata file (BR-Ma2_E-fieldlog_20160501.xls) has information on locations where the sensors were installed and other sensor maintenance details. No data processing or QA/QC was done on the raw datasets. Processed data will be uploaded as separate datasets on ESS-DIVE. This research was performed as a part of the NGEE Tropics project, which aims to advance model predictions of tropical forest carbon cycle responses to a changing climate over the 21st Century.

JSON-LD Field

description

Automated Check

Required: Abstract is at least 100 words in length

Keywords*

Format

GCMD Keywords OR Free Text

Description

Add a minimum of three total keywords or data variables. As you begin typing in the web form field, GCMD controlled vocabulary terms will appear in a dropdown list. Selecting from the GCMD controlled keywords where possible is encouraged but not required. You can also enter your own keywords. Ensure that keyword terms differ from words in the title to increase the findability of your dataset in searches.

Example

Earth Science, Land Surface, Soils

JSON-LD Field

keywords

Automated Check

Required: At least three keywords Optional: Keywords differ from terms in dataset title

Data Variables

Format

GCMD Keywords or Free Text

Description

Add variables to increase the findability of your dataset in searches. Similarly to the keywords field, selecting variable terms from GCMD controlled vocabulary where possible is encouraged but not required.

Example

Soil Moisture

JSON-LD Field

variableMeasured

Publication Date

Format

YYYY or YYYY-MM-DD

Description

Specify a custom date or year when this dataset can be made publicly available. If this is not specified, it will default to the current date.

Example

2019 or 2019-04-19

JSON-LD Field

datePublished

Automated Check

Required: Publication date is present

Usage Rights*

Format

Select choice

Description

Choose how you wish your data to be shared and reused. Creative Commons Attribution (CC BY 4.0) requires that the dataset be cited by anyone using the data. Creative Commons Public Domain (CC BY 1.0) dedicates the data to the public domain without restriction. When using the API, enter the URL for the selected CC BY license.

Example

Select Creative Commons Attribution (CC BY 4.0) or Creative Commons Public Domain (CC BY 1.0)

JSON-LD Field

license

Automated Check

Optional: Usage rights is set to Creative Commons CC-BY license

Project *

Online Form only

Format

Controlled List

Description

Select the DOE project name from the drop down list, which will appear when you start typing in the project name or Principal Investigator (PI) name. If multiple projects were involved, enter the project that had the largest contribution to this dataset.

Example

Next-Generation Ecosystem Experiments (NGEE) Tropics [PI: Jeffrey Chambers]

JSON-LD Field

provider

Automated Check

Required: Project name from controlled list

API Only

Format

Value

Description

Enter the project ID into the JSON-LD field. Written project names will not be accepted. Look up your project ID using ESS-DIVE's Project List. If multiple projects were involved, enter the project that had the largest contribution to this dataset.

Example

1e6d50d3-9532-43fb-a63f-bdcb4350bf0c

JSON-LD Field

provider = {
   "identifier": {
      "@type": "PropertyValue",
      "propertyID": "ess-dive",
      "value": "<Project ID>"
}

Funding Organization*

Format

Controlled List or Free Text

Description

List the organizations that funded the work. When using the web form, you can choose from the drop down list as you begin to enter the funding organization.

Example

[Example from dropdown list]: U.S. DOE > Office of Science > Biological and Environmental Research (BER)

JSON-LD Field

funder

Automated Check

Optional: Funding organization "U.S. DOE > Office of Science > Biological and Environmental Research (BER)" is present

DOE Contracts

Format

Controlled List or Free Text

Description

List the numbers of any DOE contract under which the data in the package was funded. Enter "NONE" if no DOE funding applies. If the dataset is a result of a joint effort between two or more DOE Site/Facility Management Contractors, etc., additional DOE contract numbers may be entered.

Example

AC0205CH11231

JSON-LD Field

award

Format

Free Text

Description

Include the full citations and DOIs of datasets or publications associated with your dataset. These related materials allow users to learn more about the dataset, processing methods, or how the data were used.

Example

Somebody J. (2018), Sapflow and soil moisture coupling in the Amazon, Journal. doi: xx.xxxx

JSON-LD Field

citation

People

Contact*

Format

Free Text

Description

List the person who should be contacted by users seeking further information for the data. Only one contact is allowed. Including the ORCID of this individual is strongly encouraged.

Example

First name, Last name, Organization, Email, ORCID (strongly encouraged)

JSON-LD Field

editor

Automated Check

Required: Contact is present Required: Contact ORCID is provided

Creators

Format

Free Text

Description

Include the main researchers involved in producing the data such as authors, owners, originators, and principal investigators. List creators in the order they should appear in the dataset citation. One or more creators is required and including email addresses is highly encouraged.

Example

First name, Last name, Organization, Email, ORCID (not required for creators)

JSON-LD Field

creator

Automated Check

Required: At least one creator is present

Contributors

Format

Free Text

Description

List any additional contributors involved in producing the data. These may include people who assisted in creating the dataset but are not considered authors. Contributors will not appear in the data citation. Including email addresses is highly encouraged.

Example

First name, Last name, Organization, Email, ORCID (not required for contributors)

JSON-LD Field

contributor

Dates

Start Date

Format

YYYY-MM-DD

Description

Earliest date of data collection included in the dataset.

Example

2017-04-16

JSON-LD Field

temporalCoverage

Automated Check

Required: Start date is present

End Date

Format

YYYY-MM-DD

Description

Last date of data collection included in the dataset. This field can be left blank if your dataset is open ended.

Example

2019-07-13

JSON-LD Field

temporalCoverage

Automated Check

Required: End date is present

Locations

Geographic Description

Format

Free Text

Description

A short description of the location(s) where data was collected. This may include the location name, known identifiers if associated with a specific project (e.g. Ameriflux site name), and ecosystem type involved. Multiple geographic descriptions can be added if necessary. A complete geographic description will increase the findability of your dataset, as all terms entered are searchable through the data portal.

Example

Br-Ma2, Manaus, Brazil: ZF2 K34 Tower. Eddy covariance site established in 1999 on kilometer 34 of the ZF2 highway. It was later expanded into an atmospheric and soil sampling hub. It is a 1.5m x 2.5 m- section aluminum tower, 50 m tall, on a medium-sized plateau (Araujo et al., 2002).

JSON-LD Field

spatialCoverage/description

Bounding Box Coordinates

Format

Latitude and Longitude in WGS 84 decimal degrees

Description

Latitude and Longitude of the location(s) this data represent in WGS84 decimal format. Enter only one coordinate pair for a single point and bounding box coordinates for non-point locations. Ensure coordinate accuracy before submitting your dataset. If the data location is better represented by a shape, you may also include a KML file in the file uploads.

Example

Northwest Coordinates [Lat Long]/Southeast Coordinates [Lat Long]

JSON-LD Field

spatialCoverage

Automated Check

Optional: Coordinates describing the point location or geographic area of the dataset are present

Methods

Methods

Format

Free Text

Description

Methods for a dataset should focus on all aspects of dataset production and should be thorough enough for your work to be reproduced. Include descriptions of the experimental design, laboratory and/or field collection methods (e.g. observations and/or devices used), source data for synthesis studies, data processing, and QA/QC procedures, and known issues or limitations of data where applicable. A complete methods section will improve findability of your data, as all text entered into methods will also be searchable for users through the data portal filters.

You may provide a citation for any methods used that were published previously, but methods related to data production must still be included.

Example

An example of a complete methods section can be viewed at: Conlisk E ; Castanha C ; Germino M J ; Veblen T T ; Smith J M ; Kueppers L M (2017): Data from: "Declines in low-elevation subalpine tree populations outpace growth in high-elevation populations with warming". Subalpine and Alpine Species Range Shifts with Climate Change: Temperature and Soil Moisture Manipulations to Test Species and Population Responses. doi:10.15485/1730950

JSON-LD Field

measurementTechnique

Automated Check

Required: Methods description is more than 7 words in length

Additional Automated Checks

The below checks are run on each dataset upon submission as a part of the ESS-DIVE automated check suite. Informational checks appear on the assessment reports and are not pass/fail.

CriteriaRequired/OptionalFAIR Category

URLs in metadata resolve correctly

Required

Findable

Data file formats are non-proprietary

Optional

Reusable

Informational: Number of creators with email addresses provided

Informational

Findable

Informational: Number of contacts with email addresses provided

Informational

Findable

Informational: Count of data entities present

Informational

Interoperable

Last updated