Reporting Format Requirements

This page provides an overview of the review process for datasets utilizing reporting formats.

ESS-DIVE’s Reporting Formats are designed to make data and metadata published on ESS-DIVE more FAIR (Findable, Accessible, Interoperable, Reusable). Consistent formatting of data and metadata enables both machines and humans to better understand and reuse valuable data.

We use reporting formats to enable advanced search within data files. Specifically, the Fusion Database (Fusion DB) validates, extracts and indexes data within standardized files.

The contents of public data and metadata files successfully parsed by the FusionDB are made searchable by the Deep Dive API, which is separate from the ESS-DIVE main search and Dataset API. This currently requires the use of the File Level Metadata (FLMD) and Comma Separated Values (CSV) Guidelines Reporting Formats. These reporting formats are widely applicable to data types stored on ESS-DIVE and ensure that data files are described through standardized metadata fields and are machine-readable. The Fusion DB provides feedback to the ESS-DIVE Publication Review Team if any requirements are not met. These requirements are outlined below. For more detailed documentation of all Reporting Formats, please visit the ESS-DIVE Workspace GitHub.

We plan to expand the FusionDB to incorporate data-type specific reporting formats and associated automated validations in the future.

Reporting Format Checks

A series of checks are performed during the publication review process for datasets using reporting formats. Checks listed as required are necessary for machine readability and parsing, whereas strongly recommended and optional checks are recommended enhancements to metadata.

Example datasets that have passed all reporting format checks are available below.

Check Name

Requirement Level

Description

File Name

Required

File name uses only letters, numbers, and underscores. Do not include spaces and do not start with an underscore or hyphen.

File Description

Required

A brief description (minimum 10 characters) is provided

Column or Row Name

Required

Column or row names use only letters, numbers, hyphens, and underscores. Do not include spaces, and do not start with an underscore, hyphen, or number.

Unit

Required

Unit is present

Definition

Required

Description is present

Character Set

Required

All characters are within US-ASCII character set without extensions or UTF-8

Delimiter

Required

Delimiter used for file is comma and saved as a CSV file

Data Matrix

Required

Contents of the data portion of the file is organized in a logical and readable matrix format

Column or Row Name Orientation

Required

Orientation of the file is either horizontal or vertical

Consistent Values

Required

Text and numeric data are not mixed within the same Column or Row

Missing Value Codes

Required

All cells in the data matrix have a value and missing data are represented with Missing Value Codes

Temporal Data

Required

Date format follows ISO 8601 standard (YYYY-MM-DD, to known precision) and time format following Coordinated Universal Time (UTC) (YYYY-MM-DD hh:mm:ss, to known precision)

Spatial Data

Required

Geographic coordinates are provided in WGS84 decimal format

File naming conventions for File Level Metadata and Data Dictionary files

Required

A file within the dataset contains the following suffixes *_flmd.csv and *_dd.csv.

Reporting Format Keywords

Required

ESS-DIVE reporting format keywords are used. The File Level Metadata reporting format keyword is required for the FusionDB to identify, validate and parse your dataset.

Standard

Strongly Recommended

ESS-DIVE Standard field terms for reporting formats are used

Data Orientation

Optional

Check whether “horizontal” or “vertical” is provided within File Level Metadata file

Example Datasets Using Reporting Formats and Successfully Parsed By Fusion Database

Roley et al., (2023) Data and scripts associated with "Coupled primary production and respiration in a large river contrasts with smaller rivers and streams." doi:10.15485/1985922
Jastrow et al., (2022) Spatially Averaged Ice Contents of Ice-Wedge Polygon Cross-Sections to 3-m Depth, July 2013, Utqiagvik, Alaska doi:10.15485/1876898
Kaufman et al., (2023) Spatial Study 2022: Water Column, Sediment, and Total Ecosystem Respiration Rates across the Yakima River Basin, Washington, USA doi:10.15485/1987520
Gooseff et al., (2023) Riverbed and Near-Surface Water Quality Data, Hanford Reach, Columbia River, February 2021 - April 2022 doi:10.15485/2204421
Hassett et al., (2023) Carbon flux measurements from chambers collected between July to October 2022 at Old Woman Creek, Huron, Ohio doi:10.15485/2229438
Stolze et al., (2024) Aerobic respiration controls on shale weathering, Geochimica et Cosmochimica Acta, 2023: Dataset doi:10.15485/1987859
Wang et al., (2024) Continuous soil temperature measurements from 2019-10-4 to 2020-10-4, Teller road Mile 27, Seward Peninsula, Alaska doi:10.15485/2301692
Sala et al., (2024) Plot and Tree Characteristics from the 2022-2023 field experiment at Game Ridge, Missoula County, Montana, USA doi:10.15485/2371850
Williams et al., (2024) Anion Data for the East River Watershed, Colorado (2014-2023) doi:10.15485/1668054

PreviousMetadata Requirements NextPublish your Dataset

Last updated 10 months ago