This page provides coding examples for retrieving a list of submitted datasets and submitting dataset metadata for validation using Python.
While learning the expected schema for the metadata use https://api-sandbox.ess-dive.lbl.gov, as this is the domain shown in the examples in this documentation. Once you've familiarized yourself with ESS-DIVE's metadata and dataset JSON_LD schema, use our production domain https://api.ess-dive.lbl.gov/ to submit datasets to ESS-DIVE for publishing and review.
Setup the JSON for the “provider”, which includes details about the project. Simply update the "value" to use the desired project identifier, lookup project identifiers via ESS-DIVE's project list: https://data.ess-dive.lbl.gov/projects. The project will be listed as the publisher in the citation.
Prepare the dataset authors in the order that you would like them to appear in the citation. Please add the ORCID for all authors, especially the first author, if possible.
creators = [{"@id":"http://orcid.org/0000-0001-7293-3561","givenName":"Paul J","familyName":"Hanson","affiliation":"Oak Ridge National Laboratory","email":"hansonpj@ornl.gov"},{"givenName":"Jeffrey","familyName":"Riggs","affiliation":"Oak Ridge National Laboratory"},{"givenName":"C","familyName":"Nettles","affiliation":"Oak Ridge National Laboratory"},{"givenName":"William","familyName":"Dorrance","affiliation":"Oak Ridge National Laboratory"},{"givenName":"Les","familyName":"Hook","affiliation":"Oak Ridge National Laboratory"} ]
Create the rest of the JSON-LD object
json_ld ={"@context":"http://schema.org/","@type":"Dataset","@id":"http://dx.doi.org/10.3334/CDIAC/spruce.001","name":"SPRUCE S1 Bog Environmental Monitoring Data: 2010-2016","description": ["This data set reports selected ambient environmental monitoring data from the S1 bog in Minnesota for the period June 2010 through December 2016. Measurements of the environmental conditions at these stations will serve as a pre-treatment baseline for experimental treatments and provide driver data for future modeling activities.", "The site is the S1 bog, a Picea mariana [black spruce] - Sphagnum spp. bog forest in northern Minnesota, 40 km north of Grand Rapids, in the USDA Forest Service Marcell Experimental Forest (MEF). There are/were three monitoring sites located in the bog: Stations 1 and 2 are co-located at the southern end of the bog and Station 3 is located north central and adjacent to an existing U.S. Forest Service monitoring well.",
"There are eight data files with selected results of ambient environmental monitoring in the S1 bog for the period June 2010 through December 2016. One file has the ","other seven have the available data for a given calendar year. Not all measurements started in June 2010 and EM3 measurements ended in May 2014.","Further details about the data package are in the attached pdf file (SPRUCE_EM_DATA_2010_2016_20170620)." ],"creator": creators,"datePublished":"2015","keywords": ["EARTH SCIENCE > BIOSPHERE > VEGETATION","Climate Change" ],"variableMeasured": ["EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC TEMPERATURE > SURFACE TEMPERATURE > AIR TEMPERATURE","EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC WATER VAPOR > WATER VAPOR INDICATORS > HUMIDITY > RELATIVE HUMIDITY","EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC PRESSURE > SEA LEVEL PRESSURE","EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC TEMPERATURE > SURFACE TEMPERATURE > DEW POINT TEMPERATURE > DEWPOINT DEPRESSION","EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC WINDS > SURFACE WINDS > WIND SPEED","EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC WINDS > SURFACE WINDS > WIND DIRECTION","EARTH SCIENCE > BIOSPHERE > VEGETATION > PHOTOSYNTHETICALLY ACTIVE RADIATION","EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC RADIATION > NET RADIATION","EARTH SCIENCE > LAND SURFACE > SURFACE RADIATIVE PROPERTIES > ALBEDO","EARTH SCIENCE > LAND SURFACE > SOILS > SOIL TEMPERATURE","Precipitation (Total)","Irradiance","Groundwater Temperature","Groundwater Level","Volumetric Water Content","surface_albedo" ],"license":"http://creativecommons.org/licenses/by/4.0/","spatialCoverage": [{ "description": "Site ID: S1 Bog Site name: S1 Bog, Marcell Experimental Forest Description: The site is the 8.1-ha S1 bog, a Picea mariana [black spruce] - Sphagnum spp. ombrotrophic bog forest in northern Minnesota, 40 km north of Grand Rapids, in the USDA Forest Service Marcell Experimental Forest (MEF). The S1 bog was harvested in successive strip cuts in 1969 and 1974 and the cut areas were allowed to naturally regenerate. Stations 1 and 2 are located in a 1974 strip that is characterized by a medium density of 3-5 meter black spruce and larch trees with an open canopy. The area was suitable for siting a monitoring station for representative meteorological conditions on the S1 bog. Station 3 is located in a 1969 harvest strip that is characterized by a higher density of 3-5 meter black spruce and larch trees with a generally closed canopy. Measurements at this station represent conditions in the surrounding stand. Site Photographs are in the attached document",
"geo": [{"name":"Northwest","latitude":47.50285,"longitude":-93.48283},{"name":"Southeast","latitude":47.50285,"longitude":-93.48283} ]} ],"funder":{"@id":"http://dx.doi.org/10.13039/100006206","name":"U.S. DOE > Office of Science > Biological and Environmental Research (BER)"},"temporalCoverage":{"startDate":"2010-07-16","endDate":"2016-12-31"},"editor":{"@id":"http://orcid.org/0000-0001-7293-3561","givenName":"Paul J","familyName":"Hanson","email":"hansonpj@ornl.gov"},"provider": provider_spruce,"measurementTechnique": [ "The stations are equipped with standard sensors for measuring meteorological parameters, solar radiation, soil temperature and moisture, and groundwater temperature and elevation. Note that some sensor locations are relative to nearby vegetation and bog microtopographic features (i.e., hollows and hummocks). See Table 1 in the attached pdf (SPRUCE_EM_DATA_2010_2016_20170620) for a list of measurements and further details. Sensors and data loggers were initially installed and became operational in June, July, and August of 2010. Additional sensors were added in September 2011. Station 3 was removed from service on May 12, 2014.",
"These data are considered at Quality Level 1. Level 1 indicates an internally consistent data product that has been subjected to quality checks and data management procedures. Established calibration procedures were followed." ]}
Please refer to the API documentation to understand the schema and navigate through any errors: https://api.ess-dive.lbl.gov
Metadata Only
Submit the JSON-LD object to the Dataset API
post_packages_url ="{}{}".format(base,endpoint)post_package_response = requests.post(post_packages_url, headers={"Authorization":header_authorization}, json=json_ld)if post_package_response.status_code ==201:# Success response=post_package_response.json()print(f"View URL:{response['viewUrl']}")print(f"Name:{response['dataset']['name']}")else:# There was an errorprint(post_package_response.text)
Metadata and Data
To submit the JSON-LD object along with data files, you need to create a folder named files and add your desired file to upload inside it.
files_tuples_array = []upload_file = “path/to/your_file”files_tuples_array.append((("json-ld", json.dumps(json_ld))))files_tuples_array.append(("data", open(upload_file ,'rb')))post_packages_url ="{}{}".format(base,endpoint)post_package_response = requests.post(post_packages_url, headers={"Authorization":header_authorization}, files= files_tuples_array)if post_package_response.status_code ==201:# Success response=post_package_response.json()print(f"View URL:{response['viewUrl']}")print(f"Name:{response['dataset']['name']}")else:# There was an errorprint(post_package_response.text)
Remember to change the file directories & file names to your actual names. The directory variable can be left blank if yourAPI is already located in the same directory as your file.
Metadata and Many Data Files
In case you have many files to be uploaded, you can place them all inside the files directory and use the following code:
files_tuples_array = []files_upload_directory ="your_upload_directory/"files = os.listdir(files_upload_directory)files_tuples_array.append((("json-ld", json.dumps(json_ld))))for filename in files: file_directory = files_upload_directory + filename files_tuples_array.append((("data", open(file_directory, 'rb'))))post_packages_url ="{}{}".format(base,endpoint)post_package_response = requests.post(post_packages_url, headers={"Authorization":header_authorization}, files= files_tuples_array)if post_package_response.status_code ==201:# Success response=post_package_response.json()print(f"View URL:{response['viewUrl']}")print(f"Name:{response['dataset']['name']}")else:# There was an errorprint(post_package_response.text)
Search for Datasets
Anyone can search for public datasets on ESS-DIVE using the Dataset API. If you are registered to submit data, you can also search for your private datasets. Query your dataset searches by defining parameters.
Limited dataset metadata are returned in the response of this call. Additionally, this call cannot be used to download data files. To look up all dataset metadata and download data files, use the API to Download Dataset Metadata.
The following lines of code will return the most recent 25 records. If the results contain more than 25 packages, use the row_start and page_size query parameters to page through the results.
For data users: Must pass isPublic=true to search for public datasets. This call searches for private datasets by default.
For data contributors: Must pass isPublic=false (or omit parameter) to search for private datasets.
get_packages_response = requests.get(get_packages_url, headers={"Authorization":header_authorization})if get_packages_response.status_code ==200:#Successprint(get_packages_response.json())else:# There was an errorprint(get_packages_response.text)
Download Dataset Metadata
Anyone can search for individual public datasets on ESS-DIVE using the Dataset API. If you are registered to submit data, you can also download your private dataset metadata.
The response for this call will return all dataset metadata and attached data files. Metadata and data files can then be downloaded. If you'd like to look up the dataset upload date, last modified date, or dataset access status, use the API to Search for Datasets.
# ESS-DIVE Identifiers are in the format of: ess-dive-0f0348396e46261-20181022T131245032205dataset_id ="<Enter an ESS-DIVE Identifier here>"get_package_url ="{}{}/{}?&isPublic=true".format(base,endpoint, dataset_id)
get_package_response = requests.get(get_package_url, headers={"Authorization":header_authorization})if get_package_response.status_code ==200:#Successprint(get_package_response.json())else:# There was an errorprint(get_package_response.text)
Update a Dataset
It is possible to both update the metadata and data of an existing dataset. The following update scenarios are possible
update metadata only
replace/add files only
both update metadata and replace/add files.
These examples will demonstrate both updating metadata and adding new files to the dataset created in previous sections.
Metadata Only
Use the PUT function to update the metadata of a dataset. This example updates the name of a dataset.
Check the results for the changed metadata attribute
# Check for errorsif put_package_response.status_code ==200:# Success response=put_package_response.json()print(f"View URL:{response['viewUrl']}")print(f"Name:{response['dataset']['name']}")else:# There was an errorprint(put_package_response.text)
Metadata plus new data file
Use the PUT function to update a dataset. This example updates the date published to 2019 of a dataset and adds a new data file.
Check the results for the changed metadata attribute and newly uploaded file
# Check for errorsif put_package_response.status_code ==200:# Success response=put_package_response.json()print(f"View URL:{response['viewUrl']}")print(f"Date Published:{response['dataset']['datePublished']}")print(f"Files In Dataset:{response['dataset']['distribution']}")else:# There was an errorprint(put_package_response.text)
get_packages_url ="{}{}".format(base,endpoint)get_packages_response = requests.get(get_packages_url, headers={"Authorization":header_authorization})if get_packages_response.status_code ==200:#Successprint(get_packages_response.json())else:# There was an errorprint(get_packages_response.text)
Troubleshooting
{"detail":"You do not have authorized access"}
This error message indicates your token is either incorrect or expired. Please follow the instructions on the ESS-DIVE Dataset API page to retrieve a new token.
{"detail":"One or more fields raised validation errors.","errors":["provider/member 'familyName' is a required property"]}
This error message indicates a required field is missing from your JSON. In this case it is the "familyName". Revise your JSON to include the mandatory fields.
FileNotFoundErrorTraceback (most recent call last)<ipython-input-28-7ae5b841a5b0>in<module>56 files_tuples_array.append((("json-ld", json.dumps(json_ld))))---->7 files_tuples_array.append(("data", open(file_directory ,'rb')))89 post_packages_url ="{}{}".format(base,endpoint)FileNotFoundError: [Errno 2] No such file or directory:'trials.csv'
This error message indicates the file entered was not found. This could be because you are searching in the wrong directory or because you misrepresented the file name.