Python Example

This page provides coding examples for retrieving a list of submitted datasets and submitting dataset metadata for validation using Python.

While learning the expected schema for the metadata use https://api-sandbox.ess-dive.lbl.gov, as this is the domain shown in the examples in this documentation. Once you've familiarized yourself with ESS-DIVE's metadata and dataset JSON_LD schema, use our production domain https://api.ess-dive.lbl.gov/ to submit datasets to ESS-DIVE for publishing and review.

For additional information about the API, review the documentation at https://api-sandbox.ess-dive.lbl.gov.

ESS-DIVE Test API URL: https://api-sandbox.ess-dive.lbl.gov ESS-DIVE Production API URL: https://api.ess-dive.lbl.gov/ Help Desk: ess-dive-support@lbl.gov

Contents

Setup

Install the following python module into your python environment

$ pip install requests

In a python console or script add the following lines to setup your script.

import requests
import os
import json

token = "<Enter your authorization token here>"
base = "https://api-sandbox.ess-dive.lbl.gov/"
header_authorization =  "bearer {}".format(token)
endpoint = "packages"

Check that your token is up-to-date; it expires after a few days

Submit a Dataset

After creating metadata, you have the option to submit a dataset with only metadata or to submit a dataset with both metadata and data files.

Create Metadata

The following lines of code validate JSON-LD metadata for a single dataset. The example provided is from the ESS-DIVE sandbox site. (See https://data-sandbox.ess-dive.lbl.gov/#view/doi:10.3334/CDIAC/spruce.001).

Setup the JSON for the “provider”, which includes details about the project. Simply update the "value" to use the desired project identifier, lookup project identifiers via ESS-DIVE's project list: https://data.ess-dive.lbl.gov/projects. The project will be listed as the publisher in the citation.

provider_spruce = {
    "identifier": {
        "@type": "PropertyValue",
        "propertyID": "ess-dive",
        "value": "1e6d50d3-9532-43fb-a63f-bdcb4350bf0c"
    }
 }

Prepare the dataset authors in the order that you would like them to appear in the citation. Please add the ORCID for all authors, especially the first author, if possible.

creators =  [
   {
     "@id": "http://orcid.org/0000-0001-7293-3561",
     "givenName": "Paul J",
     "familyName": "Hanson",
     "affiliation": "Oak Ridge National Laboratory",
     "email": "hansonpj@ornl.gov"
   },
   {
     "givenName": "Jeffrey",
     "familyName": "Riggs",
     "affiliation": "Oak Ridge National Laboratory"
   },
   {
     "givenName": "C",
     "familyName": "Nettles",
     "affiliation": "Oak Ridge National Laboratory"
   },
   {
     "givenName": "William",
     "familyName": "Dorrance",
     "affiliation": "Oak Ridge National Laboratory"
   },
   {
     "givenName": "Les",
     "familyName": "Hook",
     "affiliation": "Oak Ridge National Laboratory"
   }
 ]

Create the rest of the JSON-LD object

json_ld = {
 "@context": "http://schema.org/",
 "@type": "Dataset",
 "@id": "http://dx.doi.org/10.3334/CDIAC/spruce.001",
 "name": "SPRUCE S1 Bog Environmental Monitoring Data: 2010-2016",
 "description": [
   "This data set reports selected ambient environmental monitoring data from the S1 bog in Minnesota for the period June 2010 through December 2016. Measurements of the environmental conditions at these stations will serve as a pre-treatment baseline for experimental treatments and provide driver data for future modeling activities.",
   "The site is the S1 bog, a Picea mariana [black spruce] - Sphagnum spp. bog forest in northern Minnesota, 40 km north of Grand Rapids, in the USDA Forest Service Marcell Experimental Forest (MEF). There are/were three monitoring sites located in the bog: Stations 1 and 2 are co-located at the southern end of the bog and Station 3 is located north central and adjacent to an existing U.S. Forest Service monitoring well.",
   "There are eight data files with selected results of ambient environmental monitoring in the S1 bog for the period June 2010 through December 2016. One file has the ",
   "other seven have the available data for a given calendar year. Not all measurements started in June 2010 and EM3 measurements ended in May 2014.",
   "Further details about the data package are in the attached pdf file (SPRUCE_EM_DATA_2010_2016_20170620)."
 ],
 "creator": creators,
 "datePublished": "2015",
 "keywords": [
   "EARTH SCIENCE > BIOSPHERE > VEGETATION",
   "Climate Change"
 ],
 "variableMeasured": [
   "EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC TEMPERATURE > SURFACE TEMPERATURE > AIR TEMPERATURE",
   "EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC WATER VAPOR > WATER VAPOR INDICATORS > HUMIDITY > RELATIVE HUMIDITY",
   "EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC PRESSURE > SEA LEVEL PRESSURE",
   "EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC TEMPERATURE > SURFACE TEMPERATURE > DEW POINT TEMPERATURE > DEWPOINT DEPRESSION",
   "EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC WINDS > SURFACE WINDS > WIND SPEED",
   "EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC WINDS > SURFACE WINDS > WIND DIRECTION",
   "EARTH SCIENCE > BIOSPHERE > VEGETATION > PHOTOSYNTHETICALLY ACTIVE RADIATION",
   "EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC RADIATION > NET RADIATION",
   "EARTH SCIENCE > LAND SURFACE > SURFACE RADIATIVE PROPERTIES > ALBEDO",
   "EARTH SCIENCE > LAND SURFACE > SOILS > SOIL TEMPERATURE",
   "Precipitation (Total)",
   "Irradiance",
   "Groundwater Temperature",
   "Groundwater Level",
   "Volumetric Water Content",
   "surface_albedo"
 ],
 "license": "http://creativecommons.org/licenses/by/4.0/",
 "spatialCoverage": [
   {
     "description": "Site ID: S1 Bog Site name: S1 Bog, Marcell Experimental Forest Description: The site is the 8.1-ha S1 bog, a Picea mariana [black spruce] - Sphagnum spp. ombrotrophic bog forest in northern Minnesota, 40 km north of Grand Rapids, in the USDA Forest Service Marcell Experimental Forest (MEF). The S1 bog was harvested in successive strip cuts in 1969 and 1974 and the cut areas were allowed to naturally regenerate. Stations 1 and 2 are located in a 1974 strip that is characterized by a medium density of 3-5 meter black spruce and larch trees with an open canopy. The area was suitable for siting a monitoring station for representative meteorological conditions on the S1 bog. Station 3 is located in a 1969 harvest strip that is characterized by a higher density of 3-5 meter black spruce and larch trees with a generally closed canopy. Measurements at this station represent conditions in the surrounding stand. Site Photographs are in the attached document",
     "geo": [
       {
         "name": "Northwest",
         "latitude": 47.50285,
         "longitude": -93.48283
       },
       {
         "name": "Southeast",
         "latitude": 47.50285,
         "longitude": -93.48283
       }
     ]
   }
 ],
 "funder": {
   "@id": "http://dx.doi.org/10.13039/100006206",
   "name": "U.S. DOE > Office of Science > Biological and Environmental Research (BER)"
 },
 "temporalCoverage": {
   "startDate": "2010-07-16",
   "endDate": "2016-12-31"
 },
 "editor": {
   "@id": "http://orcid.org/0000-0001-7293-3561",
   "givenName": "Paul J",
   "familyName": "Hanson",
   "email": "hansonpj@ornl.gov"
 },
 "provider": provider_spruce,
 "measurementTechnique": [
   "The stations are equipped with standard sensors for measuring meteorological parameters, solar radiation, soil temperature and moisture, and groundwater temperature and elevation. Note that some sensor locations are relative to nearby vegetation and bog microtopographic features (i.e., hollows and hummocks). See Table 1 in the attached pdf (SPRUCE_EM_DATA_2010_2016_20170620) for a list of measurements and further details. Sensors and data loggers were initially installed and became operational in June, July, and August of 2010. Additional sensors were added in September 2011. Station 3 was removed from service on May 12, 2014.",
   "These data are considered at Quality Level 1. Level 1 indicates an internally consistent data product that has been subjected to quality checks and data management procedures. Established calibration procedures were followed."
 ]
}

Please refer to the API documentation to understand the schema and navigate through any errors: https://api.ess-dive.lbl.gov

Metadata Only

Submit the JSON-LD object to the Dataset API

post_packages_url = "{}{}".format(base,endpoint)
post_package_response = requests.post(post_packages_url,
                                      headers={"Authorization":header_authorization},
                                      json=json_ld)

if post_package_response.status_code == 201:
    # Success
    response=post_package_response.json()
    print(f"View URL:{response['viewUrl']}")
    print(f"Name:{response['dataset']['name']}")
else:
    # There was an error
    print(post_package_response.text)

Metadata and Data

To submit the JSON-LD object along with data files, you need to create a folder named files and add your desired file to upload inside it.

files_tuples_array = []
upload_file = “path/to/your_file”

files_tuples_array.append((("json-ld", json.dumps(json_ld))))
files_tuples_array.append(("data", open(upload_file ,'rb')))

post_packages_url = "{}{}".format(base,endpoint)
post_package_response = requests.post(post_packages_url,
                                    headers={"Authorization":header_authorization},
                                    files= files_tuples_array)

if post_package_response.status_code == 201:
    # Success
    response=post_package_response.json()
    print(f"View URL:{response['viewUrl']}")
    print(f"Name:{response['dataset']['name']}")
else:
    # There was an error
    print(post_package_response.text)

Remember to change the file directories & file names to your actual names. The directory variable can be left blank if your API is already located in the same directory as your file.

Metadata and Many Data Files

In case you have many files to be uploaded, you can place them all inside the files directory and use the following code:

files_tuples_array = []
files_upload_directory = "your_upload_directory/"
files = os.listdir(files_upload_directory)

files_tuples_array.append((("json-ld", json.dumps(json_ld))))

for filename in files:
   file_directory = files_upload_directory + filename
   files_tuples_array.append((("data", open(file_directory, 'rb'))))

post_packages_url = "{}{}".format(base,endpoint)
post_package_response = requests.post(post_packages_url,
                                    headers={"Authorization":header_authorization},
                                    files= files_tuples_array)

if post_package_response.status_code == 201:
   # Success
   response=post_package_response.json()
   print(f"View URL:{response['viewUrl']}")
   print(f"Name:{response['dataset']['name']}")
else:
   # There was an error
   print(post_package_response.text)

Search for Datasets

Anyone can search for public datasets on ESS-DIVE using the Dataset API. If you are registered to submit data, you can also search for your private datasets. Query your dataset searches by defining parameters.

Limited dataset metadata are returned in the response of this call. Additionally, this call cannot be used to download data files. To look up all dataset metadata and download data files, use the API to Download Dataset Metadata.

The following lines of code will return the most recent 25 records. If the results contain more than 25 packages, use the row_start and page_size query parameters to page through the results.

For data users: Must pass isPublic=true to search for public datasets. This call searches for private datasets by default.

For data contributors: Must pass isPublic=false (or omit parameter) to search for private datasets.

project = "Next-Generation Ecosystem Experiments (NGEE) Arctic"

get_packages_url = "{}/{}?providerName=\"{}\"&isPublic=true".format(base,endpoint,project)
get_packages_response = requests.get(get_packages_url, 
    headers={"Authorization":header_authorization})

if get_packages_response.status_code == 200:
   #Success
   print(get_packages_response.json())
else:
   # There was an error
   print(get_packages_response.text)

Download Dataset Metadata

Anyone can search for individual public datasets on ESS-DIVE using the Dataset API. If you are registered to submit data, you can also download your private dataset metadata.

The response for this call will return all dataset metadata and attached data files. Metadata and data files can then be downloaded. If you'd like to look up the dataset upload date, last modified date, or dataset access status, use the API to Search for Datasets.

# ESS-DIVE Identifiers are in the format of: ess-dive-0f0348396e46261-20181022T131245032205
dataset_id = "<Enter an ESS-DIVE Identifier here>"

get_package_url = "{}{}/{}?&isPublic=true".format(base,endpoint, dataset_id)
get_package_response = requests.get(get_package_url, 
    headers={"Authorization":header_authorization})

if get_package_response.status_code == 200:
   #Success
   print(get_package_response.json())
else:
   # There was an error
   print(get_package_response.text)

Update a Dataset

It is possible to both update the metadata and data of an existing dataset. The following update scenarios are possible

  • update metadata only

  • replace/add files only

  • both update metadata and replace/add files.

These examples will demonstrate both updating metadata and adding new files to the dataset created in previous sections.

Metadata Only

Use the PUT function to update the metadata of a dataset. This example updates the name of a dataset.

dataset_id = "<Enter an ESS-DIVE Identifier here>"

put_package_url = "{}{}/{}".format(base,endpoint, dataset_id)

metadata_update_dict = {"name": "Updated Dataset Name"}

put_package_response = requests.put(put_package_url,
                                    headers={"Authorization":header_authorization},
                                    json=metadata_update_dict)

Check the results for the changed metadata attribute

# Check for errors
if put_package_response.status_code == 200:
   # Success
   response=put_package_response.json()
   print(f"View URL:{response['viewUrl']}")
   print(f"Name:{response['dataset']['name']}")
else:
   # There was an error
   print(put_package_response.text)

Metadata plus new data file

Use the PUT function to update a dataset. This example updates the date published to 2019 of a dataset and adds a new data file.

dataset_id = "<Enter an ESS-DIVE Identifier here>"

files_tuples_array = []
upload_file = "path/to/your_file"
files_tuples_array.append((("json-ld", json.dumps(metadata_update_dict))))
files_tuples_array.append(("data", open(upload_file ,'rb')))

put_package_url = "{}{}/{}".format(base,endpoint, dataset_id)



put_package_response = requests.put(put_package_url,
                                   headers={"Authorization":header_authorization},
                                   files= files_tuples_array)

Check the results for the changed metadata attribute and newly uploaded file

# Check for errors
if put_package_response.status_code == 200:
    # Success
    response=put_package_response.json()
    print(f"View URL:{response['viewUrl']}")
    print(f"Date Published:{response['dataset']['datePublished']}")
    print(f"Files In Dataset:{response['dataset']['distribution']}")
else:
   # There was an error
   print(put_package_response.text)
get_packages_url = "{}{}".format(base,endpoint)
get_packages_response = requests.get(get_packages_url, 
    headers={"Authorization":header_authorization})

if get_packages_response.status_code == 200:
   #Success
   print(get_packages_response.json())
else:
   # There was an error
   print(get_packages_response.text)

Troubleshooting

{"detail":"You do not have authorized access"}

This error message indicates your token is either incorrect or expired. Please follow the instructions on the ESS-DIVE Dataset API page to retrieve a new token.

{"detail":"One or more fields raised validation errors.","errors":["provider/member 'familyName' is a required property"]}

This error message indicates a required field is missing from your JSON. In this case it is the "familyName". Revise your JSON to include the mandatory fields.

FileNotFoundError                         Traceback (most recent call last)
<ipython-input-28-7ae5b841a5b0> in <module>
      5 
      6 files_tuples_array.append((("json-ld", json.dumps(json_ld))))
----> 7 files_tuples_array.append(("data", open(file_directory ,'rb')))
      8 
      9 post_packages_url = "{}{}".format(base,endpoint)

FileNotFoundError: [Errno 2] No such file or directory: 'trials.csv'

This error message indicates the file entered was not found. This could be because you are searching in the wrong directory or because you misrepresented the file name.

Last updated