How to Query Data
A workflow for data discovery that walks through the intended use of the Deep Dive API search endpoints.
Last updated
A workflow for data discovery that walks through the intended use of the Deep Dive API search endpoints.
Last updated
The documentation at includes technical details, expected schema, error information, and interactive query parameters for the endpoints available through the Deep Dive API. We recommend starting out with this interactive documentation to help familiarize yourself with query parameters and outputs.
While programmatic experience can be helpful, it is not required.
Let’s first expand the Query-Data endpoint details.
This section shows the available search parameters, parameter descriptions, expected format, and editable value boxes. Select the “Try it out” button highlighted in orange to edit the values. From here, we can easily enter parameters to test out search queries.
fieldName
refers to the column or row name (aka field name, or variable name) in the data file, which come from the dataset's Data Dictionary. This is a component of the File Level Metadata Reporting Format. Search terms entered in the fieldName
parameter do not have to be an exact match.
The fieldDefinition
parameter comes provides definitions of each field name included in the data files.
Let’s execute the following search query (Table 1) to look for data within this public dataset that is using reporting formats and is available in the Deep Dive API.
Roley S ; Hall Jr. R O H J; Garayburu-Caruso V A ; Perkins W A ; Stegen J C (2023): Data and scripts associated with "Coupled primary production and respiration in a large river contrasts with smaller rivers and streams.". River Corridor and Watershed Biogeochemistry SFA, ESS-DIVE repository. Dataset. doi:10.15485/1985922 accessed via https://data.ess-dive.lbl.gov/datasets/doi:10.15485/1985922 on 2024-06-24
Copy the values in Table 1 and enter the query parameters as follows:
doi
doi:10.15485/1985922
This is the DOI of the specific dataset that may have relevant data. It must be entered in the format of doi:{DOI number}
.
fieldName
temp
I want to look for temperature data, but I’m not sure what the exact field name in the file is.
recordCountMin
20
I need at least 20 data points for the data to be useful for my research.
Table 1: Example Query-Data parameters
Our example parameters will look like this once completed (Fig 3). Select the "Execute" button to query the Deep Dive API.
Find all datasets using reporting formats in ESS-DIVE's Reporting Format data portal.
The responses section (Fig 4) will show you the search results, which is referred to here as the Server Response (see reference material).
In the response body, the pageCount
value tells us that 4 fields (e.g. column/row headers) were found by this query. The results were found in at least one file.
next
and previous
show us which page of the results are currently displayed in the response body. A page is simply defined by the pageSize
limit. In our example, one page will show up to 25 results. Given that we received 4 search results, the current response contains all results and we do not have to page through the results.
Let’s take a look at the results
starting with first field name: temp_C
.
fieldName
refers to the column or row name (aka field name, or variable name) in the data file, which come from the dataset's Data Dictionary.
Detailed information about this column/row is provided, including the unit
, definition
, and data_type
. There are almost 80,000 data points available under this one column/row (total_record_count
) and the data values are between a minimum of 1.02℃ and 23.28℃ (values_summary
). Additionally, there are no missing values in this column/row (missing_values_count
).
The version
number (also called an ESS-DIVE Identifier) is a unique ID given to all datasets on ESS-DIVE. This number points to the exact version of the dataset associated with this data at the time the file was published, so you can always find the data should changes be made. The provided doi
number is a persistent identifier to the latest version of the dataset. Additionally, we can see the CSV file name (data_file
) and file_path
where the column/row can be found.
To find this file in your browser, you can enter the DOI in the search bar (doi.org/10.15485/1985922) or you can look up the version number in ESS-DIVE using the standard URL format: data.ess-dive.lbl.gov/view/<ESS-DIVE Identifier>
.
To find this file programmatically, we can use the data_file_url
. Code example coming soon.
Here we demonstrate how to use the Deep Dive API to locate relevant files that appeared in our search results.
As we did in step 1, first expand the Get-Dataset-File endpoint details and select the "Try it out" button to edit the parameter values (Fig 5).
Let’s execute the following search query (Table 2) using the DOI and file path associated with the temp_C
field from our Query-Data results. Enter the parameters as follows:
doi
doi:10.15485/1985922
This is the DOI of the file that I am interested in looking up. It must be entered in the format of doi:{DOI number}
.
file_path
Roley_CR_Metabolism_Data_Package.zip/DO_temp_sensor_data.csv
I found this file in the results of my Query-Data search. I want to see a summary of all the data in this file (by header/row name), find the size of the file, and/or get the URL to directly download this file.
Table 2: Example Get-Dataset-File parameters
Our example parameters will look like this once completed (Fig 5). Select the "Execute" button to query the Deep Dive API.
The Get-Dataset-File responses section (Fig 4) will show you the search results, which is referred to here as the Server Response (see reference material).
You'll notice that the Get-Dataset-File response contains all of the information returned by the Query-Data endpoint, with the addition of a few new values (Fig 6). It contains the dataset DOI, file name and the ESS-DIVE identifier that corresponds to the latest version of the dataset.
Then, under fields
, you'll find a list of all the column/row headers in the specified file. It also lists the same detailed information about the data within the column/row header that Query-Data provides.
Another key difference of this endpoint is data_download
return which provides the size of the file in bytes (contentSize
), the file format (encoding_format
), the file version (identifier
), and a URL that points directly to the file (contentURL
).
This information can help you determine whether this data is relevant for your research.
Then the remaining four values in the response can be used to locate the exact file and dataset version where this column/row temp_C
can be found for download.
The power of the Get-Dataset-File endpoint is that it allows you to find all column/row information for an entire file at once. This provides more context about the data, allowing you to determine whether the file could be relevant to your research.
By providing file specifications, you can make an informed decision on whether the file size and format are within your expected constraints for data use.