How to Query Data

A workflow for data discovery that walks through the intended use of the Deep Dive API search endpoints.

See it in action: Check out our Jupyter Notebook tutorial with ready-to-run code examples using the Deep Dive API. (Tip: launch the tutorial directly in your browser with the Google Collab button)

Deep Dive API Interactive Documentation

The documentation at fusion.ess-dive.lbl.gov includes technical details, expected schema, error information, and interactive query parameters for the endpoints available through the Deep Dive API. We recommend starting out with this interactive documentation to help familiarize yourself with query parameters and outputs.

While programmatic experience can be helpful, it is not required.

1. Expand Query-Data endpoint

Let’s first expand the Query-Data endpoint details.

This section shows the available search parameters, parameter descriptions, expected format, and editable value boxes. Select the “Try it out” button highlighted in orange to edit the values. From here, we can easily enter parameters to test out search queries.

fieldName refers to the column or row name (aka field name, or variable name) in the data file, which come from the dataset's Data Dictionary. This is a component of the File Level Metadata Reporting Format. Search terms entered in the fieldName parameter do not have to be an exact match.

The fieldDefinition parameter comes provides definitions of each field name included in the data files.

2. Enter example Query-Data search

Let’s execute the following search query (Table 1) to look for data within this public dataset that is using reporting formats and is available in the Deep Dive API.

Roley S ; Hall Jr. R O H J; Garayburu-Caruso V A ; Perkins W A ; Stegen J C (2023): Data and scripts associated with "Coupled primary production and respiration in a large river contrasts with smaller rivers and streams.". River Corridor and Watershed Biogeochemistry SFA, ESS-DIVE repository. Dataset. doi:10.15485/1985922 accessed via https://data.ess-dive.lbl.gov/datasets/doi:10.15485/1985922 on 2024-06-24

Copy the values in Table 1 and enter the query parameters as follows:

Parameter Name

Value

Logic

doi

doi:10.15485/1985922

This is the DOI of the specific dataset that may have relevant data. It must be entered in the format of doi:{DOI number}.

fieldName

temp

I want to look for temperature data, but I’m not sure what the exact field name in the file is.

recordCountMin

I need at least 20 data points for the data to be useful for my research.

Table 1: Example Query-Data parameters

Our example parameters will look like this once completed (Fig 3). Select the "Execute" button to query the Deep Dive API.

💡 Try out different searches on other Reporting Format datasets

Find all datasets using reporting formats in ESS-DIVE's Reporting Format data portal.

2. Interpret the Query-Data response

The responses section (Fig 4) will show you the search results, which is referred to here as the Server Response (see reference material).

In the response body, the pageCount value tells us that 4 fields (e.g. column/row headers) were found by this query. The results were found in at least one file.

next and previous show us which page of the results are currently displayed in the response body. A page is simply defined by the pageSize limit. In our example, one page will show up to 25 results. Given that we received 4 search results, the current response contains all results and we do not have to page through the results.

Let’s take a look at the results starting with first field name: temp_C.

fieldName refers to the column or row name (aka field name, or variable name) in the data file, which come from the dataset's Data Dictionary.

Detailed information about this column/row is provided, including the unit, definition, and data_type. There are almost 80,000 data points available under this one column/row (total_record_count) and the data values are between a minimum of 1.02℃ and 23.28℃ (values_summary). Additionally, there are no missing values in this column/row (missing_values_count).

💡 This information can help you determine whether this data is relevant for your research.

💡 Then the remaining four values in the response can be used to locate the exact file and dataset version where this column/row temp_C can be found for download.

The version number (also called an ESS-DIVE Identifier) is a unique ID given to all datasets on ESS-DIVE. This number points to the exact version of the dataset associated with this data at the time the file was published, so you can always find the data should changes be made. The provided doi number is a persistent identifier to the latest version of the dataset. Additionally, we can see the CSV file name (data_file) and file_path where the column/row can be found.

To find this file in your browser, you can enter the DOI in the search bar (doi.org/10.15485/1985922) or you can look up the version number in ESS-DIVE using the standard URL format: data.ess-dive.lbl.gov/view/<ESS-DIVE Identifier>.

To find this file programmatically, we can use the data_file_url. Example code is available in our Jupyter Notebook Tutorial.

3. Enter example Get-Dataset-File request

Here we demonstrate how to use the Deep Dive API to locate relevant files that appeared in our search results.

As we did in step 1, first expand the Get-Dataset-File endpoint details and select the "Try it out" button to edit the parameter values (Fig 5).

Let’s execute the following search query (Table 2) using the DOI and file path associated with the temp_C field from our Query-Data results. Enter the parameters as follows:

Parameter Name

Value

Logic

doi

doi:10.15485/1985922

This is the DOI of the file that I am interested in looking up. It must be entered in the format of doi:{DOI number}.

file_path

Roley_CR_Metabolism_Data_Package.zip/DO_temp_sensor_data.csv

I found this file in the results of my Query-Data search. I want to see a summary of all the data in this file (by header/row name), find the size of the file, and/or get the URL to directly download this file.

Table 2: Example Get-Dataset-File parameters

Our example parameters will look like this once completed (Fig 5). Select the "Execute" button to query the Deep Dive API.

4. Interpret the Get-Dataset-File response

The Get-Dataset-File responses section (Fig 4) will show you the search results, which is referred to here as the Server Response (see reference material).

You'll notice that the Get-Dataset-File response contains all of the information returned by the Query-Data endpoint, with the addition of a few new values (Fig 6). It contains the dataset DOI, file name and the ESS-DIVE identifier that corresponds to the latest version of the dataset.

Then, under fields, you'll find a list of all the column/row headers in the specified file. It also lists the same detailed information about the data within the column/row header that Query-Data provides.

💡 The power of the Get-Dataset-File endpoint is that it allows you to find all column/row information for an entire file at once. This provides more context about the data, allowing you to determine whether the file could be relevant to your research.

Another key difference of this endpoint is data_download return which provides the size of the file in bytes (contentSize), the file format (encoding_format), the file version (identifier), and a URL that points directly to the file (contentURL).

💡By providing file specifications, you can make an informed decision on whether the file size and format are within your expected constraints for data use.

Where to find definitions for Get-Dataset-File responses

The server response for this query follows the DatasetFile schema. All definitions for the fields in this response are listed in the "Schema" > "DatasetFile" section of the Deep Dive API documentation.

Click the response value you are interested in to expand it's definition and expected formatting.

PreviousSearch with Deep Dive API NextESS-DIVE Dataset API

Last updated 3 months ago