# How to Query Data

{% hint style="success" %}
**See it in action:** Check out our [Jupyter Notebook tutorial ](https://github.com/ess-dive/essdive-tutorials/blob/main/search_data/Using_Data_with_Dataset_DeepDiveAPI_Python.ipynb)with ready-to-run code examples using the Deep Dive API. *(Tip: launch the tutorial directly in your browser with the* [*Google Collab button*](https://github.com/ess-dive/essdive-tutorials/tree/main)*)*&#x20;
{% endhint %}

## Deep Dive API Interactive Documentation

The documentation at [fusion.ess-dive.lbl.gov](https://fusion.ess-dive.lbl.gov/) includes technical details, expected schema, error information, and interactive query parameters for the endpoints available through the Deep Dive API. **We recommend starting out with this interactive documentation to help familiarize yourself with query parameters and outputs.**

While programmatic experience can be helpful, it is not required.

<details>

<summary>Reference Material: What's under each endpoint dropdown?</summary>

When you expand an endpoint, each one has **four major sections**. Here we provide a brief explanation. The sections in <mark style="color:green;">green</mark> are most relevant to this demonstration.

* **Header:** This displays the expected format for the base URL (e.g. `/api/v1/deepdive`) and the first sentence in the dropdown describes what the endpoint does.\
  &#x20;
* <mark style="color:green;">**Parameters:**</mark> This section provides a list of the available search parameters that can be input into the endpoint. \
  If a parameter must be filled out, it will say "<mark style="color:red;">\* required</mark>". Each parameter has a **definition** that explains what the parameter does. For your convenience, the screenshots on this page can be used to reference all available parameters and their definitions. <br>
* <mark style="color:green;">**Server Responses (interactive):**</mark> By default, this section is not visible until you select "try it out" and execute an example search (step 1-2). \
  The interactive response section lists a copy of the search request you made (e.g., the Curl command and Request URL) and the **server response** (i.e., the search result). **Definitions of the results in the server response can be found in the Schemas** section of the interactive documentation (visible in Fig 1; see step 4 for a demonstration). <br>
* **Responses (general)**: By default, this section is always visible in the Responses.\
  This lists the *possible* response codes and response messages that could be returned. This is useful reference material for debugging searches when writing your own code. For this demonstration, we will only be looking at successful responses (status code 200).&#x20;

![](https://3166205607-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LgU7GrufgIOpY1Ufd5R%2Fuploads%2Fonb8SvnksKGOZAYVKE6C%2FEndpoint-Structure-Description-Parameters.png?alt=media\&token=d00fb3f7-7de5-4ee5-825a-b3c5d62b09e0)

*Header and Parameters*

![](https://3166205607-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LgU7GrufgIOpY1Ufd5R%2Fuploads%2F1X0uB2wsppuqX54DanRl%2FEndpoint-Structure-Responses-Interactive.png?alt=media\&token=e1c21f6a-372d-4f35-b0ad-b9805292b5d6)![](https://3166205607-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LgU7GrufgIOpY1Ufd5R%2Fuploads%2F3KjPVYIjH7CTZ9TXM5ki%2FEndpoint-Structure-Responses-General.png?alt=media\&token=625e51f8-0623-4320-a464-4feec909fc8c)

*Interactive Server Responses                                     General Details about Responses*

</details>

<figure><img src="https://3166205607-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LgU7GrufgIOpY1Ufd5R%2Fuploads%2FjirdSngl91rY5nHbpGwD%2FDeep-Dive-Docs.png?alt=media&#x26;token=21be6c7d-0ffc-486e-bc8d-a78d68d3126c" alt=""><figcaption><p>Figure 1: Documentation at fusion.ess-dive.lbl.gov</p></figcaption></figure>

## 1. Expand Query-Data endpoint &#x20;

Let’s first expand the Query-Data endpoint details.

This section shows the available search parameters, parameter descriptions, expected format, and editable value boxes. Select the “Try it out” button highlighted in orange to edit the values. From here, we can easily enter parameters to test out search queries.&#x20;

<figure><img src="https://3166205607-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LgU7GrufgIOpY1Ufd5R%2Fuploads%2FvKSEVNQfCJxzcsc6yOlY%2FDeep-Dive-Expand-Endpoint.png?alt=media&#x26;token=2206f879-162c-43e5-a645-2bdbe669fdf1" alt=""><figcaption><p>Figure 2: Select the Query-Data endpoint to expand interactive query parameters</p></figcaption></figure>

{% hint style="info" %}
`fieldName` refers to the column or row name (aka field name, or variable name) in the data file, which come from the dataset's [Data Dictionary](https://ess-dive.gitbook.io/file-level-metadata-reporting-format/csv_dd). This is a component of the File Level Metadata Reporting Format. Search terms entered in the `fieldName` parameter do not have to be an exact match.&#x20;

The `fieldDefinition` parameter comes  provides definitions of each field name included in the data files.&#x20;
{% endhint %}

## 2. Enter example Query-Data search

Let’s execute the following search query (Table 1) to look for data within this public dataset that is using reporting formats and is available in the Deep Dive API.&#x20;

> Roley S ; Hall Jr. R O H J; Garayburu-Caruso V A ; Perkins W A ; Stegen J C (2023): Data and scripts associated with "Coupled primary production and respiration in a large river contrasts with smaller rivers and streams.". River Corridor and Watershed Biogeochemistry SFA, ESS-DIVE repository. Dataset. doi:10.15485/1985922 accessed via [https://data.ess-dive.lbl.gov/datasets/doi:10.15485/1985922 on 2024-06-24](https://data.ess-dive.lbl.gov/datasets/doi:10.15485/1985922)

Copy the values in Table 1 and enter the query parameters as follows:

<table><thead><tr><th width="192">Parameter Name</th><th width="127">Value</th><th>Logic</th></tr></thead><tbody><tr><td><code>doi</code></td><td>doi:10.15485/1985922</td><td>This is the DOI of the specific dataset that may have relevant data. It must be entered in the format of <code>doi:{DOI number}</code>.</td></tr><tr><td><code>fieldName</code></td><td>temp</td><td>I want to look for temperature data, but I’m not sure what the exact field name in the file is.</td></tr><tr><td><code>recordCountMin</code></td><td>20</td><td>I need at least 20 data points for the data to be useful for my research.</td></tr></tbody></table>

*Table 1: Example Query-Data parameters*

Our example parameters will look like this once completed (Fig 3). Select the "Execute" button to query the Deep Dive API.

<figure><img src="https://3166205607-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LgU7GrufgIOpY1Ufd5R%2Fuploads%2Fv5VSTK1Pw92QDCBwfZ04%2FDeep-Dive-Query-Data-Search.png?alt=media&#x26;token=801a7170-92eb-4701-afe2-7e2dedea50b3" alt=""><figcaption><p>Fig 3: Search parameters with the example query</p></figcaption></figure>

{% hint style="info" %}

#### :bulb: Try out different searches on other Reporting Format datasets

Find all datasets using reporting formats in [ESS-DIVE's Reporting Format data portal](https://data.ess-dive.lbl.gov/portals/reporting-formats).
{% endhint %}

## 2. Interpret the Query-Data response

The responses section (Fig 4) will show you the search results, which is referred to here as the Server Response (see [reference material](#reference-material-whats-under-each-endpoint-dropdown)).&#x20;

In the response body, the `pageCount` value tells us that 4 fields (e.g. column/row headers) were found by this query. The results were found in at least one file.

`next` and `previous` show us which page of the results are currently displayed in the response body. A page is simply defined by the `pageSize` limit. In our example, one page will show up to 25 results. Given that we received 4 search results, the current response contains all results and we do not have to page through the results.&#x20;

Let’s take a look at the `results` starting with first field name: `temp_C`.

<figure><img src="https://3166205607-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LgU7GrufgIOpY1Ufd5R%2Fuploads%2FbhxC7Q4wZBw4q0dO6rkm%2Fdeepdive-query-data-response.png?alt=media&#x26;token=d9749868-bfef-4ebf-a5fe-c0bdd6756bce" alt=""><figcaption><p>Figure 4: Successful response from the example query</p></figcaption></figure>

{% hint style="info" %}
`fieldName` refers to the column or row name (aka field name, or variable name) in the data file, which come from the dataset's [Data Dictionary](https://ess-dive.gitbook.io/file-level-metadata-reporting-format/csv_dd).
{% endhint %}

Detailed information about this column/row is provided, including the `unit`, `definition`, and `data_type`. There are almost 80,000 data points available under this one column/row (`total_record_count`) and the data values are between a minimum of 1.02℃ and 23.28℃ (`values_summary`). Additionally, there are no missing values in this column/row (`missing_values_count`).&#x20;

:bulb: This information can help you determine whether this data is relevant for your research.

:bulb: Then the remaining four values in the response can be used to locate the exact file and dataset version where this column/row `temp_C` can be found for download.

The `version` number (also called an ESS-DIVE Identifier) is a unique ID given to all datasets on ESS-DIVE. This number points to the exact version of the dataset associated with this data at the time the file was published, so you can always find the data should changes be made. The provided `doi` number is a persistent identifier to the *latest* version of the dataset. Additionally, we can see the CSV file name (`data_file`) and `file_path` where the column/row can be found.&#x20;

**To find this file in your browser**, you can enter the DOI in the search bar (doi.org/10.15485/1985922) or you can look up the version number in ESS-DIVE using the standard URL format: `data.ess-dive.lbl.gov/view/<ESS-DIVE Identifier>`.

**To find this file programmatically**, we can use the `data_file_url`. <mark style="color:green;">**Example code is available in our**</mark> [<mark style="color:green;">**Jupyter Notebook Tutorial**</mark>](https://github.com/ess-dive/essdive-tutorials/blob/main/search_data/Using_Data_with_Dataset_DeepDiveAPI_Python.ipynb)<mark style="color:green;">**.**</mark>

## 3. Enter example Get-Dataset-File request

Here we demonstrate how to use the Deep Dive API to locate relevant files that appeared in our search results.&#x20;

As we did in step 1, first expand the Get-Dataset-File endpoint details and select the "Try it out" button to edit the parameter values (Fig 5).

Let’s execute the following search query (Table 2) using the DOI and file path associated with the `temp_C` field from our Query-Data results. Enter the parameters as follows:

<table><thead><tr><th width="180">Parameter Name</th><th width="261">Value</th><th>Logic</th></tr></thead><tbody><tr><td><code>doi</code></td><td>doi:10.15485/1985922</td><td>This is the DOI of the file that I am interested in looking up. It must be entered in the format of <code>doi:{DOI number}</code>.</td></tr><tr><td><code>file_path</code></td><td>Roley_CR_Metabolism_Data_Package.zip/DO_temp_sensor_data.csv</td><td>I found this file in the results of my Query-Data search. I want to see a summary of all the data in this file (by header/row name), find the size of the file, and/or get the URL to directly download this file.</td></tr></tbody></table>

*Table 2: Example Get-Dataset-File parameters*

Our example parameters will look like this once completed (Fig 5). Select the "Execute" button to query the Deep Dive API.

<figure><img src="https://3166205607-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LgU7GrufgIOpY1Ufd5R%2Fuploads%2FLidsQxwMVIDKKN3IoTsY%2Fdeepdive-get-data-file-search.png?alt=media&#x26;token=3c9e8cf0-33f1-4f2c-82f6-32f9d9ac4d6e" alt=""><figcaption><p>Figure 5: Search parameters with the example query</p></figcaption></figure>

## 4. Interpret the Get-Dataset-File response

The Get-Dataset-File responses section (Fig 4) will show you the search results, which is referred to here as the Server Response (see [reference material](#reference-material-whats-under-each-endpoint-dropdown)).&#x20;

You'll notice that the Get-Dataset-File response contains all of the information returned by the Query-Data endpoint, with the addition of a few new values (Fig 6). It contains the dataset DOI, file name and the ESS-DIVE identifier that corresponds to the latest version of the dataset.

<figure><img src="https://3166205607-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LgU7GrufgIOpY1Ufd5R%2Fuploads%2FLl5nL1MGYPCjIxWTQcE7%2Fdeepdive-get-data-file-response.png?alt=media&#x26;token=420e0026-76f2-4036-a044-399c0ad41860" alt=""><figcaption><p>Figure 6: Successful response from the data file lookup</p></figcaption></figure>

Then, under `fields`, you'll find a list of *all the column/row headers in the specified file*. It also lists the same detailed information about the data within the column/row header that Query-Data provides.&#x20;

:bulb: The power of the Get-Dataset-File endpoint is that it allows you to find all column/row information for an entire file at once. This provides more context about the data, allowing you to determine whether the file could be relevant to your research.&#x20;

Another key difference of this endpoint is `data_download` return which provides the size of the file in bytes (`contentSize`), the file format (`encoding_format`), the file version (`identifier`), and a URL that points directly to the file (`contentURL`). &#x20;

:bulb:By providing file specifications, you can make an informed decision on whether the file size and format are within your expected constraints for data use.&#x20;

<details>

<summary>Where to find definitions for Get-Dataset-File responses</summary>

The [server response](#reference-material-whats-inside-the-endpoint-documentation) for this query follows the `DatasetFile` schema. All definitions for the fields in this response are listed in the "Schema" > "DatasetFile" section of the [Deep Dive API documentation](https://fusion.ess-dive.lbl.gov/).&#x20;

Click the response value you are interested in to expand it's definition and expected formatting.&#x20;

![](https://3166205607-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LgU7GrufgIOpY1Ufd5R%2Fuploads%2FOpRBtwias2IN8enaGC11%2Fdeep-dive-api-schemas-dataset-fields.png?alt=media\&token=d0630a34-642a-4bd3-8d9b-61f05a7b9387)

</details>
