Link to External Data Sources
Connect data files and metadata stored in other repositories to your ESS-DIVE dataset. Read more to learn how External Linking works and when to use it.
Last updated
Connect data files and metadata stored in other repositories to your ESS-DIVE dataset. Read more to learn how External Linking works and when to use it.
Last updated
External Linking allows data contributors to connect data files and metadata stored in other repositories to ESS-DIVE datasets. It enables data to be stored where it makes the most sense scientifically and practically, while also following ESS Data Management and Sharing Policy to store searchable metadata on ESS-DIVE.
An external link is easily accessible from an ESS-DIVE dataset landing page and has a clearly defined relationship to the dataset. With this feature, your ESS-DIVE dataset can be linked to data files and metadata that have already been published on another established data repository or to data files that cannot be uploaded to ESS-DIVE.
Not all external data or metadata is suitable for external linking on ESS-DIVE. All externally linked datasets will be reviewed by the ESS-DIVE Team before publication and your external links may not be accepted. In this documentation, we will review the types of external data and metadata that is acceptable to link to as well as how to add external links to your dataset.
Datasets with external links will have an external link table underneath the existing file table. The external link table follows a specific format and each link will have the following three components:
A brief but meaningful description for this web page,
The relationship type and it's definition, and
The URL where the data/metadata resides.
ESS-DIVE uses shared vocabulary from Schema.org to define external linking relationship types. This vocabulary is controlled, machine readable, and can be understood by most major search engines. ESS-DIVE has approved three Schema.org vocabularies for use with external linking. For more information, see the Relationship Types section.
There are numerous reasons why linking out to external data sources may be advantageous for certain dataset publications. In this section, we will review the reasons why external linking may be needed and the common examples in which it is used.
Store your data where it makes the most sense scientifically or for practical reasons (e.g. project data archive), while also complying with ESS data policy to at least store metadata describing your data on ESS-DIVE.
Generally enhance the discoverability of your dataset by making it searchable on ESS-DIVE, along with other ESS project data stored on ESS-DIVE.
Your data file storage volume is greater than 500GB and is too large to upload directly to ESS-DIVE. See more details under the Large Data Files examples.
Your published data product is complex, or involves analysis tools used on another platform, and can be more easily accessed from the external source (e.g. model code stored on GitHub). See more details under the Data Analysis Platforms examples.
If your data product is not represented in this section, please refer to the Relationship Types section to see if one of the available relationships suit your data product.
External data stored in valid data repositories that provide long term data storage and stewardship of data does not need to be uploaded to ESS-DIVE. But again, if your project is funded by the DOE ESS program, you must at least store metadata describing the data on ESS-DIVE, with links to the published dataset. Some data repositories or data systems that are approved for external linking are:
Environmental Data Initiative (EDI)
National Microbiome Data Collaborative (NMDC)
USGS ScienceBase
If your data is stored in a repository not listed here, please reach out to the ESS-DIVE Support Team at ess-dive-support@lbl.gov to find out if that repository can be linked to your ESS-DIVE datasets.
Some ESS projects have archives where they upload and manage their data products. If these data are publicly accessible, you may take advantage of our external linking feature to store metadata on ESS-DIVE, with links to the original project data archive.
Some examples of projects archives are:
Spruce and Peatland Responses Under Changing Environments (SPRUCE)
Next-Generation Ecosystem Experiment (NGEE) Arctic
Next-Generation Ecosystem Experiment (NGEE) Tropics
In cases where the project archive does not issue DOIs or is not a long-term data preservation repository, you will need to upload your data files to your ESS-DIVE datasets.
Some open-access data sources distribute data in a manner that provides useful scientific context to datasets. In some cases, it may be necessary to use External Linking to associate a dataset to the original data analysis platform. Some examples of such data sources are:
If you are linking to data on a platform that does not issue DOIs, you must upload a copy of your data files to ESS-DIVE
For certain datasets, the associated data files are too large to upload to and/or download from ESS-DIVE. In these cases, you may take one of two options. Either (1) upload your data to an established long term data storage service that can take your large data or (2) work with the ESS-DIVE team to upload this data to an external data source that is managed by the ESS-DIVE Team; we call this Large Data Storage.
Data files greater than 500GB cannot be uploaded or downloaded from ESS-DIVE. Please contact the ESS-DIVE Team at ess-dive-support@lbl.gov if you have a dataset with more than 500GB of data, or if you have a large number of data files.
ESS-DIVE currently supports three types of relationships between ESS-DIVE datasets and external data sources. In this section we list the approved relationships for your reference. You can read through the following descriptions in Table 1 to see if your external data can be linked to your ESS-DIVE dataset.
If you feel like your data product falls into one of these three categories, email The ESS-DIVE Team at ess-dive-support@lbl.gov and we will help associate your data with the appropriate type.
Relationship | Description |
---|---|
same as | Original publication of this dataset (where the data+metadata can be found). The DOI url of this dataset, starting with 'https://doi.org/', redirects to original source. |
archived at | Complete copy of the data files in the dataset resides in external source. |
has part | One or more files that are part of the dataset held outside of the repository. This could be a link to an individual file or a directory. |
Table 1: Lists ESS-DIVE's current external linking relationship types and their descriptions
We have the infrastructure to accommodate your needs! Please email the ESS-DIVE Support Team (ess-dive-support@lbl.gov) and share your use case with us. We can discuss expanding the list of acceptable external data relationships to accommodate your use case.
External Linking is managed by the ESS-DIVE Team. To externally link your dataset to an external data product, please contact ess-dive-support@lbl.gov to start the process. This section outlines the steps involved in making and completing the request.
Email ess-dive-support@lbl.gov to start a request, with details on why you need external linking and any existing DOI(s), if you have them.
When the request is initiated, The ESS-DIVE Team will ask that you provide a brief but meaningful description for each web page, as described in the External Link Table section.
During this stage, the ESS-DIVE Team will also evaluate whether your external data/metadata is suitable to be externally linked. Not all requests are approved.
If there is an existing DOI, the ESS-DIVE Team uses a programmatic tool to transfer the metadata from the original repository (e.g. EDI) to one or more new ESS-DIVE dataset(s).
If there is NOT an existing DOI, you will create a new dataset for your data and send the associated ESS-DIVE identifier to the ESS-DIVE Team (e.g. ess-dive-ea25aaddf4b47a3-20211103T152543118).
The ESS-DIVE Team will populate the External Links table in your dataset, including the appropriate relationship type(s).
The ESS-DIVE Team will select an appropriate relationship from Table 1 and apply it based on your dataset publication needs.
The ESS-DIVE Team will send you the dataset URL and give you ownership.
Review the dataset and complete your metadata. Our automated assessment report can help in your review. If you had an existing DOI and transferred your metadata, note that programmatic metadata transfers are not comprehensive and there will be missing information in your dataset.
Request to publish the dataset.
ESS-DIVE's Package Service API can be used to programmatically and autonomously add external links to your datasets. The process for adding external links to datasets via the API only differs slightly from the standard tutorial for creating or updating datasets with the API. In these instructions, we will reference and expand on our existing tutorials. Detailed instructions on the use of the Package Service API can be found in the Package Service Tutorials.
Head to our Package Service Tutorial documentation page to learn how to get started with the API for the first time
Follow the Setup instructions
Create new or copy your existing dataset JSON-LD (i.e. your dataset metadata)
If you are creating a dataset for the first time, skip to Create Metadata and follow the instructions for creating JSON-LD for your dataset metadata
If you are updating a dataset, you can use the Get a Single Dataset code example to copy the JSON-LD for your existing dataset
Once you have your JSON-LD, you can now append the external linking schema onto it. Read through the available relationship types (Table 1) and decide which suits your dataset.
To learn how to format your selected external linking relationship into your JSON-LD, navigate to ESS-DIVE's technical API documentation and locate your selected relationship in the dataset schema (Figure 2); either hasPart
, archivedAt
, and/or sameAs
An example of each relationship's format is provided in code snippets at the end of this section (all code snippets are in Python)
Once you have added your external links, submit your dataset using the API!
If you are submitting a new dataset without data files, skip to Create a Dataset > Metadata Only and copy the code examples
If you are submitting a new dataset with data files, skip to Create a Dataset > Metadata and Data Files or Metadata and Many Data Files and copy the code example
If you are updating an existing dataset metadata (without data files), skip to Update a Dataset > Metadata Only section and copy the code example
Head to ESS-DIVE (https://data.ess-dive.lbl.gov/) and request to publish your dataset
During the publication process, your external links will be reviewed by the ESS-DIVE Team for suitability. At this stage, we will help refine your external links as needed or we may determine that your external data/metadata are not suitable for external linking. Not all external links will be approved.
An example of the has part
relationship JSON-LD schema.
An example of the archived at
relationship JSON-LD schema.
An example of the same as
relationship JSON-LD schema.
All externally linked datasets must have complete metadata on ESS-DIVE.
Externally linked datasets may or may not have data files attached, depending on your use case and needs.
For example, external links from data services or repositories that provide long term storage and stewardship of data will not be required to upload data files to ESS-DIVE.
When data is originally published elsewhere with a DOI, the DOI url will resolve to the original external data source.
By default, when a dataset is originally published on another repository, the standard DOI url (e.g. https://doi.org/10.3334/CDIAC/SPRUCE.042) will not direct them to ESS-DIVE. Please contact the ESS-DIVE Support Team (ess-dive-support@lbl.gov) if you would like your DOI to resolve to your ESS-DIVE dataset.