Large Data Support

Learn about ESS-DIVE's large data support tools and how they're used to upload and publish large data.

ESS-DIVE now has a Tier 2 data storage service to support publishing very large, hierarchical datasets that can be directly accessed from our repository. ESS-DIVE uses Globus, a data transfer service, to make it easier to upload large data to ESS-DIVE. The Tier 2 and Globus services are setup offline with close assistance from the ESS-DIVE Team.

How to contact ESS-DIVE about publishing large data:

For large data support, please email us at [email protected] with the following information about your data:

  1. What's the total file volume of your dataset?

  2. Approximately how many files are in your dataset and what's the range of file sizes?

  3. Is the data structure hierarchical? If yes:

    1. Can you easily flatten your data structure (i.e. move data out of folders)? Or

    2. Can you compress the folders into ZIP files or will it be necessary to browse the folder hierarchies?

  4. Where is your data stored currently (e.g. local desktop, cloud based server, Google Drive)?

Globus: Upload Large Data

Globus (https://www.globus.org/) is a free, cloud-based data transfer service designed to move significant amounts of data. ESS-DIVE uses this service to move data from your local desktop or existing Globus endpoint to ESS-DIVE's storage services. This large data support tool can be used to resolve common upload errors or as the default upload method for data greater than 500GB.

Learn more about and how to use Globus for publishing data on ESS-DIVE or resolving upload issues via our Globus documentation page.

Figure 1: The Globus file manager (pictured) is accessible via browser and is used as the primary interface for transferring data with Globus.
Globus Data Transfer Service

Tier 2: Storage for Large Data

Tier 2 (Figure 2) is ESS-DIVE's extended storage resource that is used to store very large, hierarchical datasets, instead of storing the data directly on ESS-DIVE's dataset landing pages, or Tier 1 (Figure 3). Data greater than 500GB in volume will be archived on Tier 2 by default. Additionally, Tier 2 supports the functionality to browse hierarchical folders in your browser prior to download.

Data stored on Tier 2 resources can be accessed and downloaded from the Tier 2 landing page (Figure 2). This is separate from ESS-DIVE's dataset landing page (Figure 3). You can choose to publish some or all of your dataset files on Tier 2.

Generally, data should be stored on Tier 1 whenever possible. ESS-DIVE is constantly expanding and improving features on Tier 1 that may not be supported on Tier 2. However data less than 500GB can be published on Tier 2 if necessary.

Any data contributor can take advantage of the Tier 2 service even if your data is less than 500GB. Please contact ESS-DIVE at [email protected] to discuss if your data is suitable for Tier 2 storage.

Figure 2: Tier 2 landing page for large file exploration and download. Access to dataset metadata on Tier 1 is provided via link.
Figure 3: Tier 1 dataset landing page where metadata and data can be discovered and downloaded. Access to files on Tier 2 are provided via external link.

Data contributors must use the Globus transfer service to upload their data to Tier 2. Once uploaded to Globus, ESS-DIVE will organize the data and add additional file metadata on the Tier 2 landing page. The data contributor will review and approve the data on Tier 2 prior to publication. At the time of publication, the data will be publicly accessible on the Globus "ESS-DIVE Public Share" collection, as well as, on the Tier 2 website. Additionally, external links to both Tier 2 and Globus will be added to the dataset metadata landing page for access and download (demonstrated in Figure 3).

Management and Preservation of Tier 2 Data

ESS-DIVE stores redundant copies of data published on Tier 2 resources to preserve and provide long-term access to Environmental Systems Science (ESS) research data.

Please be aware that, at this time, the following features are not available for Tier 2 data:

  1. Will not be linked to the DataOne federation,

  2. Cannot be private, and

  3. Data downloads and views will not be factored into data package statistics.

How to Download Tier 2 Data

Download Data

Last updated