Use this page to help decide how you will upload your dataset to ESS-DIVE by considering file upload limits and dataset organization.
A condensed summary of our Data Submission Guidelines is available on our website. You can find more details throughout this Guide to Using ESS-DIVE.
Use Table 1 to decide which submission tool is best suited for creating your dataset. The next section summarizes the tools in more detail.
ESS-DIVE has three tools available for uploading data and each tool has a limit to the amount of data that they can upload at one time. Consider uploading data files in batches if the sum of the data files in your dataset is more than the given upload limits listed in Table 1.
Total Upload Size Limit
Why Use This Tool?
< 10 GB
Self-managed process using the ESS-DIVE data portal. Easiest for managing small numbers of files and datasets
< 500 GB
Self-managed process and most time efficient for uploading many data files at once (to one or multiple datasets) using programmatic tools
User friendly web service for automated high performance transfers, including support for hierarchical folders and very large datasets
Table 1: ESS-DIVE's submission tools and their upload limits.
ESS-DIVE can store datasets with data volumes more than a Terabyte in size.
If your project needs to upload more than a Terabyte of data, ESS-DIVE will need to prepare to accommodate this. Contact [email protected] to inform us of your needs.
This section gives a brief overview of the submission tools available for creating and editing datasets on ESS-DIVE.
The ESS-DIVE data submission web form is the easiest way to submit small datasets. For step-by-step instructions on how to create a dataset refer to the how-to guide linked below and ESS-DIVE's Tutorial Videos. When completing the required metadata fields, use our Dataset Requirements guide, which was created using the community-developed NCEAS FAIR data standards to ensure our repository contains high-quality and useful data.
ESS-DIVE's Dataset API allows you to programmatically submit many datasets at once. Detailed tutorials and example code for dataset submissions with the API are provided both in the Dataset API guide (linked below) and ESS-DIVE's API Examples GitHub repository. Example code is available in Python, Java, and R. Examples of the expected metadata schema are available at https://api.ess-dive.lbl.gov/.
Globus is a cloud-based data transfer service designed to move significant amounts of data and does not require writing code. Globus can help you upload data to ESS-DIVE, but it cannot be used to curate metadata. You'll need to create your dataset and curate metadata via the Submission Form or the Dataset API. When using Globus, it is necessary to work with the ESS-DIVE Team to complete data file uploads.
If you have immediate questions about uploading data using Globus OR you are encountering upload issues with data volumes less than 500GB, contact the ESS-DIVE Support Team ([email protected]) to discuss your upload options and potentially using Globus.
Detailed instructions for getting setup with and using Globus will be available soon.
Datasets on ESS-DIVE contain related data and metadata files. Each dataset should contain all the relevant data and metadata necessary for a general user to be able to understand and reuse the data.
All data generated in the scientific process may be worth preservation, including raw data, processed data that has gone through extensive QA/QC and transformations, and results of analyses. DataONE has a good summary of best practices in determining what data to preserve (https://www.dataone.org/best-practices/decide-what-data-preserve).
Our general recommendation is to publish data that has the greatest potential to be scientifically useful to others, and to deep archive the rest of the data for reproducibility of the results. Consider that your data may be reused in other studies, and include enough descriptive information so that others could understand your data in the future. The Digital Curation Centre (DCC) has a list of potential future purposes for data (http://www.dcc.ac.uk/resources/how-guides/five-steps-decide-what-data-keep#4), which also may be helpful in determining what data to publish.
Additionally, we ask that data contributors adhere to existing data standards or data reporting formats when applicable in order to make the data stored in ESS-DIVE as useful as possible. To learn more about reporting formats, visit the following page in our guide.
The ESS-DIVE publication process begins with a user gathering files to be included in a dataset and uploading them via the Data Submission Form or Dataset API. The user specifies metadata associated with the dataset, including author and citation information, as well as related references. The user then submits the dataset which saves the metadata and accompanying data files to ESS-DIVE as a private dataset. The dataset can be revised as frequently as needed. We recommend that you use the Automated Quality Reports to verify that your package will pass the ESS-DIVE criteria for publication.
When all the revisions are complete, you can publish the dataset. Published data are made available through the ESS-DIVE and DataONE search catalogs, can be downloaded by the public, and can be modified after publication. Publication requests need to be made by selecting the "Publish" button on the dataset landing page. After a publication request is received, the ESS-DIVE team reviews the dataset and assigns a unique Digital Object Identifier (DOI) if an existing DOI is not provided. The ESS-DIVE team will email you regarding any changes that need to be made to the dataset before publication. Publication times range from a few days to a few weeks, depending on the complexity and quality of the dataset. Incomplete submissions and slow response times will result in delayed publication.