Skip to Main Content

NIH Policy for Data Management and Sharing: Selecting a Data Repository

Provides information on the NIH policy updates for data management plans and sharing, effective January 25, 2023. Lack of compliance can lead to a delay in receipt of funds.

Annoucement

Update Notice: Changes to UNC Dataverse Terms of Use and User Policy

Effective February 1, 2025, ECU users will no longer be able to add or edit dataverses or datasets in UNC Dataverse. However, any dataverses and datasets already published in UNC Dataverse will remain preserved.

ECU users will still be able to search, discover, and download openly available data in UNC Dataverse.

Data Repository Types

Data repositories can be categorized based on their purpose, scope, and the types of data they store. Below are three common types of data repositories:

  • Domain-Specific Repositories: Accept specific data types or works from certain discipline or field of study, often following community standards for metadata and formats

You can find NIH-supported domain-specific repositories here.

  • Generalist Repositories: Accept data regardless of data type, format, content, or disciplinary focus, offering a broad platform for sharing and preserving research data. If you are unable to find a repository specific to your discipline or the type of data you generate, a generalist repository can serve as a valuable alternative.

    The following seven generalist repositories are part of the NIH Generalist Repository Ecosystem Initiative (GREI):

 Download this Generalist Repository Comparison Chart for the key features of the seven generalist repositories. 

  • Institutional Repositories: Hosted by universities, research institutions, or libraries to archive and provide access to their community's research outputs, including datasets, articles, and theses.

The ScholarShip, ECU’s current institutional repository, is a digital archive for the scholarly output of the ECU community and accommodates smaller datasets.

NIH Desirable Characteristics for All Data Repositories

When selecting a repository to manage and share data, the NIH recommends researchers look for the following desirable characteristics:

  • Unique Persistent Identifiers​ (PIDs): Assigns PIDs, such as DOIs, to datasets to ensure consistent and reliable access.
  • Long-Term Sustainability​: Maintains data for the long term with appropriate measures to prevent data loss or degradation.
  • Metadata​: Supports rich metadata and documentation standards to make data understandable, searchable, and reusable.
  • Curation and Quality Assurance​: Includes processes to review and curate submitted data, ensuring quality and completeness.
  • Free and Easy Access​: Enables free and easy access to datasets without imposing paywalls or restrictive access policies.
  • Broad and Measured Reuse: Promotes broad reuse by assigning clear reuse terms and enable metrics for attribution and citation. 
  • Clear Use Guidance​: Provides detailed documentation about terms of use, including licensing and requirements for accessing sensitive data.
  • Security and Integrity​: Implements measures to protect datasets from unauthorized access or alterations.
  • Confidentiality​: Provides controlled access mechanisms for sensitive data, such as human data, to ensure compliance with ethical and legal standards.
  • Common Format: Supports widely used, non-proprietary formats for downloading and exporting data. ​
  • Provenance​: Supports versioning to track origin, ownership, and modifications of datasets to maintain transparency and accountability.
  • Retention Policy: Outlines policies for how long data will be retained and managed within the repository.

Workflow for Choosing a Repository Under the NIH DMS Policy

The NIH strongly encourages researchers to use established repositories to preserve and share scientific data. Their repository selection page provides a workflow to help researchers choose an appropriate repository:

1. Check for NIH-Designated Repositories

  • Start by reviewing NIH and/or Institute, Center, or Office (ICO) policies and funding opportunities to identify any specific repositories recommended or required for preserving and sharing data. Researchers should use these designated repositories when specified.

2. Explore Discipline-Specific Repositories

  • If no repository is specified by the NIH, researchers should prioritize selecting discipline-specific repositories tailored to their research field or data type. These repositories enhance discoverability and usability within specific scientific communities.

3. Consider Other Data Sharing Options When Needed

If no suitable discipline-specific repository is available, researchers should explore alternative options:

  • Small Datasets: For datasets up to 2 GB, include them as supplementary material with articles submitted to PubMed Central (follow their submission instructions).
  • Generalist or Institutional Repositories: Use repositories that enable sharing with the larger research community, institutions, or the public.
  • Large Datasets: Use cloud-based repositories for datasets requiring substantial storage capacity.

 

    Source: https://sharing.nih.gov/data-management-and-sharing-policy/sharing-scientific-data/selecting-a-data-repository

Best Practices for Selecting a Repository

We encourage researchers to use established data repositories to store and share their data in alignment with NIH DMS policy. Best practices include:

1. Follow the NIH Repository Selection Workflow

Use the NIH workflow to identify and select the most appropriate repository for your data.

2. Centralize Project Data

Deposit all data generated from a single project in one repository if possible, or ensure your data is integrated by linking publications to the associated datasets.

3. Prioritize Open and Accessible Repositories

Avoid depositing data in publisher-hosted repositories, as they often impose paywalls that restrict data access. Instead, choose repositories that offer free and easy access to enhance data discoverability and reusability.

4. Special Considerations for Human Data

Pay particular attention to handling human data, including de-identified data, and ensure proper use of repositories that support restricted access or usage when required. Follow NIH guidelines on when to share scientific data through controlled-access repositories. You can find NIH supported controlled-access repositories by clicking here or keyword searching of this list.

For commonly used repositories for storing genomic data, click here.

 

  NNLM Data Repository Finder

The Data Repository Finder is a tool provided by the Network of the National Library of Medicine (NNLM) that helps researchers locate appropriate repositories for sharing and preserving health and biomedical data. By answering a few questions, researchers can use this tool to find NIH-supported repositories that align with their data-sharing needs.