Skip to Main Content

NIH Policy for Data Management and Sharing: Implementing DMS

Provides information on the NIH policy updates for data management plans and sharing, effective January 25, 2023. Lack of compliance can lead to a delay in receipt of funds.

Overview

For researchers, implementing the 2023 NIH Data Management and Sharing (DMS) Policy requires thoughtful planning and consistent adherence to its requirements throughout the research data lifecycle. This involves integrating robust data management practices at every stage to ensure compliance, enhance data quality, and facilitate effective data sharing.

Below is a biomedical data lifecycle diagram created by the Harvard Longwood Medical Area Research Data Management Working Group. The diagram outlines the key stages of research, including data collection, use, and storage. At its center is "Store & Manage"— the critical activity of successful project data management.

While the lifecycle is typically structured as a linear progression from "Plan & Design" to "Publish & Reuse," researchers may find themselves revisiting earlier stages as their project evolves. For instance, DMPs are initially created during the planning phase but play an active role throughout the research process. The plans provide a roadmap for how data will be handled, including collection, creation, and sharing. They also address essential considerations such as documentation, security, and long-term reuse. By thoughtfully managing data at every stage, researchers can ensure their work aligns with the DMS policy while improving data quality and promoting collaboration and reproducibility in research.

 

                             

 

Cioffi, M., Goldman, J., & Marchese, S. (2023). Harvard Biomedical Research Data Lifecycle (Version 5). Zenodo. https://doi.org/10.5281/zenodo.8076168

Best Practices for Data Management at Each Stage of the Data Lifecycle

Goals: Prepare for compliance by developing a robust DMS Plan and budget for data management and sharing costs.

Actions:

  • Develop a comprehensive DMS Plan that describes how data will be managed and shared in alignment with NIH requirements:
    • Define data types to be generated, shared, and preserved.
    • Identify metadata standards, repositories, and sharing mechanisms.
    • Storage solutions and data security needs.
    • Address ethical/legal concerns for human participant data (e.g., consent, de-identification).
  • Budget for Data Management:
    • Allocate costs for personnel, software, repository fees, and long-term storage. Click here for details.
  • Determine Responsibilities:
    • Assign roles (e.g., data manager or PI) for overseeing data management and sharing activities.

Goals: Generate high-quality, well-documented, and organized data.

Actions:

  • Standardize Data Formats:
    • Use consistent file formats and naming conventions.
  • Document Data:
    • Create detailed metadata, including information about collection methods, instruments, and conditions.
  • Ensure Data Security:
    • Use secure systems to store sensitive data, following institutional and NIH guidelines.
  • Ethical Compliance:
    • Obtain informed consent for data use and follow approved protocols for data collection.

Goals: Organize, process, and analyze data while maintaining integrity and reproducibility.

Actions:

  • Data Cleaning and Quality Control:
    • Standardize datasets, remove duplicates, and address missing data.
  • Version Control:
    • Track changes to data files and scripts and store processed data in intermediate storage with clear versioning.
  • Document Analytical Methods:
    • Record processing and analyzing steps, including scripts and software tools, to allow reproducibility. Use code repositories like GitHub for version control of analysis scripts.

Goals: Ensure long-term accessibility and preservation of data.

Actions:

  • Select Repositories:
    • Deposit data into FAIR-compliant repositories for preservation and sharing. Depending on your data, NIH has set repositories for sharing. See how to select an appropriate repository here.
    • Confirm that your chosen repository supports long-term data access.
  • Preserve Data:
    • Store raw, processed, and analyzed data securely.
    • Use open and widely accepted file formats (e.g., CSV for tabular data, TIFF for images) to enhance sharing, reuse, and interoperability. Refer to The Library of Congress's detailed list of recommended file formats for guidance. 
    • Retain sufficient metadata and documentation to allow future users to understand and reuse the data.
  • De-identify Sensitive Data:

Goals: Share data with the research community, adhering to FAIR principles.

Actions:

  • Share data in repositories specified in the DMS Plan:
    • Submit datasets with accompanying metadata and documentation.
  • Provide Persistent Identifiers:
    • Ensure datasets have DOIs or other identifiers to facilitate citation.
  • Data Access:
    • Ensure shared data complies with privacy laws (e.g., HIPAA), including de-identification where needed.
    • Specify any restrictions or access conditions in repository settings.
    • Select a license that aligns with your goals for data sharing and reuse. Open licenses like Creative Commons (e.g., CC BY or CC0) or open data licenses (e.g., Open Data Commons) are widely recognized and encourage reuse while maintaining proper attribution.

If you are not sure which license to choose, the Creative Commons License Chooser can help you decide in just a few simple steps.

Goals: Enable data to be reused for new research, validation, or education.

Actions:

  • Encourage Reuse:
    • Ensure shared data is complete, well-documented, and easy to interpret.
    • Promote shared datasets by linking them to publications, conference presentations, or institutional profiles.
    • Respond to inquiries from other researchers to facilitate collaboration.
  • Comply with Licenses:
    • Use open licenses (e.g., Creative Commons) to define reuse terms and specify what others are allowed to do with your data. For example, a CC BY license permits others to share, adapt, and use your data as long as they credit you. CC0 license eliminates restrictions on reuse and enables you to give up your copyright to make your data accessible for a broader audience.
    • If you need to impose limits, such as prohibiting commercial use or requiring derivatives to be shared alike, choose a license that reflects these conditions (e.g., CC BY-NC or CC BY-SA).
    • Display the license prominently in your metadata, documentation, or alongside the dataset itself to remove ambiguity about usage rights.

Report and Monitor DMS

   Remember: 

 

 

Reference:

Harvard Longwood Medical Area Research Data Management Working Group. "Research Data Management Lifecycle Checklist." Retrieved January 16, 2025. https://osf.io/d2pum