The NIH has released their Final Policy for Data Management and Sharing on October 29, 2020 and it will be effective in January, 2023. More information on the new policy is available in another tab for this guide. For specific questions and support related to the new NIH Policy, please contact either: Kerry Sewell, Research Librarian for the Health Sciences (BROWDERK@ECU.EDU, 252-744-0477) or Scholarly Communication Department, email@example.com, 252-328-2261.
Data management consists of a series of considered, systematic approaches to storing, organizing, documenting, preserving, and sharing data collected during a research project. As such, it spans the research life cycle. More specifically, data management includes (but is not limited to!) basic practices such as consistent, documented file and folder naming conventions, documenting data cleaning and analysis procedures, and ensuring that final file formats allow for long-term access and use.
In considering data management it's important to think of data in broad terms. While definitions of what constitute data vary slightly between funders and other research stakeholders, research data is broadly defined as "the recorded factual material commonly accepted in the scientific community as necessary to validate research findings,” as defined by the U.S. Office of Management and Budget (OMB Circular 110, last revised in 1993 and amended in 1999). There are two key aspects to this broad definition of data:
- Research approaches and subjects mean that data includes far more than just spreadsheet-formatted data. Data created or captured during research includes qualitative data in audio and transcribed formats, images from various imaging studies such as MRIs, GIS data, video files, paper forms, and many other digital or physical items. Some research groups also include lab notebooks and physical specimens in their definitions of data. In other cases, these are excluded from definitions of research data.
- Data files by themselves (e.g. just a solitary audio, video, or spreadsheet-formatted file) are typically not enough to allow someone to validate research findings. Consider the difficulty of understanding spreadsheet data without a document explaining the variable names and units of measurement! Because such data cannot be used to validate research findings without the accompanying documentation that allows a secondary user of the data to understand what's in the file and how it was collected or created, data must also include or be accompanied by documentation of file contents, interview guides for qualitative research, software and hardware information, and other key documentation to provide meaning and context to the collected or created data.
What's the big deal?
- First and foremost, data management benefits the researcher:
- Good research data management makes it easier to find files and folders during and after a research project
- The likelihood of losing data and data files is lower
- Having good data documentation makes it easier to onboard new research team members in labs or collaborative study projects
- Writing the methods section of a manuscript is far easier when you have clear documentation of the entire data collection, cleaning, and analysis workflow
- If you submit a manuscript and are asked for new or tweaked analyses of your data, it's much easier to go back to earlier data versions and quickly respond to these types of peer reviewers' requests.
- When well-managed data is shared, it benefits the public and science more broadly
- Data with high-quality documentation increase the reproducibility of published research
- Openly providing documentation of all study instruments and procedures also allows other researchers to more easily replicate the study to either confirm findings in a similar population or situation or else replicate the study in a new population
- Shared data promotes public trust in your research
- Publications with shared data tend to be cited more often
- New analyses of existing data reduce research waste by allowing new knowledge to be produced without the time- and resource-intensive costs of collecting new data. This reduces the burden on scientists as well as the human participants or animals
- Data sets can also be used to train students on how to analyze data