Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Research Reproducibility: Definitions, Essential Components, Tools: Reproducibility Problems

This guide is intended as a primer on research reproducibility.

Poor reporting and sharing

Irreproducibility is generally the result of inadequate information. Without sufficient documentation and availability of the data, metadata, analysis methods, computational coding, and models, it is impossible to reproduce the numbers and figures published in journal articles and other reports.

  • Data: data must first be made available in order for results to be reproduced. Public availability of data may be inhibited by ethical considerations, but generally, some method of providing data upon request may still be accommodated. 
    • Merely making data available is often not sufficient. In order to be meaningfully usable, the data must also be accessible, interoperable, and reusable. In many cases, insufficient metadata (lack of data dictionaries and code books) or providing data in a format that requires proprietary software or access to software built by you or your lab means that the data is functionally inaccessible. The principles of making data Findable, Accessible, Interoperable, and Reusable are known as the FAIR principles.
    • It is also critically important to make the meaningful data sets for a research study available. Data may undergo many changes throughout a research project, such as when data is cleaned or after some data points are recoded. Failing to provide the interim data sets can make the means of going from raw data to reported and published results utterly irreproducible.
  • Methods:
    • The word limits, writing conventions, and text-based limitations of methods sections can lead to inadequate information being provided to describe the exact methods used for a given research project. 
      • Consider the fact that data cleaning decisions must be fully described in order to reproduce the steps it took to go from raw data to the final data set.
      • The actual statistical or mathematical operations used to analyze data must also be made available. Merely describing the type of statistical or mathematical steps used may not allow another person to undertake the exact same steps. Often, the steps involve assumptions and frameworks that are not delineated within the methods section of a manuscript.
      • Computational results result from computing scripts for a given program. As Donoho (2010) noted, it is often unclear what has actually been done, and an examination of the computational scripts often reveals that the description does not match the actual processes. 
  • Tools and operating systems details:
    • The exact software and hardware used (including version and operating systems) need to be provided in order for another person to reproduce analyses and results. For example, rerunning computational scripts requires knowledge of the software (including version) used to create and run the scripts.

References

Donoho, D. L. (2010). An invitation to reproducible computational research. Biostatistics, 11(3), 385–388. https://doi.org/10.1093/biostatistics/kxq028

FAIR Principles. (n.d.). GO FAIR. Retrieved March 30, 2021, from https://www.go-fair.org/fair-principles/