Reproducible Research for Scientific Computing: Tools and Strategies for Changing the Culture
peter.suber's bookmarks 2021-07-20
Summary:
"Without concerted effort and broad agreement on goals and procedures, both individual scientists and scientific institutions face considerable challenges and disincentives for implementing reproducible research. Nevertheless, we call upon all computational scientists to practice reproducibility, even if only privately and for the benefit of your current and future research efforts: use version control, write a narrative, automate your process, track your provenance, and test your code. Keep in mind during this process that reproducibility is not an all-or-nothing affair, but rather a social construct with a spectrum of meanings that supports a gradual learning curve. Furthermore, from private reproducibility it’s only a small effort to achieve public reproducibility if circumstances warrant: simply release the code and data under a suitable license. We also call upon all interested computational scientists to tackle institutional and community challenges. This effort can take a variety of forms—for example, train your students and postdocs in reproducibility, publish examples of reproducible research in your field, request code and data when reviewing, submit to and review for journals that support reproducible research, critically review and audit data management plans in grant proposals, and consider reproducibility wherever possible in hiring, promotion, and reference letters. Such efforts convince our representatives at funding agencies, journal editorial boards, universities, and scientific societies that reproducibility is a worthwhile goal, and provide ammunition to bring these efforts to the attention of broader and higher audiences. Last, we call upon all stakeholders to consider code a vital part of the digitization of science. A focus on data policies alone not only misses the unique features of code and its importance to reproducibility but fails to see that code is integral to all stages of data use. Digital datasets are not only analyzed by code, they’re also deposited, made available, collated, filtered, and sometimes even created by code. An exclusive emphasis on open data is a missed opportunity to resolve the current credibility crisis facing computational science and engineering. If we seek to elevate computation into a third pillar of the scientific method alongside theory and experiment, we must overcome relaxed attitudes toward reproducibility. Changing a culture isn’t a simple task, but it can be accomplished through individual and small group efforts."