OpenDORS: A dataset of openly referenced open research software
peter.suber's bookmarks 2025-12-03
Summary:
Abstract: In many academic disciplines, software is created during the research process or for a research purpose. The crucial role of software for research is increasingly acknowledged. The application of software engineering to research software has been formalized as research software engineering, to create better software that enables better research. Despite this, large-scale studies of research software and its development are still lacking. To enable such studies, we present a dataset of 134,352 unique open research software projects and 134,154 source code repositories referenced in open access literature. Each dataset record identifies the referencing publication and lists source code repositories of the software project. For 122,425 source code repositories, the dataset provides metadata on latest versions, license information, programming languages and descriptive metadata files. We summarize the distributions of these features in the dataset and describe additional software metadata that extends the dataset in future work. Finally, we suggest examples of research that could use the dataset to develop a better understanding of research software practice in RSE research.