Statistical code for clinical research papers in a high-impact specialist medical journal - PMC
peter.suber's bookmarks 2022-11-28
Abstract: Background It is widely accepted that statistical analyses should be implemented by writing good quality code in a professional statistical package such as R, SAS or Stata. Good code ensures reproducibility, reduces error, and provides auditable documentation of the analyses underpinning research results. There have been several recent efforts to encourage archiving of code corresponding to published papers1–5, on the grounds that doing so improves transparency. Such efforts have focused on areas such as neuroscience or bioinformatics, which are highly dependent on computationally intensive analyses. Objective To examine how often authors used statistical code for clinical research papers published in a high-impact specialty journal, and to determine the quality of this code. Methods and Findings In mid-2016, we added to the online submission system for European Urology a question regarding whether authors used statistical code and, if so, whether they would be willing to submit it were their paper to be accepted. In August 2017, we reviewed 314 papers subsequently accepted to the journal. Authors of 40 papers reported that they used statistical code. Authors archived the code with the journal for 18 of these papers, with the remaining 32 declining to do so. We randomly selected and reviewed 50 papers where authors had reported no code. Of these 50, 35 presented no statistics (e.g. a narrative review of the literature) or only trivial analyses (e.g. a single survival curve). The remaining 15 included substantive analyses, such as large numbers of regression models, graphs or time-to-event statistics. We contacted the corresponding authors for these 15 papers; 8 told us that they did not use code but 7 responded that they had indeed used code and that their initial response was erroneous. In 6 of these 7 cases, the authors refused to submit their code to the journal. We then examined the all code sets received, excluding code associated with 3 papers submitted by authors trained in our group. Most of the code had little or no annotation and extensive repetition. For half of the papers, the reviewed code included no formatting for presentation (Table 1). Discussion No statistical code was used for more than a third of papers published in a high-impact specialist medical journal that included non-trial statistical analyses. Not a single set of code managed to score even moderately on three basic and widely accepted software criteria. This is not a superficial problem. For instance, failure to include code that formats numerical output increases the risk of transcription errors; repeated code can lead to inconsistent analyses.
From feeds:[IOI] Open Infrastructure Tracking Project » Items tagged with oa.floss in Open Access Tracking Project (OATP)
Open Access Tracking Project (OATP) » peter.suber's bookmarks