Importance of publishing data and code
Language Log 2013-04-22
J.W. writes:
In connection with some of your prior statements on the Log about the importance of publishing underlying data, you might be interested in Thomas Herndon, Michael Ash, and Robert Pollin, "Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff", PERI 4/15/2013 (explanation in lay language at "Shocking Paper Claims That Microsoft Excel Coding Error Is Behind The Reinhart-Rogoff Study On Debt", Business Insider 4/16/2013). In sum, a look at the data spreadsheet underlying a really influential 2010 economics paper reveals that its results were driven by selective data exclusions, idiosyncratic weighting, and an Excel coding error [!].
Indeed.
Paul Krugman notes another recent case of "death by Excel". And there have been some equally shocking and damaging cases in several other fields, including translational cancer research. From the abstract of Keith Baggerly's presentation in the symposium on "Reproducible Science" at AAAS 2011:
In this talk, we examine several related papers using array-based signatures of drug sensitivity derived from cell lines to predict patient response. Patients in clinical trials were allocated to treatment arms based on these results. However, we show in several case studies that the reported results incorporate several simple errors that could put patients at risk. One theme that emerges is that the most common errors are simple (e.g., row or column offsets); conversely, it is our experience that the most simple errors are common. We briefly discuss steps we are taking to avoid such errors in our own investigations.
In the case that Keith explores in detail, the program involved was R rather than Excel. For more detail, see Keith Baggerly and Kevin Coombes, "What Information Should Be Required to Support Clinical 'Omics'Publications?", Clinical Chemistry 2011:
A major goal of “omics” is personalizing therapy—the use of “signatures” derived from biological assays to determine who gets what treatment. Recently, Potti et al. (1) introduced a method that uses microarray profiles to better predict the cytotoxic agents to which a patient would respond. The method was extended to include other drugs, as well as combination chemotherapy (2, 3). We were asked if we could implement this approach to guide treatment at our institution; however, when we tried to reproduce the published results, we found that poor documentation hid many simple errors that undermined the approach (4). These signatures were nonetheless used to guide patient therapy in clinical trails initiated at Duke University in 2007, which we learned about in mid-2009. We then published a report that detailed numerous problems with the data (5). As chronicled in The Cancer Letter, trials were suspended (October 2, 9, and 23, 2009), restarted (January 29, 2010), resuspended (July 23, 2010), and finally terminated (November 19, 2010). The underlying reports have now been retracted; further investigations at Duke are under way. We spent approximately 1500 person-hours on this issue, mostly because we could not tell what data were used or how they were processed. Transparently available data and code would have made checking results and their validity far easier. Because transparency was absent, an understanding of the problems was delayed, trials were started on the basis of faulty data and conclusions, and patients were endangered. Such situations need to be avoided.
In my opinion, problems of this general kind are endemic in linguistics, psychology, and other fields as well — our errors are not likely to damage world economies or cancer patients, but truth and science do suffer.
And of course there's a spectrum of error, delusion, and fraud, from simple coding errors to hidden choices about data exclusion, data weighting, modeling choices, hypothesis shopping, data dredging, and so forth. The current system of peer review, although it often delays publication for two years or more, does a very bad job of detecting problems on this spectrum. A more streamlined reviewing system, with insistence on publication of all relevant data and code, and provisions for post-publication peer commentary, would be much better.