In decreasing order of importance: (1) The error, (2) How the error persisted, (3) The research misconduct, (4) Who did it.

Statistical Modeling, Causal Inference, and Social Science 2025-01-29

James Heathers wrote this screed (I mean that in a good way!) about the report by Harvard University on a business school professor who was involved in fraudulent research.

For some background, here’s a link to a news story on the Harvard investigation, more background is here, and here are some of my own reanalyses of those data.

From various discussions of this case, four points are clear:

1. There is no dispute that the published work in question failed to demonstrate the effects it purported to show. There were four published papers that were scientific failures—not in the good sense of reporting some failed experiment and trying to figure out what aspect of theory, experiment, measurement, or analysis went wrong, but in the bad sense of making a confident claim and saying it was borne out by experiment, even though it wasn’t.

To give an analogy to news reports, imagine three possible news stories: (a) An investigation was performed into something suspicious, nothing came of the investigation, and the failure was reported. (b) An investigation was performed into something suspicious, nothing came of the investigation, and no report was released. (c) An investigation was performed into something suspicious, nothing came of the investigation, but it was reported as if the investigation was successful. Option (c) is the worst! If a news organization is going to do option (c), they’d be better off not pretending to do reporting at all. It would be better for them to just run stories about fires and funny animals and other traditional local-news staples.

2. The erroneous published work had a wide audience. These papers and related work were published in top journals, they supported careers at leading research institutions, they were promoted by celebrities and featured in major media outlets.

3. There is no dispute that there was fraud in the data processing and analysis of these studies. The four published papers in question were not just a blight on science, they were an intentional blight on science. I don’t care so much about this—recall Clarke’s Law—but, sure, it’s part of the story.

4. There is some residual dispute about who was directly involved in the fraud. As Heathers puts it, there are various “unconvincing” but “technically possible” alternative explanations for how exactly the data were manipulated. In the words of the Harvard report, the research conduct was conducted “intentionally, knowingly, or recklessly.” I guess that, of those three, “reckless” is the most positive option.

In decreasing order of importance

Heathers also writes about a separate investigation by Duke University of another business-school professor involved in this research:

The dataset was massively and provably falsified. . . . There is not enough evidence to determine [who did it].

Ultimately, the who-did-it question is less important than the what-happened question and the how-did-this-stuff-get-promoted-on-NPR-etc. question. Here’s Heathers:

The question of “can the work be trusted?” is not a personnel matter. It is an “everyone in the scientific community” matter. It is an “are we ruining the legacy of the Enlightenment” matter. . . . But the real game is in the actual harms that come from putting out untrustworthy research. These start with the broken degrees or careers of junior collaborators whose efforts have been wasted. They continue through the expenditure of further thousands of hours, and millions of dollars, on pointless follow-up research. And then, sometimes, but more often than you’d hope, they kill people. . . .

As I said before, it’s rational for an institution to want to know [who was responsible for the research misconduct] . . . And, yes, I think it’s justified to keep this bit private if you want, because it IS a personnel matter.

But the work itself is an everyone-else-as-well matter. And sewing the two together just lets you hide the mechanics behind the HR, keep it all from public view, and avoid scrutiny.

As usual. For shame.

Here’s my restatement:

In decreasing order of importance:

(1) The error, (2) How the error persisted, (3) The research misconduct, (4) Who did it.

Unfortunately, as Heathers says, the emphasis is often in the reverse order.