Do research articles have to be so one-sided?

Statistical Modeling, Causal Inference, and Social Science 2024-04-17

It’s standard practice in research articles as well as editorials in scholarly journals to present just one side of an issue. That’s how it’s done! A typical research article looks like this:

“We found X. Yes, we really found X. Here are some alternative explanations for our findings that don’t work. So, yeah, it’s really X, it can’t reasonably be anything else. Also, here’s why all the thickheaded previous researchers didn’t already find X. They were wrong, though, we’re right. It’s X. Indeed, it had to be X all along. X is the only possibility that makes sense. But it’s a discovery, it’s absolutely new. As was said of the music of Beethoven, each note is prospectively unexpected but retrospectively absolutely right. In conclusion: X.”

There also are methods articles, which go like this:

“Method X works. Here’s a real problem where method X works better than anything else out there. Other methods are less accurate or more expensive than X, or both. There are good theoretical reasons why X is better. It might even be optimal under some not-too-unreasonable conditions. Also, here’s why nobody tried X before. They missed it! X is, in retrospect, obviously the right thing to do. Also, though, X is super-clever: it had to be discovered. Here are some more examples where X wins. In conclusion: X.”

Or the template for a review article:

“Here’s a super-important problem which has been studied in many different ways. The way we have studied it is the best. In this article, we also discuss some other approaches which are worse. Our approach looks even better in this contrast. In short, our correct approach both flows naturally from and is a bold departure from everything that came before.”

OK, sometimes we try to do better. We give tentative conclusions, we accept uncertainty, we compare our approach to others on a level playing field, we write a review that doesn’t center on our own work. It happens. But, unless you’re Bob Carpenter, such an even-handed approach doesn’t come naturally, and, as always with this kind of adjustment, there’s always the concern of going too far (“bending over backward”) in the other direction. Recall my criticism of the popular but I think bogus concept of “steelmanning.”

So, yes, we should try to be more balanced, especially when presenting our own results. But the incentives don’t go in that direction, especially when your contributions are out there fighting with lots of ideas that other people are promoting unreservedly. Realistically, often the best we can do is to include Limitations sections in otherwise-positive papers.

One might think that a New England Journal of Medicine editorial could do better, but editorials have the same problem as review articles, which is that the authors will still have an agenda.

Dale Lehman writes in, discussing such an example:

A recent article in the New England Journal of Medicine caught my interest. The authors – a Harvard economist and a McKinsey consultant (properly disclosed their ties) – provide a variety of ways that AI can contribute to health care delivery. I can hardly argue with the potential benefits, and some areas of application are certainly ripe for improvements from AI. However, the review article seems unduly one-sided. Almost all of the impediments to application that they discuss lay the “blame” on health care providers and organizations. No mention is made about the potential errors made by AI algorithms applied in health care. This I found particularly striking since they repeatedly appeal to AI use in business (generally) as a comparison to the relatively slow adoption of AI in health care. When I think of business applications, a common error might be a product recommendation or promotion that was not relevant to a consumer. The costs of such a mistake are generally small – wasted resources, unhappy customers, etc. A mistake made by an AI recommendation system in medicine strikes me as quite a bit more serious (lost customers is not the same thing as lost patients).

To that point, the article cites several AI applications to prediction of sepsis (references 24-27). That is a particular area of application where several AI sepsis-detection algorithms have been developed, tested, and reported on. But the references strike me as cherry-picked. A recent controversy has concerned the Epic model (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8218233/?report=classic) where the company reported results were much better than the attempted replication. Also, there was a major international challenge (PhysioNet: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6964870/) where data was provided from 3 hospital systems, 2 of which provided the training data for the competition and the remaining system was used as the test data. Notably, the algorithms performed much better on the systems for which the training data was provided than for the test data.

My question really concerns the role of the NEJM here. Presumably this article was peer reviewed – or at least reviewed by the editors. Shouldn’t the NEJM be demanding more balanced and comprehensive review articles? It isn’t that the authors of this article say anything that is wrong, but it seems deficient in its coverage of the issues. It would not have been hard to acknowledge that these algorithms may not be ready for use (admittedly, they may outperform existing human models, but that is an area on which there is research and it should be noted in the article). Nor would it be difficult to point out that algorithmic errors and biases in health care may be a more serious matter than in other sectors of the economy.

Interesting. I’m guessing that the authors of the article were coming from the opposite direction, with a feeling that there’s too much conservatism regarding health-care innovation and they wanted to push back against that. (Full disclosure: I’m currently working with a cardiologist to evaluate a machine-learning approach for ECG diagnosis.)

In any case, yes, this is part of a general problem. One thing I like about blogging, as opposed to scholarly writing or journalism, is that in a blog post there’s no expectation or demand or requirement that we come to a strong conclusion. We can let our uncertainty hang out, without some need to try to make “the best possible case” for some point. We may be expected to entertain, but that’s not so horrible!