Addressing legitimate counterarguments in a scientific review: The challenge of being an insider

Statistical Modeling, Causal Inference, and Social Science 2024-12-24

Review articles can be written by outsiders or insiders.

From the outside, it’s easier. You assess the evidence and draw your conclusions. For example, this is what I did when summarizing the research on ballot-order effects and addressing the question of whether Donald Trump won the 2016 election because his name came first on the ballot in key states; see pages 240-242 of Active Statistics for the full story. Or when my colleagues and I wrote about “nudge” interventions. We could be right, we could be wrong, but in any case the job is clear enough.

Writing a review is more difficult from the inside. For example, consider the meta-analysis of nudge that was written by some nudge researchers. It had big problems: garbage in, garbage out. It’s hard to step back and examine the evidence, if you’re part of the story. For another example, we recently discussed a review of the controversial Implicit Association Test that was flawed in only presenting part of the story, not even acknowledging that there was a controversy.

What to do if you’re an insider and you want to write a review that includes some of your own work?

I don’t think you should just give up. As an insider, you have a special perspective, and it makes sense that you’ll want to review the evidence. At the same time, you have to avoid the natural inclination to try to present too much of a coherent story. In real life, the evidence doesn’t always all go in the same direction!

My recommendation, for the insider writing a review article, is not to try to debate or shoot down such counterarguments but rather to acknowledge the disagreement and fit it into your larger story.

Here’s an example where we did exactly that. Our article is called Reconciling Evaluations of the Millennium Villages Project, and it begins:

The Millennium Villages Project was an integrated rural development program carried out for a decade in 10 clusters of villages in sub-Saharan Africa starting in 2005, and in a few other sites for shorter durations. An evaluation of the 10 main sites compared to retrospectively chosen control sites estimated positive effects on a range of economic, social, and health outcomes (Mitchell et al. 2018). More recently, an outside group performed a prospective controlled (but also nonrandomized) evaluation of one of the shorter-duration sites and reported smaller or null results (Masset et al. 2020). Although these two conclusions seem contradictory, the differences can be explained by the fact that Mitchell et al. studied 10 sites where the project was implemented for 10 years, and Masset et al. studied one site with a program lasting less than 5 years, as well as differences in inference and framing. Insights from both evaluations should be valuable in considering future development efforts of this sort. Both studies are consistent with a larger picture of positive average impacts (compared to untreated villages) across a broad range of outcomes, but with effects varying across sites or requiring an adequate duration for impacts to be manifested.

“Mitchell et al. 2018” was us! I worked with the Millennium Villages Project to help conduct a retrospective evaluation, which yielded positive estimated effects. My take was that the project worked well. That’s what I believe, but I’m an insider. Masset et al. 2020 was an outside team that did a different analysis on different data and reported null results. The purpose of our review article was to understand how these two studies of the same program came to such different conclusions. In writing this paper we had to walk a fine line: we’re trying our best to assess the past work objectively, but one of the papers was ours. The key here is that we presented the disagreement—we did not pretend the dissenting article did not exist, nor did we dismiss it—; rather, we incorporated it into a larger understanding. I’m not saying that this new paper of ours was perfect, nor that it was influential—indeed, according to Google scholar it has exactly zero citations, really a low payoff given all the work we put into it—but I still think of it as a model for how to present and assess conflicting evidence from the inside.