Specification curve analysis and the multiverse
Statistical Modeling, Causal Inference, and Social Science 2024-11-12
I just learned about this paper from 2020, Specification curve analysis, by Uri Simonsohn, Joseph Simmons, and Leif Nelson:
Empirical results hinge on analytical decisions that are defensible, arbitrary and motivated. These decisions probably introduce bias (towards the narrative put forward by the authors), and they certainly involve variability not reflected by standard errors. To address this source of noise and bias, we introduce specification curve analysis, which consists of three steps: (1) identifying the set of theoretically justified, statistically valid and non-redundant specifications; (2) displaying the results graphically, allowing readers to identify consequential specifications decisions; and (3) conducting joint inference across all specifications. We illustrate the use of this technique by applying it to three findings from two different papers, one investigating discrimination based on distinctively Black names, the other investigating the effect of assigning female versus male names to hurricanes. Specification curve analysis reveals that one finding is robust, one is weak and one is not robust at all.
This came up in a conversation about the multiverse, a concept described in this paper from 2016, Increasing transparency through a multiverse analysis, by Sara Steegen, Francis Tuerlinckx, Wolf Vanpaemel, and myself:
Empirical research inevitably includes constructing a data set by processing raw data into a form ready for statistical analysis. Data processing often involves choices among several reasonable options for excluding, transforming, and coding data. We suggest that instead of performing only one analysis, researchers could perform a multiverse analysis, which involves performing all analyses across the whole set of alternatively processed data sets corresponding to a large set of reasonable scenarios. Using an example focusing on the effect of fertility on religiosity and political attitudes, we show that analyzing a single data set can be misleading and propose a multiverse analysis as an alternative practice. A multiverse analysis offers an idea of how much the conclusions change because of arbitrary choices in data construction and gives pointers as to which choices are most consequential in the fragility of the result.
The ideas of the two papers are very similar, but with different perspectives: We’re viewing multiverse analysis in a more exploratory way; they’re using it as a way to do hypothesis testing.
What’s interesting is how the same idea came up completely separately. When writing our multiverse paper, we had not been aware of any work being done by Simonsohn et al. on this topic (beyond their classic 2011 paper on researcher degrees of freedom, which definitely informed our ideas), and it seems that, when writing their specification curve paper, they were not aware of our work on the multiverse. It makes sense that this idea was in the air, and two research groups came to it independently; it’s just interesting to be part of the story.
I did some searching and it seems that both papers have been influential, and I guess they serve different needs—some researchers are more into exploration, others want hypothesis tests—so it’s maybe for the best that we and they worked on the topic completely separately, thus giving people two different paths into this approach.
Also it’s funny that one of the examples in the Simonsohn et al. (2020) paper is the notorious himmicanes study, which various people including me have criticized. It’s a small world.