Routine, bread-and-butter, run-of-the-mill junk science from 2015
Statistical Modeling, Causal Inference, and Social Science 2025-03-25
Just a reminder of what the world was like, in simpler days:
Someone writes in:
In the most recent absurd embodiment paper on wobbly stools leading to wobbly relationship beliefs, Psych Sci asked for a second study with a large N. The authors performed it, found no effect and then performed a mediation analysis to recover the effect. It’s a good example for garden of forking paths given that the mediation analysis is decided on post hoc and there are a number of ways to approach the problem.
No kidding! The article is called, “Turbulent Times, Rocky Relationships: Relational Consequences of Experiencing Physical Instability.” Almost a parody of a Psych Science tabloid-style paper. From the abstract:
Drawing on the embodiment literature, we propose that experiencing physical instability can undermine perceptions of relationship stability. Participants who experienced physical instability by sitting at a wobbly workstation rather than a stable workstation (Study 1), standing on one foot rather than two (Study 2), or sitting on an inflatable seat cushion rather than a rigid one (Study 3) perceived their romantic relationships to be less likely to last. . . . These findings indicate that benign physical experiences can influence perceptions of relationship stability, exerting downstream effects on consequential relationship processes.
This is no joke. Here’s how the paper begins:
The earthquake that struck Sichuan, China, in 2008 made headlines not only because of the tremendous loss of life it caused, but also because after the quake, Sichuan came to lead the country in number of divorces (Zhiling, 2010). Experts and popular media outlets made causal claims (e.g., “Earthquake Boosts Divorce Rate,” 2010). If the earthquake truly caused changes in Sichuan’s divorce rate, why might this be? Emotional distress, financial hardship, and mortality salience may well contribute. Sociologist Guang Wei speculated that Sichuan residents “decided to live each day to the fullest . . . if they do not get along with their spouses, they decide to part ways” (Zhiling, 2010, paras. 9–10). We examine a different feature of earthquakes that may affect relationships: physical instability.
Don’t get them wrong, though. They very graciously admit to not having the whole story:
Certainly, the shaking ground was not solely responsible for the change in the divorce rate in Sichuan.
That’s a relief! And good that they can feel “certain” about something. Certainty . . . that’s what science is all about, no?
Getting to my correspondent’s criticisms, yes, lots of forking paths in preparing the dataset:
Data for 54 participants were collected. Because our main measure was perceived stability of a person’s relationship with a particular partner, only data from participants in exclusive romantic relationships were included in the analyses (36 exclusively dating, 8 cohabiting, 2 engaged, and 1 married) . . . Data from 3 participants—1 who stood instead of sitting and 2 who communicated with friends while completing the questionnaire—were omitted from analyses. . . . Participants responded to six items (α = .96) regarding the stability of their current romantic relationship. These included the four items from Study 1 as well as similar items querying confidence in still being together in 10 and 20 years. . . . Data from 4 participants—1 who reported not having followed the posture instructions and 3 who correctly guessed the study hypothesis—were omitted from analyses. . . .
And forking paths in the analysis:
We averaged participants’ ratings of their beliefs that they would remain with their partners over each of the four time periods in the items on relationship stability. . . . Mediation analysis using PROCESS Model 4 revealed a significant indirect effect of condition on reports of relationship quality via perceived relationship stability . . .
Lots and lots of mediation analyses. But what happened to the main effect in the replications?
The physical-stability manipulation used in this study did not produce significant condition differences in negative affect. . . . Contrary to prediction, a one-way ANOVA yielded no evidence of a direct effect of condition on perceived relationship stability.
And our old friend, the difference between significant and non-significant:
Participants’ experience of negative affect did not differ between the two conditions, F < 1, which suggests that mood is not a viable alternative explanation for the observed condition differences. . . . A model in which the order of perceived relationship stability and relationship quality was reversed did not yield a significant indirect effect . . .
And good old “marginally significant”:
Although we observed only a marginally significant effect of posture condition on perceived relationship stability, it is widely accepted that indirect (i.e., mediated) effects can be examined even in the absence of any direct link between a predictor and outcome.
Summary
The research team found a newsworthy result which did not appear in the replications. But that didn’t stop them from doing some mediation analyses and finding some statistical significance and some non-significance in various places along their forking paths. They wove this together and wrote it up as if they’d discovered something important.
Let’s check the score. Again, from the abstract:
Participants who experienced physical instability by sitting at a wobbly workstation rather than a stable workstation (Study 1), standing on one foot rather than two (Study 2), or sitting on an inflatable seat cushion rather than a rigid one (Study 3) perceived their romantic relationships to be less likely to last.
Is this true? For study 1, yes, after all their choices in data construction and data analysis, they achieved “p less than .05” (p=.034, to be precise). For study 2, despite their flexibility in excluding people and defining the outcome, they were only able to get p down to .069. For study 3, nothing at all, “F less than 1,” as they put it. And they really did have lots of things to win—they bought lots of tickets for the “p less than .05” lottery. For example:
For our behavioral measure, participants were asked to select and send a “thinking of you” electronic greeting card (e-card) to their romantic partners. Each participant chose an e-card design from six choices that had been prerated and selected to vary in intimacy (for details, see Supplemental Material). The intimacy of the card design selected was one outcome of interest. However, we observed no direct or indirect effects of stability condition on card design intimacy, so we do not discuss it further.
The authors get full credit for reporting this—but no credit for realizing what this sort of thing does to their analysis! They consistently report their successes in detail and downplay the null findings. That’s called capitalizing on chance.
Published in Psychological Science: if we reward this sort of research behavior, I see no reason we won’t get lots more of it. I have no reason to think the authors and journal editor are trying to mislead anyone; rather, I’m guessing they’re true believers. They did their own replication and it failed. But they did not do the next step and place their theory and methods under criticism. Too bad.
Update, 9 years later
The above post appeared in Dec 2015. I looked up the article in question on Google Scholar, and it’s only been cited 23 times. OK, that’s about 23 more than it should be, but at least we’re not approaching that notorious elderly-walking paper, which has now been cited a stunning 6683 times in the nearly thirty years since its publication, including completely uncritically in an article in Lancet (of course), published many years after the non-replications and general debunking of that paper.
So, yeah, 23 citations . . . that’s not too much damage.
Still, I feel that academic social psychology has not fully come to terms with the terrible papers they were publishing in their flagship journal during those bad years of the mid-2010s. At worst, leaders in the field continue actively promoting this stuff; other times, it seems that they just act as if it never happened.
And, as always, let me emphasize that I’m not saying that the authors of these papers are bad people, or even that they were less than honest in their work. Unfortunately, honesty and transparency are not enuf. Science is about truth, not good intentions, and real scientific discovery comes from some combination of theory and measurement, not the ability to follow the protocols to get a paper published in a leading journal.