Noise-mining as standard practice in social science

Statistical Modeling, Causal Inference, and Social Science 2020-04-03

The following example is interesting, not because it is particularly noteworthy but rather because it represents business as usual in much of social science: researchers trying their best, but hopelessly foiled by their use of crude psychological theories and cruder statistics, along with patterns of publication and publicity that motivate the selection and interpretation of patterns in noise.

Elio Campitelli writes:

The silliest study this week?

I realise that it’s a hard competition, but this has to be the silliest study I’ve read this week. Each group of participants read the same exact text with only one word changed and the researchers are “startled” to see that such a minuscule change did not alter the readers’ understanding of the story. From the Guardian article (the paper is yet to be published as I’m sending you this email):

Two years ago, Washington and Lee University professors Chris Gavaler and Dan Johnson published a paper in which they revealed that when readers were given a sci-fi story peopled by aliens and androids and set on a space ship, as opposed to a similar one set in reality, “the science fiction setting triggered poorer overall reading” and appeared to “predispose readers to a less effortful and comprehending mode of reading – or what we might term non-literary reading”.

But after critics suggested that merely changing elements of a mainstream story into sci-fi tropes did not make for a quality story, Gavaler and Johnson decided to revisit the research. This time, 204 participants were given one of two stories to read: both were called “Ada” and were identical apart from one word, to provide the strictest possible control. The “literary” version begins: “My daughter is standing behind the bar, polishing a wine glass against a white cloth.” The science-fiction variant begins: “My robot is standing behind the bar, polishing a wine glass against a white cloth.”

In what Gavaler and Johnson call “a significant departure” from their previous study, readers of both texts scored the same in comprehension, “both accumulatively and when divided into the comprehension subcategories of mind, world, and plot”.

The presence of the word “robot” did not reduce merit evaluation, effort reporting, or objective comprehension scores, they write; in their previous study, these had been reduced by the sci-fi setting. “This difference between studies is presumably a result of differences between our two science-fiction texts,” they say.

Gavaler said he was “pretty startled” by the result.

I mean, I wouldn’t dismiss out of hand the possibility of a one-word change having dramatic consequences (change “republican” to “democrat” in a paragraph describing a proposed policy, for example). But in this case it seems to me that the authors surfed the noise generated by the previous study into expecting a big change by just changing “sister” to “robot” and nothing else.

I agree. Two things seem to be going on:

1. The researchers seem to have completely internalized the biases arising from the statistical significance filter that lead to estimates being too high (as discussed in section 2.1 of this article), thus they came into this new experiment expecting to see a huge and statistically significant effect (recall the 80% power lie).

2. Then they do the experiment and are gobsmacked to find nothing (like the 50 shades of gray story, but without the self-awareness).

The funny thing is that items 1 and 2 kinda cancel, and the researchers still end up with positive press!

P.S. I looked up Chris Gavalar and he has a lot of interesting thoughts. Check out his blog! I feel bad that he got trapped in the vortex of bad statistics, and I don’t want this discussion of statistical fallacies to reflect negatively on his qualitative work.