Ptolemaic inference

Statistical Modeling, Causal Inference, and Social Science 2016-10-26

epicycles

OK, we’ve been seeing this a lot recently. A psychology study gets published, with a key idea that at first seems wacky but, upon closer reflection, could very well be true!

Examples:

– That “dentist named Dennis” paper suggesting that people pick where they live and what job to take based on their names.

– Power pose: at first it sounds ridiculous that you could boost your hormones and have success just by holding your body differently. But, sure, think about it some more and it could be possible.

– Ovulation and voting: do your political preferences change this much based on the time of the month? OK, this one seems ridiculous even upon reflection, but that’s just because I’ve seen a lot of polling data. To an outsider, sure, it seems possible, everybody knows voters are irrational.

– Embodied cognition: as Daniel Kahneman memorably put it, “When I describe priming studies to audiences, the reaction is often disbelief . . . The idea you should focus on, however, is that disbelief is not an option.”

– And lots more: himmicanes, air rage, beauty and sex ratio, football games and elections, subliminal smiley faces and attitudes toward immigration, etc. Each of these seems at first to be a bit of a stretch, but, upon reflection, could be real. Maybe people really do react differently to hurricanes with boy or girl names! And so on.

These examples all have the following features:

1. The claimed phenomenon is some sort of bank shot, an indirect effect without a clear mechanism.

2. Still, the effect seems like it could be true; the indirect mechanism seems vaguely plausible.

3. The exact opposite effect is also plausible. One could easily imagine people avoiding careers that sound like their names, or voting in the opposite way during that time of the month, or responding to elderly-themed words by running faster, or reacting with more alacrity to female-named hurricanes, and so on.

Item 3 is not always mentioned but it’s a natural consequence of items 1 and 2. The very vagueness of the mechanisms which allow plausibility, also allow plausibility for virtually any interaction and effects of virtually any sign. Which is why sociologist Jeremy Freese so memorably described these theories as “vampirical rather than empirical—unable to be killed by mere evidence.”

Think about it: If A is plausible, and not-A is plausible, and if the garden of forking paths and researcher degrees of freedom allow you to get statistical significance from just about any dataset, you can’t lose.

Enter Ptolemy

But we’ve discussed all that before, many times. What I want to talk about today is how many of these stories proceed. It goes like this:

– Original paper gets published and publicized. A stunning counterintuitive finding.

– A literature develops. Conceptual replications galore. Each new study finds a new interaction. The flexible nature of scientific discovery, along with the requirement by journals of (a) originality and (b) p less than .05, in practice requires that each new study is a bit different from everything that came before. From the standpoint of scientific replication, this is a minus, but from the standpoint of producing a scientific literature, it’s a plus. A paper that is nothing but noise mining can get thousands of citations:

Screen Shot 2016-06-09 at 11.06.41 PM

– The literature is criticized on methodological grounds, and some attempted replications fail.

Now here’s where Ptolemy comes in. There is, sometimes, an attempt to square the circle, to resolve the apparent contradiction between the original seemingly successful study, the literature of seemingly successful conceptual replications, and the new, discouraging, failed replications.

I say “apparent contradiction” because this pattern of results is typically consistent with a story in which the true effect is zero, or is so highly variable and situation-dependent as to be undetectable, and in which the original study and the literature of apparent successes are merely testimony to the effectiveness of p-hacking or the garden of forking paths to uncover statistically significant comparisons in the presence of researcher degrees of freedom.

But there is this other story that often gets told, which is that effects are contextually dependent in a particular way, a way which preserves the validity of all the published results while explaining away the failed replications as just being done wrong, as not true replications.

This was what the power pose authors said about the unsuccessful replication performed by Ranehill et al., and this is what Gilbert et al. said about the entire replication project. (And see here for Nosek et al.’s compelling (to me) criticism of Gilbert et al.’s argument.)

I call this reasoning Ptolemaic because it’s an attempt to explain an entire pattern of data with an elaborate system of invisible mechanisms. On days where you’re more fertile you’re more likely to wear red. Unless it’s a cold day, then it doesn’t happen. Or maybe it’s not the most fertile days, maybe it’s the days that precede maximum fertility. Or, when you’re ovulating you’re more likely to vote for Barack Obama. Unless you’re married, then ovulation makes you more likely to support Mitt Romney. Or, in the words of explainer-in-chief John Bargh, “Both articles found the effect but with moderation by a second factor: Hull et al. 2002 showed the effect mainly for individuals high in self consciousness, and Cesario et al. 2006 showed the effect mainly for individuals who like (versus dislike) the elderly.”

It’s all possible but this sort of interpretation of the data is a sort of slalom that weaves back and forth in order to be consistent with every published claim. Which would be fine if the published claims were deterministic truths, but in fact they’re noisy and selected data summaries. It’s classic overfitting.

Look. I’m not some sort of Occam fundamentalist. Maybe these effects are real. But in any case you should take account of all these sources of random and systematic error and recognize that, once you open that door, you have to allow for the possibility that these effects are real, and go in the opposite direction as claimed. You have to allow for the very real possibility that power pose hurts people, that Cornell students have negative ESP, that hurricanes with boys’ names create more damage, and so forth. Own your model.

Remember: the ultimate goal is to describe reality, not to explain away a bunch of published papers in a way that will cause the least offense to their authors and their supporters in the academy and the news media.

The post Ptolemaic inference appeared first on Statistical Modeling, Causal Inference, and Social Science.