“A Vast Graveyard of Undead Theories: Publication Bias and Psychological Science’s Aversion to the Null”
Statistical Modeling, Causal Inference, and Social Science 2013-04-26
Erin Jonaitis points us to this article by Christopher Ferguson and Moritz Heene, who write:
Publication bias remains a controversial issue in psychological science. . . . that the field often constructs arguments to block the publication and interpretation of null results and that null results may be further extinguished through questionable researcher practices. Given that science is dependent on the process of falsification, we argue that these problems reduce psychological science’s capability to have a proper mechanism for theory falsification, thus resulting in the promulgation of numerous “undead” theories that are ideologically popular but have little basis in fact.
They mention the infamous Daryl Bem article. It is pretty much only because Bem’s claims are (presumably) false that they got published in a major research journal. Had the claims been true—that is, had Bem run identical experiments, analyzed his data more carefully and objectively, and reported that the results were consistent with the null hypothesis—then the result would be entirely unpublishable. After all, you can’t publish an article in a top journal demonstrating that a study is consistent with there being no ESP. Everybody knows that ESP, to the extent it exists, has such small effects as to be essentially undetectable in any direct study. So here you have the extreme case of a field in which errors are the only thing that gets published.
It’s science as Slate magazine is reputed to be: if it’s true, it’s obvious so no need to publish. If it’s counterintuitive, go for it. (Just to be clear, I’m not saying the actual Slate magazine is like that; this is just its reputation.)
This is indeed disturbing and I applaud yet another publication on the topic. The authors go beyond previous research by Gregory Francis and Uri Simonsohn by focusing specifically on difficulties with meta-analyses that unsuccessfully try to overcome problems of publication bias.
There’s something called the fail-safe number (FSN) of Rosenthal (1979) and Rosenthal and Rubin (1978), “an early and still widely used attempt to estimate the number of unpublished studies, averaging null results, that are required to bring the meta-analytic mean Z value of effect sizes down to an insignificant level,” but,
The FSN treats the file drawer of unpublished studies as unbiased by assuming that their average Z value is zero. This wrong assumption appears mostly not to be recognized by researchers who use the FSN to demonstrate the stability of their results. . . . Without making this computational error, the FSN turns out to be a gross overestimate of the number of unpublished studies required to bring the mean Z value of published studies to an insignificant level. The FSN thus gives the meta-analytic researcher a false sense of security.
The false sense of security persists:
Although this fundamental flaw had been spotted early, the number of applications of the FSN has grown exponentially since its publication. Ironically, getting critiques of the FSN published was far from an easy task . . .
Problems with meta-analysis
Ferguson and Heene continue:
Meta-analyses should be more objective arbiters of review for a field than are narrative reviews, but we argue that this is not the case in practice. . . . The selection and interpretation of effect sizes from individual studies requires decisions that may be susceptible to researcher biases.
It is thus not surprising that we have seldom seen a meta-analysis resolve a controversial debate in a field. Typically, the antagonists simply decry the meta-analysis as fundamentally flawed or produce a competing meta-analysis of their own . . . meta-analyses may be used in such debates to essentially confound the process of replication and falsification.
Thus:
The average effect size may be largely meaningless and spurious due to the avoidance of null findings in the published literature. This aversion to the null is arguably one of the most pernicious and unscientific aspects of modern social science.
Let me interject here that, although I am in general agreement with Ferguson and Heene on these issues, I have a bit of “aversion to the null” myself. I think it’s important to separate the statistical from the scientific null hypothesis.
- The statistical null hypothesis is typically that a particular comparison is exactly zero in the population.
- The scientific null hypothesis is typically that a certain effect is nonexistent or, more generally, that the effect depends so much on situation as to be unreplicable in general.
I might well believe in the scientific null but not in the statistical null.
Virtually unkillable
Ferguson and Heene continue:
The aversion to the null and the persistence of publication bias and denial of the same, renders a situation in which psychological theories are virtually unkillable. Instead of rigid adherence to an objective process of replication and falsification, debates within psychology too easily degenerate into ideological snowball fights, the end result of which is to allow poor quality theories to survive indefinitely. Proponents of a theory may, in effect, reverse the burden of proof, insisting that their theory is true unless skeptics can prove it false (a fruitless invitation, as any falsifying data would certainly be rejected as flawed were it even able to pass through the null-aversive peer review process described above).
Indeed. We see this reversal of the burden of proof all the time. For example, after a data alignment error was uncovered in their research, Neil Anderson and Deniz Ones notoriously wrote: “When any call is made for the retraction of two peer-reviewed and published articles, the onus of proof is on the claimant and the duty of scientific care and caution is manifestly high. . . . Goldberg et al. do not and cannot provide irrefutable proof of the alleged clerical errors.. . . . We continue to stand by the findings and conclusions reported in our previous publications” Ugh! This bothered me so much when I saw it, it made me want to barf. At the time, I wrote that it’s unscientific behavior not to admit error. Unfortunately, for reasons discussed by Ferguson and Heene, much of the scientific enterprise seems to be set up to avoid admission of error. These are serious issues, and it’s interesting to me that we as a field haven’t been thinking much about them until recently.