The Perils of Hypothesis Testing … Again
Normal Deviate 2013-04-28
A few months ago I posted about John Ioannidis’ article called “Why Most Published Research Findings Are False.”
Ioannidis is once again making news by publishing a similar article aimed at neuroscientists. This paper is called “Power failure: why small sample size undermines the reliability of neuroscience.” The paper is written by Button, Ioannidis, Mokrysz, Nosek, Flint, Robinson and Munafo.
When I discussed the first article, I said that his points were correct but hardly surprising. I thought it was fairly obvious that where is the event that a result is declared significant and is the event that the null hypothesis is true. But the fact that the paper had such a big impact made me realize that perhaps I was too optimistic. Apparently, this fact does need to be pointed out.
The new paper has basically the same message although the emphasis is on the dangers of low power. Let us assume that for a fraction of studies , the null is actually false. That is . Let be the power. Then the probability of a false discovery, assuming we reject when the p-value is less than , is
Let us suppose, for the sake of illustration that (most nulls are true). Then the probability of a false discovery (using = 0.05) looks like this as a function of power:
So indeed, if the power is low, the chance of a false discovery is high. (And things are worse if we include the effects of bias.)
The authors go on to estimate the typical neuroscience studies. They conclude that the typical power is between .08 and .31. I applaud them for trying to come up with some estimate of the typical power but I doubt that the estimate is very reliable.
The paper concludes with a number of sensible recommendations such as: performing power calculations before doing a study, disclosing methods transparently and so on. I wish they had included one more recommendation: focus less on testing and more on estimation.
So, like the first paper, I am left with the feeling that this message, too, is correct, but not surprising. But I guess that these points are not so obvious to many users of statistics. In that case, papers like these serve an important function.