The “80% power” lie

Stats Chat 2017-12-04

OK, so this is nothing new. Greg Francis said it, and Uri Simonsohn said it, Ulrich Schimmack said it, lots of people have said it. But it’s worth saying again.

To get NIH funding, you need to demonstrate (that is, convincingly claim) that your study has 80% power.

I hate the term “power” as it’s all tied into the idea of the goal of a study being statistical significance. But let’s set that aside for now, and just do the math, which is that with a normal distribution, if you want an 80% probability of your 95% interval excluding zero, then the true effect size has to be at least 2.8 standard errors from zero.

All right, then. Suppose we really were running studies with 80% power. In that case, the expected z-score is 2.8, and 95% of the time we’d see z-scores between 0.8 and 4.8.

Let’s open up the R:

> 2*pnorm(-0.8)
[1] 0.42
> 2*pnorm(-4.8)
[1] 1.6e-06

So we should expect to routinely see p-values ranging from 0.42 to . . . ummmm, 0.0000016. And those would be clean, pre-registered p-values, no funny business, no researcher degrees of freedom, no forking paths.

Let’s explore further . . . the 75th percentile of the normal distribution is 0.67, so if we’re really running studies with 80% power, then one-quarter of the time we’d see p-values above 2.8 + 0.67 = 3.47.

> 2*pnorm(-3.47)
[1] 0.00052

Dayum. We’d expect to see clean, un-hacked p-values less than 0.0005, at least a quarter of the time, if we were running studies with minimum 80% power, as we routinely claim we’re doing, if we ever want any of that sweet, sweet NIH funding.

And, yes, that’s 0.0005, not 0.005. There’s a bunch of zeroes there.

And, no, this ain’t happening. We don’t have 80% power. Heck, we’re lucky if we have 6% power.

Remember that wonderful passage from the Nosek, Spies, and Motyl “50 shades of gray” paper:

We conducted a direct replication while we prepared the manuscript. We ran 1,300 participants, giving us .995 power to detect an effect of the original effect size at alpha = .05.

Followed by:

The effect vanished (p = .59).

None of this should be a surprise

When I say, “None of this should be a surprise,” I don’t just mean that, in response to the replication crisis and the work of Ioannidis, Button et al., etc., we should realize that statistically-based science is not what it’s claimed to be. And I don’t just mean that, given the real world of type M errors and the statistical significance filter, that we should expect claims of statistical power (which are based on optimistic interpretations of a biased literatures) will be wildly inflated.

What I mean is that, even knowing nothing about any replication crisis, without any real-world experience or cynicism or sociology or documentation or whatever you want to call it . . . it just comes down to the math. With 80% power, we’d expect to see tons and tons of p-values like 0.0005, 0.0001, 0.00005, etc. This would just be happening all the time. But it doesn’t.

I should’ve realized this the first time I was asked to demonstrate 80% power for a grant proposal. And certainly I should’ve realized this when writing the section on sample size and power analysis in my book with Jennifer, over ten years ago, well before I’d thought about all the problems in statistical practice of which we are now so painfully aware. All the math in that section is correct—but the implications of the math reveal the absurdity of the assumptions.

The post The “80% power” lie appeared first on Statistical Modeling, Causal Inference, and Social Science.