Misunderstanding the p-value

Statistical Modeling, Causal Inference, and Social Science 2013-03-15

scream

The New York Times has a feature in its Tuesday science section, Take a Number, to which I occasionally contribute (see here and here).

Today’s column, by Nicholas Balakar, is in error. The column begins:

When medical researchers report their findings, they need to know whether their result is a real effect of what they are testing, or just a random occurrence. To figure this out, they most commonly use the p-value.

This is wrong on two counts. First, whatever researchers might feel, this is something they’ll never know. Second, results are a combination of real effects and chance, it’s not either/or.

Perhaps the above is a forgivable simplification, but I don’t think so; I think it’s a simplification that destroys the reason for writing the article in the first place. But in any case I think there’s no excuse for this, later on:

By convention, a p-value higher than 0.05 usually indicates that the results of the study, however good or bad, were probably due only to chance.

This is the old, old error of confusing p(A|B) with p(B|A). I’m too rushed right now to explain this one, but it’s in just about every introductory statistics textbook ever written. For more on the topic, I recommend my recent paper, P Values and Statistical Practice, which begins:

The casual view of the P value as posterior probability of the truth of the null hypothesis is false and not even close to valid under any reasonable model, yet this misunderstanding persists even in high-stakes settings (as discussed, for example, by Greenland in 2011). The formal view of the P value as a probability conditional on the null is mathematically correct but typically irrelevant to research goals (hence, the popularity of alternative—if wrong—interpretations). . . .

I can’t get too annoyed at science writer Bakalar for garbling the point—it confuses lots and lots of people—but, still, I hate to see this error in the newspaper.

On the plus side, if a newspaper column runs 20 times, I guess it’s ok for it to be wrong once—we still have 95% confidence in it, right?

P.S. Various commenters remark that it’s not so easy to define p-values accurately. I agree, and I think it’s for reasons described in my quote immediately above: the formal view of the p-value is mathematically correct but typically irrelevant to research goals.

P.P.S. Phil nails it:

The p-value does not tell you if the result was due to chance. It tells you whether the results are consistent with being due to chance. That is not the same thing at all.