Double Misunderstandings About p-values

Normal Deviate 2013-03-18

It’s been said a million times and in a million places that a p-value is not the probability of ${H_0}$ given the data.

But there is a different type of confusion about p-values. This issue arose in a discussion on Andrew’s blog.

Andrew criticizes the New York times for giving a poor description of the meaning of p-values. Of course, I agree with him that being precise about these things is important. But, in reading the comments on Andrew’s blog, it occurred to me that there is often a double misunderstanding.

First, let me way that I am neither defending nor criticizing p-values in this post. I am just going to point out that there are really two misunderstandings floating around.

Two Misunderstandings

(1) The p-value is not the probability of ${H_0}$ given the data.

(2) But neither is the p-value the probability of something conditional on ${H_0}$ .

Deborah Mayo pointed this fact out in the discussion on Andrew’s blog (as did a few other people).

When we use p-values we are in frequentist-land. ${H_0}$ (the null hypothesis) is not a random variable. It makes no sense to talk about the posterior probability of ${H_0}$ . But it also makes no sense to talk about conditioning on ${H_0}$ . You can only condition on things that were random in the first place.

Let me get more specific. Let ${Z}$ be a test statistic and let ${z}$ be the realized valued of ${Z}$ . The p-value (in a two-sided test) is

$\displaystyle p = P_0(|Z| > |z|)$

where ${P_0}$ is the null distribution. It is not equal to ${P\bigl(|Z|> |z| \, \bigm| \,H_0\bigr)}$ . This makes no sense. ${H_0}$ is not a random variable. In case the null consists of a set ${{\cal P}_0}$ of distributions, the p-value is

$\displaystyle p = \sup_{P\in {\cal P}_0}P(|Z| > |z|).$

You could accuse me of being pedantic here or of being obsessed with notation. But given the amount of confusion about p-values, I think it is important to get it right.

More Misunderstandings

The same problem occurs when people write ${p(x|\theta)}$ . When I teach Bayes, I do write the model as ${p(x|\theta)}$ . When I teach frequentist statistics, I write this either as ${p(x;\theta)}$ or ${p_\theta(x)}$ . There is no conditioning going on. To condition on ${\theta}$ would require a joint distribution for ${(x,\theta)}$ . There is no such joint distribution in frequentist-land.

The coverage of a confidence interval ${C(X_1,\ldots, X_n)}$ is not the probability that ${C(X_1,\ldots, X_n)}$ traps ${\theta}$ conditional on ${\theta}$ . The frequentist coverage is

$\displaystyle {\rm Coverage} = \inf_{\theta} P_\theta\Bigl(\theta\in C(X_1,\ldots, X_n)\Bigr).$

Again, there is no conditioning going on.

Conclusion

I understand that people often say “conditional on ${\theta}$ ” to mean “treating ${\theta}$ as fixed.” But if we want to eradicate misunderstandings about statistics, I think it would help if we were more careful about how we choose our words.