Double Misunderstandings About p-values

Normal Deviate 2013-03-18

It’s been said a million times and in a million places that a p-value is not the probability of {H_0} given the data.

But there is a different type of confusion about p-values. This issue arose in a discussion on Andrew’s blog.

Andrew criticizes the New York times for giving a poor description of the meaning of p-values. Of course, I agree with him that being precise about these things is important. But, in reading the comments on Andrew’s blog, it occurred to me that there is often a double misunderstanding.

First, let me way that I am neither defending nor criticizing p-values in this post. I am just going to point out that there are really two misunderstandings floating around.

Two Misunderstandings

(1) The p-value is not the probability of {H_0} given the data.

(2) But neither is the p-value the probability of something conditional on {H_0}.

Deborah Mayo pointed this fact out in the discussion on Andrew’s blog (as did a few other people).

When we use p-values we are in frequentist-land. {H_0} (the null hypothesis) is not a random variable. It makes no sense to talk about the posterior probability of {H_0}. But it also makes no sense to talk about conditioning on {H_0}. You can only condition on things that were random in the first place.

Let me get more specific. Let {Z} be a test statistic and let {z} be the realized valued of {Z}. The p-value (in a two-sided test) is

\displaystyle  p = P_0(|Z| > |z|)

where {P_0} is the null distribution. It is not equal to {P\bigl(|Z|> |z| \, \bigm| \,H_0\bigr)}. This makes no sense. {H_0} is not a random variable. In case the null consists of a set {{\cal P}_0} of distributions, the p-value is

\displaystyle  p = \sup_{P\in {\cal P}_0}P(|Z| > |z|).

You could accuse me of being pedantic here or of being obsessed with notation. But given the amount of confusion about p-values, I think it is important to get it right.

More Misunderstandings

The same problem occurs when people write {p(x|\theta)}. When I teach Bayes, I do write the model as {p(x|\theta)}. When I teach frequentist statistics, I write this either as {p(x;\theta)} or {p_\theta(x)}. There is no conditioning going on. To condition on {\theta} would require a joint distribution for {(x,\theta)}. There is no such joint distribution in frequentist-land.

The coverage of a confidence interval {C(X_1,\ldots, X_n)} is not the probability that {C(X_1,\ldots, X_n)} traps {\theta} conditional on {\theta}. The frequentist coverage is

\displaystyle  {\rm Coverage} = \inf_{\theta} P_\theta\Bigl(\theta\in C(X_1,\ldots, X_n)\Bigr).

Again, there is no conditioning going on.

Conclusion

I understand that people often say “conditional on {\theta}” to mean “treating {\theta} as fixed.” But if we want to eradicate misunderstandings about statistics, I think it would help if we were more careful about how we choose our words.