A Bayesian probability worksheet
What's new 2022-10-08
This is a spinoff from the previous post. In that post, we remarked that whenever one receives a new piece of information , the prior odds ratio
between an alternative hypothesis
and a null hypothesis
is updated to a posterior odds ratio
, which can be computed via Bayes’ theorem by the formula
A PDF version of the worksheet and instructions can be found here. One can fill in this worksheet in the following order:
- In Box 1, one enters in the precise statement of the null hypothesis
.
- In Box 2, one enters in the precise statement of the alternative hypothesis
. (This step is very important! As discussed in the previous post, Bayesian calculations can become extremely inaccurate if the alternative hypothesis is vague.)
- In Box 3, one enters in the prior probability
(or the best estimate thereof) of the null hypothesis
.
- In Box 4, one enters in the prior probability
(or the best estimate thereof) of the alternative hypothesis
. If only two hypotheses are being considered, we of course have
.
- In Box 5, one enters in the ratio
between Box 4 and Box 3.
- In Box 6, one enters in the precise new information
that one has acquired since the prior state. (As discussed in the previous post, it is important that all relevant information
– both supporting and invalidating the alternative hypothesis – are reported accurately. If one cannot be certain that key information has not been withheld to you, then Bayesian calculations become highly unreliable.)
- In Box 7, one enters in the likelihood
(or the best estimate thereof) of the new information
under the null hypothesis
.
- In Box 8, one enters in the likelihood
(or the best estimate thereof) of the new information
under the null hypothesis
. (This can be difficult to compute, particularly if
is not specified precisely.)
- In Box 9, one enters in the ratio
betwen Box 8 and Box 7.
- In Box 10, one enters in the product of Box 5 and Box 9.
- (Assuming there are no other hypotheses than
and
) In Box 11, enter in
divided by
plus Box 10.
- (Assuming there are no other hypotheses than
and
) In Box 12, enter in Box 10 divided by
plus Box 10. (Alternatively, one can enter in
minus Box 11.)
To illustrate this procedure, let us consider a standard Bayesian update problem. Suppose that a given point in time, of the population is infected with COVID-19. In response to this, a company mandates COVID-19 testing of its workforce, using a cheap COVID-19 test. This test has a
chance of a false negative (testing negative when one has COVID) and a
chance of a false positive (testing positive when one does not have COVID). An employee
takes the mandatory test, which turns out to be positive. What is the probability that
actually has COVID?
We can fill out the entries in the worksheet one at a time:
- Box 1: The null hypothesis
is that
does not have COVID.
- Box 2: The alternative hypothesis
is that
does have COVID.
- Box 3: In the absence of any better information, the prior probability
of the null hypothesis is
, or
.
- Box 4: Similarly, the prior probability
of the alternative hypothesis is
, or
.
- Box 5: The prior odds
are
.
- Box 6: The new information
is that
has tested positive for COVID.
- Box 7: The likelihood
of
under the null hypothesis is
, or
(the false positive rate).
- Box 8: The likelihood
of
under the alternative is
, or
(one minus the false negative rate).
- Box 9: The likelihood ratio
is
.
- Box 10: The product of Box 5 and Box 9 is approximately
.
- Box 11: The posterior probability
is approximately
.
- Box 12: The posterior probability
is approximately
.
The filled worksheet looks like this:
Perhaps surprisingly, despite the positive COVID test, the employee only has a
chance of actually having COVID! This is due to the relatively large false positive rate of this cheap test, and is an illustration of the base rate fallacy in statistics.
We remark that if we switch the roles of the null hypothesis and alternative hypothesis, then some of the odds in the worksheet change, but the ultimate conclusions remain unchanged:
So the question of which hypothesis to designate as the null hypothesis and which one to designate as the alternative hypothesis is largely a matter of convention.
Now let us take a superficially similar situation in which a mother observers her daughter exhibiting COVID-like symptoms, to the point where she estimates the probability of her daughter having COVID at . She then administers the same cheap COVID-19 test as before, which returns positive. What is the posterior probability of her daughter having COVID?
One can fill out the worksheet much as before, but now with the prior probability of the alternative hypothesis raised from to
(and the prior probablity of the null hypothesis dropping from
to
). One now gets that the probability that the daughter has COVID has increased all the way to
:
Thus we see that prior probabilities can make a significant impact on the posterior probabilities.
Now we use the worksheet to analyze an infamous probability puzzle, the Monty Hall problem. Let us use the formulation given in that Wikipedia page:
Problem 1 Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?
For this problem, the precise formulation of the null hypothesis and the alternative hypothesis become rather important. Suppose we take the following two hypotheses:
- Null hypothesis
: The car is behind door number 1, and no matter what door you pick, the host will randomly reveal another door that contains a goat.
- Alternative hypothesis
: The car is behind door number 2 or 3, and no matter what door you pick, the host will randomly reveal another door that contains a goat.
However, consider the following different set of hypotheses:
- Null hypothesis
: The car is behind door number 1, and if you pick the door with the car, the host will reveal another door to entice you to switch. Otherwise, the host will not reveal a door.
- Alternative hypothesis
: The car is behind door number 2 or 3, and if you pick the door with the car, the host will reveal another door to entice you to switch. Otherwise, the host will not reveal a door.
Here we still have and
, but while
remains equal to
,
has dropped to zero (since if the car is not behind door 1, the host will not reveal a door). So now
has increased all the way to
, and it is not advantageous to switch! This dramatically illustrates the importance of specifying the hypotheses precisely. The worksheet is now filled out as follows:
Finally, we consider another famous probability puzzle, the Sleeping Beauty problem. Again we quote the problem as formulated on the Wikipedia page:
Problem 2 Sleeping Beauty volunteers to undergo the following experiment and is told all of the following details: On Sunday she will be put to sleep. Once or twice, during the experiment, Sleeping Beauty will be awakened, interviewed, and put back to sleep with an amnesia-inducing drug that makes her forget that awakening. A fair coin will be tossed to determine which experimental procedure to undertake:Any time Sleeping Beauty is awakened and interviewed she will not be able to tell which day it is or whether she has been awakened before. During the interview Sleeping Beauty is asked: “What is your credence now for the proposition that the coin landed heads?”‘
- If the coin comes up heads, Sleeping Beauty will be awakened and interviewed on Monday only.
- If the coin comes up tails, she will be awakened and interviewed on Monday and Tuesday.
- In either case, she will be awakened on Wednesday without interview and the experiment ends.
Here the situation can be confusing because there are key portions of this experiment in which the observer is unconscious, but nevertheless Bayesian probability continues to operate regardless of whether the observer is conscious. To make this issue more precise, let us assume that the awakenings mentioned in the problem always occur at 8am, so in particular at 7am, Sleeping beauty will always be unconscious.
Here, the null and alternative hypotheses are easy to state precisely:
- Null hypothesis
: The coin landed tails.
- Alternative hypothesis
: The coin landed heads.
The subtle thing here is to work out what the correct prior state is (in most other applications of Bayesian probability, this state is obvious from the problem). It turns out that the most reasonable choice of prior state is “unconscious at 7am, on either Monday or Tuesday, with an equal chance of each”. (Note that whatever the outcome of the coin flip is, Sleeping Beauty will be unconscious at 7am Monday and unconscious again at 7am Tuesday, so it makes sense to give each of these two states an equal probability.) The new information is then
- New information
: One hour after the prior state, Sleeping Beauty is awakened.
With this formulation, we see that ,
, and
, so on working through the worksheet one eventually arrives at
, so that Sleeping Beauty should only assign a probability of
to the event that the coin landed as heads.
There are arguments advanced in the literature to adopt the position that should instead be equal to
, but I do not see a way to interpret them in this Bayesian framework without a substantial alteration to either the notion of the prior state, or by not presenting the new information
properly.
If one has multiple pieces of information that one wishes to use to update one’s priors, one can do so by filling out one copy of the worksheet for each new piece of information, or by using a multi-row version of the worksheet using such identities as