On the Uncertainty of the Bayesian Estimator
Three-Toed Sloth 2016-02-23
Summary:
Attention conservation notice: A failed attempt at a dialog, combining the philosophical sophistication and easy approachability of statistical theory with the mathematical precision and practical application of epistemology, dragged out for 2500+ words (and equations). You have better things to do than read me vent about manuscripts I volunteered to referee.
Scene: We ascend, by a dubious staircase, to the garret loft space of Confectioner-Stevedore Hall, at Robberbaron-Bloodmoney University, where we find two fragments of the author's consciousness, temporarily incarnated as perpetual post-docs from the Department of Statistical Data Science, sharing an unheated office.
Q: Are you unhappy with the manuscript you're reviewing?
A: Yes, but I don't see why you care.
Q: The stabbing motions of your pen are both ostentatious and distracting. If I listen to you rant about it, will you go back to working without the semaphore?
A: I think that just means you find it easier to ignore my words than anything else, but I'm willing to try.
Q: So, what is getting you worked up about the manuscript?
A: They take a perfectly reasonable โ though not obviously appropriate-to-the-problem โ regularized estimator, and then go through immense effort to Bayesify it. They end up with about seven levels of hierarchical priors. Simple Metropolis-Hastings Monte Carlo would move as slowly as a continental plate, so they put vast efforts into speeding it up, and in a real technical triumph they get something which moves like a glacier.
Q: Isn't that rather fast these days?
A: If they try to scale up, my back-of-the-envelope calculation suggests they really will enter the regime where each data set will take a single Ph.D. thesis to analyze.
Q: So do you think that they're just masochists who're into frequentist pursuit, or do they have some reason for doing all these things that annoy you?
A: Their fondness for tables over figures does give me pause, but no, they claim to have a point. If they do all this work, they say, they can use their posterior distributions to quantify uncertainty in their estimates.
Q: That sounds like something statisticians should want to do. Haven't you been very pious about just that, about how handling uncertainty is what really sets statistics apart from other traditions of data analysis? Haven't I heard you say to students that they don't know anything until they know where the error bars go?
A: I suppose I have, though I don't recall that exact phrase. It's not the goal I object to, it's the way quantification of uncertainty is supposed to follow automatically from using Bayesian updating.
Q: You have to admit, the whole "posterior probability distribution over parameter values" thing certainly looks like a way of expressing uncertainty in quantitative form. In fact, last time we went around about this, didn't you admit that Bayesian agents are uncertain about parameters, though not about the probabilities of observable events?
A: I did, and they are, though that's very different from agreeing that they quantify uncertainty in any useful way โ that they handle uncertainty well.
Q: Fine, I'll play the straight man and offer a concrete proposal for you to poke holes in. Shall we keep it simple and just consider parametric inference?
A: By all means.
Q: Alright, then, I start with some prior probability distribution over a finite-dimensional vector-valued parameter \( \theta \), say with density \( \pi(\theta) \). I observe \( x \) and have a model which gives me the likelihood \( L(\theta) = p(x;\theta) \), and then my posterior distribution is fixed by \[ \pi(\theta|X=x) \propto L(\theta) \pi(\theta) \] This is my measure-valued estimate. If I want a set-valued estimate of \( \theta \), I can fix a level \( \alpha \) and chose a region \( C_{\alpha} \) with \[ \int_{C_{\alpha}}{\pi(\theta|X=x) d\theta} = \alpha \] Perhaps I even preferentially grow \( C_{\alpha} \) around the posterior mode, or something like that, so it looks pretty. How is \( C_{\alpha} \) not a reasonable way of quantifying my uncertainty about \( \theta \)?
A: To begin with, I don't know the probability that the true \( \theta \in C_{\alpha} \).
Q: How is it not \( \alpha \), like it says right there on the label?
A: Again, I don't understand what that means.
Q: Are you attacking subjective probability? Is that where this is going? OK: sometimes, when a Bayesian agent and a bookmaker love each other very much, the bookie will offer the Bayesian bets on whether \( \theta \in C_{\alp