More on Bayesian model selection in high-dimensional settings

Statistical Modeling, Causal Inference, and Social Science 2013-04-22

David Rossell writes:

A friend pointed out that you were having an interesting philosophical discussion on my paper with Val Johnson [on Bayesian model selection in high-dimensional settings].

I agree with the view that in almost all practical situations the true model is not in the set under consideration. Still, asking a model choice procedure to be able to pick up the correct model when it is in the set under consideration seems a minimal requirement (though perhaps not sufficient). In other words, if a procedure is unable to pick the data-generating model even when it is one of the models under consideration, I don’t have high hopes for it working well in more realistic scenarios either.

Most results in the history in statistics seem to have been obtained under an assumed model, e.g. why even do MLE or penalized-likelihood if we don’t trust the model. While unrealistic, these results were useful to help understand important basic principles. In our case Val and I are defending the principle of model separation, i.e. specifying priors that guarantee that the models under consideration do not overlap probabilistically with each other. We believe that these priors are more intuitively appealing for testing than the typical Normal-Cauchy prior. For instance, under the null mu=0. Under the alternative usual priors place the mode at mu=0 (or nearby), but mu=0 is not even a possible value under the alternative. If you ask a clinician who wants to run a clinical trial his prior beliefs about mu, they’re certainly not peaked around 0, else he wouldn’t consider doing the trial.

A more pragmatic note on whether posterior model probabilities can be useful at all, inspired by your nice discussion. To me the posterior probability of a model is a proxy for the probability that it is a better approximation to the underlying truth than any other model in the set under consideration. While a purely informal statement, the expected log-BF always favors the model closest (in Kullback-Leibler divergence) to the true model, even when the true model is not in the set under consideration. It would be interesting to pursue this more formally…