Probabilistic numerics and the folk theorem of statistical computing

Statistical Modeling, Causal Inference, and Social Science 2024-11-04

U.S. election day is tomorrow. So let’s talk about something else:

1. Encoding prior information using non-generative modeling

I was talking with Hong Ge about the uses of non-generative models in probabilistic programming. An example I gave is the use of prior information on derived quantities. For example, suppose you have a logistic regression with a weak prior on the coefficients:

data {
  int N;
  array[N] int y;
  vector[N] x;
}
parameters {
  real a, b;
}
model {
  a ~ normal(0, 5);
  b ~ normal(0, 5);
  y ~ bernoulli_logit(a + b*x);
}

and you feed it some data x_i, y_i, i=1,…,N. And then you want to add additional information on predicted values. Perhaps, for example, you think the predicted probability for observations 1, 2, and 3 is likely to be close to 50%. Then you could add something like this to the model block:

  vector[N] prob;
  prob = inv_logit(a + b*x);
  prob[1:3] ~ normal(0.5, 0.2);

I’m purposely making this code kinda clunky and purposely using a model that’s not completely well specified (p is constrained to between 0 and 1, but the specified normal prior has no such constraint) just to show how direct this process can be. Also I haven’t actually programmed the model and checked it, so my code could have a bug! Anyway, the point is that we can just throw in this prior information wherever we want, not just on the official “parameters” of the model.

As noted above, the resulting Stan program does not correspond to a generative model! The code produces a target function that is a constant plus a log posterior density, log p(a,b|y), implicitly conditional on the unmodeled data x and N, without separately defining a prior distribution p(a,b) or a data distribution p(y|a,b). There’s no direct way to sample from the prior. The way I like to think of it is that the prior information on those predicted probabilities represent additional data.

2. Replacing hard constraints by soft constraints

This sort of prior on a derived quantity can be useful in many statistical settings. For example, when poststratifying a survey or causal estimate, it’s often the case that we have some partial information on the full poststratification table and also exact information (for example, from a census) on certain marginals. So we might have the marginal totals for age x ethnicity, county, and ethnicity x sex, but not the full joint table. In that case, fitting can be difficult: it can be awkward to parameterize the models so that the margins are known, also computation can be slow: all these hard constraints can lead to difficult geometry.

We can often fix this problem by replacing the hard constraints by soft constraints. As the saying goes, uncertainty greases the wheels of commerce. This is not (yet) automatic—there’s a tradeoff in that if you make the constraints too soft, you’re throwing away some information, but if you make them too narrow, your computational problems can return—and I think further research is needed on developing automatic or semi-automatic methods for implementing soft constraints in such problems.

3. The folk theorem

Regular readers will know the Folk Theorem of Statistical Computing (for more on the topic, see this post by Bob). The above is an example of that folk theorem, in that, in the real world, purportedly hard constraints actually are soft! For example, census numbers are typically only estimates, not exact. Indeed, the first place this idea came up in my own work was with Frédéric Bois and Jiming Jiang when fitting our differential equation model in toxicology: our algorithm had poor mixing, and we realized that the problem was that certain biological measurements we’d taken as known, were only measured; we added the measurement error terms and our model fit better.

4. Probabilistic numerics

Hong pointed out a connection of the above ideas to probabilistic numerics, a field of numerical analysis that I’d never heard of. Here’s a reference that Hong recommends.