Prior distributions on derived quantities rather than on parameters themselves
Statistical Modeling, Causal Inference, and Social Science 2013-07-19
Following up on our discussion of the other day, Nick Firoozye writes:
One thing I meant by my initial query (but really didn’t manage to get across) was this: I have no idea what my prior would be on many many models, but just like Utility Theory expects ALL consumers to attach a utility to any and all consumption goods (even those I haven’t seen or heard of), Bayesian Stats (almost) expects the same for priors. (Of course it’s not a religious edict much in the way Utility Theory has, since there is no theory of a “modeler” in the Bayesian paradigm—nonetheless there is still an expectation that we should have priors over all sorts of parameters which mean almost nothing to us).
For most models with sufficient complexity, I also have no idea what my informative priors are actually doing and the only way to know anything is through something I can see and experience, through data, not parameters or state variables.
My question was more on the—let’s use the prior to come up with something that can be manipulated, and then use this to restrict or identify the prior.
For instance, we could use the Prior Predictive Density, or the Prior Conditional Forecast.
Just as an aside on Conditional Forecasts, we used to do this back when I was at DeutscheBank using VARs or Cointegrations (just restricted VARs), simple enough because of Gaussian error terms—put yield curve and economic variables into a decent enough VAR or Cointegration framework, then condition on a long term scenario of inflation going up by 1%, or GDP going down -1%, etc. We can use a simple conditioning rule with huge matrices to find the conditional densities of the other, e.g., yield curve variables. Then we would ask, are they reasonable? do we think conditioning on CPI and GDP in 2y time, yield curves could be shaped like the mean +/- the standard deviation? In the case of looking at prior conditional forecasts, if they do not seem (subjectively) reasonable, the priors need to be changed. And if you can’t get anything that seems reasonable or plausible, probably the model needs to be changed!
Not knowing the literature on conditional forecasts, I used to call these Correlated Brownian Bridges, but it really is just a large dimensional normal random variable where you fix some data elements (the past), you condition on some (the future) or ascribe a density to them, and then find the conditional distributions of all the rest of them. Easy enough and works wonders in these macro-financial models where the LT unconditional econ forecasts are very bad but the LT yield curve forecasts conditioned on econ data are generally quite reasonable. This is also a very reasonable way to generate scenarios, preferably conditioning on as little information as possible.
Back at DB, we for instance could “show” that the most common yield curve movements (bull steepening and bear flattening, where say the 2y rate and 10y rate go down—bullish, and the 2s-10s slope steepens, or the 2y and 10y go up—bearish, and the 2s-10s slope flattens…yields and prices move inversely!), that these motions were largely related to demand shocks in the economy, where growth and inflation move together (typically the only shocks that the Fed really knows how to deal with), but that the atypical motions bear steepening and bull flattening seemed to coincide with supply shocks. Nowadays QE and financial instability would make for more complex set of conditions of course.
BTW, we liked these far more than the standard Impulse-Response approach to VARs that econometricians usually use. No covariance data, unreasonable impulses, arbitrary order. There is usually no means of disentangling effects in using impulse response. At least conditional forecasts give something you might be able to see in reality.
I have no idea whether one can easily put enough constraints on the priors to make them fully determined. If for some reason we do not, we probably still have to make them unique. This is more like having some information for an informative prior but perhaps not enough to make it unique (e.g., I have a Gaussian prior and only know F(mean,variance)=c but no more to determine each uniquely).
My only real foray into ‘objective’ Bayesian methods was to suggest that some objective criteria could be used to decide between many competing means and variances, at least as a starting point. Say MaxEnt subject to the subjective constraints, or like in Reference priors, minimize the cross-entropy between the prior and the posterior subject to my subjective constraints, etc. I’m afraid I don’t know how to “Jeffrey-ize” these subjective priors! I think Jeffrey-izing is an all or nothing method, unlike minimizing cross entropy, maximizing entropy, etc. I suppose we can take our constraints and find the unique identification via some symmetry argument much like Jefrreys’ method, but this is not so obvious.
Irrespective, the goal is not to have priors on parameters exactly since I think this is damn near impossible. I think nobody knows what the correlation between the state variables in time t vs time t+1 should be to make the model all that reasonable (well hopefully they are uncorrelated, but who knows?), and why should state space models all have the same prior? There are so many questions that can easily come up
The goal is to use the “black box” of the prior predictive density and the prior conditional density (the conditional in particular since you can look at model behaviour in a dynamic, scenario based setting) to inform us about how the informative priors should be constrained.
My actual contention here is—people do not have priors on parameters. They have priors on model behaviour. Parameters are hidden and we never ever observe them. But relationships in data, forecasts, conditional forecasts, all these are observable or involve observable quantities. And these we can have opinions about. If this identifies a prior, then great—job done. If it does not, we need further restrictions to help, which is where objective Bayes methods seem appropriate!
Please do let me know your thoughts. Again, I would tend to agree, there is no true objective. In reality there are many competing which all have their own merits (MaxEnt, Min Cross Ent, Jefrreys’ etc). You still must subjectively choose one over another! But using these methods in this subjective prior identification problem seems not completely loony.
I don’t have much to add here. In some settings I think it can make sense to put a prior distribution on parameters, in other sense it can make more sense to encode prior information in terms of predictive quantities. In my paper many years ago with Frederic Bois, we constructed priors on our model parameters that made sense to us on a transformed scale. In Stan, by the way, you can put priors on anything that can be computed: parameters, functions of parameters, predictions, whatever. As we’ve been discussing a lot on this blog recently, strong priors can make sense, especially in settings with sparse data where we want to avoid being jerked around by patterns in the noise.
The post Prior distributions on derived quantities rather than on parameters themselves appeared first on Statistical Modeling, Causal Inference, and Social Science.