Hierarchical array priors for ANOVA decompositions

Statistical Modeling, Causal Inference, and Social Science 2013-04-03

Alexander Volfovsky and Peter Hoff write:

ANOVA decompositions are a standard method for describing and estimating heterogeneity among the means of a response variable across levels of multiple categorical factors. In such a decomposition, the complete set of main effects and interaction terms can be viewed as a collection of vectors, matrices and arrays that share various index sets defined by the factor levels. For many types of categorical factors, it is plausible that an ANOVA decomposition exhibits some consistency across orders of effects, in that the levels of a factor that have similar main-effect coefficients may also have similar coefficients in higher-order interaction terms. In such a case, estimation of the higher-order interactions should be improved by borrowing information from the main effects and lower-order interactions. To take advantage of such patterns, this article introduces a class of hierarchical prior distributions for collections of interaction arrays that can adapt to the presence of such interactions. These prior distributions are based on a type of array-variate normal distribution, for which a covariance matrix for each factor is estimated. This prior is able to adapt to potential similarities among the levels of a factor, and incorporate any such information into the estimation of the effects in which the factor appears. In the presence of such similarities, this prior is able to borrow information from well-estimated main effects and lower-order interactions to assist in the estimation of higher-order terms for which data information is limited.

I’ll have to look at the model in detail, but at first glance this looks like exactly what I want for partial pooling of deep interactions, going beyond the exchangeable Anova models I’ve written about before. Another bit of good news is that there seems to be lots of room for computational improvement. Volfovsky and Hoff report that they fit their model by iterating the Gibbs sampler 200,000 times. I’m hoping that Stan will do it all automatically in a few hundred iterations. And, of course, once we start fitting these models in examples, we’ll probably have thoughts on how to modify them.

As further motivation, let me close with the opening paragraphs of my 2005 article:

What is the analysis of variance? Econometricians see it as an uninteresting special case of linear regression. Bayesians see it as an inflexible classical method. Theoretical statisticians have supplied many mathematical definitions. Instructors see it as one of the hardest topics in classical statistics to teach, especially in its more elaborate forms such as split-plot analysis. We believe, however, that the ideas of ANOVA are useful in many applications of statistics. For the purpose of this paper, we identify ANOVA with the structuring of parameters into batches—that is, with variance components models. There are more general mathematical formulations of the analysis of variance, but this is the aspect that we believe is most relevant in applied statistics, especially for regression modeling.

We shall demonstrate how many of the difficulties in understanding and computing ANOVAs can be resolved using a hierarchical Bayesian framework. Conversely, we illustrate how thinking in terms of variance components can be useful in understanding and displaying hierarchical regressions. With hierarchical (multilevel) models becoming used more and more widely, we view ANOVA as more important than ever in statistical applications.

I’ve been stuck for awhile on how to move forward on nonexchangeable models, so this new paper by Volfovsky and Hoff is really exciting to me.