Using output from a fitted machine learning algorithm as a predictor in a statistical model

Statistical Modeling, Causal Inference, and Social Science 2017-11-24

Fred Gruber writes:

I attended your talk at Harvard where, regarding the question on how to deal with complex models (trees, neural networks, etc) you mentioned the idea of taking the output of these models and fitting a multilevel regression model. Is there a paper you could refer me to where I can read about this idea in more detail? At work I deal with ensembles of Bayesian networks in a high dimensional setting and I’m always looking for ways to improve the understanding of the final models.

I replied that I know of no papers on this; it would be a good thing for someone to write up. In the two examples I was thinking of (from two different fields), machine learning models were used to predict a binary outcome; they gave predictions on 0-1 scale. We took the logits of these predictions to get continuous scores; call these “z”, then we ran logistic regressions on the data, using, as predictors, z and some other things. For example, Pr(y_i = 1) = invlogit(a_j[i] + b*z_i) [that’s a varying-intercept model] Pr(y_i = 1) = invlogit(a_j[i] + b_j[i]*z_i) [varying intercepts and slopes] Pr(y_i = 1) = invlogit(a_j[i] + b_j[i]*z_i + X*gamma) [adding some new predictors] You’d expect the coefficients b to be close to 1 in this model, but adding the varying intercepts/slopes and other structures can help pick up patterns that were missed in the machine learning model, and can be helpful in expanding the predictions, generalizing to new settings.

Gruber followed up:

It is an interesting approach. My initial thought was different. I have seem some approaches to bring some interpretability to complex models by learning the prediction of the complex model as in

Buciluǎ, Cristian, Rich Caruana, and Alexandru Niculescu-Mizil. “Model Compression.” In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 535–41. ACM, 2006. http://dl.acm.org/citation.cfm?id=1150464.

Ba, Lei Jimmy, and Rich Caurana. “Do Deep Nets Really Need to Be Deep?” CoRR abs/1312.6184 (2013). http://arxiv.org/abs/1312.6184.

And more recently Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. “‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier,” 1135–44. ACM Press, 2016. doi:10.1145/2939672.2939778.

That’s all fine, it’s good to understand a model. I was thinking of a different question, which was taking predictions from a model and trying to do more with them by taking advantage of other information that had not been used in the original fit.

The post Using output from a fitted machine learning algorithm as a predictor in a statistical model appeared first on Statistical Modeling, Causal Inference, and Social Science.