Generalized linear neural network models

Statistical Modeling, Causal Inference, and Social Science 2025-02-04

This is Bob.

Are neural nets the future of regression?

Andrew was visiting Flatiron last Friday (really last Friday, not six months ago), and I was asking the question that’s been on my mind lately: will neural networks put regression modelers out of work?

Andrew hired me and Matt Hoffman in 2010 to work out how to specify and fit hierarchical regression models with interactions. He wanted to create a system that automatically added interactions, non-linearities, etc., guided by a vaguely conceived “topology of models.” This is a combinatorial nightmare, even with a handful of covariates and non-linearities, and not even considering continuous variation in things like priors.

Black box non-linear function approximations

Fast forward 15 years and regressions from neural networks are ubiquitous. Rather than specifying interactions, non-linearities, etc., we just let a highly overparameterized deep neural net sort it out. This idea of black-box, non-linear function approximation is not new. I first saw it with random forests (the Bayesian analogue of which is Bayesian additive regression trees) and more recently, gradient-boosted decision trees (the go-to method in Kaggle competitions).

Do we have enough data?

The only thing holding us back from using neural networks everywhere is limited data. It’s clear as our data sets get bigger that neural network regression works very well (see, e.g., LLMs, image recognition, and image generation systems, all of which fit largely black-box deep neural network models).

Uncertainty quantification

As we were talking about this, Andrew kept returning to uncertainty quantification. I somehow couldn’t convince him that we can do exactly the same thing as we are currently doing. There’s no fundamental difference between using a neural network and using bespoke hand-tooled covariate combinations—just different functions mapping the covariates to expected values.

Here’s a document explaining the connection

I didn’t have time to explain this to Andrew at the board, so I wrote it up as a document. This goes over how you can take a GLM and swap out the linear component for a neural network and then proceed as usual. It contains an example of a two-hidden layer perceptron model coded in Stan.

I’m always happy to get feedback if people have comments or suggestions. Keep in mind the purpose here is not a publication, but just explaining how a neural network can be swapped in for the linear function in a generalized linear model.