Should you always include a varying slope for the lower-level variable involved in a cross-level interaction?

Statistical Modeling, Causal Inference, and Social Science 2024-09-04

Sociology student Josh Aleksanyan writes:

I’ve also been working with a dataset on addiction treatment admissions and I have a question that I can’t really figure out. I figured I should go multilevel since I am interested in how broader social policy contexts influence the relationship between a criminal justice referral and treatment outcomes (i.e., length of treatment stay). When building a multilevel model that explicitly includes an interaction between two levels (in my case, a criminal justice referral and rescaled state-level probation and incarceration rates), is it recommended we always allow the individual-level predictor to vary? I know your recommendation is to build many models and always let coefficients vary (up to a point) but I can’t figure out if this is in fact a “rule” if we have a cross-level interaction. This article suggests so. In any case, in my case, adding a varying slope for the level 1 predictor reduces residuals outside of the error bounds (so I guess I partly answered my own question) but I’m wondering (going forward in life) if this is something we have to do because of the cross-level interaction.

Also, since I have your attention, if I expand my data beyond a single year, would you explicitly model year as its own varying intercept (as opposed to adding year dummies)?

The linked article, by Jan Paul Heisig and Merlin Schaeffer, is helpfully titled, “Why You Should Always Include a Random Slope for the Lower-Level Variable Involved in a Cross-Level Interaction.”

What do I think about this recommendation?

It’s complicated. As Aleksanyan notes, my general advice is to include everything. But in real life we can’t include everything, even if the data are available. As we say in “parent talk,” you have to pick your battles. A quick answer is that you always are including these interactions: even when you’re not including them, you’re including them with a coefficient of zero. That seems silly, but the larger point is that it’s not just a question of whether you’re including an effect or interaction, or even how you’re modeling it, so much as how you’re fitting it. If you set a coefficient to zero, that’s complete pooling, or we could say complete regularization. Interactions can be noisy to estimate, so least-squares or even hierarchical models with flat priors on the hyperparameters won’t necessarily do the job.

Anyway, my general advice is to include these varying slopes, at least conceptually. So even if you fit the model without that component of variation, you should recognize that it’s there.