The interactions paradox in statistics

Statistical Modeling, Causal Inference, and Social Science 2024-10-11

A colleague was telling me that people are running lots of A/B tests at the company she works at, and people are interested in interactions. If they start grabbing things that are large and conventionally “statistically significant,” they see all sorts of random things. But when she fits a multilevel model, everything goes away. Also, the interaction models she fits look worse under cross-validation than simple additive, no-interaction models. What to do?

My quick guess is that what’s going on is that the standard errors on these interactions are large and so any reasonable estimate will do tons of partial pooling. Here’s a simple example.

Regarding what to do . . . I think you need some theory or at least some prior expectations. If the client has some idea ahead of time of what might work, even something as simple as pre-scoring each interaction from -5 (expect a large negative interaction), through 0 (no expectation of any effect), through +5 (expect something large and positive), you could use this as a predictor in the multilevel model, and if something’s there, you might learn something.

This sort of thing comes up all the time in applications. On one hand, interactions are important; on the other hand, their estimates are noisy. The solution has to be some mix of (a) prior information (as in my above example) and acceptance of uncertainty (not expecting or demanding near-certainty in order to make a decision).

A quick google search turned up this article from 2016 by Kelvyn Jones, Ron Johnston, and David Manley, who try to do this sort of thing. I haven’t read the article in detail but it could be worth looking at.