Survey Statistics: dCV for MRP ?
Statistical Modeling, Causal Inference, and Social Science 2026-04-21
Three weeks ago we learned about design-based cross validation (dCV), shown in Figure 1(d) of Iparragirre et al. (2023):

Each dot is a PSU (primary sampling unit), which can be an individual but is often a group/cluster of individuals. Each color is a stratum. dCV is the usual K-fold CV but:
- keep PSUs together within a fold
- reject a split if a whole stratum falls into one fold
- modify the weights so that each subsample replicates the original sample

Let’s return to the problem of using CV to assess Multilevel Regression and Poststratification (MRP) models. We saw that individual-level Loss(y_i, yhat_i) might not be great for assessing MRP models, even when weighted, and that CV noise can swamp model differences.
The dCV method from Iparragirre et al. (2023) is for a probability sample. MRP is usually used for a nonprobability sample (e.g. an online survey). But maybe there’s still something to learn here.
Bayesian Data Analysis Chapter 7 about evaluating predictive accuracy p.169 says “we can imagine replicating new data in existing groups …or new data in new groups”. New data in existing groups is strata-like, while new data in new groups is cluster-like.
Iparragirre et al. (2023) say that splitting clusters between training and test (i.e. not doing #1 above) will underestimate error because we fit models with more information than we should. This “overfits” to the data. So the usual CV chooses unnecessarily complex models. (See ESL Chapter 7 about model assessment.)
What about the reverse, for strata instead of clusters ? Suppose we don’t do #2 above and we have whole strata within a fold. Then we fit models with less information than we should. Will this “underfit”, choosing overly simple models ?
In Multilevel Regression and Poststratification (MRP) does it help to do the CV rejecting a split if a whole stratum falls into one fold (#2 above) ? For example, if all members of an age or education group fall within one fold, we could redo the CV split ? Any references where folks do this ?