Survey Statistics: design-based cross validation (dCV)

Statistical Modeling, Causal Inference, and Social Science 2026-03-31

Last week we saw how cross-validation noise can swamp important model differences (Wang & Gelman 2014). The comments raised another challenge: how to split into train and test sets with structured data ?

Aki explains options here:

Thomas Lumley’s blog post and coauthored paper Iparragirre et al. (2023) explore CV using “replicate weights” ideas:

Replicate weight methods split the sample into partially independent subsamples, and modify the weights so that each subsample replicates the original sample. They are usually used for variance estimation, but Iparragirre et al. (2023) consider them for out-of-sample error estimation.

Although I was rooting for BRR (Balanced Repeated Replication) because I work at Blue Rose Research, a better method seems to be design-based cross validation (dCV), depicted in their Figure 1(d):

Each dot is a PSU (primary sampling unit), which can be an individual but is often a group/cluster of individuals. Each color is a stratum. dCV is the usual K-fold CV but:

split PSUs not individuals
reject a split if a whole stratum falls into one fold
modify the weights so that each subsample replicates the original sample

The first mirrors Aki’s LOGO (leave-one-group-out) CV above. The third is an idea from replicate weights.

The first two seem useful for nonprobability samples as well ? Suppose there is structure in the data and our predictive task is to predict for new schools (PSU-like) but existing states (strata-like). Is there a good reference for this ?