Survey Statistics: more adventures in mismeasured X

Statistical Modeling, Causal Inference, and Social Science 2025-12-31

Last week we considered this simple example of measurement error in auxiliary data X:

Y = current 2025 support
X = true 2024 vote choice
X* = response for 2024 vote choice

All are binary, = 1 for Democrats and 0 for Republicans. This cartoon example is from politics (not meant to be particularly realistic), but measurement error occurs in almost every survey. When does measurement error in an auxiliary adjustment variable negate the gains from adjusting for it to reduce nonresponse bias ?

Suppose we want E(Y), the current 2025 support in the population. If the true X were enough to handle nonresponse bias, then we could estimate this via poststratification:

P(X=1) E(Y | X = 1, sample) + P(X = 0) E(Y | X = 0, sample)

where we have P(X=1) and P(X=0) from the 2024 election results. But we can’t directly estimate E(Y | X = 1, sample) because we only have X* in the sample.

We considered two choices:

Adjust with mismeasured X*: P(X=1) E(Y | X* = 1, sample) + P(X = 0) E(Y | X* = 0, sample)
No adjustment: E(Y | sample)

Questions:

A) Which is closest to the truth, E(Y) ?

B) Which is closer to the previous election result, E(X) ?

C) Which is higher for Democrats ?

The answers depend on the distribution of Y,X,X* in the population and in the sample. For question A, generally I’d guess adjusting even with a mismeasured X* usually gets us closer to truth, but as we’ll see below, it doesn’t always. For question B, one might think adjusting for a past election always brings us closer to that past election’s results, but as we’ll see below, it doesn’t always. For question C, let’s rewrite the no adjustment estimator:

So adjusting for X* could increase support for Democrats if P(X=1) > P(X* = 1 | sample). In other words, more people voted for Democrats in 2024 than say they did in the sample. This sounds like winner’s bias, but it’s also comparing apples (population) to oranges (sample), so not quite.

So the answers to these three questions really does depend !

Here is some R code to simulate your own worlds. I made 4 examples so far. Do you think they’re realistic ?

# Y = 2025 support# X = 2024 vote# X* = 2024 recalled vote# p_ij = P(Y=1 | X=i, X*=j) is 2025 Democrat support by X and X*# s_ij = P(X=i, X*=j | sample) is distribution of X and X* in sample# s01 could come from consistency bias # s10 could come from winner's bias# P(X=1) = 0.49 is the true election resultEY_calc <- function(p11,p01,p10,p00, s11,s01,s10,s00, PX1=0.49){  PX0 <- 1 - PX1  PY1_X1 <- (s11/(s11+s10))*p11 + (s10/(s11+s10))*p10 # P(Y=1 | X=1, sample)  PY1_X0 <- (s01/(s01+s00))*p01 + (s00/(s01+s00))*p00 # P(Y=1 | X=0, sample)  Truth  <- PX1*PY1_X1 + PX0*PY1_X0  PY1_Xs1 <- (s11/(s11+s01))*p11 + (s01/(s11+s01))*p01 # P(Y=1 | X*=1, sample)  PY1_Xs0 <- (s10/(s10+s00))*p10 + (s00/(s10+s00))*p00 # P(Y=1 | X*=0, sample)  Xstar_adjust <- PX1*PY1_Xs1 + PX0*PY1_Xs0  no_adjust <- s11*p11 + s01*p01 + s10*p10 + s00*p00 # P(Y=1 | sample)  # closeness to Truth E(Y)  closer_truth <- if (abs(Xstar_adjust-Truth) < abs(no_adjust-Truth)) "Xstar_adjust"   else "no_adjust"  # closeness to E(X) (last election result)  closer_EX <- if (abs(Xstar_adjust-PX1) < abs(no_adjust-PX1)) "Xstar_adjust"   else "no_adjust"  # higher for Democrats  higher_for_Democrats <- if (Xstar_adjust > no_adjust) "Xstar_adjust"   else "no_adjust"  est <- c(Truth=Truth, Xstar_adjust=Xstar_adjust, no_adjust=no_adjust)   list(estimates=signif(est, 3),        closer_truth=closer_truth,        closer_EX=closer_EX,        higher_for_Democrats=higher_for_Democrats) }# 1) Xstar_adjust is closer to Truth E(Y)EY_calc(  p11=0.82, p01=0.68, p10=0.42, p00=0.25,  s11=0.48, s01=0.06, s10=0.07, s00=0.39,  PX1=0.49)# 2) no_adjust is closer to Truth E(Y)EY_calc(  p11=0.78, p01=0.66, p10=0.46, p00=0.34,  s11=0.44, s01=0.10, s10=0.05, s00=0.41,  PX1=0.49)# 3) no_adjust is closer to last election E(X)EY_calc(  p11=0.781, p01=0.648, p10=0.550, p00=0.297,  s11=0.476, s01=0.010, s10=0.095, s00=0.419,  PX1=0.49)# 4) Winner’s bias only: s01 = 0 no consistency biasEY_calc(  p11=0.86, p01=0.74, p10=0.40, p00=0.28,  s11=0.50, s01=0.00, s10=0.10, s00=0.40,  PX1=0.49)