Interpreting recent Iowa election poll using a rough Bayesian partition of error

Statistical Modeling, Causal Inference, and Social Science 2024-11-03

A political science colleague wrote in:

We are all abuzz about the Harris +3 in that Iowa Poll with its great track record. When I check the write up of this poll I see a reasonably detailed description of their procedure but I do not see even a mention of post-stratification. Not that there is a lot to stratify in that state, but this worries me.

I responded that this is just one poll and can be off for various reasons.

My colleague followed up:

But (a) this poll does have a great track record over many salient elections. Could it be the one ordinary poll that is the luckiest? (b) unlike almost any other poll description, this one says nothing about post-stratification. The poll has no history of D bias, In fact it was very pro-Trump this summer.

I responded: If the Iowa poll doesn’t poststratify by party ID, then they should look at the party ID breakdown of their survey, compared to earlier polls they’ve done in their state. Or maybe some of the difference is due to responses about turnout. The Bayesian thing is to attribute part of the unexpected result to sampling error, part to nonsampling error, part to Iowa-specific swing, part to regional/demographic swing, and part to national swing. How much to each, I don’t know, but a starting point would be 1/5 to each.

OK, let’s do the math. The Economist as of yesterday estimated Harris with 46% of the two-party vote share in Iowa. This new Iowa poll has Harris with 51.5% of the two-party vote share. That’s a 5.5% shift, call it a 5% shift for simplicity, then the crude estimate is: 1% sampling error, 1% nonsampling error, 1% Iowa-specific swing, 1% regional/demographic swing (roughly speaking, older white midwesterners), 1% national swing.

So, yeah, that’s not nothing. That’s pretty much the adjustment you would get if you were to take the Iowa poll at the highest level of seriousness.

I sent the above to Elliott Morris at Fivethirtyeight, who added:

Not much to add, except

It’s also random digit dialing (RDD), which has had a lot of variance in the last 2 elections — much more than predictable by sample size and the typical Goel/Rothschild/Gelman inflator (2x) and closer to 4x (17-point WI poll in 2029 from ABC; 12-pt swings from CNN in 2020; Trump +10 poll from ABC last October, etc). So the combo of subsample biases and sampling error probably acute in this case. I buy that it’s just weird, and also that our priors may need to adjust a bit.

That all being, Selzer should publish the party ID (or registration) crosstabs. I suspect there’s a reason she didn’t . . .

And why not move to a registration-based survey (a poll off the voter file), at least so you a track response rates by group? The idea that you can do your weird magic circle and get a good RDD sample is pretty farfetched with a 0.5% response rate. Survey only weights by age and sex. I’m supposed to believe that’s enough, given the result? Call me skeptical . . .

Or maybe she’s right. I have no idea.

Elliott concludes:

I don’t know if she’s right of who’s going to win and we can all just figure out Tuesday!

Indeed. This new poll is just one piece of information. My crude Bayesian breakdown is there to say that, if you do want to take that poll at face value, there are still several steps to take to get you from the poll result to inference about the national election.

To put it another way: It’s natural to say something like, “If this Iowa poll is correct, then Harris will win the national election overwhelmingly (or, conversely, if she doesn’t, then the poll is wrong).” But that’s not right. Even if the poll is just fine, you still have to decompose the residual into these different pieces, only two of which strongly relate to the national outcome.

P.S. You could do the same calculation for every new poll that comes in, and if you’re careful, you’ll end up with something like our Economist election forecast (or like the Fivethirtyeight forecast, or like just about any other method that combines historical information, state polls, and national polls). It just can be helpful to work things out in an example.