Nonsampling error and the anthropic principle in statistics

Statistical Modeling, Causal Inference, and Social Science 2024-09-23

We’ve talked before about the anthropic principle, which, in physics, is roughly the idea that things are what they are because otherwise we wouldn’t be around to see them. Related are various equilibrium principles which state that things are what they are because, if they weren’t, behavior would change until equilibrium is reached.

An example is the idea that price elasticity of demand should be close to -1. If it’s steeper than -1, then the seller has a motivation to lower the price so as to get more total money; if it’s shallower than -1, then the seller has a motivation to raise the price so as get more total money; equilibrium is only at -1. The price elasticity of demand is not, in general, -1, because there are lots of other costs, benefits, and constraints in any system; the -1 thing is just a baseline. Still, it can be a useful baseline.

Another example is the median voter theorem, which we’ve discussed many times on this blog (see the many links here): to the extent that parties take positions are not close to the median of the voters, the parties should be able to gain votes by moving toward the median. Again, this does not generally happen because of many complicating factors; the median voter theorem can still be helpful as a baseline.

Another example is effect sizes in statistics, a topic that can also be studied empirically.

Today I want to talk about polling error, in particular, this finding from an article with Houshmand Shirani-Mehr, David Rothschild, and Sharad Goel:

Reported margins of error typically only capture sampling variability, and in particular, generally ignore nonsampling errors in defining the target population (e.g., errors due to uncertainty in who will vote). Here, we empirically analyze 4221 polls for 608 state-level presidential, senatorial, and gubernatorial elections between 1998 and 2014, all of which were conducted during the final three weeks of the campaigns. Comparing to the actual election outcomes, we find that average survey error as measured by root mean square error is approximately 3.5 percentage points, about twice as large as that implied by most reported margins of error.

Roughly speaking, nonsampling error is about the same size as sampling error. I want to argue that this fits an anthropic or equilibrium storyline. It goes like this: if you conduct a survey with a huge sampling error, then there will be a clear benefit from increasing your sample size and bringing that sampling error down. From the other direction, it would not make sense to run a state poll with a sample size in the tens of thousands: that would bring down the sampling error but it would not help with nonsampling error.

With independent error components, sd of total error = sqrt((sd of sampling error)^2 + (sd of nonsampling error)^2). and the way the math works is that, reducing the smaller of these terms gives diminishing returns.

Again this reasoning is only approximate. For one thing, if the sd of average survey error is twice that of sampling error, then this implies that sampling error is less than nonsampling error (because of that root-mean-square thing), and I guess that kinda makes sense, given that polls are used for information other than the headline number, also polls are analyzed for trends, not just levels. The idea is that you wouldn’t expect nonsampling error to be much less than sampling error or much more than sampling error.

The point of this anthropic reasoning is not to give an exact answer but rather to give some intuition to where we are. It’s related to the general principle that you’d expect variance and squared bias to be comparable to each other, as discussed in Section 4.3 of Regression and Other Stories.