Two election forecasting models, negative correlations, and model assumptions

Numbers Rule Your World 2020-10-29

This post by Andrew Gelman addresses an issue that has always bothered me when I look at stochastic simulations.

It's very trendy to run simulations. It's conceptually simple. Imagine forecasting the future - there are some uncertain factors, leading to numerous possible scenarios, you flip a whole bunch of coins to "simulate" the future, and then analyze the distribution of simulated outcomes.

Beneath this simplicity is a tricky issue. You put probability distributions on the variables in your model. Typically, you specify the mean and standard deviation (which tunes how variable your variable is) - although some more advanced distributions require more parameters. You do this for each variable. Then, you press run.

When you press run, you've just made a very strong assumption, that every variable is independent of other variables. If you have 10 variables in your model, there are 10*9/2 = 45 pairs of variables, all of which you've imposed independence. That is unrealistic in almost every real-world application!

The simulation setup becomes a lot more complex if you give up independence. In the example of a 10-variable model, you'd have to specify dozens of pairwise correlations. You don't typically have enough data to estimate all these numbers reliably. So you end up straddling the fence - you impose some structure on the pairwise correlations, e.g. a subset of variables are independent of each other, which reduces the number of required estimates.

***

This situation is illustrated very well by Andrew's post already mentioned. This post is the latest in a series of related posts (here, here and here) that address differences between the election forecasting models built by FiveThirtyEight (Nate Silver) and the Economist (Gelman and associates).

You might find it worthwhile to read my 2016 post about election forecasting (link) before proceeding. I agree with Andrew's final point that given the low frequency of U.S. presidential elections, it's impossible to say who has the best model... scientifically. In the court of public opinion, the FiveThirtyEight model has built a robust reputation. (Okay, so it's only his final point, ignoring all the epilogues and postscripts!)

In the mentioned post, Andrew spends time looking at the correlations of Trump vote shares between states, as projected by FiveThirtyEight, pointing out several unexpected outcomes. These cases are unexpected because the Economist model outputs very different correlations. Then, he documents his thought process of how their team derived the between-state correlations while speculating on what the FiveThirtyEight crew are doing. (The two sides have not spoken directly at the time of the post.)

***

The discussion paints a very true picture of data science modeling. Everyone makes assumptions, lots of them. Each side has justifications for their assumptions - firstly for ourselves, then for others if requested. Observers have to decide whose rationale they believe more. One of the ways to judge a simulation model is to look at the projected outcomes, and see if they make sense. That's the first motivation of Andrew's post.

Before proceeding, you may want to look at my tour of the Oxford epi model, which gives some background on modeling in general, particularly the idea that these models use data but also impose structural assumptions, as well as assumptions on parameters. Start here.

I'm skipping right to the key point of Andrew's post, concerning the correlations between Trump vote shares in Mississippi, a reliably red (Republican) state and in Washington, a reliably blue (Democratic) state. This scatter plot brings out a number of features of the FiveThirtyEight model that are absent from the Economist model:

Of greatest interest is the negative posture of the cloud of dots on the right side, each presenting one of 40,000 simulated Election Days. (In technical terms, a negative correlation between Trump's vote shares in Mississippi and Washington states). If you draw a line through the cloud, it goes from 11 o'clock to 5 o'clock. The FiveThirtyEight Model infers that the better Trump performs in blue Washington, the worse he does in red Mississippi. If Trump outperforms in Mississippi, he underperforms in Washington.

For example, a reasonable outcome on both models predicts Trump winning 60% of Mississippi and 40% of Washington. According to the FiveThirtyEight model, in the scenario where Trump wins 55% of Washington, a real upset, Trump is projected to lose Mississippi. Andrew finds that incredible. I also have difficulty understanding how this can be the case. By contrast, in the Economist model (left side), the scatter plot has a positive posture, meaning the state vote shares are positively correlated - the better Trump does in one state, the better he performs in the other state.

Some responses to Andrew say including the FiveThirtyEight's scenario is reasonable though remote as it represents a radical party realignment. The scatter plot shows many such outliers, though. In most scenarios in which Trump beats Biden in Washington, he loses Mississippi to Biden.

Negative between-state correlations are a part of the FiveThirtyEight model, and not present in the Economist model. Adam Pearce helpfully put together this dataviz that allows you to look at every pairwise correlation. The chart above came from his tool.

This gets at the heart of the tricky simulation problem. What's a reasonable assumption for an extremely remote possibility, like Trump winning Washington or Biden winning Mississippi? Even very smart people will disagree!

***

We can explore the discrepancy further.

Let's break down the projected vote share of a given state (say Washington) in a stylized way: (TVS = Trump vote share)

The most naive and not-very-accurate model estimates TVS for every state to be equal to the estimated national Trump vote share.

TVS (Washington) = a + b*TVS (national) + unexplained (hereafter, U)

To this, we can add state-specific effects. So, the national TVS is modified up or down based on historical data. This model can differentiate between red, blue and tossup states.

TVS (Washington) = a + b*TVS (national) + c*State_Effect(Washington) + U

To this, we can add two-state interaction effects:

TVS (Washington) = a + b*TVS (National) + c*State_Effect(Washington) + d(X)*State_Effect(X) + e(WX)*Paired_Effect(Washington*State_X ) + U

As mentioned at the start, to manage the complexity of this model, we'd rather not think every state effect as correlated with Washington's. We pick a subset of states (X) that have expected correlations and include only those terms. Also think of the paired effects as modifications of the individual state effects, and so I add the series of state effects for the subset of X states as well.

Andrew's argument says that if Trump does surprisingly well in Washington, he would have done well nationally. This suggests that the national effect in the Economist model is quite important.

The other argument requires a negative correlation between Washington and Mississippi, and so the action is in the paired effect and state effects.

After I clean up some code, I'll put up another post that pursues this graphically.