Movements in the prediction markets, and going beyond a black-box view of markets and prediction models

Statistical Modeling, Causal Inference, and Social Science 2024-09-13

My Columbia econ colleague Rajiv Sethi writes:

The first (and possibly last) debate between the two major party nominees for president of the United States is in the books. . . . movements in prediction markets give us a glimpse of what might be on the horizon.

The figure below shows prices for the Harris contract on PredictIt and Polymarket over a twenty-four hour period that encompasses the debate, adjusted to allow for their interpretation as probabilities and to facilitate comparison with statistical models.

The two markets responded in very similar fashion to the debate—they moved in the same direction to roughly the same degree. One hour into the debate, the likelihood of a Harris victory had risen from 50 to 54 on PredictIt and from 47 to 50 on Polymarket. Prices fluctuated around these higher levels thereafter.

Statistical models such as those published by FiveThirtyEight, Silver Bulletin, and the Economist cannot respond to such events instantaneously—it will take several days for the effect of the debate (if any) to make itself felt in horse-race polls, and the models will respond when the polls do.

This relates to something we’ve discussed before, which is how a forecast such as ours at the Economist magazine can make use of available information that’s not in the fundamentals-based model and also hasn’t yet made its way into the polls. Such information includes debate performance, political endorsements, and other recent news items as well as potential ticking time bombs such as unpopular positions that are held by a candidate but of which the public is not yet fully aware.

Pointing to the above graph that shows the different prices in the different markets, Sethi continues:

While the markets responded to the debate in similar fashion, the disagreement between them regarding the election outcome has not narrowed. This rasies the question of how such disagreement can be sustained in the face of financial incentives. Couldn’t traders bet against Trump on Polymarket and against Harris on PredictIt, locking in a certain gain of about four percent over two months, or more than twenty-six percent at an annualized rate? And wouldn’t the pursuit of such arbitrage opportunities bring prices across markets into alignment?

There are several obstacles to executing such a strategy. PredictIt is restricted to verified residents of the US who fund accounts with cash, while trading on Polymarket is crypto-based and the exchange does not accept cash deposits from US residents. This leads to market segmentation and limits cross-market arbitrage. In addition, PredictIt has a limit of $850 on position size in any given contract, as well as a punishing fee structure.

This is all super interesting. So much of the discussion I’ve seen of prediction markets is flavored by pro- or anti-market ideology, and it’s refreshing to see these thoughts from Sethi, an economist who studies prediction markets and sees both good and bad things about them without blindly promoting or opposing them in an ideological way.

Sethi also discusses public forecasts that use the fundamentals and the polls:

While arbitrage places limits on the extent to which markets can disagree, there is no such constraint on statistical models. Here the disagreement is substantially greater—the probability of a Trump victory ranges from 45 percent on FiveThirtyEight to 49 percent on the Economist and 62 percent on Silver Bulletin.

Why the striking difference across models that use basically the same ingredients? One reason is a questionable “convention bounce adjustment” in the Silver Bulletin model, without which its disagreement with FiveThirtyEight would be negligible.

But there also seem to be some deep differences in the underlying correlation structure in these models that I find extremely puzzling. For example, according to the Silver Bulletin model, Trump is more likely to win New Hampshire (30 percent) than Harris is to win Arizona (23 percent). The other two models rank these two states very differently, with a Harris victory in Arizona being significantly more likely than a a Trump victory in New Hampshire. Convention bounce adjustments aside, the correlation structure across states in the Silver Bulletin model just doesn’t seem plausible to me.

I have a few thoughts here:

1. A rule of thumb that I calculated a few years ago in my post, Is it meaningful to talk about a probability of “65.7%” that Obama will win the election?, is that a 10 percentage point share in win probability corresponds roughly to a four-tenths of a percentage point swing in expected vote share. So the 5 percentage point swings in those markets correspond to something like a two-tenths of a percentage point swing in opinion, which can crudely be thought of as being roughly equivalent to an implicit model where the ultimate effect of the debate is somewhere between zero and half a percentage point.

2. The rule of thumb gives us a way to roughly calibrate the difference in predictions of different forecasts. A difference between a Trump win probability of 50% in one forecast and 62% in another corresponds to a difference of half a percentage point in predicted national vote share. It doesn’t seem unreasonable for different forecasts to differ by half a percentage point in the vote, given all the judgment calls involved in what polls to include, how to adjust for different polling organizations, how to combine state and national polls, and how you set up the prior or fundamentals-based model.

3. Regarding correlations: I think that Nate Silver’s approach has both the strengths and weaknesses of a highly empirical, non-model-based approach. I’ve never seen a document that describes what he’s done (fair enough, we don’t have such a document for the Economist model either!); my impression based on what I’ve read is that he started with poll aggregation, then applies some sort of weighting, then has an uncertainty model based on uncertainty in state forecasts and uncertain demographic swings. I think that some of the counterintuitive behavior in the tails is coming from the demographically-driven uncertainties and also because, at least when he was working under the Fivethirtyeight banner, he wanted to have wide uncertainties in the national electoral college forecasts, and with the method he was using, the most direct way to do this was to give huge uncertainties for the individual states. The result was weird stuff like the prediction that, if Trump were to win New Jersey, that his probability of winning Alaska would go down. This makes no sense to anyone other than Nate because, if Trump were to have won in New Jersey, that would’ve represented a total collapse of the Democratic ticket, and it’s hard to see how that would’ve played out as a better chance for Biden in Alaska. The point here is not that Nate made a judgment call about New Jersey and Alaska; rather, a 50-state prediction model is a complicated thing. You build your model and fit it to available data, then you have to check its predictions every which way, and when you come across results that don’t make sense, you need to do some mix of calibrating your intuitions (maybe it is reasonable to suppose that Trump winning New Jersey would be paired with Biden winning Alaska?) and figuring out what went wrong with the model (I suspect some high-variance additive error terms that were not causing problems with the headline national forecast but had undesirable properties in the tail). You can figure some of this out by following up and looking at other aspects of the forecast, as I did in the linked post.

So, yeah, I wouldn’t take the correlations of Nate’s forecast that seriously. That said, I wouldn’t take the correlations of our Economist forecast too seriously either! We tried our best, but, again, many moving parts and lots of ways to go wrong. One thing I like about Rajiv’s post is that he’s willing to do the same critical work on the market-based forecasts, not just treating them as a black box.