Perfectly stable climates and other statistical myths

Statistical Modeling, Causal Inference, and Social Science 2025-02-20

This is Jessica. As part of teaching prep this week, I was re-reading this paper by Gneiting et al on the usefulness of maximizing sharpness subject to calibration (over prioritizing calibration alone). It got me thinking about how certain statistical myths can induce a kind of dismay, even as you acknowledge their practical value.

The paper applies methods for checking calibration to various hypothetical forecaster types, using weather as an example domain. One of the types of forecasters they consider is the climatological forecaster, who predicts based on an understanding that the climate will be stable. Assume that for each of t rounds (which might be points in time, space, or different subjects), nature chooses draws some random number mu_t from the standard normal distribution N(0,1) then defines the outcome-generating distribution as a normal distribution centered on mu_t with variance of 1. Meanwhile the forecaster provides a predictive CDF as their forecast. The climatological forecaster predicts the unconditional normal distribution centered on 0 with variance of 2. For this example, caring about marginal calibration, which asserts that in the limit, the probabilities assigned by the forecast and the generating distribution chosen by nature are equivalent, is like proposing the existence of a stable climate. 

This analogy asserts nothing new about calibration. But the stable climate interpretation helps emphasize how certain statistical properties or procedures can seem to deny basic truths about the world. When I first read it, I thought, wait, is there really anywhere on earth that truly has a stable climate? Looking at it this way puts it in the same class as other strange myths that we sometimes act like we truly believe, like the null hypothesis that some intervention could feasibly have zero effect. Which is more unrealistic, believing in processes in the world that never change or believing that two different situations could be perfectly equal? Both are sort of unsettling if you imagine really taking them seriously. 

This reminds me how as much as I respect a kind of basic research that sets out to explore what can be proven once we have certain properties, it’s not always easy to like certain results. I’m tired of blogging about calibration (and probably some of you are tired of my posts on it!) but this is part of why I get preoccupied with such topics. It’s like there’s something in it that I’m morally or psychological predisposed to resist.   

That kneejerk reactions to certain concepts can go beyond the purely intellectual or practical realm is interesting to me partly because it’s in tension with the pragmatic attitude that I tend to gravitate to (which Andrew has espoused many times on this blog). I’m wary of getting too caught up in philosophical arguments about statistics, or rejecting certain useful methods or conceptualizations because they cannot be perfectly realized. I would not refuse to ever use a hypothesis test, or declare certain methods useless (polling, for example), though sometimes people do. Obviously we don’t have to fully embrace certain myths to get some value out of acting as if they are true. This is part of the beauty of statistics. 

But at the same time, it seems important to acknowledge that there often is some role of personal predilections about what’s valid, or what is attractive as an object of study, even if we try to pretend there’s not. Why, when pressed, do I want to argue that Bayesian inference is more honest? Why do seemingly simple concepts like exchangeability sometimes appear engimatic? Maybe it’s the irrational predilections that draw some of this toward statistics in the first place. Maybe we should just get used to the idea that statistics is also the territory of the heart.