“Trivia question for you. I kept temperature records for 100 days one year in Boston, starting August 15th (day “0”). What would you guess is the correlation between day# and temp? r=???”
Statistical Modeling, Causal Inference, and Social Science 2024-11-01
Shane Frederick writes:
Trivia question for you. I kept temperature records for 100 days one year in Boston, starting August 15th (day “0”).
What would you guess is the correlation between day# and temp?
r=???
Shane sends me this kind of thing from time to time, for example:
Boris and Natasha in America: How often is the wife taller than the husband?
Also this from 2016 (nearly ten years ago!): When are people gonna realize their studies are dead on arrival?
When he sends me these little probability puzzles, I can figure them out—no surprise, as they’re devised to work for the general population and probability is my special area of expertise! The flip side is that, when a new one comes, I feel some pressure not to mess it up.
So here was my reply to Shane:
uh, that’s a tough one . . . I don’t have such a great intuition about correlation. ok, let me try to guess . . . I guess you’re talking about the average temperature on each day . . . On 15 Aug the avg temperature might be 80 degrees. 100 days later, that’s approx 25 Nov, the average temp might be 45 degrees (jeez, this is cringeworthy, I’m afraid that without looking it up I will get it embarrassingly wrong). A range of 35 degrees, so that would be an explained sd of 35/sqrt(12) (using the fact that the uniform distribution has a sd of 1/sqrt(12)), so approx 10. As for the unexplained sd . . . suppose there’s a day where the expected avg temp is 60. This time of year it could easily be 50 or 70, so maybe 10 for the unexplained sd? Then the explained and unexplained variance are equal so R-squared is 0.5, so r=0.7? I kinda feel like I must have done something stupid here . . .
It turns out that I got it right (except for forgetting to specify that the correlation is negative, not positive, as temperature is steadily declining from August through November). That’s a relief.
According to Shane, his MIT colleagues guessed a correlation of between 0.15 and 0.2. The day-to-day variability of Boston weather was just too salient to them. He writes:
The narrative of the book Noise is that people under appreciate noise. I don’t disagree. But there are Some contexts where they clearly “over appreciate” it.
I’m guessing that people would give more accurate estimates if you asked them to specify R-squared rather than correlation. R-squared of 50% sounds like a safe guess, no? Conversely, a correlation of 0.15 or 0.2 corresponds to an R-squared in the 2%-4% range, which sounds kinda low.
Regarding Shane’s point about the narrative, this reminds me of the slow to update thing that Josh Miller and I have been thinking about. The usual lesson from those base-rate-fallacy problems is that people overweight local data and underweight base rates. But there are these examples (such as Leicester City) where people seem to be moving too slowly away from their baseline positions, not reacting fast enough to evidence.
As is often the case, there is no safe haven.