Visual cues affect how data are perceived
Junk Charts 2023-01-24
Here's a recent NYT graphic showing California's water situation at different time scales (link to article).
It's a small multiples display, showing the spatial distribution of the precipitation amounts in California. The two panels show, respectively, the short-term view (past month) and the longer-term view (3 years). Precipitation is measured in relative terms, so what is plotted is the relative ratio of precipitation in the reference period, with 100 being the 30-year average.
Green is much wetter than average while brown is much drier than average.
The key to making this chart work is a common color scheme across the two panels.
Also, the placement of major cities provides anchor points for our eyes to move back and forth between the two panels.
***
The NYT graphic is technically well executed. I'm a bit unhappy with the headline: "Recent rains haven't erased California's long-term drought".
At the surface, the conclusion seems sensible. Look, there is a lot of green, even deep green, on the left panel, which means the state got lots more rain than usual in the past month. Now, on the right panel, we find patches of brown, and very little green.
But pay attention to the scale. The light brown color, which covers the largest area, has value 70 to 90, thus, these regions have gotten 10-30% less precipitation than average in the past three years relative to the 30-year average.
Here's the question: what does it mean by "erasing California's long-term drought"? Does the 3-year average have to equal or exceed the 30-year average? Why should that be the case?
If we took all 3-year windows within those 30 years, we're definitely not going to find that each such 3-year average falls at or above the 30-year average. To illustrate this, I pulled annual rainfall data for San Francisco. Here is a histogram of 3-year averages for the 30-year period 1991-2020.
For example, the first value is the average rainfall for years 1989, 1990 and 1991, the next value is the average of 1990, 1991, and 1992, and so on. Each value is a relative value relative to the overall average in the 30-year window. There are two more values beyond 2020 that is not shown in the histogram. These are 57%, and 61%, so against the 30-year average, those two 3-year averages were drier than usual.
The above shows the underlying variability of the 3-year averages inside the reference time window. We have to first define "normal", and that might be a value between 70% and 130%.
In the same way, we can establish the "normal" range for the entire state of California. If it's also 70% to 130%, then the last 3 years as shown in the map above should be considered normal.