Oldie but goodie

Junk Charts 2014-02-03

Back in 2007, the New York Times graphics team produced a fabulous chart explaining the rise in prices at the pump (link).

Let's start with the tab labeled "Regional Price" which contains a well-executed map of the average gas prices by county:

Nyt_2007gas

The color scale is wonderful. It's just one color and yet the gradations are easily discerned. The general spatial pattern jumps out at you, with prices being higher in the Pacific coast, and lower in New England all the way down south. The Lakes region also has higher prices so does New Mexico and Colorado and Hawaii.

***

The legend is just superb. Take a closer look:

Nyt_2007gas_legend

What sets this legend apart is varying lengths of the segments. In particular, the darkest blue also corresponds to a wide range of prices (3.45-3.94). One can also easily figure out the lowest and highest price in the nation--the designers located exactly in which counties those prices were recorded, which is another nice touch.

To determine the breakpoints on the legend, one can use a statistical methodology: a standardized scale anchored on both sides of the national average price (from the other chart, the average price was $3.22). Then, we have each color mapping to the length of one standard deviation of prices in both directions. What this does is to put counties into standardized groups: for example, all counties whose prices were within one standard deviation above the average are given one tint while those that were one to two standard deviations above the average has a darker blue, and so on. In effect, we would have created a contour map.

***

I see the designers' intention in clearly labeling the areas where they do not have data, with the diagonal stripes on white. My own preference is to put those areas in a mild gray, in effect blending them into the surroundings. In this way, the missing data do not distract the average reader, while the fastidious reader can still figure out where the data holes are.

This is a key learning for most research scientists. We have a tendency to train our eyes on the outliers and the data holes because they are like imperfections in diamonds. This leads us to the tendency of highlighting the least important message up front. And it's a bad habit.

***

In the following, I put the county and state level views side by side. The NYT graphic allows users to switch between the two views via a tab.

Nyt_gasprices_spread

Much like the recent post on the age of buildings in Brooklyn, the state aggregates tell a simpler story but still capture almost all of the spatial pattern. The average prices per state are now printed directly on the chart. The question the designer should ask is what the readers want to learn from such a chart, and which one delivers more of such requirements. It's possible the Times is catering to two types of readers. Perhaps one can strike a middle ground, which is to break out certain states like Texas into contiguous "regions".