Infovis, infographics, and data visualization: My thoughts 12 years later

Statistical Modeling, Causal Inference, and Social Science 2024-04-18

I came across this post from 2011, “Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go,” and it seemed to make sense to reassess where we are now, 12 years later.

From 2011:

I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines.

In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other students were not making a lot of graphs. I discovered and absorbed the principles of Cleveland’s The Elements of Graphing Data.

In grad school and beyond, I continued to use graphs in my research. But I noticed a disconnect in how statisticians thought about graphics. There seemed to be three perspectives:

1. The proponents of exploratory data analysis liked to graph raw data and never think about models. I used their tools but was uncomfortable with the gap between the graphs and the models, between exploration and analysis.

2. From the other direction, mainstream statisticians–Bayesian and otherwise–did a lot of math and fit a lot of models (or, as my ascetic Berkeley colleagues would say, applied a lot of procedures to data) but rarely made a graph. They never seemed to care much about the fit of their models to data.

3. Finally, textbooks and software manuals featured various conventional graphs such as stem-and-leaf plots, residual plots, scatterplot matrices, and q-q plots, all of which seemed appealing in the abstract but never did much for me in the particular applications I was working on.

In my article with Meng and Stern, and in Bayesian Data Analysis, and then in my articles from 2003 and 2004, I have attempted to bring these statistical perspectives together by framing exploratory graphics as model checking: a statistical graph can reveal the unexpected, and “the unexpected” is defined relative to “the expected”–that is, a model. This fits into my larger philosophy that puts model checking at the center of the statistical enterprise.

Meanwhile, my graphs have been slowly improving. I realized awhile ago that I didn’t need tables of numbers at all. And here and there I’ve learned of other ideas, for example Howard Wainer’s practice of giving every graph a title.

I continued with some scattered thoughts about graphics and communication:

A statistical graph does not stand alone. It needs some words to go along with it to explain it. . . . I realized that our plots, graphically strong though they were, did not stand on their own. . . . This experience has led me to want to put more effort into explaining every graph, not merely what the points and lines are indicating (although that is important and can be hard to figure out in many published graphs) but also what is the message the graph is sending.

Most graphs are nonlinear and don’t have a natural ordering. A graph is not a linear story or a movie you watch from beginning to end; rather, it’s a cluttered house which you can enter from any room. The perspective you pick up if you start from the upstairs bathroom is much different than what you get by going through the living room–or, in graphical terms, you can look at clusters of points and lines, you can look at outliers, you can make lots of different comparisons. That’s fine but if a graph is part of a scientific or journalistic argument it can help to guide the reader a bit–just as is done automatically in the structuring of words in an article. . . .

While all this was happening, I also was learning more about decision analysis. In particular, Dave Krantz convinced me that the central unit of decision analysis is not the utility function or even the decision tree but rather the goal.

Applying this idea to the present discussion: what is the goal of a graph? There can be several, and there’s no reason to suppose that the graph that is best for achieving one of these goals will be optimal, or even good, for another. . . .

I’m a statistician who loves graphs and uses them all the time, I’m continually working on improving my graphical presentation of data and of inferences, but I’m probably stuck (without realizing it) in a bit of a rut of dotplots and lineplots. I’m aware of an infographics community . . .

Here’s an example of where I’m coming from: a blog post entitled, “Is the internet causing half the rapes in Norway? I wanna see the scatterplot.” To me, visualization is not an adornment or a way of promoting social science. Visualization is a central tool in social science research. (I’m not saying visualization is strictly necessary–I’m sure you can do a lot of good work with no visual sense at all–but I think it’s a powerful approach, and I worry about people who believe social science claims that they can’t visualize. I worry about researchers who believe their own claims without understanding them well enough to visualize the relation of these claims to the data from which they are derived.)

The rest of my post from 2011 discusses my struggles in communicating with the information visualization community–these are people who produce graphs for communication with general audiences, which motivates different goals and tools than those used by statisticians to communicate as part of the research process. Antony Unwin and I wrote a paper about these differences which was ultimately published with discussion in 2013 (and here is our rejoinder to the discussions).

Looking at all this a decade later, I’m not so interested in non-statistical information visualization anymore. I don’t mean this in a disparaging way! I think infofiz is great. Sometimes the very aspects of an infographic that make it difficult to read and deficient from a purely statistical perspective are a benefit for communication in that they can push the reader into thinking in new ways; here’s an example we discussed from a few years ago.

I continue to favor what we call the click-through solution: Start with the infographic, click to get more focused statistical graphics, click again to get the data and sources. But, in any case, the whole stat graphics vs. infographics thing has gone away, I guess because it’s clear that they can coexist; I don’t really see them as competing.

Whassup now?

Perhaps surprisingly, my graphical practices have remained essentially unchanged since 2011. I say “perhaps surprisingly,” because other aspects of my statistical workflow have changed a lot during this period. My lack of graphical progress is probably a bad thing!

A big reason for my stasis in this regard, I think, is that I’ve worked on relatively few large applied projects during the past fifteen years.

From 2004 through 2008, my collaborators and I were working every day on Red State Blue State. We produced hundreds of graphs and the equivalent of something like 10 or 20 research articles. In addition to our statistical goals of understanding our data and how they related to public opinion and voting, we knew from the start that we wanted to communicate both to political scientists and to the general public, so we were on the lookout for new ways to display our data and inferences. Indeed, we had the idea for the superplot before we ever made the actual graph.

Since 2008, I’ve done lots of small applied analyses for books and various research projects, but no big project requiring a rethinking of how to make graphs. The closest thing would be Stan, and here we have made some new displays–at least, new to me–but that work was done by collaborators such as Jonah Gabry, who did ShinyStan, and this hasn’t directly affected the sorts of graphs that I make.

I continue to think about graphs in new ways (for example, causal quartets and the ladder of abstraction), but, as can be seen in those new papers, the looks of my graphs haven’t really changed since 2011.