Looking at your watch a third time waiting in the station for the bus

Statistical Modeling, Causal Inference, and Social Science 2025-03-03

A couple years ago we reported on some iffy statistics used by the Maryland Department of Transportation in their goal to add lanes to the Beltway:

Don’t go back to Rockville: Possible scientific fraud in the traffic model for a highway project?

I know it might sound strange but I believe you’ll be coming back before too long

Ben Ross, the political activist who alerted us to this issue, points to a recent article he wrote with Joe Cortright. Ross writes:

We show the prevalence of this kind of scientific fraud in traffic modeling nationally. It includes the Maryland toll lanes – for which we now know, thanks to a public information law request that I had to battle over, that the state agency published not two, but three, sets of numbers all attributed to the same model run.

I haven’t looked into these claims myself so I won’t get into the specifics about who did what.

More interesting to me are the processes by which these numbers are created and criticized. It’s the sort of mix of politics, communication, and quantitative analysis which has been so problematic in science.

From the article by Ross and Cortright:

Highway construction is a very big business. Nationally, the United States spends nearly $150 billion per year on road and highway construction, an amount that has increased by almost 50 percent in the past five years. The highway-building bureaucracy has created a powerful and well-organized political machine that mobilizes construction companies, engineering firms, truckers, and local business boosters. Politicians are always keen to take credit at ribbon-cuttings. Highway departments routinely shortchange maintenance to cobble together funding for massive empire-building highway and bridge projects.

In pursuit of these goals, highway agencies depend on traffic models. These models are bewilderingly complex, their results are offered with false certainty . . . The models thus serve as powerful technocratic weapons in securing funding . . . Highway builders take advantage of this complexity, presenting models to the public as black boxes that only experts understand. . . .

Sounds familiar from our experience with the NPR/PNAS/Ted/Freakonomics/Harvard/Nudge complex! A bunch of comfortable people patting each other on the back. For example, see here and here for a couple of media-friendly academic podcaster promoting the unsubstantiated claims of another celebrity professor.

So, sure, if it happens in relatively low-stakes areas of academic science, it makes sense that things could be even worse where big money is involved.

Ross and Cortright continue:

Until recently, lack of transparency shielded the inner workings of the modeling process from public view. But two recent investigations, one by each author of this article, managed to get behind the curtain. Both revealed blatant falsification of model results. When forecasters were disappointed by the computer outputs, the forecasters simply changed them by hand, passing off the doctored numbers as genuine results of the model.

P-hacking!

The Armstrong Principle raises its ugly helmet

Ross and Cortright speculate on what went wrong:

If even malignant economic interests such as cigarette and asbestos manufacturers rarely resorted to flat-out falsification of results, why is it so common in traffic modeling? Part of the answer lies in the environmental legislation that requires highway agencies to come up with traffic forecasts. It’s not enough for them to suppress bad results; they must manufacture good ones.

This reminds me of the Armstrong principle: if you require people to promise more than they can deliver, you are motivating them to cheat.

For readers of this blog, a familiar example of the Armstrong principle is the implicit requirement for statistically-significant results in pilot studies. The purpose of a pilot study is to demonstrate feasibility, not to estimate the effect size! It’s the nature of a pilot study to be noisy—you can’t trust the estimates, they will have big standard errors, and the only way to reliably get statistically significant results is to cheat. So people do—they cheat because they think they’re supposed to do so, that it’s just a silly paperwork thing like answering No to the question, Have you ever taken illegal drugs?, on a job interview.

Ross and Cortright continue:

Models tend to depart from reality even when used with the best intentions. When they fail even to reproduce current traffic conditions, as often happens, modelers introduce fudge factors to create a match, which in turn makes them less sensitive to future changes. Algorithms pushed far outside their realm of applicability spew out nonsense. Modelers replace the nonsense with their own best guesses and call what they’ve done post-processing. From there it’s a short step to altering results to please the boss. . . .

By contrast, the last thing the highway agencies want to consider is the one proven way to reduce traffic congestion: charging tolls on existing highways. . . . charging a toll high enough to pay for a new bridge will often reduce traffic so much that there’s no reason to build the bridge at all—a fact that explains highway agencies’ widespread resistance to tolling for congestion relief.