The statistical controversy over “White Rural Rage: the Threat to American Democracy” (and a comment about post-publication review)
Statistical Modeling, Causal Inference, and Social Science 2024-08-29
Here’s an interesting example showing how technical choices in a regression model can make a big difference in the result. And I’m not talking about “statistical significance,” I’m talking about substantive interpretation.
David “should have his own weekly column in the NYT” Weakliem has the story:
The Atlantic recently published a critical review of the new book by Tom Schaller and Paul Waldman, White Rural Rage: the Threat to American Democracy. The review, by Tyler Austin Harper, concluded by saying that they were not just wrong, but had it backwards—the threat is from the cities and suburbs. . . .
The report that Harper links to says: “the more rural a county, the lower its rate of sending insurrectionists, a finding which is significant with a p-value <.01%." A just-published paper by Robert A. Pape, Kyle D. Larson, Keven G. Ruby in PS: Political Science and Politics gives a more detailed analysis. The results are from a negative binomial regression in which the dependent variable is the number of people from a county who were charged with crimes related to the January 6 attack on the Capitol. The number is estimated to be 2.88 times as large in urban than in rural counties, controlling [actually, adjusting — ed.] for other factors.
So far, so good. But there’s a problem. Weakliem explains:
A negative binomial regression predicts the logarithm of the dependent variable and their control is population (in 100,000s). The estimated coefficient for population is .148, meaning that the natural log of the predicted number of insurrectionists goes up by .148 for every 100,000 increase in county population.
Whaaaa? That’s nuts! You definitely want to to put county population on the log scale in such a model.
Weakliem follows up summarizing how his analysis differs from that of Pape et al.:
1. Control variables: my [Weakliem’s] main change was to use the logarithm of population rather than population as a predictor variable . . . I also created a variable for people living within driving distance, which I defined as 700 kilometers (which includes Boston, Cincinnati, and Detroit) and an interaction between distance and that variable. My idea was that (a) if you were in driving distance you could make the trip without spending much money and (b) with driving, the cost in time and money is strongly related to the distance . . .
2. Points in common: the number of insurrectionists increased with the percent of the county that was non-Hispanic white; decline in manufacturing employment didn’t make any clear difference; number of insurrectionists was higher in urban areas (although the estimated effect was much smaller in my analysis).
3. Points of divergence: a decline in the white population led to more insurrectionists in their analysis but had no effect in mine; the percent who voted for Trump led to fewer insurrectionists in their analysis but more in mine. . . . I ran a model including both Romney support in 2012 and the difference, and found that they both had similar positive estimates. I think this is important—it suggests that the insurrectionists were drawn both from new Trump followers and traditional Republicans. . . .
Overall, they conclude that participation in the insurrection was largely a response to perceived ethnic threat, and that the sources of “violent populism” are very different from those of “electoral populism.” My conclusion is that the sources are similar–after you control for population and distance, the places where Trump got votes were also the placed where he got supporters on January 6.
Too bad that the flawed paper is published in an official scholarly journal and Weakliem’s reanalysis and discussion are on a blog, where you’d expect they’d get less attention and respect.
I guess Weakliem could write up his posts as a short article and submit it to a journal, but that’s a bit of work, and, based on some of my experiences, I’m guessing it would end up as a Kafkaesque mess, with the most likely outcome being that his letter would be rejected by the journal outright, the next most likely outcome being an exhausting series of revisions, and the best possible outcome being publication along with a defensive and obfuscating response by the authors of the article being criticized.
This is one reason I like the idea of independent post-publication review, to avoid all that.