Problems with a CDC report: Challenges of comparing estimates from different surveys. Also a problem with rounding error.
Statistical Modeling, Causal Inference, and Social Science 2023-03-16
A few months ago we reported on an article from the Columbia Journalism Review that made a mistake by comparing numbers from two different sources.
The CJR article said, “Before the 2016 election, most Americans trusted the traditional media and the trend was positive, according to the Edelman Trust Barometer. . . . Today, the US media has the lowest credibility—26 percent—among forty-six nations, according to a 2022 study by the Reuters Institute for the Study of Journalism.” That sentence makes it look like there was a drop of at least 25 percentage points (from “most Americans” to “26 percent”) in trust in the media over a six-year period. Actually, though, as noticed by sociologist David Weakliem, the “most Americans” number from 2016 came from one survey and the “26%” from 2022 came from a different survey asking an entirely different question. When comparing comparable surveys, the drop in trust was about 5 percentage points.
This comes up a lot: when you compare data from different sources and you’re not careful, you can get really wrong answers. Indeed, this can even arise if you compare data from what seem to be the same source—consider these widely differing World Bank estimates of Russia’s GDP per capita.
It happened to the CDC
Another example came up recently, this time from the Centers for Disease Control and Prevention. The story is well told in this news article by Glenn Kessler. It started out with a news release from the CDC stating, “More than 1 in 10 [teenage girls] (14%) had ever been forced to have sex — up 27% since 2019 and the first increase since the CDC began monitoring this measure.” But, Kessler continues:
A CDC spokesman acknowledged that the rate of growth highlighted in the news release — 27 percent — was the result of rounding . . . The CDC’s public presentation reported that in 2019, 11 percent of teenage girls said that sometime in their life, they had been forced into sex. By 2021, the number had grown to 14 percent. . . . the more precise figures were 11.4 percent in 2019 and 13.5 percent in 2021. That represents an 18.4 percent increase — lower than the initial figure, 27 percent.
Rounding can be tricky. It seems reasonable to round 11.4% to 11% and 13.5% to 14%—indeed, that’s how I would report the numbers myself, as in a survey you’d never realistically have the precision to estimate a percentage to an accuracy of less than a percentage point. Even if the sample is huge (which it isn’t in this case), the underlying variability of the personal-recall measurement is such that reporting fractional percentage points would be inappropriate precision.
But, yeah, if you’re gonna compare the two numbers, you should compute the ratio based on the unrounded numbers, then round at the end.
This then logically brings us to the next step, which is that this “18.4% increase” can’t be taken so seriously either. It’s not that an 18.4% increase is correct and that a 27% increase is wrong: both are consistent with the data, along with lots of other possibilities.
The survey data as reported do show an increase (although there are questions about that too; see below), but the estimates from these surveys are just that—estimates. The proportion in 2019 could be a bit different than 11.4% and the proportion in 2021 could be a bit different than 13.5%. Even just considering sampling error alone, these data might be consistent with an increase of 5% from one year to the next, or 40%. (I didn’t do any formal calculations to get those numbers; this is just a rough sense of the range you might get, and I’m assuming the difference from one year to the other is “statistically significant,” so that the confidence interval for the change between the two surveys would exclude zero.)
There’s also nonsampling error, which gets back to the point that these are two different surveys, sure, conducted by the same organization but there will still be differences in nonresponse. Kessler discusses this too, linking to a blog by David Stein who looking into this issue. Given that the surveys are only two years apart, it does seem likely that any large increases in the rate could be explained by sampling and data-collection issues rather than representing large underlying changes. But I have not looked into all this in detail.
Show the time series, please!
The above sort of difficulty happens all the time when looking at changes in surveys. In general I recommend plotting the time series of estimates rather than just picking two years and making big claims from that. From the CDC page, “YRBSS Overview”:
What is the Youth Risk Behavior Surveillance System (YRBSS)?
The YRBSS was developed in 1990 to monitor health behaviors that contribute markedly to the leading causes of death, disability, and social problems among youth and adults in the United States. These behaviors, often established during childhood and early adolescence, include
– Behaviors that contribute to unintentional injuries and violence. – Sexual behaviors related to unintended pregnancy and sexually transmitted infections, including HIV infection. – Alcohol and other drug use. – Tobacco use. – Unhealthy dietary behaviors. – Inadequate physical activity.
In addition, the YRBSS monitors the prevalence of obesity and asthma and other health-related behaviors plus sexual identity and sex of sexual contacts.
From 1991 through 2019, the YRBSS has collected data from more than 4.9 million high school students in more than 2,100 separate surveys.
So, setting aside everything else discussed above, I’d recommend showing time series plots from 1991 to the present and discussing recent changes in that context, rather than presenting a ratio of two numbers, whether that be 18% or 27% or whatever.
Plotting the time series doesn’t remove any concerns about data quality; it’s just an appropriate general way to look at the data that gets us less tangled in statistical significance and noisy comparisons.