“How a simple math error sparked a panic about black plastic kitchen utensils”: Does it matter when an estimate is off by a factor of 10?
Statistical Modeling, Causal Inference, and Social Science 2024-12-13
tl;dr. Researcher finds out they were off by a factor of 10, responds that “this does not impact our results and our recommendations remain the same.”
Dean Eckles sent me an email, subject line, “order-of-magnitude errors that don’t affect the conclusions,” pointing to this news article from Joseph Brean:
Plastics rarely make news like this . . . the media uptake was enthusiastic on a paper published in October in the peer-reviewed journal Chemosphere.
“Your cool black kitchenware could be slowly poisoning you, study says. Here’s what to do,” said the LA Times. “Yes, throw out your black spatula,” said the San Francisco Chronicle. Salon was most blunt: “Your favorite spatula could kill you,” it said.
The study, by researchers at the advocacy group Toxic-Free Future, sought to determine whether black plastic household products sold in the U.S. contain brominated flame retardants, fire-resistant chemicals that are added to plastics for use in electronics, such as televisions, to prevent accidental fires. . . .
The study estimated that using contaminated kitchenware could cause a median intake of 34,700 nanograms per day of Decabromodiphenyl ether, known as BDE-209. That is far more than the bodily intake previously estimated from other modes, such as ingesting dust.
OK, so far, so good. But then:
The trouble is that, in the study’s section on “Health and Exposure Concerns,” the researchers said this number, 34,700, “would approach” the reference dose given by the United States Environmental Protection Agency. . . .
The paper correctly gives the reference dose for BDE-209 as 7,000 nanograms per kilogram of body weight per day, but calculates this into a limit for a 60-kilogram adult of 42,000 nanograms per day. So, as the paper claims, the estimated actual exposure from kitchen utensils of 34,700 nanograms per day is more than 80 per cent of the EPA limit of 42,000.
Did you catch that? Look carefully:
That sounds bad. But 60 times 7,000 is not 42,000. It is 420,000. This is what [McGill University’s] Joe Schwarcz noticed. The estimated exposure is not even a tenth of the reference dose. That does not sound as bad.
Indeed.
We all make mistakes, and some of them make their way into journals. I can see how the reviewers of the article could’ve not caught this particular error, with all those zeros floating around in the numbers. It’s less excusable for the authors to have missed it—I guess that part of the problem is that the incorrect number fit their story so well. When you come up with a result that doesn’t accord with the story you want to tell, you’re inclined to check it If the result is a perfect fit, you might not even give it a second look.
How to avoid this in the future?
Schwarcz, who directs McGill University’s Office for Science and Society, offers some helpful insights for avoiding this sort of error:
Schwarcz does not generally like measurements of risk expressed in percentages. Absolute numbers tend to be more useful, as in this study. He gives the example of a lottery ticket. If you have one lottery ticket, your chances of winning are, say, one in a million. If you buy another, your chances of winning have increased by 100 per cent, which sounds like a lot until you realize they are still just two in a million.
“Risk analysis is a sketchy business in the first place, very difficult to do, especially if you don’t express units properly,” Schwarcz said. “You can make things sound worse.”
There was also no need to use nanograms as the unit of measurement in this study, Schwarcz said, which gave unit amounts in the tens of thousands. The more common micrograms would have given units in the tens.
“It’s a common thing in scientific literature, especially in ones that try to call attention to some kind of toxin,” Schwarcz said.
Scaling is really important, and often people seem to be going out of their way to use numbers that are hard to interpret. Or, they just use whatever default scaling comes to them, without reflecting on how they could do better. A few years ago we discussed some graphs of annual death rates that were given in units such as “1,000 deaths per 100,000.” It’s hard to get intuition on a number like that. It would’ve been so easy to just do everything per 100, and then that number would be a much more interpretable 1%. (About 1% of Americans die each year, which makes sense given demographics.)
Did the factor-of-10 error matter?
From the news article:
Lead author Megan Liu, science and policy manager at Toxic-Free Future, described the mistake as a “typo” and said her co-authors have submitted a correction to the journal. The error remains in the online version but Liu said she anticipates it will be updated soon.
“However, it is important to note that this does not impact our results,” Liu told National Post. “The levels of flame retardants that we found in black plastic household items are still of high concern, and our recommendations remain the same.”
Hmmm, maybe. The news article also states, “it appears the study’s hypothesis is correct, that black plastic recycled out of electronic devices, mostly in Asia, is getting back into the American supply chain for household kitchen items, including spatulas. So if you’re keen on eliminating these chemicals in any amount, chucking the black plastic kitchenware is a start, even if not as effective as the erroneous calculation suggests.”
It still seems wrong to say, “Our recommendations remain the same,” if their estimate of risk is off by a factor of 10.
Look at it another way. Suppose someone else had done a study and found that the level of exposure was “8% of the reference dose, thus, a potential concern,” but they’d done the calculation wrong, and the level was really 80% of the reference dose. Then I assume that the folks at Toxic-Free Future would’t say that the recommendations remain the same, right? They’d say the exposure had been underestimated by a factor of 10 and that’s a big deal!
To put it another way, comparisons are symmetric. If you say that an exposure of 80% of the recommended dose is 10 times as bad as an exposure of 8% of the recommended dose, then the reverse should be true as well, that an exposure of 8% is 1/10 as bad as an exposure of 80%.
Does this change the recommendation? Yes, I’d say it does, in part. The individual recommendation—throw away your black plastic spatula—might not change, but the policy recommendation would change, because policy recommendations are not just directional, they also include a sense of urgency or priority, which depends on magnitude.
I’ll return to this issue in a future post. It’s an important issue that arises in many examples.