99.4% art, 0.6% data

Numbers Rule Your World 2021-12-29

The media could not conceive how the CDC could revise its estimate of omicron variant so drastically from a heart-stopping 73% to a blood-curdling 59% in a matter of two weeks (for example, Bloomberg scratch that since you can't even read one article on Bloomberg. Here's MSN, so 90s.)

The reason why the media is surprised, stunned, shocked, dismayed - is because the media didn't do its homework when they excitedly reported the 73% number.

I knew because I hopped on the CDC page that contained this number. From there, you immediately learned that 73% is a "Nowcast", which is described as "a model that estimates more recent proportions of circulating variants and enables timely public health action". In plain English, it is a forecast, not actual real-world data.

My first instinct when I see a model (because I build models for a living) is to click the very helpful button that toggles between "Nowcast on" and "Nowcast off". You can't understand any model without first looking at the actual real-world data sitting beneath it.

I was indeed surprised, stunned, shocked, dismayed. Because this was what I found (these screenshots were taken before the latest revision):

Cdc_variant_proportion_nowcast_off

The orange section is the Delta variant. The tiny slither of purple at the bottom of the very last column is the Omicron variant. On the table, you see that the actual proportion of Omicron in the week ending Dec 4, 2021 was 0.7%.

The next screenshot was taken when Nowcast was turned on:

Cdc_variant_proportion_nowcast_on

The last column showed 73% Omicron, which was all over the news when this came up. Notice that the date axis changed. There are two additional weeks shown: ending Dec 11 and Dec 18. The 73% apparently concerned the week ending Dec 18.

It appears that "Nowcast" is not really a forecast but a missing data imputation procedure because this information was released right after Dec 18. This CNet news article was dated Dec 20. Presumably, the flow of data did not support real-time reporting, and so they had to resort to a model.

***

What is this Nowcast model that can aggressively turn 0.7% to 73%? Unfortunately, your guess is as good as mine. The link behind the word Nowcast on the CDC page leads to the chart itself. There is nothing on the chart that explains what kind of model is Nowcast. I found nothing on the page that explains how they turned 0.7% to 73%.

But we can measure how horrible this Nowcast model has performed. The media got this wrong too. It's not 73% versus 59%. Look at the current view of the chart with Nowcast on:

Cdc_variant_proportion_nowcast_on_2

The 59% estimate is for the week ending Dec 25 while the 73% estimate is for the week ending Dec 18. The correct comparison is 73% versus 23% (the purple section of the second column from the right). They "projected" a 10 100 fold increase but now they say it was a 3 fold increase. No wonder they didn't want to tell us what is this model!

***

To take a Bayesian perspective, the model estimate is a kind of weighted average between past data and "prior" knowledge. In this case, the prior knowledge is "art" reflecting someone's subjective belief. We don't know much about the model but we know that this prior belief exceeds a 10 100 fold increase because it cancelled out the past data (0.7% of cases) and more.

Science in the pandemic age is just like this. Scientists running away from other scientists who are capable of evaluating the science.