Excess deaths, reporting delays, statistical adjustments, misleading headline

Numbers Rule Your World 2020-08-17

The New York Times published a very nice presentation on "excess deaths" in the United States from March through July. (link)

The following chart shows how public health policies utterly failed in many Northeastern states in the April and May.

Nyt_excessdeaths_northeast1

In Connecticut, for example, in the middle of April, the number of deaths exceeded expected deaths by 2.6 times. In Massachusetts, around the same time, actual deaths exceeded expectation by 2.3 times.

A lot of work went into generating these charts - starting with meticulous data collection from each state, and as much as possible, reconciling how metrics are defined differently across states. Then, there is a bit of statistical modeling.

Modeling is required because "excess deaths" is - like most statistical quantities - not directly observable. Excess deaths is the difference between actual deaths (from all causes) in the presence of the Covid-19 pandemic and expected deaths (also from all causes) in its absence. The former is directly measurable but the latter is not.

To establish how many people could have died during a specific week in Connecticut if the novel coronavirus did not exist, statisticians look at deaths during the same week in that state in past years. Over a number of years, one should find some statistical consistency in the death rate. This illustrates the power of averaging.

If 2020 were a normal year, the chart should look something like this:

Nyt_excessdeaths_hawaii

The above line is for Hawaii, which, because of its remote location, has not been affected much thus far by Covid-19. What you see is the actual weekly deaths hovering around the baseline (corresponding to the historical average), sometimes rising above, and sometimes dipping below.

The line for Connecticut, though, looks nothing like that for Hawaii. The most important feature is that the entire line sits above the baseline, meaning that there have been excess deaths for every week since March.

***

Ready for even more modeling?

For most states, the time line reaches the last week of July, which may surprise you. The first part of excess deaths - the observable part - counts all causes, obviating the need to verify coronavirus test results. The measure relies on death certificates. But it may take weeks to months to receive data on death certificates so how does NYT have estimates of excess deaths up to the week prior?

Does NYT have a crystal ball? In a sense, yes. It uses a statistical model to "top up" the incomplete deaths data in the most recent weeks. It turns out that actual deaths are not completely observable always. There is a reporting delay which means that the data for more recent weeks are incomplete. The data age like fine wine - the older the data, the more complete.

The NYT utilized a similar strategy as the model for projecting expected deaths - assuming historical patterns hold. The CDC keeps track of the amount of reporting delay. It might know that typically, 20 percent of the data showed up with one weekrepo's delay, 50 percent with two weeks' delay, and so on.

There is a crucial difference betweeen these two statistical models, though. The first model projects expected deaths in the absence of the novel coronavirus. The second model tops up the actual deaths reported at the time of analysis, essentially projecting the number of deaths already occurred but not yet reported. These actual deaths are in the presence of the coronavirus. This is the crucial difference. This second model has a further assumption - that reporting delays have not been affected by the pandemic.

The NYT team disclosed this assumption clearly in the text. This is what they mean by the following sentences:

Even with this adjustment, it's possible there could be an underestimate of the complete death toll if increased mortality is causing states to lag more than they have in the past or if states have changed their reporting systems.

It's quite likely that the volume of deaths, the budget pressure, and political interference have changed the pattern of the reporting delay.

The reporter then argued that the adjusted estimates are still better than the unadjusted ones. One can't refute that argument. Of course, this discussion assumes that the chart must show a time line leading to the present. The designer can choose to show only those weeks with virtually complete data.

Or, we can bring in a third model. This one looks at the early weeks of the pandemic, for which we believe almost all of the death-certificate data have been recorded. We can then estimate the error of the second model that serves to top up the actual deaths.

***

NYT's excess-death analysis has found over 200,000 more deaths (from all causes) in the U.S. between March and July than expected based on historical patterns. Given the U.S. counted about 140,000 deaths due to Covid-19 during this period, there have been about 60,000 unexplained deaths so far.

I find the NYT headline, "The true coronavirus toll in the US has already surpassed 200,000," alarmist. The analysis did not support this conclusion - unless they make the further assumption that the only factor causing excess deaths during those months is Covid-19. That is plausible but not fact-based. A more correct headline should be "Deaths due to Covid-19 may have been undercounted by up to 40 percent." That's also shocking and sobering.