Thirty percent unvaccinated in healthcare: less than meets the eye

Numbers Rule Your World 2021-11-24

When I saw this "news" on my Twitter feed, I knew I have to write a blog post about validity of research results. There is an ongoing plague in which science reporters broadcast top-line results without ever checking the underlying papers - which disclose how limited most of these results really are.

Twitter_unvaccinatedhcp

The most important words in the above excerpt is not that 30 percent of U.S. hospital-based healthcare workers are still unvaccinated - they are "for whom data on vaccination status was available".

The published paper is here. I'll summarize what you'll learn from it.

***

The report covers only 40% of U.S. hospital-based healthcare workers. The source is a database in which hospitals (sometimes through the state health departments) voluntarily enter aggregate data.

About half of the hospitals in the U.S. contribute data to this database. Another 10% of hospitals were dropped because of exclusions.

"Psychiatric, rehabilitation, and religious non-medical facilities were excluded." I really don't know why.

Also, hospitals for which "total HCP" (healthcare personnel) or "HCP vaccination" fields are missing or zero are excluded. The confounding of missing and zero is a common database problem. Does zero mean zero or missing? By using this exclusion, the analysts made the assumption that zero means missing, and not zero.

Yet another exclusion applies when unvaccinated personnel and fully vaccinated personnel summed to less than 75% of the total number of HCP. This means they excluded any hospital in which a significant number of HCPs have taken one shot or have taken two shots but not reached 14 days. I really have no idea why this exclusion is needed.

P.S. On second thought, I think this provision is associated with the desire to put "unvaccinated" into the headline instead of "not fully vaccinated". However, when such an exclusion has been added, then the first part of the sentence should really say 30% of hospital-based healthcare personnel excluding those who are partially vaccinated. So, no, I don't like this exclusion, and the headline is inaccurate.

***

What is the implication of exclusions? It causes the study findings to become less generalizable. It also confuses readers. When people read "30% of hospital-based healthcare personnel are unvaccinated", they think 30% of all such personnel. When they do that, they made the implicit assumption of no non-response bias, i.e. the hospitals excluded from the study are the same as those included.

Ex ante, we know that assumption to be false. If that assumption held, then there would be no need to exclude hospitals in the first place.

If excluded hospitals (including those who chose not to participate) are different from included ones, then the study finding can only apply to included hospitals, which represent less than half of total hospitals. In statistics, we say exclusions reduce the "validity" of results.

***

Many statisticians, including myself, prefer to make educated guesses to close the gap. We'd like our overall finding to generalize to the whole country. We'd therefore need to predict what answer we would see from the excluded hospitals should we have their data. This prediction can use data from the included hospitals with appropriate adjustments.

A key to such a prediction is to find differences between the excluded and included hospitals. According to the paper, excluded hospitals are less likely to be critical access hospitals, more likely to be non-rural, and more likely to be larger in size. (The implication of this last factor is that the 40% of hospitals represent less than 40% of all hospital-based HCPs.) The authors stated that the first two factors are associated with higher vaccination rates, and so the excluded hospitals may in fact have higher rates than those included.

Here is where you find heated debates between statisticians. Some argue that any such prediction injects subjectivity into the analysis. I take the other side of this argument. Not injecting subjectivity is a mirage. When you don't make adjustments like these, your results assume no adjustments are necessary, which is to assume no bias exists, and allow readers to implicitly extend biased results to the entire population. The adjustments may fail entirely or partially but I don't think the adjusted results are any worse than the unadjusted ones.

***

The adverse impact of hidden biases in the dataset is easily missed.

Consider the following sentence from the paper:

Coverage was higher among HCP working in facilities located in metropolitan counties (71.0%; 95% CI: 70.9, 71.0), followed by HCP working in facilities located in rural counties (65.1%; 95% CI: 64.8, 65.3), and HCP working in nonmetropolitan urban counties (63.3%; 95% CI: 63.1, 63.5).

We know that rural hospitals have a higher response rate than urban hospitals. Responders are likely to be different from non-responders. So we don't know whether vaccination rate by degree of urbanicity would be in that precise order were we to have complete data.

The same problem applies to every conclusion of the study, e.g. children's hospitals have higher vaccination rates. In some cases, the connections are not obvious. (Children's hospitals are not likely to be equally distributed in urban and rural areas.)