Group Testing For The Coronavirus

Gödel’s Lost Letter and P=NP 2020-03-13

Plus other mathematical ideas that may be helping

History of Econ. Thought src

Robert Dorfman was a professor of political economy at Harvard University, who helped create the notion of group testing.

Today Ken and I discuss this notion and its possible application to the current epidemic crisis. We also discuss other mathematical aspects that have clear and immediate value.

The novel coronavirus (COVID-19) is the nasty issue we all face today. I live in New York City and can see the reduced foot traffic every day. The issue is discussed every minute on the news. I hope you are all safe, and well. But we are all worried about this.

The point of group testing is that it can reduce the number of tests needed to find out who has the virus. I assume that we are not using Dorfman’s idea because it does not apply to today’s testing. But it seems like it could fit. One issue is the need for more individual testing kits. As we write this, the US is still well short of the needed supply of kits. Are there situations where group testing can still help economize?

Group Testing

Dorfman created the notion of group testing in 1943. You could say he was driven by the need to test light bulbs:

As Wikipedia group-testing article explains:

[Suppose] one is searching for a broken bulb among six light bulbs. Here, the first three are connected to a power supply, and they light up (A). This indicates that the broken bulb must be one of the last three (B). If instead the bulbs did not light up, one could be sure that the broken bulb was among the first three. Continuing this procedure can locate the broken bulb in no more than three tests, compared to a maximum of six tests if the bulbs are checked individually.

Okay, Dorfman’s motivation was not light bulbs. It was testing soldiers during WWII for a certain disease, that will go unnamed. This type of testing reduced the number of blood samples needed. What was analogous to connecting groups of light bulbs into one circuit was combining portions of individual blood samples into one sample. If it tested negative then that entire group could be dismissed without further testing.

Fewer Kits

The point of using group testing is made stark when you think about recent issues with cruise ships. On one ship there were around {3500} passengers who were eventually found to be almost all okay. I believe only {2} were infected. This could have been checked by group testing with many fewer than {3500} tests.

According to the current outbreak maps, known cases in the US are still fairly sparse. Those in New York are mainly in the city, Westchester, and along the Hudson River to Albany. Let’s say we wish assurance that all non-virus related admits to a hospital are free of the virus. Can group testing apply?

The original version of group testing would apply if it were deemed mandatory that all new admits give a blood sample. Some kinds of coronavirus test kits use blood samples. Taking blood samples might however be as costly and intrusive as having the individual tests to begin with. There are other kinds of tests involving taking swabs, but it is not apparent whether those samples can be combined at scale as readily as blood samples can.

At least the idea of group testing has interesting connections to other parts of theory. Here is a 2000 survey by Ken’s former Buffalo colleague Hung Ngo with his PhD advisor, Ding-Zhu Du, where the application is to large scale screening of DNA samples. Hung and Ken’s colleague Atri Rudra taught a course in Buffalo on group testing in relation to compressed sensing. More recent is a 2015 paper that tries to solve problems of counting the number of positive cases, not just whether they exist.

Other Mathematical Effects

As we write, New York Governor Andrew Cuomo has just declared a cap of 500 attendees for any assembly. This is forcing the suspension of Broadway shows among many other activities. Yesterday the entire SUNY system joined numerous other institutions in moving to distance-only learning after this week.

The cap is based on the likelihood of {N} persons including someone who is already infected. That likelihood in turn is based on the density of known cases and what is known about the proportion of unknown to known cases. The virus has a long (two weeks) incubation time during which it is contagious but not symptomatic.

Here is a graph from a PSA item posted just today by Alex Tabarrok for the Marginal Revolution blog (note that the calculations are by Joshua Weitz of Georgia Tech):

The {y}-axis is the number of cases (in proportion to the US on the whole) and the {x}-axis is the group size. Which axis is more important—has more effect on the danger? Tabarrok notes:

Now here is the most important point. It’s the size of the group, not the number of carriers that most drives the result.

This involves comparing two partial derivatives. The item gives a brief worked-out example without using calculus.

I, Ken, have been connected this week to another example. I was statistically monitoring the World Senior Team Chess Championship, which began last Friday in Prague. Almost 500 players in teams of 4 or 5 players took part. Initially the players were roughly evenly divided between two halls. Effective Tuesday, a cap of 100 was declared for the Czech Republic, so more playing rooms were found and spectators were banned. Today, however, after an update on the density of cases in the Czech Republic, the cap was lowered to 30 effective tomorrow. Thus the tournament was forced to finish today, two rounds earlier than planned. Even though chess events have a lower size footprint than all of the spectator sports whose seasons have been suspended in recent days, the growth of the outbreak is making cancellation the only reasonable policy for all but the smallest events.

The main purpose of these and other social isolation measures is to flatten out the growth of cases. The target is not just to contain the outbreak but also to stay below the number of serious cases that our treatment systems can bear at once. Here is one of numerous versions of a graphic that is being widely circulated:

The graphic is not necessarily talking about reducing the number of cases total. The area under both curves is the same. A sentence in accompanying article—

If individuals and communities take steps to slow the virus’s spread, that means the number of cases of COVID-19 will stretch out across a longer period of time.

—seems to imply that “the number of cases” is the same in both scenarios, but stretched out over time in the latter. The point is how the stretching keeps the {y}-value bounded.

We certainly hope that isolation can reduce that number—i.e., that containment data out of Southeast Asia in particular holds true, as opposed to fears being voiced in the West that the virus will spread to a large percentage of the population over time. See charts here, especially chart 7.

What governs the spreading process? This is being understood via simple mathematical models of contagion, such as come from percolation theory and its associated {R_0} factor. Almost a month ago, the Washington Post made an interactive showing how epidemics spread according to the parameters in these models. How and whether the COVID-19 pandemic follows these models remains to be seen. Of course we hope it stays on the better side of the equations.

Open Problems

Is group testing a practical mechanism for mapping and constraining the epidemic? How can we promote the understanding of mechanisms and equations and models, not only for those shaping policy but for us who must abide by it and know why.

[added note about UB and others going to online learning, added small-event qualifier about chess]