Diffensive Privacy

Info/Law 2016-05-20

A Response to the Criticisms of Fool’s Gold: An Illustrated Critique of Differential Privacy

By Jane Bambauer and Krish Muralidhar

Two years ago, we coauthored an article that challenged the popular enthusiasm for Differential Privacy. Differential Privacy is a technique that permits researchers to query personal data without risking the privacy of the data subjects. It gained popularity in the computer science and public policy spheres by offering an alternative to the statistical disclosure control and anonymization techniques that had served as the primary mechanism for managing the tension between research utility and privacy.

The reputation of anonymization and “statistical disclosure control” methods is in a bedraggled state at the moment. Even though there is little evidence that reidentification attacks actually occur at any frequency in real life, demonstration attacks have captured the imagination of the press and of regulators. The founders of Differential Privacy helped erode confidence in SDC and anonymization so that Differential Privacy could shine by comparison. Differential Privacy was fundamentally different from what had come before because its standard guaranteed a certain level of privacy no matter how much special knowledge a data intruder had.

The problem is, Differential Privacy provides no assurance about the quality of the research results. As we showed in our paper, it destroys most of the research value of data. In order to salvage data utility, researchers in Differential Privacy have had to introduce relaxations to the privacy promises. But these relaxations have made Differential Privacy less “cryptographic” and more context-dependent, just like the methods of anonymization that the founders of Differential Privacy had rejected. In other words, Differential Privacy in its pure form is special but not useful, and in its modified form is useful but not special.

The Article concludes with a dilemma. On one hand, we praise some recent efforts to take what is good about differential privacy and modify what is unworkable until a more nuanced and messy—but ulitimately more useful—system of privacy practices are produced. On the other hand, after we deviate in important respects from the edicts of differential privacy, we end up with the same disclosure risk principles that the founders of differential privacy had insisted needed to be scrapped. In the end, differential privacy is a revolution that brought us more or less where we started.

Our article clearly hit a nerve. Cynthia Dwork refused to talk to me at a conference, and a few other computer scientists have written hostile critiques that aim primarily to impugn our intelligence and honesty rather than engaging with our arguments on the merits. Anand Sarwate calls our article “an exercise in careful misreading” and Frank McSherry writes

The authors could take a fucking stats class and stop intentionally misleading their readers.

The not-so-subtle subtext is “don’t listen to these idiots. They are bad people.”

Given this reaction, you would think that the critics have uncovered flaws in our applications and illustrations of Differential Privacy. They have not. Sarwate even admits that we “manage to explain the statistics fairly reasonably in the middle of the paper” and primarily takes issue with our tone and style.

I have little doubt that the condescension and character attacks are a symptom of something good: there has been a necessary adjustment in the public policy debates. Indeed, although our piece has received the occasional angry tweet or blog review, the private reaction has been positive. Emails and personal conversations have quietly confirmed that data managers understand the significant limitations of pure Differential Privacy and have had to stick with other forms of statistical disclosure controls that have fallen out of vogue.

We respond here to the criticisms, which come in four general types: (1) Differential Privacy should destroy utility—it’s working as planned; (2) We exaggerate the damage that DP does to utility; (3) We overlook the evolution in Differential Privacy that has relaxed the privacy standard to permit more data utility; and (4) There are methods other than adding Laplace noise that satisfy pure Differential Privacy. In brief, our responses are: (1) This is a disagreement about policy rather than a technical discrepancy; (2) Not correct, and when we take the suggestions offered by our critics, the noise gets worse; (3) Not correct; we spent an entire Part of our paper on deviations from pure Differential Privacy; and (4) Don’t hold your breath.

(1) Differential Privacy should destroy utility—that’s what ensures privacy.

Our Fool’s Gold paper begins with a hypothetical in which an internist at a hospital queries a public health database to see if her city is suffering from an outbreak of a rare disease. Because the system uses Differential Privacy, the results she receives are absurd and useless.

Frank McSherry begins his critique of our paper by acknowledging that our introductory illustration is descriptively correct, and that we are wrong to object to the results.

While it is intended to be unsettling, the example shows differential privacy working exactly as promised: protecting unauthorized users from gaining overly specific information about small populations.

This is a curious way to start a critique that promises to show how badly we understand the technique of differential privacy. McSherry’s remarks show that the disagreement between us is not technical at all, but philosophical. McSherry characterizes users of research databases as “unauthorized users” who should not be able to use data for anything other than the most general, census-style descriptive statistics. McSherry, in other words, is highly skeptical of citizen science and of open access to research data. We are not. McSherry therefore seems to be in agreement with us that if we want open access to research data that yields useful insights, Differential Privacy is not an option.

Moreover, as we point out in our paper, we borrowed this hypothetical from a Microsoft publication called Differential Privacy for Everyone that imagined a hypothetical public health researcher investigating a rare disease across eight different towns. The Microsoft report claims that if the true number of patients with a rare disease is 1,

the DP guard will introduce a random but small level of inaccuracy, or distortion, into the results it serves to the researcher. Thus, instead of reporting one case for Smallville, the guard may report any number close to one. It could be zero, or ½ (yes, this would be a valid noisy response when using DP), or even -1. The researcher will see this and, knowing the privacy guard is doing its job, she will interpret the result as “Smallville has a very small number of cases, possibly even zero.

Our paper showed that the Microsoft paper is making false promises. Differential Privacy will not report “any number close to one.” It will report numbers like -2,820.6 or 2, 913.9. Thus, it is McSherry’s former colleagues at Microsoft who are guilty of misleading the public. Not us.

(2) We exaggerate the damage that Differential Privacy does to utility

McSherry also claims that in two of our illustrations (mean and correlation), we exaggerated the noise that would have to be added to query results. This is not so.

For the mean, McSherry objects to the mere idea that any one would be interested in the mean of a skewed variable. He insists that users who are less idiotic than us would anticipate that the data will be skewed and query for the median rather than the mean.

The premise of McSherry’s objection is quite problematic. Many existing open access tools encourage their users to analyze averages, and users often won’t know which variables are skewed and which aren’t. Indeed, even income might not be skewed if the researcher is studying a particular subset.

In any case, if we had taken McSherry’s suggestion and analyzed the median instead of the mean, the noise would have been worse. It would have been 59 times worse, to be specific.

Recall that the noise added to meet Differential Privacy is based on “global sensitivity.” For the mean, the global sensitivity is the difference in the minimum and maximum income of everyone in the universe of databases divided by the number of individuals in the subset, which was (1 billion/59) in our example. For the median, the global sensitivity is simply the difference between the minimum and maximum income of everyone in the universe of possible databases. Thus, global sensitivity would have been 1 billion in our example instead of (1 billion/59).

You do not have to take our word for it: Here is the quote from a paper co-authored by Professor Kobi Nissim, the researcher who won the test of time award for his work on differential privacy:

(Nissim’s comment about using “local sensitivity” instead of global sensitivity is a deviation that would not conform to pure Differential Privacy. We address this in the next section of our response.)

So, consider ourselves corrected. Differential Privacy will actually perform worse than we say it will when non-idiots submit a proper query.

McSherry also objects to our correlation example, but his objection is brief and perplexing. He suggests we should have added noise to intermediate steps of the correlation calculation rather than adding one lump of noise at the end, but adding noise to each intermediate step of a correlation calculation would introduce four or six different types of noise (depending on how it is done) and would lead to results at least as hopeless as the ones we reported.

McSherry does not provide step-by-step instructions nor does he state the implications of his proposed alternative. In an exchange one of us had with him on Twitter, he presented a graph of the noise that should be added to a correlation query result. But, he used a bait-and-switch move:

That last statement by McSherry is critical. It gives us a strong clue as to why McSherry’s procedure cannot satisfy differential privacy. Consider the statement “If we use your data set, the error is much larger, which makes sense as only one participant contributes to the correlation.” By this argument, it also means that if the error is much smaller, as in McSherry’s illustration, then an intruder will know something valuable about the data set. He will know that it is not one in which one participant could have contributed to the correlation. He might know, for example, that Bill Gates is not in it. McSherry’s technique leaks information about the data set. Differential Privacy explicitly prohibits this type of leakage of information. It is practically impossible to quantify the privacy harm from this sort of leakage, so it breaks Differential Privacy’s cryptographic, context-independent promises. This is precisely why Dwork (2006) defines global sensitivity

as a property of the function alone, and independent of the database.

McSherry’s procedure violates this requirement. The only way to prevent such leakage of information is to implement a procedure similar to the one that we described in our paper.

So McSherry must be using a relaxed form of Differential Privacy to produce these graphs. (We suspect he’s using Nissim’s relaxed form described in the last section.) Relaxation of Differential Privacy standards isn’t necessarily a bad thing. Surely it’s a good thing from a utility perspective as McSherry’s own example illustrates. But as we made clear, our paper illustrates the problems with pure Differential Privacy. Given McSherry’s didactic approval of pure Differential Privacy and its destruction of utility by its own design, we are surprised he has faulted us for failing to cheat on the Differential Privacy requirements. We think this shows something very important: even the most dedicated Differential Privacy advocates cannot really tolerate its consequences.

(3) We overlook the evolution in Differential Privacy that has permitted more data utility.

Anand Sarwate accuses us of setting up a straw man by showing the limits of only the pure form of Differential Privacy rather than the variants that have been developed more recently.

In their effort to discredit differential privacy, the authors ignore both the way in which scientific and academic research works as well as contemporary work that seeks to address the very problems they raise: context-awareness via propose-test-release, methods for setting in practical scenarios, and dealing with multiple disclosures via stronger composition rules.

“Ignore” is an odd word choice given that we devote our conclusion to a discussion about relaxations of the Differential Privacy standard.

We don’t think Sarwate understands the importance of these relaxations in the context of the policy debates. These relaxations compromise the very thing that gave DP superiority over other statistical disclosure control methods. The new forms of Differential Privacy are no longer context-independent, no longer purely mathematical and cryptographic. This is a big deal because Differential Privacy was sold as an alternative to the context-dependent, pragmatic decision-making required by statistical disclosure control methods.

Here is what Cynthia Dwork herself said about the matter:

On the other hand, in (ε, δ)-differential privacy there is no bound on the expected privacy loss, since with probability δ all bets are off and the loss can be infinite.

The bait and switch from pure differential privacy to a more relaxed definition that McSherry used in the correlation example is very common (see for example Adam Smith’s comment here). It is increasingly rare to see the original definition of differential privacy anymore. Even Dwork & Smith, in their book The Algorithmic Foundations of Differential Privacy start by defining differential privacy in its relaxed version (see Definition 2.4, page 21), in spite of the fact that this quantification of privacy is uninterpretable and the privacy “loss can be infinite.”

The architects of Differential Privacy did a good deal of damage to the reputation of statistical disclosure control methods they sought to replace in the process of promoting the cryptographic promises of pure Differential Privacy. The distrust of data anonymization that the Differential Privacy community helped engender will hurt the modified forms of Differential Privacy, now that it’s clear that we are all muddling along, trying to strike a practical balance between privacy and utility.

Our paper expressed no moral criticism of the relaxations of Differential Privacy as part of the toolset for Statistical Disclosure Control. But it is hypocritical and disingenuous for members of the Differential Privacy community to sell DP based on its strong promises only to relax the standards later. If the public policy community does accept relaxed forms of Differential Privacy to enable open access to research data, it should accept other forms of disclosure control, too.

(4) There are methods other than adding LaPlace noise that satisfy pure Differential Privacy.

Finally, we have occasionally received feedback that the Differential Privacy method we illustrate in our paper—adding Laplace noise—is not the only method that satisfies the pure Differential Privacy standard.

The Differential Privacy standard was chosen with the Laplace distribution in mind, but that does not necessarily rule out the possibility that other data protection methods could satisfy the standard. Yet these other methods would have to add at least as much uncertainty as Laplace noise does, since the privacy guarantees and the Laplace curve are identical.

Sarwate cites to randomized response methods as an example of Differential Privacy that doesn’t use Laplace noise and is already used in practice. (See also the first comment here.)Randomized response protects respondents by preventing even the data collector from knowing the true response. For example, if I want to know something sensitive about a population, I can ask them “have you had a one-night stand? Flip this coin and don’t show anybody. If it turns up heads, tell the truth. If it comes up tails, then flip the coin again and respond ‘Yes’ for heads and ‘No’ for tails.”

It is rather ironic that randomized response has been held up as evidence that cutting edge Differential Privacy principles can work in practice because randomized response techniques have been used by statisticians and epidemiologists for decades. (My father taught me this trick when I was in grade school.)

Randomized response takes an inherently different approach compared to Differential Privacy. For instance, when discussing randomized response, Dwork (2011) says:

Randomized response was devised for the setting in which the individuals do not trust the curator, so we can think of the randomized responses as simply being published.

This approach (masking the input and publishing the responses) is very different from the Differential Privacy approach which involves computing the correct response, adding noise to the correct response, and releasing the result (masking the output). Dwork again:

Randomized response will not satisfy this definition. When comparing database z with n people in it to database z’ with n-1 people, both with yes/no randomized responses, P[Count of Yes = n | z’] = 0 while P[Count of Yes = n | z] is non-zero. The definitional equation for differential privacy cannot be met with randomized response.

It is also hypocritical for Differential Privacy proponents to suddenly embrace the notion of randomized response since the entire motivation of Differential Privacy was to do better than randomized response. Again we quote from Dwork (2011):

By co-opting the success of randomized response as their own, Differential Privacy proponents are acknowledging that they have yet to deliver something new and practical.

In summary, our original paper has stood up quite well to criticism. “Differential Privacy” has promised to radically change how we protect privacy in public-use research data, but after ten years, we are still waiting for a practical solution for queries as basic as averages.