Reasoning about information: an example

Tech @ FTC 2013-03-15

One of the reasons it’s hard to think carefully about privacy is that privacy is fundamentally about information, and our (uneducated) intuition about information is often unreliable.

As a teacher, I have tried different approaches to helping students get over this barrier. It’s not too hard to teach the theory, so that students learn how to manipulate logical formulas to answer contrived story problems about information and inference. What is more difficult is augmenting the formal theory with a more accurate intuition that is useful outside the classroom.

One trick I find useful for building privacy intuition is to abstract away from the formality of logic and the complexities of human relationships, and consider how information behaves in another setting: simple algebra.

[To math-phobic readers: hang on--this won't hurt a bit. I'll use the simplest possible examples, and I promise I won't ask you to solve any equations.]

Suppose we’re interested in knowing the value of X. We start out with no knowledge about X, so X could have any value, large or small, positive or negative. Now we learn a fact:

X – Y = 2

We have learned a fact about X, but that fact doesn’t help us narrow down what the value of X might be–it’s still the case that X could take on absolutely any value.

Some time later, we learn another fact:

X + 2Y – Z = 5

That’s another fact about X, but we still can’t narrow down what the value of X might be–it’s still the case that X could take on absolutely any value. Your algebra teacher would say that we can’t find a solution because we have fewer equations (two) than unknowns (three: X, Y, and Z).

So at this point, we know nothing about X, right? Or is it better to say that we know two things about X, even though our uncertainty about the value of X has not been reduced at all? Information is odd that way.

The next day, we learn yet another fact:

Z – Y = 1

This new fact is obviously not about X. It doesn’t mention X at all–it’s just a fact about the relationship between Y and Z. How could that possibly tell us anything about the value of X?

But as it turns out, this last fact is the key to unlocking the value of X. Given the three facts we now know, we can dust off our algebra skills and solve the three equations in three unknowns, to learn that X=4, Y=2, and Z=3.

The key to unlocking the value of X, as it turned out, was a fact (Z-Y=1) that wasn’t even about X. Or maybe it was a fact about X, despite not mentioning X at all. Information is odd that way.

This example also helps to illustrate how easy it is to make mistakes when reasoning about information. For example, suppose we create a concept of X-identifying information (XII), and we say that a fact is XII if and only if that fact allows someone who learns it to determine the value of X. So the fact “X = 6″ is XII, but the fact “U + V = 7″ is not XII.

Now we might try to use XII to reason about our example. We could look at each of the three facts in isolation, and argue that they are all non-XII, because each of them in isolation does not reveal the value of X. We might then try to argue that in revealing the three facts, we never revealed any XII, and therefore there is no reason to worry that the value of X might have been revealed.

Of course, such an argument would be incorrect, because the three facts did in fact reveal the value of X, when taken together. To put it another way, if somebody tells us that “no XII was revealed” that statement by itself does not imply anything about whether X was revealed.

Information is odd that way.

[Extra-credit homework assignment: Devise an "XII removal" method that can take any fact that is XII, and transform it into an equivalent set of facts that (considered individually) are non-XII.]