The Virtue of Closed Problems

Gödel’s Lost Letter and P=NP 2020-03-09

A new condition in property testing

Composite crop of src1, src2

Maryam Aliakbarpour and Sandeep Silwal are PhD students at MIT. They have a joint paper titled, “Testing Properties of Multiple Distributions with Few Samples.” Aliakbarpour is advised by Ronitt Rubinfeld. Silwal has a different advisor, Piotr Indyk. Could we test which advisor is better by sampling? Haha, not a chance…

Today we will discuss this paper as an example of the power of computer science theory.

Aliakbarpour and Silwal (AS) prove results on property testing, which is of course a long studied area in statistics, in mathematics, and in complexity theory. Many problems of property testing are open; many are very hard problems, and some are really impossible—such as distinguishing slight differences between distributions. With all due respect, the brilliance of their paper is not in solving open problems. Rather it is how they modify property testing in a novel manner.

Their proofs are clever, are technically strong, but we feel more importantly is how they change the problems. That is, they show how to add a new constraint to property testing, where:

The constraint is reasonable and appears to be natural;
The constraint is powerful and allows efficient algorithms where previously none were possible.

Let’s look at this more closely.

Opening a Closed Problem

We all know about open problems and closed problems. There are however two kinds of closed problems: ones that have been settled and ones that have been proved impossible to settle. We also know that one can often take an open problem and define a meaningful subcase of it that can be solved. What may be less obvious is how one can do the same with the latter kind of closed problem.

The work of AS is a perfect example of this phenomenon. They take a problem that is unsolvable and change it to one that is solvable. The details of their results are less important than their change to the underlying problem. One of the key insights throughout theory is that we often have made major progress by changing the ground rules. The AS work is a particularly clear case of this.

The New Condition

AS call it the structural condition. They postulate a consistency type of condition on property testing. Suppose that one is interesting in testing whether or not a collection of slot-machines are fair. That is, are they set to cheat players or not? This is known to be hard to do with few tests. But after adding their new constraint it becomes quite efficient.

Among some examples to convey their definition, they postulate a row of slot machines in a casino that are supposed to give the same distribution of prizes ${j}$ with known fair probabilities ${q(j)}$ . As usual with property testing we want to distinguish between two hypotheses that are some distance apart: one that the machines conform to the fair distribution ${q}$ and an alternative hypothesis that each is far from fair. There are two relevant factors:

We do not get to make many tests on one machine of our choosing. The best we can do is watch people try the machines. They may try different machines and so give only a few sample results per machine.
There are cases of this situation 1 where although each machine is far from fair the joint distribution of a few samples cannot be told apart from fair. The authors mention a trivial case where ${q}$ is uniform and we get just one result from each machine, results all different from each other. These cannot be told apart. But there are less-trivial cases.

Such situations are common with real-word data. The authors mention cases of medical data when one gets just one or a few blood readings from patients with disparate situational factors that can bias their samples. The joint effect of the biases may completely mask the systematic population tendency one is trying to test for. This is the setting in which the problem of distinguishing has been regarded as impossible, case closed. In the slot-machine example, Aliakbarpour and Silwal state their structural condition informally as follows:

Our goal is to test whether all the machines were fair … or they are far from being fair. In this case, we can naturally assume that if the machines are unfair, the house will assign a lower probability to the expensive prizes, and higher probability to cheap ones.

Of course, the house could cheat players without cheating in this uniform manner. More subtle rigging of the machines in different ways could mask the lowered expectation. But what AS are saying is: Provided the house cheats in this consistent manner, then they can be discovered quickly.

Formal Definition and Proof Idea

The formal definition is that each of the true source distributions is biased the same way in the same components as the expected (“fair”) distribution:

Definition 1 A sequence ${p^1,\dots, p^m}$ of distributions on a discrete domain ${[n]}$ obeys the structural condition relative to the distribution ${q}$ if there is a subset ${A \subset [n]}$ such that for all ${i \in [m]}$ and ${j \in [n]}$ :
$\displaystyle \begin{cases} p^i(j) \geq q(j) & \text{if } j \in A\\ p^i(j) \leq q(j) & \text{otherwise}. \end{cases}$

The term “structural condition” is vanilla and perhaps saying the ${p^i}$ are “conformally biased” relative to ${q}$ would be more descriptive. A major point which the authors emphasize right afterward is that the set ${A}$ is unknown. There are exponentially many possibilities for what ${A}$ could be—so we cannot try to guess it. Rather, proofs using this condition need to exploit how all the ${p^{i}}$ distributions conform to the same ${A}$ . The slot-machine example arguably does not reflect this point in full—because we can know in advance what the major and minor prizes of a slot machine are (including “no prize” among the latter). The medical-data examples and others certainly do, however.

We can convey briefly and intuitively how this condition is used for the case where ${q}$ is uniform distribution on ${[n]}$ —which is quite general as testing for many distributions can be efficiently mapped into this case. A key property of uniform distribution is that among all distributions on ${[n]}$ , it minimizes the probability of getting collisions from repeated samples. This idea has yielded the most effective test statistics for problems when there is only one distribution ${p}$ . When there are multiple ${p^i}$ the effectiveness can be blunted in instances like the one in the previous section—the instances that are impossible.

However, AS show that their structural condition is just what’s needed to bridge the gap that keeps the impossible cases from being distinguished. Realizing that the compounded complications of having multiple ${p^i}$ can be resolved by this stroke is their act of brilliance. Their paper applies it successfully to two other problem settings—identity testing and closeness testing—and also shows meaningful instances that obey and are resolved by their definition but that escape an older condition involving product distributions. As usual, we say to see the paper for the details.

Consequences in Crypto and Inference?

It remains our place to ask, in which other cases might the AS condition arise and what uses might it have? I can think first of one implication for adversarial strategies in cryptography. It goes like this:

We can view the AS theorems as saying that anyone who wishes to cheat others should not be consistent. Being consistent according to some utility criterion—even an unknown one—makes it easier for others to detect that they are cheating.

Following their slot-machine example suggests this: Arrange the machines in total to cheat players. But make it so that they are not consistent. They can set some prizes up in probability and some lower, and importantly, do this differently on different machines. They could even make some machines pay out more than the legal expectation—while still giving a small profit to the casino so it does not become too obvious to customers that those machines are “lucky.” There is safety in non-conformity alone.

$\displaystyle \S$

Ken notices a similarity to the chess-modeling situation he began describing in this recent post. Instead of prizes of known value, in chess we have distributions over possible moves whose values are not so readily apparent to the player—that’s what makes chess challenging to play.

In his case, giving any chess position ${t}$ and a player ${P}$ having that position, he postulates a true distribution ${p^t}$ of moves ${P}$ would make and wants to compare that with the distribution ${q^t}$ projected by his model. Now there can be many positions ${t}$ with different characteristics, but Ken can treat the distributions ${q^t}$ coming from his model as one fixed point of reference ${q}$ , and he can give different positions the same ranks of legal moves ${1}$ (best), then ${2,\dots, n}$ . If positions have different numbers of legal moves he can pad them out by treating illegal moves the same as completely losing blunders.

So he has a situation like this: different ${p^t}$ and one ${q}$ over a bunch of positions. Ken knows his model not only varies in accuracy but is biased in ways explained in that post. It appears that this bias is highly conformal. The model is trained to make the projection ${q(1)}$ of the probability of the first move accurate on average. Then the tendency is that the projections of the second-best and third-best moves are off one way, and the projections of all the other moves are off the other way. This is like having ${A = \{2,3\}}$ or its complement most of the time. The new statistical tests that Ken is crafting also live under the specter of impossibility. A co-worker has demonstrated that they fail in cases randomly generated from normal distributions. But so far they are appearing to succeed in Ken’s more-structured situations from his real-world chess data. On this we will stay tuned.

Open Problems

Does the AS paper say something about cryptography? How widely does their condition apply?