Discrepancy Games and Sensitivity

Gödel’s Lost Letter and P=NP 2019-08-20

Can we connect the talks that closed this month’s Random Structures and Algorithms conference?

Cropped from NYU homepage

Joel Spencer gave the closing talk of last week’s Random Structures and Algorithms conference at ETH Zurich.

Today we discuss his talk and the one that preceded it, which was by Hao Huang on his proof this month of the Boolean sensitivity conjecture.

Spencer’s talk was titled “Four Discrepancies” and based on a joint paper with Nikhil Bansal. The main new result in the talk was a case where a bound of {O(\sqrt{n\log n})} arising from reasoning about normal distribution can, surprisingly, be improved to a sharp {O(\sqrt{n})}.

We will talk about this first, but then progress to the talk that preceded Spencer’s. It was by Hao Huang, who was extended an invitation right after his announced proof of the Boolean Sensitivity Conjecture earlier this month. Our ulterior purpose is to ask whether any concrete connections can be found besides both talks addressing problems on the {\{-1,+1\}^n} hypercube.

Discrepancy Games

First suppose you get a stream of single bits {v_t}, each {-1} or {+1}. For each bit, you can say “Keep it” or “Flip it.” In the latter case, your bit {w_t} is {-v_t}. Your goal is to keep the sum {\sum_{k=1}^t w_k} within a bounded range. OK, this is easy: you can make the sum always be {+1} when {t} is odd and {0} when {t} is even by flipping when needed.

Now suppose the bits come in pairs {(v_{t,1},v_{t,2})} and your goal is to keep the sum in each coordinate bounded. Again there is a simple strategy: There are four kinds of vectors. For each kind, every time you see it for the second time, flip it. That keeps the contribution of each kind within {-1 \ldots +1} in each coordinate. Thus simple reasoning says each coordinate stays within {-4 \ldots +4}.

The trouble with extending this idea to vectors of length {n} is that the number of kinds is {2^n} and our simple-minded bounds, while constant in terms of the number {T} of vectors given, are exponential in {n}. Well, we can certainly do better. Going back to the {n = 2} case, we can maintain a policy of never letting both coordinates become {+2} or both become {-2}. This keeps both sums within {-2 \ldots +2} regardless of the sequence of the vectors. But for larger vector lengths {n}, how large can the discrepancy among the {n} coordinate sums—relative to zero or to each other—become?

A final help is if we know the number {T} of vectors in any sequence we might be given is bounded. In fact, Spencer’s paper with Bhansal first considers the case {T = n}. There are two main questions about randomness:

  1. Does it matter whether the sequence of vectors given is random or worst-case?

  2. Can you do better than choosing “keep” or “flip” randomly?

Long back, Spencer proved that if you can see the whole sequence of {n} vectors in advance, then you can always keep the sums within {-5.32 \sqrt{n} \ldots +5.32\sqrt{n}}, whatever the sequence. If not—if you must decide “keep” or “flip” for each vector before seeing the next—then Spencer had also proved:

Theorem 1 If an adversary can choose the next vector based on your current sums, then the best bound {B} such that you can keep all your sums within absolute value {B} grows as {\Theta(\sqrt{n\log n})}. Moreover, up to the constant in the “{\Theta,}” with high probability you cannot do any better than choosing the sign for each vector randomly.

See also his famous book, The Probabilistic Method, with Noga Alon. The new theorem is:

Theorem 2 If the adversary presents vectors uniformly at random, then you can achieve {B = O(\sqrt{n})} with high probability—and not by choosing the signs randomly.

The algorithm defines a kind of potential function and has you play “keep” or “flip” according to which has the lesser growth in potential. The analysis is quite complicated. As usual we refer to the paper for details. Instead we ask: are there insights to mine here for further advances on the Boolean sensitivity problem?

Boolean Sensitivity

We didn’t talk about Boolean sensitivity in our previous post on it. Our purpose now is to convey how many concepts are related, hence how they might connect to discrepancy on the hypercube.

To get the idea of sensitivity, first consider the parity function {f_{\oplus}}. If you change one bit of any argument {x} you change the value of {f_{\oplus}(x)}. Define {x^i} to mean {x} with bit {i} flipped. Then for any {x} and {i}, {f_{\oplus}(x^i) \neq f_{\oplus}(x)}. This means parity is extremely sensitive.

The OR function {f_{\vee}} is intuitively less sensitive, but it too has an argument {x} such that {f_{\vee}(x^i) \neq f_{\vee}(x)} for all {i}, namely {x = 0^n}. For any Boolean function {f: \{0,1\}^n \rightarrow \{0,1\}} define its sensitivity (at {n}) by

\displaystyle  s(f) = s_n(f) = \max_{x \in \{0,1\}^n} ||\{i: f(x^i) \neq f(x)\}||.

The OR-of-AND function {f_2} is less sensitive. Say it is an OR of {m} blocks, each of {k} variables, and the blocks use disjoint variables so {n = km}. If {f_2(x) = 1}, then some block is all {1}, so flipping any other bit makes no difference, and the most sensitive we can get is {k}. If {f_2(x) = 0} then the most sensitive case is when all blocks have exactly one 0. So {s(f_2) = \max\{k,m\}}. When {k = m = \sqrt{n}}, {s_n(f) = \sqrt{n}}.

If we make each block of {f_2} return true if exactly one bit is {1}, then we again get sensitivity {n} on the all-{0} assignment. Now, however, consider the related function {f'_2} (with {k} even, {k = 2\ell}) that is still an OR over blocks, but each block is true when some consecutive pair {x_{2\ell - 1},x_{2\ell}} are both {1} with all other pairs being both {0}. Then we can’t do the same trick with the all-{0} assignment changing just one bit. So when {n = 4\ell^2} and {m = k = 2\ell} we again get sensitivity (no more than) {\sqrt{n}}.

We can, however, consider {f'_2} to be more sensitive if we can flip more than one bit at a time. Partition {[n]} into blocks {B} of two consecutive bit-places each, and given any {x}, define {x^B} to be the result of flipping the bits in {B}. We get {n/2} blocks, and the all-{0} assignment becomes a true case of {f'_2} if any block is flipped. Generally define the block sensitivity by considering any partitions {\mathcal{B} = \{B_j\}} of {[n]} into disjoint subsets and writing

\displaystyle  bs(f) = \max_{x,\mathcal{B}} ||\{j: f(x^{B_j}) \neq f(x)\}||.

Note that not every member of the partition has to flip the function—we can discard the ones that don’t flip and count only the disjoint subsets that do flip the value. So back to our example, we have

\displaystyle  bs(f'_2) = \frac{n}{2} = \frac{1}{2}s(f'_2)^2.

Andris Ambainis and Xiaoming Sun improved the constant from {\frac{1}{2}} asymptotically to {\frac{2}{3}}, but their relation is still quadratic.

Some Boolean Complexity Connections

This example of quadratic discrepancy is still the best known lower bound on {bs(f)} in terms of {s(f)}. But no one had proved anything better than an exponential upper bound until Huang’s result, from which it follows that:

Theorem 3 For all Boolean functions {f}, {bs(f) \leq 2s(f)^4}.

This bound is concrete, not just asymptotic. It still leaves a gap between quadratic and quartic. It is, however, the combination of two quadratic upper bounds. One was shown by Nisan and Szegedy in their paper:

\displaystyle  bs(f) \leq 2\deg(f)^2,

where {\deg(f)} means the degree of the unique multi-linear real polynomial that agrees with {f} on the cube {\{-1,1\}^n} with {-1} for true. The other is the conjecture

\displaystyle  \deg(f) \leq s(f)^2

in a 1992 paper by Craig Gotsman and Nati Linial. Huang proved this by exploiting a connection to graphs that was also shown by Gotsman and Linial. Consider any red-or-white coloring of the {2^n} nodes of the hypercube, let {G} be the graph induced by the red nodes, and define {g(x) = 1} if node {x} is red, {g(x) = 0} otherwise. Now if the maximum degree {d(G)} of a node in {G} is small then every red node has many white neighbors, so the Boolean function {g} is very sensitive. However, going to each neighbor in the hypercube flips the parity. Hence the function

\displaystyle  g'(x) = g(x) \oplus f_{\oplus}(x)

is not very sensitive. Moreover, it has the same sensitivity as the function {h'(x) = h(x) \oplus f_{\oplus}(x)} where {h(x)} is true on the white nodes. This nice duality between {G} and the graph {H} induced by the white nodes enables us to fix “{G}” to mean whichever of the two has more nodes in the following theorem statement:

Theorem 4 Provided {m > 2^{n-1}}, every graph {G} induced by {m} nodes of the {n}-cube has {d(G) \geq \sqrt{n}} if and only if every Boolean function {f} has {\deg(f) \leq s(f)^2}.

The proof uses some Fourier analysis with {f = g'} as above. It, too, takes only one page of a really short paper.

The main open question now is whether the 4th-power upper bound in Huang’s theorem 3 can be improved to quadratic. It is possible that a deeper application of Fourier analysis may show that cases of quadratic separation from block-sensitivity to degree and degree to sensitivity cannot “amplify” any more than quadratic. This is where there might be some commonality with discrepancy.

There is a much wider suite of Boolean complexity measures besides {s(f)}, {bs(f)}, and {\deg(f)} discussed here. For example, consider how many bits of {x} you need to fix in order to preserve the value {f(x)}. That is, define {C(f)} to be the maximum over {x} of the minimum size of a set {I \subseteq [n]} such that whenever {x'} agrees with {x} on {I}, {f(x') = f(x)}. Clearly {I} needs to include at least one bit from each {B_j.} This proves {C(f) \geq bs(f)}. There are many other relations that might be improved. We end by considering ways of broadening Huang’s techniques.

Weak Adjacency Matrices

We—that is Ken and I—have been thinking about how to build on Huang’s proof. We take a somewhat more-general approach compared to the paper and Ken’s post.

Definition 5 Let {G} be a graph on {n} vertices. The {n} by {n} matrix {A} is a weak adjacency matrix of {G} provided

  1. Every entry {A_{x,y}} is bounded by {1} in absolute value;

  2. For all {x,y} if {x \rightarrow y} is not an edge in {G}, then {A_{x,y}=0}.

As usual the maximum degree of {G} is denoted by {\deg(G)}. The following is almost the same proof as the classic result on adjacency matrices.

Lemma 6 (H-Lemma) Suppose that {A} is a weak adjacency matrix of {G}. Then if {\lambda} is an eigenvalue of {A},

\displaystyle  | \lambda | \le \deg(G).

Proof: By the definition of eigenvalue, there is some non-zero vector {v} so that {Av = \lambda v}. Let {k} be an index so that {|v_{k}|} maximum. Then

\displaystyle  |\lambda v_{k}| =|(A v)_{k}|= \left|\sum_{y=1}^{n} A_{k,y}v_{y} \right| \leq \sum_{y=1}^{n} |A_{k,y}| \cdot |v_{k}|.

But then the last sum is upper bounded by

\displaystyle  D|v_{k}|,

where {D} is the number of {y} so that {A_{k,y}} is not zero. This uses property (1) of the definition of a weak adjacency matrix. Property (2) implies that {D} is at most the degree of vertex {k} is {G}. Thus

\displaystyle  |\lambda v_{k}| \leq D |v_{k}| \leq \deg(G) |v_{k}|.

Dividing by {|v_{k}|>0} yields the lemma. \Box

Let {G-k} be the graph that results after we delete the vertex {k} from {G}. Let {A-k} be the matrix that results after we delete the column and row {k} from {A}.

Lemma 7 Suppose that {A} is a weak adjacency matrix of {G}. Then for any vertex {k}, {A-k} is a weak adjacency matrix of {G-k}.

Proof: The bound on the entries of {A-k} is immediate. Whenever {G-k} has no edge from {x} to {y}, then {(A-k)_{x,y}} must be zero. \Box

Lemma 8 Suppose that {G} a {n}-graph has a real symmetric weak adjacency matrix {A} with {m} of the top eigenvalues greater than {B \ge 0}. Then every induced subgraph {H} of {G} with at least {n-m} vertices has degree at least {B}.

Definition 9 The hypercube {C_{n}} is the graph on {n} bit vectors so that two vertices are adjacent if they differ in one bit position.

Theorem 10 The hypercube {C_{n}} has a real symmetric weak adjacency matrix with all its eigenvalues {\pm \sqrt{n}}.

Proof: Let {A_{1}} be

\displaystyle  \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}.

and {A_{n}} be

\displaystyle  \begin{bmatrix} A_{n-1} & I \\ I & -A_{n-1} \end{bmatrix}.

Induction shows that {A_{n}^{2} = nI}. It follows that

\displaystyle  A_{n}x = \lambda x,

implies that

\displaystyle  nx = \lambda Ax.

Thus

\displaystyle  nx = \lambda^{2}x.

So the eigenvalues are {\pm \sqrt{n}} and by trace same number of each.

The upper-left and lower-right blocks of {A_{n}} correspond to the two {n-1} dimensional subcubes of {C_{n}}, and the two identity blocks correspond to the perfect matching connecting these two subcubes. So {A_{n}} is a weak adjacency matrix for {C_{n}}. \Box

Corollary 11 Every induced subgraph {H} of {C_{n}} with at least {2^{n-1}+1} vertices has degree at least {\sqrt{n}}.

Open Problems

Can the last two talks at the conference be further connected?