Twisted convolution and the sensitivity conjecture

What's new 2019-08-20

Earlier this month, Hao Huang (who, incidentally, was a graduate student here at UCLA) gave a remarkably short proof of a long-standing problem in theoretical computer science known as the sensitivity conjecture. See for instance this blog post of Gil Kalai for further discussion and links to many other online discussions of this result. One formulation of the theorem proved is as follows. Define the {n}-dimensional hypercube graph {Q_n} to be the graph with vertex set {({\bf Z}/2{\bf Z})^n}, and with every vertex {v \in ({\bf Z}/2{\bf Z})^n} joined to the {n} vertices {v + e_1,\dots,v+e_n}, where {e_1,\dots,e_n} is the standard basis of {({\bf Z}/2{\bf Z})^n}.

Theorem 1 (Lower bound on maximum degree of induced subgraphs of hypercube) Let {E} be a set of at least {2^{n-1}+1} vertices in {Q_n}. Then there is a vertex in {E} that is adjacent (in {Q_n}) to at least {\sqrt{n}} other vertices in {E}.

The bound {\sqrt{n}} (or more precisely, {\lceil \sqrt{n} \rceil}) is completely sharp, as shown by Chung, Furedi, Graham, and Seymour; we describe this example below the fold. When combined with earlier reductions of Gotsman-Linial and Nisan-Szegedy; we give these below the fold also.

Let {A = (a_{vw})_{v,w \in ({\bf Z}/2{\bf Z})^n}} be the adjacency matrix of {Q_n} (where we index the rows and columns directly by the vertices in {({\bf Z}/2{\bf Z})^n}, rather than selecting some enumeration {1,\dots,2^n}), thus {a_{vw}=1} when {w = v+e_i} for some {i=1,\dots,n}, and {a_{vw}=0} otherwise. The above theorem then asserts that if {E} is a set of at least {2^{n-1}+1} vertices, then the {E \times E} minor {(a_{vw})_{v,w \in E}} of {A} has a row (or column) that contains at least {\sqrt{n}} non-zero entries.

The key step to prove this theorem is the construction of rather curious variant {\tilde A} of the adjacency matrix {A}:

Proposition 2 There exists a {({\bf Z}/2{\bf Z})^n \times ({\bf Z}/2{\bf Z})^n} matrix {\tilde A = (\tilde a_{vw})_{v,w \in ({\bf Z}/2{\bf Z})^n}} which is entrywise dominated by {A} in the sense that

\displaystyle  |\tilde a_{vw}| \leq a_{vw} \hbox{ for all } v,w \in ({\bf Z}/2{\bf Z})^n \ \ \ \ \ (1)

and such that {\tilde A} has {\sqrt{n}} as an eigenvalue with multiplicity {2^{n-1}}.

Assuming this proposition, the proof of Theorem 1 can now be quickly concluded. If we view {\tilde A} as a linear operator on the {2^n}-dimensional space {\ell^2(({\bf Z}/2{\bf Z})^n)} of functions of {({\bf Z}/2{\bf Z})^n}, then by hypothesis this space has a {2^{n-1}}-dimensional subspace {V} on which {\tilde A} acts by multiplication by {\sqrt{n}}. If {E} is a set of at least {2^{n-1}+1} vertices in {Q_n}, then the space {\ell^2(E)} of functions on {E} has codimension at most {2^{n-1}-1} in {\ell^2(({\bf Z}/2{\bf Z})^n)}, and hence intersects {V} non-trivially. Thus the {E \times E} minor {\tilde A_E} of {\tilde A} also has {\sqrt{n}} as an eigenvalue (this can also be derived from the Cauchy interlacing inequalities), and in particular this minor has operator norm at least {\sqrt{n}}. By Schur’s test, this implies that one of the rows or columns of this matrix has absolute values summing to at least {\sqrt{n}}, giving the claim.

Remark 3 The argument actually gives a strengthening of Theorem 1: there exists a vertex {v_0} of {E} with the property that for every natural number {k}, there are at least {n^{k/2}} paths of length {k} in the restriction {Q_n|_E} of {Q_n} to {E} that start from {v_0}. Indeed, if we let {(u_v)_{v \in E}} be an eigenfunction of {\tilde A} on {\ell^2(E)}, and let {v_0} be a vertex in {E} that maximises the value of {|u_{v_0}|}, then for any {k} we have that the {v_0} component of {\tilde A_E^k (u_v)_{v \in E}} is equal to {n^{k/2} |u_{v_0}|}; on the other hand, by the triangle inequality, this component is at most {|u_{v_0}|} times the number of length {k} paths in {Q_n|_E} starting from {v_0}, giving the claim.

This argument can be viewed as an instance of a more general “interlacing method” to try to control the behaviour of a graph {G} on all large subsets {E} by first generating a matrix {\tilde A} on {G} with very good spectral properties, which are then partially inherited by the {E \times E} minor of {\tilde A} by interlacing inequalities. In previous literature using this method (see e.g., this survey of Haemers, or this paper of Wilson), either the original adjacency matrix {A}, or some non-negatively weighted version of that matrix, was used as the controlling matrix {\tilde A}; the novelty here is the use of signed controlling matrices. It will be interesting to see what further variants and applications of this method emerge in the near future. (Thanks to Anurag Bishoi in the comments for these references.)

The “magic” step in the above argument is constructing {\tilde A}. In Huang’s paper, {\tilde A} is constructed recursively in the dimension {n} in a rather simple but mysterious fashion. Very recently, Roman Karasev gave an interpretation of this matrix in terms of the exterior algebra on {{\bf R}^n}. In this post I would like to give an alternate interpretation in terms of the operation of twisted convolution, which originated in the theory of the Heisenberg group in quantum mechanics.

Firstly note that the original adjacency matrix {A}, when viewed as a linear operator on {\ell^2(({\bf Z}/2{\bf Z})^n)}, is a convolution operator

\displaystyle  A f = f * \mu

where

\displaystyle \mu(x) := \sum_{i=1}^n 1_{x=e_i}

is the counting measure on the standard basis {e_1,\dots,e_n}, and {*} denotes the ordinary convolution operation

\displaystyle  f * g(x) := \sum_{y \in ({\bf Z}/2{\bf Z})^n} f(y) g(x-y) = \sum_{y_1+y_2 = x} f(y_1) g(y_2).

As is well known, this operation is commutative and associative. Thus for instance the square {A^2} of the adjacency operator {A} is also a convolution operator

\displaystyle  A^2 f = f * (\mu * \mu)(x)

where the convolution kernel {\mu * \mu} is moderately complicated:

\displaystyle  \mu*\mu(x) = n \times 1_{x=0} + \sum_{1 \leq i < j \leq n} 2 \times 1_{x = e_i + e_j}.

The factor {2} in this expansion comes from combining the two terms {1_{x=e_i} * 1_{x=e_j}} and {1_{x=e_j} * 1_{x=e_i}}, which both evaluate to {1_{x=e_i+e_j}}.

More generally, given any bilinear form {B: ({\bf Z}/2{\bf Z})^n \times ({\bf Z}/2{\bf Z})^n \rightarrow {\bf Z}/2{\bf Z}}, one can define the twisted convolution

\displaystyle  f *_B g(x) := \sum_{y \in ({\bf Z}/2{\bf Z})^n} (-1)^{B(y,x-y)} f(y) g(x-y)

\displaystyle  = \sum_{y_1+y_2=x} (-1)^{B(y_1,y_2)} f(y_1) g(y_2)

of two functions {f,g \in \ell^2(({\bf Z}/2{\bf Z})^n)}. This operation is no longer commutative (unless {B} is symmetric). However, it remains associative; indeed, one can easily compute that

\displaystyle  (f *_B g) *_B h(x) = f *_B (g *_B h)(x)

\displaystyle = \sum_{y_1+y_2+y_3=x} (-1)^{B(y_1,y_2)+B(y_1,y_3)+B(y_2,y_3)} f(y_1) g(y_2) h(y_3).

In particular, if we define the twisted convolution operator

\displaystyle  A_B f(x) := f *_B \mu(x)

then the square {A_B^2} is also a twisted convolution operator

\displaystyle  A_B^2 f = f *_B (\mu *_B \mu)

and the twisted convolution kernel {\mu *_B \mu} can be computed as

\displaystyle  \mu *_B \mu(x) = (\sum_{i=1}^n (-1)^{B(e_i,e_i)}) 1_{x=0}

\displaystyle + \sum_{1 \leq i < j \leq n} ((-1)^{B(e_i,e_j)} + (-1)^{B(e_j,e_i)}) 1_{x=e_i+e_j}.

For general bilinear forms {B}, this twisted convolution is just as messy as {\mu * \mu} is. But if we take the specific bilinear form

\displaystyle  B(x,y) := \sum_{1 \leq i < j \leq n} x_i y_j \ \ \ \ \ (2)

then {B(e_i,e_i)=0} for {1 \leq i \leq n} and {B(e_i,e_j)=1, B(e_j,e_i)=0} for {1 \leq i < j \leq n}, and the above twisted convolution simplifies to

\displaystyle  \mu *_B \mu(x) = n 1_{x=0}

and now {A_B^2} is very simple:

\displaystyle  A_B^2 f = n f.

Thus the only eigenvalues of {A_B} are {+\sqrt{n}} and {-\sqrt{n}}. The matrix {A_B} is entrywise dominated by {A} in the sense of (1), and in particular has trace zero; thus the {+\sqrt{n}} and {-\sqrt{n}} eigenvalues must occur with equal multiplicity, so in particular the {+\sqrt{n}} eigenvalue occurs with multiplicity {2^{n-1}} since the matrix has dimensions {2^n \times 2^n}. This establishes Proposition 2.

Remark 4 Twisted convolution {*_B} is actually just a component of ordinary convolution, but not on the original group {({\bf Z}/2{\bf Z})^n}; instead it relates to convolution on a Heisenberg group extension of this group. More specifically, define the Heisenberg group {H} to be the set of pairs {(x, t) \in ({\bf Z}/2{\bf Z})^n \times ({\bf Z}/2{\bf Z})} with group law

\displaystyle  (x,t) \cdot (y,s) := (x+y, t+s+B(x,y))

and inverse operation

\displaystyle  (x,t)^{-1} = (-x, -t+B(x,x))

(one can dispense with the negative signs here if desired, since we are in characteristic two). Convolution on {H} is defined in the usual manner: one has

\displaystyle  F*G( (x,t) ) := \sum_{(y,s) \in H} F(y,s) G( (y,s)^{-1} (x,t) )

for any {F,G \in \ell^2(H)}. Now if {f \in \ell^2(({\bf Z}/2{\bf Z})^n)} is a function on the original group {({\bf Z}/2{\bf Z})^n}, we can define the lift {\tilde f \in \ell^2(H)} by the formula

\displaystyle  \tilde f(x,t) := (-1)^t f(x)

and then by chasing all the definitions one soon verifies that

\displaystyle  \tilde f * \tilde g = 2 \widetilde{f *_B g}

for any {f,g \in \ell^2(({\bf Z}/2{\bf Z})^n)}, thus relating twisted convolution {*_B} to Heisenberg group convolution {*}.

Remark 5 With the twisting by the specific bilinear form {B} given by (2), convolution by {1_{x=e_i}} and {1_{x=e_j}} now anticommute rather than commute. This makes the twisted convolution algebra {(\ell^2(({\bf Z}/2{\bf Z})^n), *_B)} isomorphic to a Clifford algebra {Cl({\bf R}^n,I_n)} (the real or complex algebra generated by formal generators {v_1,\dots,v_n} subject to the relations {(v_iv_j+v_jv_i)/2 = 1_{i=j}} for {i,j=1,\dots,n}) rather than the commutative algebra more familiar to abelian Fourier analysis. This connection to Clifford algebra (also observed independently by Tom Mrowka and by Daniel Matthews) may be linked to the exterior algebra interpretation of the argument in the recent preprint of Karasev mentioned above.

Remark 6 One could replace the form (2) in this argument by any other bilinear form {B'} that obeyed the relations {B'(e_i,e_i)=0} and {B'(e_i,e_j) + B'(e_j,e_i)=1} for {i \neq j}. However, this additional level of generality does not add much; any such {B'} will differ from {B} by an antisymmetric form {C} (so that {C(x,x) = 0} for all {x}, which in characteristic two implied that {C(x,y) = C(y,x)} for all {x,y}), and such forms can always be decomposed as {C(x,y) = C'(x,y) + C'(y,x)}, where {C'(x,y) := \sum_{i<j} C(e_i,e_j) x_i y_j}. As such, the matrices {A_B} and {A_{B'}} are conjugate, with the conjugation operator being the diagonal matrix with entries {(-1)^{C'(x,x)}} at each vertex {x}.

Remark 7 (Added later) This remark combines the two previous remarks. One can view any of the matrices {A_{B'}} in Remark 6 as components of a single canonical matrix {A_{Cl}} that is still of dimensions {({\bf Z}/2{\bf Z})^n \times ({\bf Z}/2{\bf Z})^n}, but takes values in the Clifford algebra {Cl({\bf R}^n,I_n)} from Remark 5; with this “universal algebra” perspective, one no longer needs to make any arbitrary choices of form {B}. More precisely, let {\ell^2( ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n))} denote the vector space of functions {f: ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n)} from the hypercube to the Clifford algebra; as a real vector space, this is a {2^{2n}} dimensional space, isomorphic to the direct sum of {2^n} copies of {\ell^2(({\bf Z}/2{\bf Z})^n)}, as the Clifford algebra is itself {2^n} dimensional. One can then define a canonical Clifford adjacency operator {A_{Cl}} on this space by

\displaystyle  A_{Cl} f(x) := \sum_{i=1}^n f(x+e_i) v_i

where {v_1,\dots,v_n} are the generators of {Cl({\bf R}^n,I_n)}. This operator can either be identified with a Clifford-valued {2^n \times 2^n} matrix or as a real-valued {2^{2n} \times 2^{2n}} matrix. In either case one still has the key algebraic relations {A_{Cl}^2 = n} and {\mathrm{tr} A_{Cl} = 0}, ensuring that when viewed as a real {2^{2n} \times 2^{2n}} matrix, half of the eigenvalues are equal to {+\sqrt{n}} and half equal to {-\sqrt{n}}. One can then use this matrix in place of any of the {A_{B'}} to establish Theorem 1 (noting that Schur’s test continues to work for Clifford-valued matrices because of the norm structure on {Cl({\bf R}^n,I_n)}).

To relate {A_{Cl}} to the real {2^n \times 2^n} matrices {A_{B'}}, first observe that each point {x} in the hypercube {({\bf Z}/2{\bf Z})^n} can be associated with a one-dimensional real subspace {\ell_x} (i.e., a line) in the Clifford algebra {Cl({\bf R}^n,I_n)} by the formula

\displaystyle  \ell_{e_{i_1} + \dots + e_{i_k}} := \mathrm{span}_{\bf R}( v_{i_1} \dots v_{i_k} )

for any {i_1,\dots,i_k \in \{1,\dots,n\}} (note that this definition is well-defined even if the {i_1,\dots,i_k} are out of order or contain repetitions). This can be viewed as a discrete line bundle over the hypercube. Since {\ell_{x+e_i} = \ell_x e_i} for any {i}, we see that the {2^n}-dimensional real linear subspace {V} of {\ell^2( ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n))} of sections of this bundle, that is to say the space of functions {f: ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n)} such that {f(x) \in \ell_x} for all {x \in ({\bf Z}/2{\bf Z})^n}, is an invariant subspace of {A_{Cl}}. (Indeed, using the left-action of the Clifford algebra on {\ell^2( ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n))}, which commutes with {A_{Cl}}, one can naturally identify {\ell^2( ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n))} with {Cl({\bf R}^n,I_n) \otimes V}, with the left action of {Cl({\bf R}^n,I_n)} acting purely on the first factor and {A_{Cl}} acting purely on the second factor.) Any trivialisation of this line bundle lets us interpret the restriction {A_{Cl}|_V} of {A_{Cl}} to {V} as a real {2^n \times 2^n} matrix. In particular, given one of the bilinear forms {B'} from Remark 6, we can identify {V} with {\ell^2(({\bf Z}/2{\bf Z})^n)} by identifying any real function {f \in \ell^2( ({\bf Z}/2{\bf Z})^n)} with the lift {\tilde f \in V} defined by

\displaystyle  \tilde f(e_{i_1} + \dots + e_{i_k}) := (-1)^{\sum_{1 \leq j < j' \leq k} B'(e_{i_j}, e_{i_{j'}})}

\displaystyle f(e_{i_1} + \dots + e_{i_k}) v_{i_1} \dots v_{i_k}

whenever {1 \leq i_1 < \dots < i_k \leq n}. A somewhat tedious computation using the properties of {B'} then eventually gives the intertwining identity

\displaystyle  A_{Cl} \tilde f = \widetilde{A_{B'} f}

and so {A_{B'}} is conjugate to {A_{Cl}|_V}.

— 1. The Chung, Furedi, Graham, and Seymour example —

The paper of by Chung, Furedi, Graham, and Seymour gives, for any {n \geq 1}, an example of a subset {E} of {Q_n} of cardinality {2^{n-1}+1} for which the maximum degree of {Q_n} restricted to {E} is at most {\lceil \sqrt{n} \rceil}, thus showing that Theorem 1 cannot be improved (beyond the trivial improvement of upgrading {\sqrt{n}} to {\lceil \sqrt{n} \rceil}, because the maximum degree is obviously a natural number).

Define the “Möbius function” {\mu: ({\bf Z}/2{\bf Z})^n \rightarrow \{-1,1\}} to be the function

\displaystyle  \mu(x) := (-1)^{x_1+\dots+x_n}

for {x = (x_1,\dots,x_n) \in ({\bf Z}/2{\bf Z})^n}. This function is extremely balanced on coordinate spaces. Indeed, from the binomial theorem (which uses the convention {0^0=1}) we have

\displaystyle  \sum_{x \in ({\bf Z}/2{\bf Z})^n} \mu(x) = (1-1)^n = 1_{n=0}.

More generally, given any index set {I \subset \{1,\dots,n\}} of cardinality {|I|}, we have

\displaystyle  \sum_{x \in ({\bf Z}/2{\bf Z})^n: x_i=0 \forall i \in I} \mu(x) = (1-1)^{n-|I|} = 1_{n=|I|}.

Now let {I_1,\dots,I_k} be a partition of {\{1,\dots,n\}} into disjoint non-empty sets. For each {j \in J}, let {V_j} be the subspace of {({\bf Z}/2{\bf Z})^n} consisting of those {x = (x_1,\dots,x_n)} such that {x_i = 0} for all {i \in I_j}. Then for any {J \subset \{1,\dots,k\}}, we have

\displaystyle  \sum_{x \in \bigcap_{j \in J} V_j} \mu(x) = (1-1)^{1_{n= \sum_{j \in J}|I_i|}} = 1_{n = \sum_{j \in J} |I_i|},

and the right-hand side vanishes if {J \neq \{1,\dots,k\}} and equals {1} when {J = \{1,\dots,k\}}. Applying the inclusion-exclusion principle, we conclude that

\displaystyle  \sum_{x \in ({\bf Z}/2{\bf Z})^n \backslash \bigcup_{j=1}^k V_j} \mu(x) = (-1)^k

and thus also (assuming {n \geq 1})

\displaystyle  \sum_{x \in \bigcup_{j=1}^k V_j} \mu(x) = (-1)^{k+1}

so that

\displaystyle  \sum_{x \in ({\bf Z}/2{\bf Z})^n \backslash \bigcup_{j=1}^k V_j} (-1)^k \mu(x) + \sum_{x \in \bigcup_{j=1}^k V_j} (-1)^{k+1} \mu(x) = 2

Thus, if {E} denotes the set of those {x \in ({\bf Z}/2{\bf Z})^n \backslash \bigcup_{j=1}^k V_j} with {(-1)^k \mu(x) = 1}, together with those {x \in \bigcup_{j=1}^k V_j} with {(-1)^{k+1} \mu(x) = 1}, then {E} has to have two more elements than its complement {({\bf Z}/2{\bf Z})^n \backslash E}, and hence {E} has cardinality {2^{n-1}+1}.

Now observe that, if {x \in \bigcup_{j=1}^k V_j} with {(-1)^{k+1} \mu(x)=1} and {i = 1,\dots,n}, then {(-1)^k \mu(x+e_i)=1}, and if {x \in V_j} then {x+e_i \in V_j} unless {i \in I_j}. Thus in this case the total number of {i} for which {x+e_i \in E} is at most {\max_j |I_j|}. Conversely, if {x \not \in \bigcup_{j=1}^k V_j} with {(-1)^k \mu(x)=1} and {i=1,\dots,n}, then {(-1)^{k+1} \mu(x+e_i)=1}, and for each {j} there is at most one {i} that will make {x+e_i} lie in {V_j}. Hence in this case the total number of {i} for which {x+e_i \in E} is at most {k}. Thus the maximum degree of the subgraph of {Q_n} induced by {E} is at most {\max( \max_j |I_j|, k )}. By choosing the {I_j} to be a partition of {\{1,\dots,n\}} into {k=\lceil \sqrt{n} \rceil} pieces, each of cardinality at most {\lceil \sqrt{n} \rceil}, we obtain the claim.

Remark 8 Suppose that {n = k^2} is a perfect square, then the lower bound here exactly matches the upper bound in Theorem 1. In particular, the {E \times E} minor of the matrix {\tilde A} must have an eigenvector of eigenvalue {\sqrt{n}}. Such an eigenvector can be explicitly constructed as follows. Let {f \in \ell^2(E)} be the vector defined by setting

\displaystyle  f(x) := (-1)^{\sum_{1 \leq j < j' \leq k} B( e_{i_j}, e_{i_{j'}} )}

whenever {x} is of the form

\displaystyle  x = \sum_{j=1}^k e_{i_j} \ \ \ \ \ (3)

for some {i_j \in I_j}, {j =1,\dots,k}, and

\displaystyle  f(x) := (-1)^{\sum_{1 \leq j < j' \leq k: j,j' \neq j_0} B( e_{i_j}, e_{i_{j'}} ) + k - j_0}

whenever {x} is of the form

\displaystyle  x = \sum_{1 \leq j \leq k: j \neq j_0} e_{i_j} \ \ \ \ \ (4)

for some {i_j \in I_j}, {j =1,\dots,k}, {j \neq j_0}, and {f(x)=0} for all other {x \in E} (one easily verifies that the previous types of {x} lie in {E}). We claim that

\displaystyle  A_B f(x) = k f(x)

for all {x \in E}. Expanding out the left-hand side, we wish to show that

\displaystyle  \sum_{1 \leq i \leq n: x+e_i \in E} (-1)^{B(x,e_i)} f(x+e_i) = k f(x) \ \ \ \ \ (5)

for all {x \in E}.

First suppose that {x} is of the form (3). One checks that {x+e_i} lies in {E} precisely when {i = i_{j_0}} for one of the {j_0=1,\dots,k}, in which case

\displaystyle  f(x+e_{i_{j_0}}) = f(x) (-1)^{\sum_{1 \leq j < j_0} B( e_{i_j}, e_{i_{j_0}}) + \sum_{j_0 < j \leq k} B(e_{i_{j_0}}, e_{i_j}) + k-j_0}.

Since {B(e_{i_{j_0}}, e_{i_j}) = B(e_{i_j}, e_{i_{j_0}}) + 1}, this simplifies using (3) as

\displaystyle  (-1)^{B(x,e_{i_{j_0}})} f(x + e_{i_{j_0}}) = f(x)

giving (5) in this case. Similarly, if {x} is of the form (4), then {y+e_i} lies in {E} precisely when {i \in I_{j_0}}, in which case one can argue as before to show that

\displaystyle  (-1)^{B(x,e_{i})} f(x + e_{i}) = f(x)

and (5) again follows. Finally, if {x \in E} is not of either of the two forms (3), (4), one can check that {x+e_i} is never of these forms either, and so both sides of (5) vanish.

The same analysis works for any of the other bilinear forms {B'} in Remark 6. Using the Clifford-valued operator {A_{Cl}} from Remark 7, the eigenfunction {f \in \ell^2( E \rightarrow Cl({\bf R}^n,I_n) )} is cleaner; it is defined by

\displaystyle  f(x) := v_{e_{i_1}} \dots v_{e_{i_k}}

when {x} is of the form (3), and

\displaystyle  f(x) := (-1)^{k-j_0} v_{e_{i_1}} \dots v_{e_{i_{j_0-1}}} v_{e_{i_{j_0+1}}} v_{e_{i_k}}

when {x} is of the form (4), with {f(x)=0} otherwise.

— 2. From induced subgraph bounds to the sensitivity conjecture —

On the hypercube {({\bf Z}/2{\bf Z})^n}, let {X_1,\dots,X_n: ({\bf Z}/2{\bf Z})^n \rightarrow \{-1,1\}} denote the functions

\displaystyle  X_i(x) := (-1)^{x_i}.

The monomials {\prod_{i \in I} X_i} in {X_i} are then the characters of {({\bf Z}/2{\bf Z})^n}, so by Fourier expansion every function {f \in \ell^2(({\bf Z}/2{\bf Z})^n)} can be viewed as a polynomial in the {X_i} (with each monomial containing at most one copy of {X_i}; higher powers of each {X_i} are unnecessary since {X_i^2=1}. In particular, one can meaningfully talk about the degree of a function {f \in \ell^2(({\bf Z}/2{\bf Z})^n)}. Observe also that the Möbius function {\mu} from the preceding section is just the monomial {X_1 \dots X_n}.

Define the sensitivity {s(f)} of a Boolean function {f: ({\bf Z}/2{\bf Z})^n \rightarrow \{-1,1\}} to be the largest number for which there is an {x} such that there are at least {s(f)} values of {i = 1,\dots,n} with {f(x+e_i) \neq f(x)}. Using an argument of Gotsman and Linial, we can now relate the sensitivity of a function to its degree:

Corollary 9 (Lower bound on sensitivity) For any boolean function {f: ({\bf Z}/2{\bf Z})^n \rightarrow \{-1,1\}}, one has {s(f) \geq \sqrt{\mathrm{deg}(f)}}.

Proof: Write {m = \mathrm{deg}(f)}. By permuting the indices, we may assume that {f} contains a non-trivial multiple of the monomial {X_1 \dots X_m}. By restricting to the subspace {({\bf Z}/2{\bf Z})^m \times \{0\}^{n-m} \equiv ({\bf Z}/2{\bf Z})^m} (which cannot increase the sensitivity), we may then assume without loss of generality that {\mathrm{deg}(f)=m=n}. The Fourier coefficient of {X_1 \dots X_n} is just the mean value

\displaystyle  \frac{1}{2^n} \sum_{x \in ({\bf Z}/2{\bf Z})^n} f(x) X_1 \dots X_n = \frac{1}{2^n} \sum_{x \in ({\bf Z}/2{\bf Z})^n} f(x) \mu(x)

of {f} times the Möbius function {\mu}, so this mean value is non-zero. This means that one of the sets {\{ \mu(x) f(x) = +1\}} or {\{\mu(x) f(x)=-1\}} has cardinality at least {2^{n-1}+1}. Let {E} denote the larger of these two sets. By Theorem 1, there is an {x \in E} such that {x + e_i \in E} for at least {\sqrt{n}} values of {i}; since {\mu(x+e_i) = -\mu(x)}, this implies that {f(x+e_i) \neq f(x)} for at least {\sqrt{n}} values of {i}, giving the claim. \Box

The construction of Chung, Furedi, Graham, and Seymour from the previous section can be easily adapted to show that this lower bound is tight (other than the trivial improvement of replacing {\sqrt{\mathrm{deg}(f)}} by {\lceil \sqrt{\mathrm{deg}(f)} \rceil}).

Now we need to digress on some bounds involving polynomials of one variable. We begin with an inequality of Bernstein concerning trigonometric polynomials:

Lemma 10 (Bernstein inequality) Let {P(\theta)} be a trigonometric polynomial of degree at most {n}, that is to say a complex linear combination of {\sin(k \theta), \cos(k \theta)} for {k=0,\dots,n}. Then

\displaystyle  \sup_{\theta \in {\bf R}/2\pi {\bf Z}} |P'(\theta)| \leq n \sup_{\theta \in {\bf R}/2\pi {\bf Z}} |P(\theta)|.

Observe that equality holds when {P(\theta)=\sin(n\theta)} or {P(\theta) = \cos(n\theta)}. Specialising to linear combinations of {e^{ik\theta} = \cos(k\theta) + i \sin(k \theta)}, we obtain the classical Bernstein inequality

\displaystyle  \sup_{|z|=1} |P'(z)| \leq n \sup_{|z|=1} |P(z)|

for complex polynomials of degree at most {n}.

Proof: If one is willing to lose a constant factor in this estimate, this bound can be easily established from modern Littlewood-Paley theory (see e.g., Exercise 52 of these lecture notes). Here we use an interlacing argument due to Boas. We first restrict to the case when {P} has real coefficients. We may normalise {\sup_{\theta \in {\bf R}/2\pi {\bf Z}} |P(\theta)| =1}. Let {\lambda} be a real parameter in {(-1,1)}. The trigonometric polynomial {\cos(n\theta)} alternately takes the values {+1} and {-1} at the {2n} values {\frac{j\pi}{n}, j=0,\dots,2n-1}. Thus the trigonometric polynomial {\cos(n\theta) + \lambda P(\theta)} alternates in sign at these values, and thus by the intermediate value theorem has a zero on each of the intervals {[\frac{j\pi}{n}, \frac{(j+1)\pi}{n}]}. On the other hand, a trigonometric polynomial {P(\theta)} of degree at most {n} can be expressed by de Moivre’s theorem as {e^{-in\theta}} times a complex polynomial in {e^{i\theta}} of degree at most {2n}, and thus has at most {2n} zeroes. Thus we see that {\cos(n\theta) + \lambda P(\theta)} has exactly one zero in each {[\frac{j\pi}{n}, \frac{(j+1)\pi}{n}]}. Furthermore, at this zero, the derivative {-n \sin(n \theta) + \lambda P'(\theta)} of this function must be positive if {\cos(n\theta)} is increasing on this interval, and negative if {\cos(n\theta)} is decreasing on this interval. In summary, we have shown that if {\lambda \in (-1,1)} and {\theta,\alpha} are such that {\cos(n\theta) + \lambda P(\theta)=0}, then {-n \sin(n \theta) + \lambda P'(\theta)} has the same sign as {-n\sin(n\theta)}. By translating the function {\cos(n\theta)}, we also conclude that if {\lambda \in (-1,1)} and {\theta,\alpha} are such that {\cos(n(\theta+\alpha)) + \lambda P(\theta)=0} for some {\alpha \in {\bf R}}, then {-n \sin(n(\theta+\alpha)) + \lambda P'(\theta)} has the same sign as {-n\sin(n(\theta+\alpha))}.

If {\lambda \in (-1,1)}, then we can find {\alpha} such that {\cos(n(\theta+\alpha)) + \lambda P(\theta)=0} and {n\sin(n(\theta+\alpha))} is positive, and we conclude that {\lambda P'(\theta) < n \sin(n(\theta+\alpha)) \leq n}; thus we have the upper bound

\displaystyle  P'(\theta) \leq n \sup_{\theta \in {\bf R}/2\pi {\bf Z}} |P(\theta)|.

A similar argument (with {n \sin(n(\theta+\alpha))} now chosen to be negative) similarly bounds {-P'(\theta)}. This gives the claim for real-valued trigonometric polynomials {P}. (Indeed, this argument even gives the slightly sharper bound {\sup_{\theta} \frac{1}{n^2} |P'(\theta)|^2 + |P(\theta)|^2 = \sup_{\theta} |P(\theta)|^2}.)

To amplify this to complex valued polynomials, we take advantage of phase rotation invariance. If {P} is a complex trigonometric polynomial, then by applying Bernstein’s inequality to the real part {\mathrm{Re} P} we have

\displaystyle  \sup_{\theta \in {\bf R}/2\pi {\bf Z}} |\mathrm{Re} P'(\theta)| \leq n \sup_{\theta \in {\bf R}/2\pi {\bf Z}} |P(\theta)|.

But then we can multiply {P} by any complex phase {e^{i\alpha}} and conclude that

\displaystyle  \sup_{\theta \in {\bf R}/2\pi {\bf Z}} |\mathrm{Re} e^{i\alpha} P'(\theta)| \leq n \sup_{\theta \in {\bf R}/2\pi {\bf Z}} |P(\theta)|.

Taking suprema in {\alpha}, one obtains the claim for complex polynomials {P}. \Box

The analogue of Bernstein’s inequality for the unit interval is known as Markov’s inequality for polynomials:

Lemma 11 (Markov’s inequality for polynomials) Let {P: {\bf R} \rightarrow {\bf R}} be a polynomial of degree {n}. Then

\displaystyle  \sup_{x \in [-1,1]} |P'(x)| \leq n^2 \sup_{x \in [-1,1]} |P(x)|.

This bound is sharp, as is seen by inspecting the Chebyshev polynomial {T_n}, defined as the unique polynomial giving the trigonometric identity

\displaystyle  T_n(\cos \theta) := \cos(n \theta). \ \ \ \ \ (6)

Differentiating (6) using the chain rule, we see that

\displaystyle  T'_n(\cos \theta) := n \frac{\sin(n \theta)}{\sin(\theta)}; \ \ \ \ \ (7)

the right-hand side approaches {n^2} as {\theta \rightarrow 0}, demonstrating that the {n^2} factor here is sharp.

Proof: We again use an argument of Boas. We may normalise so that

\displaystyle  \sup_{x \in [-1,1]} |P(x)| = 1. \ \ \ \ \ (8)

The function {P(\cos \theta)} is a trigonometric polynomial of degree at most {n}, so by Bernstein’s inequality and the chain rule we have

\displaystyle  |\sin \theta| |P'(\cos \theta)| \leq n \ \ \ \ \ (9)

for all {\theta}, and thus

\displaystyle  |P'(x)| \leq \frac{n}{\sqrt{1-x^2}}

for all {x \in [-1,1]}. This already gives Markov’s inequality except in the edge regions {|x| \geq \sqrt{1 - \frac{1}{n^2}} \geq \cos \frac{\pi}{2n}} (since {\sin \frac{\pi}{2n} \geq \frac{1}{n} \sin \frac{\pi}{2} = \frac{1}{n}}). By reflection symmetry, it then suffices to verify Markov’s inequality in the region {\cos \frac{\pi}{2n} \leq x \leq 1}.

From (6), the Chebyshev polynomial {T_n} attains the values {\pm 1} alternately at the {n+1} different points {\cos(\frac{j\pi}{n}), j=0,\dots,n}. Thus, if {\lambda \in (-1,1)}, the polynomial {T_n(x)+\lambda P(x)} changes sign at least {n} times on {[-1,1]}, and thus must have all {n} zeroes inside this interval by the intermediate value theorem; furthermore, {n-1} of these zeroes will lie to the left of {\cos(\frac{\pi}{n})}. By Rolle’s theorem, the derivative {T'_n(x) + \lambda P'(x)} then has all {n-1} zeroes in the interval {[-1,1]}, and at least {n-2} of these will lie to the left of {\cos( \frac{\pi}{n} )}. In particular, the derivative {T'_n(x) + \lambda P'(x)} can have at most one zero to once to the right of {\cos(\frac{\pi}{2n} ) \geq \cos(\frac{\pi}{n})}.

Since {T_n(1)=1}, {T_n(x)+\lambda P(x)} is positive at {x=+1}, and hence positive as {x \rightarrow \infty} since there are no zeroes outside of {[-1,1]}. Thus the leading coefficient of {T_n(x) + \lambda P(x)} is positive, which implies the same for its derivative {T'_n(x) + \lambda P'(x)}. Thus {T'_n(x) + \lambda P'(x)} is positive when {x \rightarrow \infty}.

From (9) one has {|P'(\cos(\frac{\pi}{2n}))| \leq \frac{n}{\sin(\pi/2n)}}, hence by (7) we see that {T'_n(x) + \lambda P'(x)} is also positive at {x = \cos(\frac{\pi}{2n})}. Thus {T'_n(x) + \lambda P'(x)} cannot become negative for {x > \cos(\frac{\pi}{2n})}, as this would create at least two zeroes to the right of {\cos(\frac{\pi}{2n})}. We conclude that in this region we have

\displaystyle  |P'(x)| \leq T'_n(x).

From (7) we have {|T'_n(x)| \leq n^2}, and the claim follows. \Box

Remark 12 The following slightly shorter argument gives the slightly weaker bound {\sup_{x \in [-1,1]} |P'(x)| \leq n \mathrm{cosec}(1/n) \sup_{x \in [-1,1]} |P(x)|}. We again use the normalisation (8). By two applications of Bernstein’s inequality, the function {\theta \mapsto P(\cos(\theta))} has first derivative bounded in magnitude by {n}, and second derivative bounded in magnitude by {n^2}. As this function also has vanishing first derivative at {0}, we conclude the bounds

\displaystyle  |\frac{d}{d\theta} P(\cos \theta)| \leq \min( n^2 |\theta|, n )

and thus by the chain rule

\displaystyle  |P(\cos \theta)| \leq \min( n^2 |\theta|, n ) \mathrm{cosec}(\theta).

For {|\theta| \leq \pi}, one easily checks that the right-hand side is at most {n \mathrm{cosec}(1/n)}, giving the claim.

This implies a result of Ehlich-Zeller and of Rivlin-Cheney:

Corollary 13 (Discretised Markov inequality) Let {P: {\bf R} \rightarrow {\bf R}} be a polynomial of degree {d}. If

\displaystyle  \sup_{x \in [0,n]} |P'(j)| > A \sup_{j =0,\dots,n} |P(j)| \ \ \ \ \ (10)

then we have {d > \sqrt{\frac{An}{A+2}}}.

Proof: We use an argument of Nisan and Szegedy. Assume for sake of contradiction that {d \leq \sqrt{\frac{An}{A+2}}}, so in particular {d < \sqrt{n}}. From the fundamental theorem of calculus and the triangle inequality one has

\displaystyle  \sup_{x \in [0,n]} |P(x)| \leq \sup_{j =0,\dots,n} |P(j)| + \frac{1}{2} \sup_{x \in [0,n]} |P'(x)|.

By a rescaled and translated version of Markov’s inequality we have

\displaystyle  \sup_{x \in [0,n]} |P'(x)| \leq \frac{2d^2}{n} \sup_{x \in [0,n]} |P(x)| \ \ \ \ \ (11)

which when inserted into the preceding inequality gives after some rearranging

\displaystyle  \sup_{x \in [0,n]} |P(x)| \leq (1 - \frac{d^2}{n})^{-1} \sup_{j =0,\dots,n} |P(j)|

and then after a second application of (11) gives

\displaystyle  \sup_{x \in [0,n]} |P'(x)| \leq \frac{2d^2}{n} (1 - \frac{d^2}{n})^{-1} \sup_{j =0,\dots,n} |P(j)|.

Comparing with (10), we conclude that

\displaystyle  A < \frac{2d^2}{n} (1 - \frac{d^2}{n})^{-1}

and the claim follows after some rearranging. \Box

Nisan and Szegedy observed that this one-dimensional degree bound can be lifted to the hypercube by a symmetrisation argument:

Corollary 14 (Multidimensional Markov inequality bound) Let {f: ({\bf Z}/2{\bf Z})^n \rightarrow [-1,1]} be such that {f(0)=1} and {f(e_i)=-1} for {i=1,\dots,n}. Then

\displaystyle  \mathrm{deg}(f) \geq \sqrt{n/2}.

Proof: By averaging {f} over all permutations of the indices (which can decrease the degree of {f}, but not increase it), we may assume that {f} is a symmetric function of the inputs {x_1,\dots,x_n}. Using the Newton identities, we can then write

\displaystyle  f(x) = P(|x|)

for some real polynomial {P: {\bf R} \rightarrow {\bf R}} of degree at most {\mathrm{deg}(f)}, where

\displaystyle  |x| = \# \{ i =1,\dots,n: x_i = 1 \}

is the Hamming length of {x}. By hypothesis, {\sup_{j = 0,\dots,n} |P(j)| \leq 1}, {P(0)=1}, and {P(1)=-1}, hence by the mean-value theorem {\sup_{x \in [0,n]} |P'(x)| \geq 2}. Applying Corollary 13 with {A=2}, we obtain the claim. \Box

Define the block sensitivity {bs(f)} of a Boolean function {f: ({\bf Z}/2{\bf Z})^n \rightarrow \{-1,1\}} to be the largest number for which there is an {x} such that there are at least {bs(f)} disjoint subsets {I_1,\dots,I_{bs(f)}} of {\{1,\dots,n\}} with {f(x+\sum_{i \in I_j} e_i) \neq f(x)} for {j=1,\dots,bs(f)}. We have

Theorem 15 (Sensitivity conjecture) One has

\displaystyle  s(f) \leq bs(f) \leq 2 s(f)^4.

More precisely, the sensitivity conjecture of Nisan and Szegedy asserted the bound {bs(f) = O( s(f)^{O(1)})}; Huang’s result thus gives explicit values for the exponents. It is still open whether the exponent {4} in this theorem can be improved; it is known that one cannot improve it to below {2}, by analysing a variant of the Chung-Furedi-Graham-Seymour example (see these notes of Regan for details). Proof: The lower bound for {bs(f)} is immediate from the definitions, since the sensitivity arises by restricting the {I_i} in the definition of block sensitivity to singleton sets. To prove the upper bound, it suffices from Proposition 9 to establish the bound

\displaystyle  \mathrm{deg}(f) \geq \sqrt{bs(f)/2}.

Let {m = bs(f)}. By hypothesis, there are {x \in ({\bf Z}/2{\bf Z})^n} and {m} disjoint subsets {I_1,\dots,I_m} of {\{1,\dots,n\}} such that {f(x+\sum_{i \in I_j} e_i) \neq f(x)} for {j=1,\dots,m}. We may normalise {x=0}, {f(0)=1} and {f(\sum_{i\in I_j}e_i)=-1}. If we then define the pullback boolean function {\tilde f: ({\bf Z}/2{\bf Z})^m \rightarrow \{-1,1\}} by the formula

\displaystyle  \tilde f( y_1,\dots,y_m) := f( \sum_{j=1}^m y_j \sum_{i \in I_j} e_i )

then it is easy to see that {\mathrm{deg}(\tilde f) \leq \mathrm{deg}(f)}, {\tilde f(0)=1}, and {\tilde f(e_i)=-1} for {i=1,\dots,m}. The claim now follows from Corollary 14. \Box

Remark 16 The following slightly shorter variant of the argument lets one remove the factor {2}. Let {m, I_1,\dots,I_m} be as above. We again may normalise {f(0)=1} and {f(\sum_{i \in I_j} e_i)=-1}. For any {\theta}, let {\xi_{\theta,1},\dots,\xi_{\theta,m} \in \{0,1\}} be iid Bernoulli random variable that equal {1} with probability {\sin^2 \theta} and {0} with probability {\cos^2 \theta}. The quantity

\displaystyle  F(\theta) := \mathop{\bf E} f( \sum_{j=1}^m \xi_{\theta,i} \sum_{i \in I_j} e_i )

is a trigonometric polynomial of degree at most {2\mathrm{deg}(f)} that is bounded in magnitude by {1}, so by two applications of Bernstein’s inequality

\displaystyle  |F''(0)| \leq 4 \mathrm{deg}(f)^2.

On the other hand, for small {\theta}, the random variable {\sum_{j=1}^m \xi_{\theta,i} \sum_{i \in I_j} e_i} is equal to zero with probability {1 - m \theta^2 + O(\theta^3)} and equal to each {\sum_{i \in I_j} e_i} with probability {\theta^2 + O(\theta^3)}, hence

\displaystyle  F(\theta) = 1 - 2m \theta^2 + O(\theta^3)

and hence {F''(0) = - 4m}. Combining these estimates we obtain {\mathrm{deg}(f) \geq \sqrt{bs(f)}} and hence {bs(f) \leq s(f)^4}.

Remark 17 The sensitivity {s(f)} of a Boolean function {f: ({\bf Z}/2{\bf Z})^n \rightarrow \{-1,1\}} can be split as {s(f) = \max( s_{+1}(f), s_{-1}(f))}, where {s_c(f)} is largest number for which there is an {x} such that there are at least {s_c(f)} values of {i = 1,\dots,n} with {f(x+e_i) \neq f(x) = c}. It is not difficult to use the {k=2} case of Remark 3 to improve Corollary 9 slightly to {s_{+1}(f) s_{-1}(f) \geq \mathrm{deg}(f)}. Combining this with the previous remark, we can thus improve the upper bound in Theorem 15 slightly to

\displaystyle  bs(f) \leq s_{+1}(f)^2 s_{-1}(f)^2.