Myth creation: The switching lemma

Thoughts 2022-09-14

The history of science is littered with anecdotes about misplaced credit. Because it does not matter if it was A or B who did it; it only matters if it was I or not I. In this spirit I am starting a series of posts about such misplaced credit, which I hesitated before calling more colorfully “myth creation.” Before starting, I want to make absolutely clear that I am in no way criticizing the works themselves or their authors. In fact, many are among my favorites. Moreover, at least in the examples I have in mind right now, the authors do place their work in the appropriate context with the use of citations etc. My only point is the credit that the work has received within and without our community (typically due to inertia and snowball effects rather than anything else).

Of course, at some level this doesn’t matter. You can call Chebichev’s polynomials rainbow sprinkles and the math doesn’t change. And yet at some other level maybe it does matter a little, for science isn’t yet a purely robotic activity. With these posts I will advertise unpopular points of views that might be useful, for example to researchers who are junior or from different communities.

The switching lemma

I must admit I had a good run — Johan Hastad (privately to the blogger)

Random restrictions have been used in complexity theory since at least the 60’s [Sub61]. The first dramatic use in the context of AC0 is due to [FSS84, Ajt83]. These works proved a switching lemma the amazing fact that a DNF gets simplified by a random restriction to the point that it can be written as a CNF, so you can collapse layers and induct. (An exposition is given below.) Using it, they proved super-polynomial lower bounds for AC0. The proof in [FSS84] is very nice, and if I want to get a quick intuition of why switching is at all possible, I often go back to it. [Ajt83] is also a brilliant paper, and long, unavailable online for free, filled with a logical notation which makes some people twitch. The first symbol of the title says it all, and may be the most obscene ever chosen:

$\begin{aligned} \Sigma _{1}^{1}. \end{aligned}$

Subsequently, [Yao85] proved exponential lower bounds of the form $2^{n^{c}}$ , with a refined analysis of the switching lemma. The bounds are tight, except for the constant $c$ which depends on the depth of the circuit. Finally, the star of this post [Has86, Has87] obtained $c=1/(depth-1)$ .

Yao’s paper doesn’t quite state that a DNF can be written exactly as a CNF, but it states that it can be approximated. Hastad’s work is the first to prove that a DNF can be written as a CNF, and in this sense his statement is cleaner than Yao’s. However, Yao’s paper states explicitly that a small circuit, after being hit by a restriction, can be set to constant by fixing few more bits.

The modern formulation of the switching lemma says that a DNF can be written as a shallow decision tree (and hence a small CNF). This formulation in terms of decision trees is actually not explicit in Hastad’s work. Beame, in his primer [Bea94], credits Cai with this idea and mentions several researchers noted Hastad’s proof works in this way.

Another switching lemma trivia is that the proof in Hastad’s thesis is actually due to Boppana; Hastad’s original argument — of which apparently no written record exists — was closer to Razborov’s later proof.

So, let’s recap. Random restrictions are already in [Sub61]. The idea of switching is already in [FSS84, Ajt83]. You already had three analyses of these ideas, two giving superpolynomial lower bounds and one [Yao85] giving exponential. The formulation in terms of decision trees isn’t in [Has87], and the proof that appears in [Has87] is due to Boppana.

Still, I would guess [Has87] is more well known than all the other works above combined. [Yao85] did have a following at the time — I think it appeared in the pop news. But hey — have you ever heard of Yao’s switching lemma?

The current citation counts offer mixed support for my thesis:

FSS: 1351

Y: 732

H – paper “Almost optimal…:” 867

H – thesis: 582

But it is very hard to use citation information. The two H citations overlap, and papers are cited for various reasons. For example FSS got a ton of citations for the connection to oracles (which has nothing to do with switching lemmas).

Instead it’s instructive to note the type of citations that you can find in the literature:

Hastad’s switching lemma is a cornerstone of circuit complexity [No mention of FSS, A, Y]

Hastad‘s Switching Lemma is one of the gems of computational complexity [Notes below in passing it builds on FSS, A, Y]

The wikipedia entry is also telling:

In computational complexity theory, Hastad’s switching lemma is a key tool for proving lower bounds on the size of constant-depth Boolean circuits. Using the switching lemma, Johan Hastad (1987) showed that... [No mention of FSS,A,Y]

I think that 99% of the contribution of this line of research is the amazing idea that random restrictions simplify a DNF so that you can write it as a CNF and collapse. 90% of the rest is analyzing this to get superpolynomial lower bounds. And 90% of whatever is left is analyzing this to get exponential lower bounds.

Going back to something I mentioned at the beginning, I want to emphasize that Hastad during talks makes a point of reminding the audience that the idea of random restrictions is due to Sipser, and of Boppana’s contribution. And I also would like to thank him for his help with this post.

OK — so maybe this is so, but it must then be the case that [Has87] is the final word on this stuff, like the ultimate tightest analysis that kills the problem. Actually, it is not tight in some regimes of interest, and several cool works of past and recent times address that. In the end, I can only think of one reason why [Has87] entered the mythology in ways that other works did not, the reason that I carefully sidestepped while composing this post: å.

Perhaps one reason behind the aura of the switching lemma is that it’s hard to find examples. It would be nice to read: If you have this extreme DNF here’s what happens, on the other hand for this other extreme DNF here’s what happens, and in general this always works and here’s the switching lemma. Examples are forever – Erdos. Instead the switching lemma is typically presented as blam!: an example-free encoding argument which feels deus ex machina, as in this crisp presentation by Thapen. For a little more discussion, I liked Bogdanov’s lecture notes. Next I give a slightly different exposition of the encoding argument.

The simplest case: Or of $n$ bits.

Here the circuit $C$ is simply the Or of $n$ bits $x_{1},x_{2},\ldots ,x_{n}$ . This and the next case can be analyzed in more familiar ways, but the benefit of the encoding argument presented next is that it will extend to the general case more easily… arguably. Anyway, it’s also just fun to learn a different argument.

So, let’s take a random restriction $\rho$ with exactly $s$ stars. Some of the bits may become $0$ , others $1$ , and others yet may remain unfixed, i.e., assigned to stars. Those that become $0$ you can ignore, while if some become $1$ then the whole circuit becomes $1$ .

We will show that the number of restrictions for which the restricted circuit $C|_{\rho }$ requires decision trees of depth $\ge d$ is small. To accomplish this, we are going to encode/map such restrictions using/to a restriction… with no stars (that is, just a 0/1 assignment to the variables). The gain is clear: just think of a restriction with zero stars versus a restriction with one star. The latter are more by a factor about the number $n$ of variables.

A critical observation is that we only want to encode restrictions for which $C|_{\rho }$ requires large depth. So $\rho$ does not map any variable to $1$ , for else the Or is $1$ which has decision trees of depth $0$ .

The way we are going to encode $\rho$ is this: Simply replace the stars with ones. To go back, replace the ones with stars. We are using the ones in the encoding to “signal” where the stars are.

Hence, the number of bad restrictions is at most $2^{n}$ , which is tiny compared to the number $\binom {n}{s}2^{n-s}$ of restrictions with $s$ stars.

The medium case: Or of functions on disjoint inputs.

Instead of working with DNFs, I will consider a circuit $C$ which is the Or of arbitrary functions $f_{i}$ each on $w$ bits. You can immediately get this formulation from the usual one for DNFs, but I still find it a little useful since otherwise you might think there is something special about DNFs. What is special is that you take the Or of the functions, and we will exploit this again shortly.

In this warm-up case, we start with functions on disjoint inputs. So, again, let’s take a random restriction $\rho$ with exactly $s$ stars. Some of the functions may become $0$ , others $1$ , and others yet may remain unfixed. Those that become $0$ you can ignore, while if some become $1$ then the whole circuit becomes $1$ .

As before, we will show that the number of restrictions for which the restricted circuit $C|_{\rho }$ requires decision trees of depth $\ge d$ is small. To accomplish this, we are going to encode/map such restrictions using/to a restriction with just $s-d$ stars, plus a little more information. As we saw already, the gain in reducing the number of stars is clear. In particular, standard calculations show that saving $d$ stars reduces the number of restrictions by a factor $O(s/n)^{d}$ . The auxiliary information will give us a factor of $w^{d}$ , leading to the familiar bound $O(ws/n)^{d}$ .

As before, recall that we only want to encode restrictions for which $C|_{\rho }$ requires large depth. So no function in $C|_{\rho }$ is $1$ , for else the circuit is $1$ and has decision trees of depth $0$ . Also, you have $d$ stars among inputs to functions that are unfixed (i.e., not even fixed to $0$ ), for else again you can compute the function reading less than $d$ bits. Because the functions are unfixed, there is a setting for those $d$ stars (and possibly a few more stars – that would only help the argument) that make the corresponding functions $1$ . We are going to pick precisely that setting in our restriction $\rho '$ with $s-d$ stars. This allows us to “signal” which functions had inputs with the stars we are saving (namely, those that are the constant $1$ ). To completely recover $\rho$ , we simply add extra information to indicate where the stars were. The saving here is that we only have to say where the stars are among $w$ symbols, not $n$ .

The general case: Or of functions on any subset of $w$ bits.

First, the number of functions does not play a role, so you can think you have functions on any possible subset of $w$ bits, where some functions may be constant. The idea is the same, except we have to be slightly more careful because when we set values for the stars in one function we may also affect other functions. The idea is simply to fix one function at the time. Specifically, starting with $\rho$ , consider the first function $f$ that’s not made constant by $\rho$ . So the inputs to $f$ have some stars. As before, let us replace the stars with constants that make the function $f$ equal to the constant 1, and append the extra information that allows us to recover where these stars were in $\rho$ .

We’d like to repeat the argument. Note however we only have guarantees about $C|_{\rho }$ , not $C|_{\rho }$ with some stars replaced with constants that make $f$ equal to $1$ . We also can’t just jump to the 2nd function that’s not constant in $C|_{\rho }$ , since the “signal” fixing for that might clash with the fixing for the first – this is where the overlap in inputs makes things slightly more involved. Instead, because $C|_{\rho }$ required decision tree depth at least $d$ , we note there have to be some assignments to the $m$ stars in the input to $f$ so that the resulting, further restricted circuit still requires decision tree depth $\ge d-m$ (else $C|_{\rho }$ has decision trees of depth $<d$ ). We append this assignment to the auxiliary information and we continue the argument using the further restricted circuit.

References

[Ajt83] Mikl�s Ajtai. $\Sigma \sp {1}\sb {1}$ -formulae on finite structures. Annals of Pure and Applied Logic, 24(1):1–48, 1983.

[Bea94] Paul Beame. A switching lemma primer. Technical Report UW-CSE-95-07-01, Department of Computer Science and Engineering, University of Washington, November 1994. Available from http://www.cs.washington.edu/homes/beame/.

[FSS84] Merrick L. Furst, James B. Saxe, and Michael Sipser. Parity, circuits, and the polynomial-time hierarchy. Mathematical Systems Theory, 17(1):13–27, 1984.

[Has86] Johan H�stad. Almost optimal lower bounds for small depth circuits. In Juris Hartmanis, editor, Proceedings of the 18th Annual ACM Symposium on Theory of Computing, May 28-30, 1986, Berkeley, California, USA, pages 6–20. ACM, 1986.

[H�s87] Johan H�stad. Computational limitations of small-depth circuits. MIT Press, 1987.

[Sub61] B. A. Subbotovskaya. Realizations of linear functions by formulas using +, *, -. Soviet Mathematics-Doklady, 2:110–112, 1961.

[Yao85] Andrew Yao. Separating the polynomial-time hierarchy by oracles. In 26th IEEE Symp. on Foundations of Computer Science (FOCS), pages 1–10, 1985.