Progress on the Frontier

Gödel’s Lost Letter and P=NP 2018-03-12

An almost exponential improvement in bounds against ACC

Source from previous paper

Cody Murray is a PhD student of Ryan Williams at MIT. He and Ryan have a new paper that greatly improves Ryan’s separation of nonuniform {\mathsf{ACC}} circuits from uniform nondeterministic time classes. The previous best separation was from {\mathsf{NEXP}}, that is, nondeterministic time {2^{n^{O(1)}}}. The new one is from {\mathsf{NQP}}, which is nondeterministic time {2^{(\log n)^{O(1)}}}. The ‘Q’ here means “quasi-polynomial,” not “quantum.”

Today we discuss the new ideas that led to this breakthrough on a previous breakthrough.

Since I live in Buffalo and my hometown Buffalo Bills are named for the frontiersman and showman William Cody, I can’t help making Wild West associations. The complexity frontier has now existed for over 60 years, a fourth of United States history, longer than the time from the Civil War to Arizona becoming the last contiguous state in 1912. So much is still unknown about it that any new avenue of progress commands attention.

The main idea calls to mind the classic movie The Searchers. In the movie the two principals are searching for an abducted small child. Here we are searching for a witness string {y} to an {\mathsf{NP}}-type predicate {V(x,y)} that is small in the following sense: it is the truth table of a small circuit {C_y}. In the first applications using {\mathsf{NEXP}}, the length {m} of {y} was exponential in the length {n} of the input {x} and the size {\ell} of {C_y}, while polynomial in {n}, was (poly-)logarithmic in {m}. In the new paper, all of the auxiliary size functions are related by operations under which polynomials are closed.

The small nature of the witness depends on strong hypotheses that the proof is ultimately trying to contradict. Ryan’s previous results used the powerful hypothesis {\mathsf{NEXP \subset ACC}}. The present result starts by supposing {\mathsf{NQP \subset ACC}}. What we wish to emphasize first are the clever alterations to previous techniques that enable sweeping over the vast expanses of strings in a new way to reveal a new unconditional lower bound.

The “Almost Almost Everywhere” Hardness Condition

Let {s(n)} be a complexity function. We will think of {s(n)} as polynomial or quasi-polynomial but the reasoning works clear up to {s(n) < 2^n/(2n)}. With {s(n)} in mind, say that a language {L} is “easy at length {n}'' if there exists an {n}-input circuit {C} of size {s(n)} such that for all {x \in \{0,1\}^n}, {C(x) = L(x)}. Say {L} is “hard at {n}'' otherwise. Consider the following conditions:

  1. There are infinitely many {n} such that {L} is hard at {n}.

  2. For some polynomial {p(m)} and all {m}, there is an {n} such that {m \leq n \leq p(m)} and {L} is hard at {n}.

  3. For all but finitely many {n}, {L} is hard at {n}.

When {L} encodes a natural problem like {\mathsf{SAT}}, we might expect its hardness not to fluctuate with {n}, so that these conditions have equal force. When {L} (like {\mathsf{SAT}}) is downward self-reducible, easiness at lengths just below {n} implies easiness at {n}. The easiness does slip to {C\cdot s(n)} in place of {s(n)}, where {C} is the number of queries in the self-reduction, but this still supports the intuition that easiness and hardness should be evenly spread.

To get a language {L_d \in \mathsf{DSPACE}[s(n)^2]} that meet the third criterion, we can loop over Boolean functions {f} with {u_n = \lfloor 2\log_2(s(n))\rfloor} inputs until we find {f_n} that has no circuit of size {s(n)}. Then define:

\displaystyle  L_d = \{1^{n - 1 - u_n}0v: |v| = u_n \wedge f_n(v) = 1\}.

Almost-everywhere hardness may fail technically if, say, the encoding of {\mathsf{SAT}} as {L} uses no odd-length strings, so that {L} is trivially easy at odd {n}. We could fill in the gap by using {L' = L \cup \{x1: x \in L\}} as the encoding, but this is ad hoc and might not work for other kinds of gaps. We could instead craft notions of {L} being hard on a polynomially dense set {D}, strengthening condition 2 by making {D} easy to decide and denser. Against this backdrop the new “a.a.e.” condition used in the paper is short and neat:

Definition 1 {L} is almost-almost-everywhere hard if there is a polynomial {q} such that for all {n}, either {L} is hard at {n} or {L} is hard at {q(n)}.

We may relax {q} to be a more general function, such as a polynomial composed with other bounds. We assume all bounds in question are increasing functions that are time-constructible or space-constructible as required.

Besides the diagonal language {L_d}, the paper uses a special {\mathsf{PSPACE}}-complete set {L_0} credited to Rahul Santhanam drawing on work by Luca Trevisan and Salil Vadhan. {L_0} is downward self-reducible, paddable in that {1^a 0x \in L_0 \iff x \in L_0} for all {a}, and autoreducible in the following sense: there is a probabilistic polynomial-time oracle TM {M_0} that on inputs {x} makes only queries of length {|x|}, always accepts {x} when {x \in L_0} and {L_0} is the oracle, and when {x \notin L_0} rejects with probability at least 2/3 given any oracle. (The last clause prevents simply having {M_0(x)} query {x}.)

The First Lower Bound

The new lower bound really comes from a new kind of upper bound using “Merlin-Arthur with advice.” A predicate {R(x,y,z)} related to a language {L} has the Merlin-Arthur property if (informally speaking):

  • If {x \in L}, then there exists a {y} such that for most {z}, {R(x,y,z)} holds.

  • If {x \notin L}, then there are no {y} or {z} such that {R(x,y,z)} holds.

One reason this double-barreled quantification enters the scene is that we still don’t know how to separate {\mathsf{NEXP}} from {\mathsf{P/poly}}, but we do know that the exponential-time Merlin-Arthur class is not in {\mathsf{P/poly}}. The new tweak involves adding a quantifier but will allow dropping the time. It comes from exempting a small piece {\alpha_n} of {y} from the second (“soundness”) condition, where {\alpha_n} depends only on the length {n} of {x}:

Definition 2 {L} belongs to {\mathsf{MA}[t(n)]/a(n)} if there is a predicate {R(x,y,z)} decidable in time {t(|x|)} such that for all {n} there is a string {\alpha_n} of length {a(n)} such that for all {x \in \{0,1\}^n}:

\displaystyle  \begin{array}{rcl}  x \in L &\implies& (\exists y)\Pr_z[R(x,\alpha_n y,z)] > 2/3;\\ x \notin L &\implies& (\forall y) \Pr_z[R(x,\alpha_n y,z)] < 1/3. \end{array}

The point is that the condition for {x \notin L} is allowed to fail for other strings {\alpha'_n}. The string {\alpha_n} will in fact either be {q(n)} from the a.a.e. definition or will give the maximum length {\ell = \ell(n)} at which the language {L_0} mentioned above has circuits of some size {r = r(n)} depending only on {n} (so {\ell} exists). Now we can state the lower bound, importing what we’ve said about {s(n)} and {q(n)} previously and {r(n)} just now:

Theorem 3 There are constants {b,c,d \geq 1} such that for all {s(n)} as above, and any auxiliary functions {q(n),r(n)} such that {q(n) > s(n)^{2c}}, {r(n) \geq s(q(n))}, and {r(n) \geq s(n)^{2b+1}}, we can construct a language {L \in \mathsf{MA}[r(n)^2 \cdot q(n)^d]/2\log(q(n))} that is a.a.e.-hard.

The proof blends the constructions of {L_d} and {L_0} mentioned in the previous section. In particular, it uses the manner in which {L_d} reduces to {L_0} under appropriate padding. The reduction {f} maps any string {x} of length {n} to a padded string {x'} of length {n' = s(n)^{2c}} in {O(n')} time. Note that the oracle TM {M_0} that comes with {L_0} obeys a kind of {\mathsf{MA}} condition. We have all the pieces we need to carry out the diagonalization needed for a.a.e.-hardness while obtaining a Merlin-Arthur protocol with advice that works as follows on any input {x}, {n = |x|}:

  1. Advice {\alpha_n = \ell(n)} unless {q(n)} is smaller.

  2. Merlin guesses a circuit {C} of size {r(n)} with {\alpha_n} inputs.

  3. Arthur runs {M_0^C(w)} for a selected string {w}:

    • In the case {\alpha_n = q(n)}, let {x' = f(x)}, {n' = |x'|}, and take {w = 1^{q(n)-1-n'} 0 x'}. By the way {L_d} is defined above and {n' = s(n)^{2c} < q(n)}, this slice has no {s(n)}-sized circuits.
    • In the case {\alpha_n = \ell(n) < q(n)} we parse {x} as {1^a 0 x'} and put {n' = |x'|} as before. If {\alpha_n < n' + 1} then reject, else we can take {w = 1^{\alpha_n - 1 - n'}0x'}.

First note that we saved up to {2^{q(n)}} time by not computing {\ell(n)}. The latter case uses the magnitude not the length of {\alpha_n} in the padding. The proof analyzes the cases.

In the former case, because of how {L_0} is defined with padding and {q(n) \leq \ell}, there is a circuit {C} of size {r(n)} that decides {L_0} at length {q(n)}, so Merlin can guess it. In the latter case, Merlin guesses {C} for the length-{\ell(n)} slice of {L_0} directly. The property of {M_0} ensures that for all {x} and the appropriate {\alpha_n}, either there is a {C} leading Arthur to accept always or all {C} make Arthur reject with probability at least 2/3, so the set of {x} giving the former case defines a language in {\mathsf{MA}} with the stated time and advice bounds.

The a.a.e. hardness follows from how the protocol either yields the length-{n} slice of {L_d} or implicitly maps the length-{q(n)} slice of it under the reduction to {L_0}. In fact, the argument is more delicate and starts by negating both sides of the “a.a.e.” hardness definition for sake of contradiction. For that we refer to the paper.

The Easy-Witness Theorem

Let {\mathsf{EW}_{t(n)}[w(n)]} denote the “easy-witness” class of languages {L} such that for all witness predicates {V(x,y)} for {L} that are decidable in time {t(|x|)}, and all {x \in L}, there is a circuit {C} of size {w(|x|)} whose graph is a string {y} such that {V(x,y)} holds. It doesn’t much matter whether we restrict witnesses {y} to have length a power of 2 or let the graph be {1^a 0 y} for some {a}. Let {\mathsf{SIZE}[s(n)]} denote the class of languages with (nonuniform) circuits of size {s(n)}.

Theorem 4 There are universal constants {e,g \geq 1} and {h > 0} such that for every {s(n) < \frac{1}{n}2^{n/e}} and time function {t(n) \geq q(q(q(n)))^h}, where {q(n) = s(en)^e}:

\displaystyle  \mathsf{NTIME}[t(n)^e] \subset \mathsf{SIZE}[s(n)] \implies \mathsf{NTIME}[t(n)] \subseteq \mathsf{EW}_{t(n)}[q(q(q(n)))^{2g}].

The two triple compositions of {q(n)} (which is called {s_2(n)} in the paper) foreshadow the proof being a three-day ride. The proof again works by contradiction, and it helps that the negation of the “easy-witness” condition is concrete and helpful: it gives a verifier {V} and an {x \in L} such that there are {y} of length at most {t(|x|)} giving {V(x,y)} but none with small circuit complexity. In fact, we get this for infinitely many {x \in L}. The proof finally applies a theorem of Chris Umans that constructs a polynomial-time pseudorandom generator {G} such that whenever {y} has circuit complexity not less than {s^g} and {n' = \lceil g\log|y|\rceil}, all circuits {C} of size {s} give:

\displaystyle  \left|\Pr_{v \in \{0,1\}^{n'}}[C(G(y,v)) = 1] - \Pr_{x \in \{0,1\}^m}[C(m) = 1]\right| < \frac{1}{s},

where {m \leq s} is the output length of {G} in terms of {n'} and the length of {y}. This is where the constant {g} comes in, while {h} can be taken as {2} divided by the exponent in the running time of {G}. The generator is used to de-randomize the {\mathsf{MA}} protocol. This yields a nondeterministic TM whose running time violates an application of the nondeterministic time hierarchy theorem, producing the desired contradiction.

The Roundup

The horses driving the final results come from various families {{\cal C}} of circuits, which we may suppose obey some simple closure properties. The nub is the speed with which a nondeterministic TMs {N} can distinguish {n}-input circuits {H \in {\cal C}} in the following sense:

  • If {H} computes the all-zero function then some paths of {N(H)} say {H} is “white” while all others say “another color.”

  • If {H} gives output {1} for at least {\frac{1}{4}2^n} arguments then some paths say “black” while all others say “another color.”

  • {N(H)} never has some paths that say “white” and others that say “black.”

Note that distinguishing the all-zero and quarter-dense cases is easy to do randomly in basically {O(n + |H|)} time, which converts to deterministic time under certain de-randomizing hypotheses. We only need to achieve this nondeterministically with a little advantage over the brute-force time (which cycles through {2^n} assignments). The main theorem works for any {\epsilon} such that {0 < \epsilon < 1}:

Theorem 5

  • If {n}-input {{\cal C}}-circuits of size {2^{\epsilon n}} can be nondeterministically distinguished in {O(2^{n - \epsilon n})} time, then there is a {c \geq 1} such that for all {k}, {\mathsf{NTIME}[n^{ck^4/\epsilon}]} does not have size-{n^k} circuits in {{\cal C}}.

  • If {n}-input {{\cal C}}-circuits of size {2^{n^\epsilon}} can be nondeterministically distinguished in {O(2^{n - n^\epsilon})} time, then for all {k} there is a {c \geq 1} such that {\mathsf{NTIME}[2^{(\log n)^{ck^4/\epsilon}}]} does not have size-{2^{(\log n)^k}} circuits in {{\cal C}}.

The proof applies the easy-witness theorem to a particular verifier constructed by Eli Ben-Sasson and Emanuele Viola, and its easy witnesses lead the charge. By distinguishing the adversaries’ horse colors they lift their cover of darkness and drive them to a final contradiction shootout in the gulch of the nondeterministic time hierarchy theorem. In terms of {h = 2^{\epsilon n}} the first statement’s target time happens to be {O(h^{1/\epsilon})}, which is polynomial in {h}, while the second statement’s time in terms of {h = 2^{n^{\epsilon}}} is {O(h^{(\log h)^{(1-\epsilon)/\epsilon}})}, which is quasi-polynomial in {h}.

Note that the second statement pulls a fast one: the order of quantifying {c} and {k} is switched. The gun it draws, however, was already in plain sight from earlier work by Ryan. Let {\mathsf{ACC}^+} denote {\mathsf{ACC}} circuits plus one layer of threshold gates at the inputs:

Theorem 6 For all {m,d} there is an {\epsilon > 0} such that given {\mathsf{ACC}^+} circuits of modulus {m}, depth {d}, and size {2^{n\epsilon}}, whether they compute the all-zero function can be decided deterministically in time {O(2^{n - n^\epsilon})}.

The sheriff holding this gun rides all the {\mathsf{ACC}^+} circuits out of the highlands of {\mathsf{NQP}}. And if a gunslinger NTM {N} can be found to enforce the first clause in Theorem 5, a trail may open up for showing {\mathsf{NP} \not\subset \mathsf{ACC}}.

Open Problems

What other consequences follow from this march into new lower-bound territory? Already these movies are serials.

[fixed a subscript, m-vs-n in condition 2, and last sentence before “Open Problems”; fixed LaTeX in algorithm case; fixed Theorem 6.]