The relativized BQP vs. PH problem (1993-2018)

Shtetl-Optimized 2018-08-02

Update (June 4): OK, I think the blog formatting issues are fixed now—thanks so much to Jesse Kipp for his help!

True story. A couple nights ago, I was sitting in the Knesset, Israel’s parliament building, watching Gilles Brassard and Charles Bennett receive the Wolf Prize in Physics for their foundational contributions to quantum computing and information. (The other laureates included, among others, Beilinson and Drinfeld in mathematics; the American honeybee researcher Gene Robinson; and Sir Paul McCartney, who did not show up for the ceremony.)

Along with the BB84 quantum cryptography scheme, the discovery of quantum teleportation, and much else, Bennett and Brassard’s seminal work included some of the first quantum oracle results, such as the BBBV Theorem (Bennett, Bernstein, Brassard, Vazirani), which proved the optimality of Grover’s search algorithm, and thus the inability of quantum computers to solve NP-complete problems in polynomial time in the black-box setting. It thereby set the stage for much of my own career. Of course, the early giants were nice enough to bequeath to us a few problems they weren’t able to solve, such as: is there an oracle relative to which quantum computers can solve some problem outside the entire polynomial hierarchy (PH)? That particular problem, in fact, had been open from 1993 all the way to the present, resisting sporadic attacks by me and others.

As I sat through the Wolf Prize ceremony — the speeches in Hebrew that I only 20% understood (though with these sorts of speeches, you can sort of fill in the inspirational sayings for yourself); the applause as one laureate after another announced that they were donating their winnings to charity; the ironic spectacle of far-right, ultranationalist Israeli politicians having to sit through a beautiful (and uncensored) choral rendition of John Lennon’s “Imagine” — I got an email from my friend and colleague Avishay Tal. Avishay wrote that he and Ran Raz had just posted a paper online giving an oracle separation between BQP and PH, thereby putting to rest that quarter-century-old problem. So I was faced with a dilemma: do I look up, at the distinguished people from the US, Canada, Japan, and elsewhere winning medals in Israel, or down at my phone, at the bombshell paper by two Israelis now living in the US?

For those tuning in from home, BQP, or Bounded-Error Quantum Polynomial Time, is the class of decision problems efficiently solvable by a quantum computer. PH, or the Polynomial Hierarchy, is a generalization of NP to allow multiple quantifiers (e.g., does there exist a setting of these variables such that for every setting of those variables, this Boolean formula is satisfied?). These are two of the most fundamental complexity classes, which is all the motivation one should need for wondering whether the former is contained in the latter. If additional motivation is needed, though, we’re effectively asking: could quantum computers still solve problems that were classically hard, even in a hypothetical world where P=NP (and hence P=PH also)? If so, the problems in question could not be any of the famous ones like factoring or discrete logarithms; they’d need to be stranger problems, for which a classical computer couldn’t even recognize a solution efficiently, let alone finding it.

And just so we’re on the same page: if BQP ⊆ PH, then one could hope for a straight-up proof of the containment, but if BQP ⊄ PH, then there’s no way to prove such a thing unconditionally, without also proving (at a minimum) that P ≠ PSPACE. In the latter case, the best we can hope is to provide evidence for a non-containment—for example, by showing that BQP ⊄ PH relative to a suitable oracle. What’s noteworthy here is that even the latter, limited goal remained elusive for decades.

In 1993, Bernstein and Vazirani defined an oracle problem called Recursive Fourier Sampling (RFS), and proved it was in BQP but not in BPP (Bounded-Error Probabilistic Polynomial-Time). One can also show without too much trouble that RFS is not in NP or MA, though one gets stuck trying to put it outside AM. Bernstein and Vazirani conjectured—at least verbally, I don’t think in writing—that RFS wasn’t even in the polynomial hierarchy. In 2003, I did some work on Recursive Fourier Sampling, but was unable to find a version that I could prove was outside PH.

Maybe this is a good place to explain that, by a fundamental connection made in the 1980s, proving that oracle problems are outside the polynomial hierarchy is equivalent to proving lower bounds on the sizes of AC⁰ circuits—or more precisely, constant-depth Boolean circuits with unbounded fan-in and a quasipolynomial number of AND, OR, and NOT gates. And proving lower bounds on the sizes of AC⁰ circuits is (just) within complexity theory’s existing abilities—that’s how, for example, Furst-Saxe-Sipser, Ajtai, and Yao managed to show that PH ≠ PSPACE relative to a suitable oracle (indeed, even a random oracle with probability 1). Alas, from a lower bounds standpoint, Recursive Fourier Sampling is a horrendously complicated problem, and none of the existing techniques seemed to work for it. And that wasn’t even the only problem: even if one somehow succeeded, the separation that one could hope for from RFS was only quasipolynomial (n versus n^{log n}), rather than exponential.

Ten years ago, as I floated in a swimming pool in Cambridge, MA, it occurred to me that RFS was probably the wrong way to go. If you just wanted an oracle separation between BQP and PH, you should focus on a different kind of problem—something like what I’d later call Forrelation. The Forrelation problem asks: given black-box access to two Boolean functions f,g:{0,1}ⁿ→{0,1}, are f and g random and independent, or are they random individually but with each one close to the Boolean Fourier transform of the other one? It’s easy to give a quantum algorithm to solve Forrelation, even with only 1 query. But the quantum algorithm really seems to require querying all the f- and g-inputs in superposition, to produce an amplitude that’s a global sum of f(x)g(y) terms with massive cancellations in it. It’s not clear how we’d reproduce this behavior even with the full power of the polynomial hierarchy. To be clear: to answer the question, it would suffice to show that no AC⁰ circuit with exp(poly(n)) gates could distinguish a “Forrelated” distribution over (f,g) pairs from the uniform distribution.

Using a related problem, I managed to show that, relative to a suitable oracle—in fact, even a random oracle—the relational version of BQP (that is, the version where we allow problems with many valid outputs) is not contained in the relational version of PH. I also showed that a lower bound for Forrelation itself, and hence an oracle separation between the “original,” decision versions of BQP and PH, would follow from something that I called the “Generalized Linial-Nisan Conjecture.” This conjecture talked about the inability of AC⁰ circuits to distinguish the uniform distribution from distributions that “looked close to uniform locally.” My banging the drum about this, I’m happy to say, initiated a sequence of events that culminated in Mark Braverman’s breakthrough proof of the original Linial-Nisan Conjecture. But alas, I later discovered that my generalized version is false. This meant that different circuit lower bound techniques, ones more tailored to problems like Forrelation, would be needed to go the distance.

I never reached the promised land. But my consolation prize is that Avishay and Ran have now done so, by taking Forrelation as their jumping-off point but then going in directions that I’d never considered.

As a first step, Avishay and Ran modify the Forrelation problem so that, in the “yes” case, the correlation between f and the Fourier transform of g is much weaker (though still detectable using a quantum algorithm that makes n^O(1) queries to f and g). This seems like an inconsequential change—sure, you can do that, but what does it buy you?—but it turns out to be crucial for their analysis. Ultimately, this change lets them show that, when we write down a polynomial that expresses an AC⁰ circuit’s bias in detecting the forrelation between f and g, all the “higher-order contributions”—those involving a product of k terms of the form f(x) or g(y), for some k>2—get exponentially damped as a function of k, so that only the k=2 contributions still matter.

There are a few additional ideas that Raz and Tal need to finish the job. First, they relax the Boolean functions f and g to real-valued, Gaussian-distributed functions—very similar to what Andris Ambainis and I did when we proved a nearly-tight randomized lower bound for Forrelation, except that they also need to truncate f and g so they take values in [-1,1]; they then prove that a multilinear polynomial has no way to distinguish their real-valued functions from the original Boolean ones. Second, they exploit recent results of Tal about the Fourier spectra of AC⁰ functions. Third, they exploit recent work of Chattopadhyay et al. on pseudorandom generators from random walks (Chattopadhyay, incidentally, recently finished his PhD at UT Austin). A crucial idea turns out to be to think of the values of f(x) and g(y), in a real-valued Forrelation instance, as sums of huge numbers of independent random contributions. Formally, this changes nothing: you end up with exactly the same Gaussian distributions that you had before. Conceptually, though, you can look at how each tiny contribution changes the distinguishing bias, conditioned on the sum of all the previous contributions; and this leads to the suppression of higher-order terms that we talked about before, with the higher-order terms going to zero as the step size does.

Stepping back from the details, though, let me talk about a central conceptual barrier—one that I know from an email exchange with Avishay was on his and Ran’s minds, even though they never discuss it explicitly in their paper. In my 2009 paper, I identified what I argued was the main reason why no existing technique was able to prove an oracle separation between BQP and PH. The reason was this: the existing techniques, based on the Switching Lemma and so forth, involved arguing (often implicitly) that

any AC⁰ circuit can be approximated by a low-degree real polynomial, but
the function that we’re trying to compute can’t be approximated by a low-degree real polynomial.

Linial, Mansour, and Nisan made this fully explicit in the context of their learning algorithm for AC⁰. And this is all well and good if, for example, we’re trying to prove the n-bit PARITY function is not in AC⁰, since PARITY is famously inapproximable by any polynomial of sublinear degree. But what if we’re trying to separate BQP from PH? In that case, we need to deal with the fundamental observation of Beals et al. 1998: that any function with a fast quantum algorithm, by virtue of having a fast quantum algorithm, is approximable by a low-degree real polynomial! Approximability by low-degree polynomials giveth with the one hand and taketh away with the other.

To be sure, I pointed out that this barrier wasn’t necessarily insuperable. For the precise meaning of “approximable by low-degree polynomials” that follows from a function’s being in BQP, might be different from the meaning that’s used to put the function outside of PH. As one illustration, Razborov and Smolensky’s AC⁰ lower bound method relates having a small constant-depth circuit to being approximable by low-degree polynomials over finite fields, which is different from being approximable by low-degree polynomials over the reals. But this didn’t mean I knew an actual way around the barrier: I had no idea how to prove that Forrelation wasn’t approximable by low-degree polynomials over finite fields either.

So then how do Raz and Tal get around the barrier? Apparently, by exploiting the fact that Tal’s recent results imply much more than just that AC⁰ functions are approximable by low-degree real polynomials. Rather, they imply approximability by low-degree real polynomials with bounded L₁ norms (i.e., sums of absolute values) of their coefficients. And crucially, these norm bounds even apply to the degree-2 part of a polynomial—showing that, even all the way down there, the polynomial can’t be “spread around,” with equal weight on all its coefficients. But being “spread around” is exactly how the true polynomial for Forrelation—the one that you derive from the quantum algorithm—works. The polynomial looks like this:

$$ p(f,g) = \frac{1}{2^{3n/2}} \sum_{x,y \in \left\{0,1\right\}^n} (-1)^{x \cdot y} f(x) g(y). $$

This still isn’t enough for Raz and Tal to conclude that Forrelation itself is not in AC⁰: after all, the higher-degree terms in the polynomial might somehow compensate for the failures of the lower-degree terms. But this difference between the two different kinds of low-degree polynomial—the “thin” kind that you get from AC⁰ circuits, and the “thick” kind that you get from quantum algorithms—gives them an opening that they’re able to combine with the other ideas mentioned above, at least for their noisier version of the Forrelation problem.

This difference between “thin” and “thick” polynomials is closely related to, though not identical with, a second difference, which is that any AC⁰ circuit needs to compute some total Boolean function, whereas a quantum algorithm is allowed to be indecisive on many inputs, accepting them with a probability that’s close neither to 0 nor to 1. Tal used the fact that an AC⁰ circuit computes a total Boolean function, in his argument showing that it gives rise to a “thin” low-degree polynomial. His argument also implies that no low-degree polynomial that’s “thick,” like the above quantum-algorithm-derived polynomial for Forrelation, can possibly represent a total Boolean function: it must be indecisive on many inputs.

The boundedness of the L₁ norm of the coefficients is related to a different condition on low-degree polynomials, which I called the “low-fat condition” in my Counterexample to the Generalized Linial-Nisan Conjecture paper. However, the whole point of that paper was that the low-fat condition turns out not to work, in the sense that there exist depth-three AC⁰ circuits that are not approximable by any low-degree polynomials satisfying the condition. Raz and Tal’s L₁ boundedness condition, besides being simpler, also has the considerable advantage that it works.

As Lance Fortnow writes, in his blog post about this achievment, an obvious next step would be to give an oracle relative to which P=NP but P≠BQP. I expect that this can be done. Another task is to show that my original Forrelation problem is not in PH—or more generally, to broaden the class of problems that can be handled using Raz and Tal’s methods. And then there’s one of my personal favorite problems, which seems closely related to BQP vs. PH even though it’s formally incomparable: give an oracle relative to which a quantum computer can’t always prove its answer to a completely classical skeptic via an interactive protocol.

Since (despite my journalist moratorium) a journalist already emailed to ask me about the practical implications of the BQP vs. PH breakthrough—for example, for the ~70-qubit quantum computers that Google and others hope to build in the near future—let me take the opportunity to say that, as far as I can see, there aren’t any. This is partly because Forrelation is an oracle problem, one that we don’t really know how to instantiate explicitly (in the sense, for example, that factoring and discrete logarithm instantiate Shor’s period-finding algorithm). And it’s partly because, even if you did want to run the quantum algorithm for Forrelation (or for Raz and Tal’s noisy Forrelation) on a near-term quantum computer, you could easily do that sans the knowledge that the problem sits outside the polynomial hierarchy.

Still, as Avi Wigderson never tires of reminding people, theoretical computer science is richly interconnected, and things can turn up in surprising places. To take a relevant example: Forrelation, which I introduced for the purely theoretical purpose of separating BQP from PH (and which Andris Ambainis and I later used for another purely theoretical purpose, to prove a maximal separation between randomized and quantum query complexities), now furnishes one of the main separating examples in the field of quantum machine learning algorithms. So it’s early to say what implications Avishay and Ran’s achievement might ultimately have. In any case, huge congratulations to them.