Special Topics in Complexity Theory, Lecture 10
Thoughts 2018-03-12
Added Dec 27 2017: An updated version of these notes exists on the class page.
Special Topics in Complexity Theory, Fall 2017. Instructor: Emanuele Viola
1 Lecture 10, Guest lecture by Justin Thaler, Scribe: Biswaroop Maiti
This is a guest lecture by Justin Thaler regarding lower bounds on approximate degree [BKT17, BT15, BT17]. Thanks to Justin for giving this lecture and for his help with the write-up. We will sketch some details of the lower bound on the approximate degree of ,
and some intuition about the techniques used. Recall the definition of
from the previous lecture as below:
Definition 1. The surjectivity function , takes input
where each
is interpreted as an element of
.
has value
if and only if
.
Recall from the last lecture that is the block-wise composition of the
function on
bits and the
function on
bits. In general, we will denote the block-wise composition of two functions
, and
, where
is defined on
bits and
is defined on
bits, by
. Here, the outputs of
copies of
are fed into
(with the inputs to each copy of
being pairwise disjoint). The total number of inputs to
is
.
1.1 Lower Bound of
via lower bound of
AND-OR
Claim 2. .
We will look at only the lower bound in the claim. We interpret the input as a list of numbers from
. As presented in [BKT17], the proof for the lower bound proceeds in the following steps.
- Show that to approximate
, it is necessary to approximate the block-composition
on inputs of Hamming weight at most
. i.e., show that
.
Step 1 was covered in the previous lecture, but we briefly recall a bit of intuition for why the claim in this step is reasonable. The intuition comes from the fact that the converse of the claim is easy to establish, i.e., it is easy to show that in order to approximate
, it is sufficient to approximate
on inputs of Hamming weight exactly
.
This is because
can be expressed as an
(over all range items
) of the
(over all inputs
) of “Is input
equal to
”? Each predicate of the form in quotes is computed exactly by a polynomial of degree
, since it depends on only
of the input bits, and exactly
of the predicates (one for each
) evaluates to TRUE.
Step 1 of the lower bound proof for
in [BKT17] shows a converse, namely that the only way to approximate
is to approximate
on inputs of Hamming weight at most
.
- Show that
, i.e., the degree required to approximate
on inputs of Hamming weight at most
is at least
.
In the previous lecture we also sketched this Step 2. In this lecture we give additional details of this step. As in the papers, we use the concept of a “dual witness.” The latter can be shown to be equivalent to bounded indistinguishability.
Step 2 itself proceeds via two substeps:
- Give a dual witness
for
that has places little mass (namely, total mass less then
) on inputs of hamming weight
.
- By modifying
, give a dual witness
for
that places zero mass on inputs of Hamming weight
.
- Give a dual witness
In [BKT17], both Substeps 2a and 2b proceed entirely in the dual world (i.e., they explicitly manipulate dual witnesses and
). The main goal of this section of the lecture notes is to explain how to replace Step 2b of the argument of [BKT17] with a wholly “primal” argument.
The intuition of the primal version of Step 2b that we’ll cover is as follows. First, we will show that a polynomial of degree
that is bounded on the low Hamming Weight inputs, cannot be too big on the high Hamming weight inputs. In particular, we will prove the following claim.
Claim 3. If is a degree
polynomial that satisfies
on all inputs of
of Hamming weight at most
, then
for all inputs
.
Second, we will explain that the dual witness constructed in Step 2a has the following “primal” implication:
Claim 4. For , any polynomial
of degree
satisfying
for all inputs
of Hamming weight at most
must satisfy
for some input
.
Combining Claims 3 and 4, we conclude that no polynomial of degree
can satisfy
which is exactly the desired conclusion of Step 2. This is because any polynomial satisfying Equation (1) also satisfies
for all
of Hamming weight of most
, and hence Claim 3 implies that
But Claim 4 states that any polynomial satisfying both Equations (1) and (2) requires degree strictly larger than .
In the remainder of this section, we prove Claims 3 and 4.
1.2 Proof of Claim 3
Proof of Claim 3. For notational simplicity, let us prove this claim for polynomials on domain , rather than
.
Proof in the case that is symmetric. Let us assume first that
is symmetric, i.e.,
is only a function of the Hamming weight
of its input
. Then
for some degree
univariate polynomial
(this is a direct consequence of Minsky-Papert symmetrization, which we have seen in the lectures before). We can express
as below in the same spirit of Lagrange interpolation.
Here, the first term, ,is bounded in magnitude by
, and
. Therefore, we get the final bound:
Proof for general . Let us now consider the case of general (not necessarily symmetric) polynomials
. Fix any input
. The goal is to show that
.
Let us consider a polynomial of degree
obtained from
by restricting each input
such that
to have the value 0. For example, if
and
, then
. We will exploit three properties of
:
-
.
- Since
for all inputs with
,
satisfies the analogous property:
for all inputs with
.
- If
denotes the all-1s vector of length
, then
.
Property 3 means that our goal is to show that .
Let denote the symmetrized version of
, i.e.,
, where the expectation is over a random permutation
of
, and
. Since
for all permutations
,
. But
is symmetric, so Properties 1 and 2 together mean that the analysis from the first part of the proof implies that
for all inputs
. In particular, letting
, we conclude that
as desired.
Discussion. One may try to simplify the analysis of the general case in the proof Claim 3 by considering the polynomial defined via
], where the expectation is over permutations
of
.
is a symmetric polynomial, so the analysis for symmetric polynomials immediately implies that
. Unfortunately, this does not mean that
.
This is because the symmetrized polynomial is averaging the values of
over all those inputs of a given Hamming weight. So, a bound on this averaging polynomial does not preclude the case where
is massively positive on some inputs of a given Hamming weight, and massively negative on other inputs of the same Hamming weight, and these values cancel out to obtain a small average value. That is, it is not enough to conclude that on the average over inputs of any given Hamming weight, the magnitude of
is not too big.
Thus, we needed to make sure that when we symmetrize to
, such large cancellations don’t happen, and a bound of the average value of
on a given Hamming weight really gives us a bound on
on the input
itself. We defined
so that
. Since there is only one input in
of Hamming weight
,
does not average
’s values on many inputs, meaning we don’t need to worry about massive cancellations.
A note on the history of Claim 3. Claim 3 was implicit in [RS10]. They explicitly showed a similar bound for symmetric polynomials using primal view and (implicitly) gave a different (dual) proof of the case for general polynomials.
1.3 Proof of Claim 4
1.3.1 Interlude Part 1: Method of Dual Polynomials [BT17]
A dual polynomial is a dual solution to a certain linear program that captures the approximate degree of any given function . These polynomials act as certificates of the high approximate degree of
. The notion of strong LP duality implies that the technique is lossless, in comparison to symmetrization techniques which we saw before. For any function
and any
, there is always some dual polynomial
that witnesses a tight
-approximate degree lower bound for
. A dual polynomial that witnesses the fact that
is a function
satisfying three properties:
-
Correlation analysis:
If
satisfies this condition, it is said to be well-correlated with
.
-
Pure high degree: For all polynomials
of degree less than
, we have
If
satisfies this condition, it is said to have pure high degree at least
.
-
norm:
1.3.2 Interlude Part 2: Applying The Method of Dual Polynomials To Block-Composed Functions
For any function , we can write an LP capturing the approximate degree of
. We can prove lower bounds on the approximate degree of
by proving lower bounds on the value of feasible solution of this LP. One way to do this is by writing down the Dual of the LP, and exhibiting a feasible solution to the dual, thereby giving an upper bound on the value of the Dual. By the principle of LP duality, an upper bound on the Dual LP will be a lower bound of the Primal LP. Therefore, exhibiting such a feasible solution, which we call a dual witness, suffices to prove an approximate degree lower bound for
.
However, for any given dual witness, some work will be required to verify that the witness indeed meets the criteria imposed by the Dual constraints.
When the function is a block-wise composition of two functions, say
and
, then we can try to construct a good dual witness for
by looking at dual witnesses for each of
and
, and combining them carefully, to get the dual witness for
.
The dual witness constructed in Step 2a for
is expressed below in terms of the dual witness of the inner
function viz.
and the dual witness of the outer
, viz.
.
This method of combining dual witnesses for the “outer” function
and
for the “inner function”
is referred to in [BKT17, BT17] as dual block composition.
1.3.3 Interlude Part 3: Hamming Weight Decay Conditions
Step 2a of the proof of the lower bound from [BKT17] gave a dual witness
for
(with
) that had pure high degree
, and also satisfies Equations (4) and (5) below.
Equation (4) is a very strong “Hamming weight decay” condition: it shows that the total mass that places on inputs of high Hamming weight is very small. Hamming weight decay conditions play an essential role in the lower bound analysis for
from [BKT17]. In addition to Equations (4) and (5) themselves being Hamming weight decay conditions, [BKT17]’s proof that
satisfies Equations (4) and (5) exploits the fact that the dual witness
for
can be chosen to simultaneously have pure high degree
, and to satisfy the following weaker Hamming weight decay condition:
Claim 5. There exist constants such that for all
,
(We will not prove Claim 5 in these notes, we simply state it to highlight the importance of dual decay to the analysis of ).
Dual witnesses satisfying various notions of Hamming weight decay have a natural primal interpretation: they witness approximate degree lower bounds for the target function ( in the case of Equation (4), and
in the case of Equation (6)) even when the approximation is allowed to be exponentially large on inputs of high Hamming weight. This primal interpretation of dual decay is formalized in the following claim.
Claim 6. Let be any function mapping
to
. Suppose
is a dual witness for
satisfying the following properties:
- (Correlation):
.
- (Pure high degree):
has pure high degree
.
- (Dual decay):
for all
.
Then there is no degree polynomial
such that
Proof. Let be any degree
polynomial. Since
has pure high degree
,
.
We will now show that if satisfies Equation (7), then the other two properties satisfied by
(correlation and dual decay) together imply that
, a contradiction.
Here, Line 2 exploited that has correlation at least
with
, Line 3 exploited the assumption that
satisfies Equation (7), and Line 4 exploited the dual decay condition that
is assumed to satisfy.
1.3.4 Proof of Claim 4
Proof. Claim 4 follows from Equations (4) and (5), combined with Claim 6. Specifically, apply Claim 6 with , and
2 Generalizing the analysis for
to prove a nearly linear approximate degree lower bound for
Now we take a look at how to extend this kind of analysis for to obtain even stronger approximate degree lower bounds for other functions in
. Recall that
can be expressed as an
(over all range items
) of the
(over all inputs
) of “Is input
equal to
”? That is,
simply evaluates
on the inputs
where
indicates whether or not input
is equal to range item
.
Our analysis for can be viewed as follows: It is a way to turn the
function on
bits (which has approximate degree
) into a function on close to
bits, with polynomially larger approximate degree (i.e.
is defined on
bits where, say, the value of
is
, i.e., it is a function on
bits). So, this function is on not much more than
bits, but has approximate degree
, polynomially larger than the approximate degree of
.
Hence, the lower bound for can be seen as a hardness amplification result. We turn the
function on
bits to a function on slightly more bits, but the approximate degree of the new function is significantly larger.
From this perspective, the lower bound proof for showed that in order to approximate
, we need to not only approximate the
function, but, additionally, instead of feeding the inputs directly to
gate itself, we are further driving up the degree by feeding the input through
gates. The intuition is that we cannot do much better than merely approximate the
function and then approximating the block composed
gates. This additional approximation of the
gates give us the extra exponent in the approximate degree expression.
We will see two issues that come in the way of naive attempts at generalizing our hardness amplification technique from to more general functions.
2.1 Interlude: Grover’s Algorithm
Grover’s algorithm [Gro96] is a quantum algorithm that finds with high probability the unique input to a black box function that produces a given output, using queries on the function, where
is the size of the the domain of the function. It is originally devised as a database search algorithm that searches an unsorted database of size
and determines whether or not there is a record in the database that satisfies a given property in
queries. This is strictly better compared to deterministic and randomized query algorithms because they will take
queries in the worst case and in expectation respectively. Grover’s algorithm is optimal up to a constant factor, for the quantum world.
2.2 Issues: Why a dummy range item is necessary
In general, let us consider the problem of taking any function that does not have maximal approximate degree (say, with approximate degree
), and turning it into a function on roughly the same number of bits, but with polynomially larger approximate degree.
In analogy with how equals
evaluated on inputs
, where
indicates whether or not
, we can consider the block composition
evaluated on
, and hope that this function has polynomially larger approximate degree than
itself.
Unfortunately, this does not work. Consider for example the case . The function
evaluates to 1 on all possible vectors
, since all such vectors of Hamming weight exactly
.
One way to try to address this is to introduce a dummy range item, all occurrences of which are simply ignored by the function. That is, we can consider the (hopefully harder) function to interpret its input as a list of
numbers from the range
, rather than range
, and define
(note that variables
, which indicate whether or not each input
equals range item
, are simply ignored).
In fact, in the previous lecture we already used this technique of introducing a “dummy” range item, to ease the lower bound analysis for itself. Last lecture we covered Step 1 of the lower bound proof for
, and we let
denote the frequency of the dummy range item, 0. The introduction of this dummy range item let us replace the condition
(i.e., the sum of the frequencies of all the range items is exactly
) by the condition
(i.e., the sum of the frequencies of all the range items is at most
).
2.3 A dummy range item is not sufficient on its own
Unfortunately, introducing a dummy range item is not sufficient on its own. That is, even when the range is is rather than
, the function
may have approximate degree that is not polynomially larger than that of
itself. An example of this is (once again)
. With a dummy range item,
evaluates to TRUE if and only if at least one of the
inputs is not equal to the dummy range item
. This problem has approximate degree
(it can be solved using Grover search).
Therefore, the most naive approach at general hardness amplification, even with a dummy range item, does not work.
2.4 The approach that works
The approach that succeeds is to consider the block composition (i.e., apply the naive approach with a dummy range item not to
itself, but to
). As pointed out in Section 2.3, the
gates are crucial here for the analysis to go through.
It is instructive to look at where exactly the lower bound proof for breaks down if we try to adapt it to the function
(rather than the function
which we analyzed to prove the lower bound for
). Then we can see why the introduction of the
gates fixes the issue.
When analyzing the more naively defined function (with a dummy range item), Step 1 of the lower bound analysis for
does work unmodified to imply that in order to approximate
, it is necessary to approximate block composition of
on inputs of Hamming weight at most
. But Step 2 of the analysis breaks down: one can approximate
on inputs of Hamming weight at most
using degree just
.
Why does the Step 2 analysis break down for ? If one tries to construct a dual witness
for
by applying dual block composition (cf. Equation (3), but with the dual witness
for
replaced by a dual witness for
),
will not be well-correlated with
.
Roughly speaking, the correlation analysis thinks of each copy of the inner dual witness as consisting of a sign,
, and a magnitude
, and the inner dual witness “makes an error” on
if it outputs the wrong sign, i.e., if
. The correlation analysis winds up performing a union bound over the probability (under the product distribution
) that any of the
copies of the inner dual witness makes an error. Unfortunately, each copy of the inner dual witness makes an error with constant probability under the distribution
. So at least one of them makes an error under the product distribution with probability very close to 1. This means that the correlation of the dual-block-composed dual witness
with
is poor.
But if we look at , the correlation analysis does go through. That is, we can give a dual witness
for
and a dual witness
for
such that the the dual-block-composition of
and
is well-correlated with
.
This is because [BT15] showed that for ,
. This means that
has a dual witness
that “makes an error” with probability just
. This probability of making an error is so low that a union bound over all
copies of
appearing in the dual-block-composition of
and
implies that with probability at least
, none of the copies of
make an error.
In summary, the key difference between and
that allows the lower bound analysis to go through for the latter but not the former is that the latter has
-approximate degree
for
, while the former only has
-approximate degree
if
is a constant bounded away from 1.
To summarize, the lower bound can be seen as a way to turn the function
into a harder function
, meaning that
has polynomially larger approximate degree than
. The right approach to generalize the technique for arbitrary
is to (a) introduce a dummy range item, all occurrences of which are effectively ignored by the harder function
, and (b) rather than considering the “inner” function
, consider the inner function
, i.e., let
. The
gates are essential to make sure that the error in the correlation of the inner dual witness is very small, and hence the correlation analysis for the dual-block-composed dual witness goes through. Note that
can be interpreted as follows: it breaks the range
up into
blocks, each of length
, (the dummy range item is excluded from all of the blocks), and for each block it computes a bit indicating whether or not every range item in the block has frequency at least 1. It then feeds these bits into
.
By recursively applying this construction, starting with , we get a function in AC
with approximate degree
for any desired constant
.
2.5
distinctness
The above mentioned very same issue also arises in [BKT17]’s proof of a lower bound on the approximate degree of the -distinctness function. Step 1 of the lower bound analysis for
reduced analyzing
-distinctness to analyzing
(restricted to inputs of Hamming weight at most
), where
is the function that evaluates to TRUE if and only if its input has Hamming weight at least
. The lower bound proved in [BKT17] for
-distinctness is
.
is the
function. So,
is “close” to
. And we’ve seen that the correlation analysis of the dual witness obtained via dual-block-composition breaks down for
.
To overcome this issue, we have to show that is harder to approximate than
itself, but we have to give up some small factor in the process. We will lose some quantity compared to the
lower bound for
. It may seem that this loss factor is just a technical issue and not intrinsic, but this is not so. In fact, this bound is almost tight. There is an upper bound from a complicated quantum algorithm [BL11, Bel12] for
-distinctness that makes
that we won’t elaborate on here.
References
[Bel12] Aleksandrs Belovs. Learning-graph-based quantum algorithm for k-distinctness. In Foundations of Computer Science (FOCS), 2012 IEEE 53rd Annual Symposium on, pages 207–216. IEEE, 2012.
[BKT17] Mark Bun, Robin Kothari, and Justin Thaler. The polynomial method strikes back: Tight quantum query bounds via dual polynomials. arXiv preprint arXiv:1710.09079, 2017.
[BL11] Aleksandrs Belovs and Troy Lee. Quantum algorithm for k-distinctness with prior knowledge on the input. arXiv preprint arXiv:1108.3022, 2011.
[BT15] Mark Bun and Justin Thaler. Hardness amplification and the approximate degree of constant-depth circuits. In International Colloquium on Automata, Languages, and Programming, pages 268–280. Springer, 2015.
[BT17] Mark Bun and Justin Thaler. A nearly optimal lower bound on the approximate degree of . arXiv preprint arXiv:1703.05784, 2017.
[Gro96] Lov K Grover. A fast quantum mechanical algorithm for database search. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 212–219. ACM, 1996.
[RS10] Alexander A Razborov and Alexander A Sherstov. The sign-rank of . SIAM Journal on Computing, 39(5):1833–1855, 2010.