Special Topics in Complexity Theory, Lecture 18
Thoughts 2018-03-12
Special Topics in Complexity Theory, Fall 2017. Instructor: Emanuele Viola
1 Lecture 18, Scribe: Giorgos Zirdelis
In this lecture we study lower bounds on data structures. First, we define the setting. We have bits of data, stored in
bits of memory (the data structure) and want to answer
queries about the data. Each query is answered with
probes. There are two types of probes:
- bit-probe which return one bit from the memory, and
- cell-probe in which the memory is divided into cells of
bits, and each probe returns one cell.
The queries can be adaptive or non-adaptive. In the adaptive case, the data structure probes locations which may depend on the answer to previous probes. For bit-probes it means that we answer a query with depth- decision trees.
Finally, there are two types of data structure problems:
- The static case, in which we map the data to the memory arbitrarily and afterwards the memory remains unchanged.
- The dynamic case, in which we have update queries that change the memory and also run in bounded time.
In this lecture we focus on the non-adaptive, bit-probe, and static setting. Some trivial extremes for this setting are the following. Any problem (i.e., collection of queries) admits data structures with the following parameters:
-
and
, i.e. you write down all the answers, and
-
and
, i.e. you can always answer a query about the data if you read the entire data.
Next, we review the best current lower bound, a bound proved in the 80’s by Siegel [Sie04] and rediscovered later. We state and prove the lower bound in a different way. The lower bound is for the problem of -wise independence.
Problem 1. The data is a seed of size for a
-wise independent distribution over
. A query
is defined to be the
-th bit of the sample.
The question is: if we allow a little more space than seed length, can we compute such distributions fast?
Theorem 2. For the above problem with it holds that
It follows, that if then
is
. But if
then nothing is known.
Proof. Let . We have the memory of
bits and we are going to subsample it. Specifically, we will select a bit of
with probability
, independently.
The intuition is that we will shrink the memory but still answer a lot of queries, and derive a contradiction because of the seed length required to sample -wise independence.
For the “shrinking” part we have the following. We expect to keep memory bits. By a Chernoff bound, it follows that we keep
bits except with probability
.
For the “answer a lot of queries” part, recall that each query probes bits from the memory. We keep one of the
queries if it so happens that we keep all the
bits that it probed in the memory. For a fixed query, the probability that we keep all its
probes is
.
We claim that with probability at least , we keep
queries. This follows by Markov’s inequality. We expect to not keep
queries on average. We now apply Markov’s inequality to get that the probability that we don’t keep at least
queries is at most
.
Thus, if , then there exists a fixed choice of memory bits that we keep, to achieve both the “shrinking” part and the “answer a lot of queries” part as above. This inequality is true because
and so
. But now we have
bits of memory while still answering as many as
queries.
The minimum seed length to answer that many queries while maintaining -wise independence is
. Therefore the memory has to be at least as big as the seed. This yields
from which the result follows.
This lower bound holds even if the memory bits are filled arbitrarily (rather than having entropy at most
). It can also be extended to adaptive cell probes.
We will now show a conceptually simple data structure which nearly matches the lower bound. Pick a random bipartite graph with nodes on the left and
nodes on the right. Every node on the right side has degree
. We answer each probe with an XOR of its neighbor bits. By the Vazirani XOR lemma, it suffices to show that any subset
of at most
memory bits has an XOR which is unbiased. Hence it suffices that every subset
with
has a unique neighbor. For that, in turn, it suffices that
has a neighborhood of size greater than
(because if every element in the neighborhood of
has two neighbors in
then
has a neighborhood of size
). We pick the graph at random and show by standard calculations that it has this property with non-zero probability.
It suffices to have , so that the probability is strictly less than 1, because
. We can match the lower bound in two settings:
- if
for some constant
, then
suffices,
-
and
suffices.
Remark 3. It is enough if the memory is -wise independent as opposed to completely uniform, so one can have
. An open question is if you can improve the seed length to optimal.
As remarked earlier the lower bound does not give anything when is much larger than
. In particular it is not clear if it rules out
. Next we show a lower bound which applies to this case.
Problem 4. Take bits to be a seed for
-biased distribution over
. The queries, like before, are the bits of that distribution. Recall that
.
Theorem 5. You need .
Proof. Every query is answered by looking at bits. But
queries are answered by the same 2-bit function
of probes (because there is a constant number of functions on 2-bits). There are two cases for
:
-
is linear (or affine). Suppose for the sake of contradiction that
. Then you have a linear dependence, because the space of linear functions on
bits is
. This implies that if you XOR those bits, you always get 0. This in turn contradicts the assumption that the distributions has small bias.
-
is AND (up to negating the input variables or the output). In this case, we keep collecting queries as long as they probe at least one new memory bit. If
when we stop we have a query left such that both their probes query bits that have already been queried. This means that there exist two queries
and
whose probes cover the probes of a third query
. This in turn implies that the queries are not close to uniform. That is because there exist answers to
and
that fix bits probed by them, and so also fix the bits probed by
. But this contradicts the small bias of the distribution.