Mathematics of the impossible: Computational Complexity, Chapter 3: The grand challenge

Thoughts 2023-02-02

All posts in this series. A PDF version of this post will be published with a delay, but if you’d like to have it soon let me know.


Contents

PhDComics

As mentioned in Chapter ??, our ability to prove impossibility results related to efficient computation appears very limited. We can now express this situation more precisely with the models we’ve introduced since then.

It is consistent with our knowledge that any problem in a standard algorithm textbook can be solved

  1. in Time cn^{2} on a TM, and
  2. in Time cn on a 2-TM, and
  3. by circuits of size cn.

Note that 2. implies 1. by Theorem ??.

In this chapter we begin to present several impossibility results, covering a variety of techniques which will be used later as well. As hinted above, they appear somewhat weak. However, jumping ahead, there is a flip side to all of this:

  1. At times, contrary to our intuition, stronger impossibility results are actually false. One example appears in Chapter ??. A list will be given later.
  2. Many times, the impossibility results that we can prove turn out to be, surprisingly, just “short” of proving major results. Here by “major result” I mean a result that would be phenomenal and that was in focus long before the connection was established. We will see several examples of this (section º??, section º??).
  3. Yet other times, one can identify broad classes of proof techniques, and argue that impossibility results can’t be proved with them (section º??).

Given this situation, I don’t subscribe to the general belief that stronger impossibility results are true and we just can’t prove them.

1.1 Information bottleneck: Palindromes requires quadratic time on TMs

Intuitively, the weakness of TMs is the bottleneck of passing information from one end of the tape to the other. We now show how to formalize this and use it show that deciding if a string is a palindrome requires quadratic time on TMs, which is tight and likely matches the time in Exercise ??. The same bound can be shown for other functions; palindromes just happen to be convenient to obtain matching bounds.

Theorem 1.1. \text {Palindromes}\not \in \text {TM-Time}(t(n)) for any t(n)=o(n^{2}).

More precisely, for every n and s, an s-state TM that decides if an n-bit input is a palindrome requires time \ge cn^{2}/\log s.

The main concept that allows us to formalize the information bottleneck mentioned above is the following.

Definition 1.1. A crossing sequence of a TM M on input x and boundary i, abbreviated i-CS, is the sequence of states that M is transitioning to when crossing cell boundary i (i.e., going from Cell i to i+1 or vice versa) during the computation on x.

The idea in the proof is very interesting. If M accepts inputs x and y and those two inputs have the same i-CS for some i, then we can “stitch together” the computation of M on x and y at boundary i to create a new input z that is still accepted by M. The input z is formed by picking bits from x to the left of cell boundary i and bits from y to the right of i:

\begin{aligned} z:=x_{1}x_{2}\cdots x_{i}y_{i+1}y_{i+2}\cdots y_{n}. \end{aligned}

The proof that z is still accepted is left as an exercise.

Now, for many problems, input z should not be accepted by M, and this gives a contradiction. In particular this will be be the case for palindromes. We are going to find two palindromes x and y that have the same i-CS for some i, but the corresponding z is not a palindrome, yet it is still accepted by M. We can find these two palindromes if M takes too little time. The basic idea is that if M runs in time t, because i-CSs for different i correspond to different steps of the computation, for every input there is a value of i such that the i-CS is short, namely has length at most t(|x|)/n. If t(n) is much less than n^{2}, the length of this CS is much less than n, from which we can conclude that the number of CSs is much less than the number of inputs, and so we can find two inputs with the same CS.

ProofLet n be divisible by four, without loss of generality, and consider palindromes of the form

\begin{aligned} p(x):=x0^{n/2}x^{R} \end{aligned}

where x\in \{0,1\} ^{n/4} and x^{R} is the reverse of x.

Assume there are x\ne y in \{0,1\} ^{n/4} and i in the middle part, i.e., n/4\le i\le 3n/4-1, so that the i-CS of M on p(x) and p(y) is the same. Then we can define z:=x0^{n/2}y^{R} which is not a palindrome but is still accepted by M, concluding the proof.

There remains to prove that the assumption of Theorem 1.1 implies the assumption in the previous paragraph. Suppose M runs in time t. Since crossing sequences at different boundaries correspond to different steps of the computation, for every x\in \{0,1\} ^{n/4} there is a value of i in the middle part such that the i-CS of M on p(x) has length \le ct/n. This implies that there is an i in the middle s.t. there are \ge c2^{n/4}/n inputs x for which the i-CS of M on x has length \le ct/n.

For fixed i, the number of i-CS of length \le \ell is \le (s+1)^{\ell }.

Hence there are x\ne y for which p(x) and p(y) have the same i-CS whenever c2^{n/4}/n\ge (s+1)^{ct/n}. Taking logs one gets ct\log (s)/n\le cn. QED

Exercise 1.1. For every s and n describe an s-state TM deciding palindromes in time cn^{2}/\log s (matching Theorem 1.1).

Exercise 1.2. Let L:=\{xx:x\in \{0,1\} ^{*}\}. Show L\in \text {TM-Time\ensuremath {(cn^{2})}}, and prove this is tight up to constants.

One may be tempted to think that it is not hard to prove stronger bounds for similar functions. In fact as mentioned above this has resisted all attempts!

1.2 Counting: impossibility results for non-explicit functions

Proving the existence of hard functions is simple: Just count. If there are more functions than efficient machines, some function is not efficiently computable. This is applicable to any model; next we state it for TMs for concreteness. Later we will state it for circuits.

Theorem 1.2. There exists a function f:\{0,1\} ^{n}\to \{0,1\} that cannot be computed by a TM with s states unless cs\log s\ge 2^{n}, regardless of time.

ProofThe number of TMs with s states is \le s{}^{cs}, and each TM computes at most one function (it may compute none, if it does not stop). The number of functions on n bits is 2^{2^{n}}. Hence if 2^{n}>cs\log s some function cannot be computed. QED

Note this bound is not far from that in Exercise ??.

It is instructive to present this basic result as an application of the probabilistic method:

ProofLet us pick f uniformly at random. We want to show that

\begin{aligned} \mathbb {P}_{f}[\exists \text { an }s\text {-state TM }M\text { such that }M(x)=f(x)\text { for every }x\in \{0,1\} ^{n}]<1. \end{aligned}

Indeed, if the probability is less than 1 than some function exists that cannot be computed. By a union bound we can say that this probability is

\begin{aligned} \le \sum _{M}\mathbb {P}_{f}[M(x)=f(x)\text { for every }x\in \{0,1\} ^{n}], \end{aligned}

where the sum is over all s-state machines. Each probability in the sum is (1/2)^{2^{n}}, since M is fixed. The number of s-state machines is \le s^{cs}. So the sum is \le s^{cs}(1/2)^{2^{n}}, and we can conclude as before taking logs. QED

1.3 Diagonalization: Enumerating machines

Can you compute more if you have more time? For example, can you write a program that runs in time n^{2} and computes something that cannot be computed in time n^{1.5}? The answer is yes for trivial reasons if we allow for non-boolean functions.

Exercise 1.3. Give a function f:\{0,1\}^* \to \{0,1\}^* in \text {Time}(n^{2})\setminus \text {Time}(n^{1.5}).

The answer is more interesting if the functions are boolean. Such results are known as time hierarchies, and a generic technique for proving them is diagonalization, applicable to any model.

We first illustrate the result in the simpler case of partial functions, which contains the main ideas. Later we discuss total functions.

Theorem 1.3. There is a partial function in \text {TM-Time}(t(n)) such that any TM M computing it runs in time \ge c_{M}t(n), for any t(n)=\omega (1).

In other words, \text {Time}(t(n))\supsetneq \text {Time}(o(t(n)).

ProofConsider the TM H that on input x=(M,1^{n-|M|}) of length n runs M on x until it stops and then complements the answer. (We can use a simple encoding of these pairs, for example every even-position bit of the description of M is a 0.)

Now define X to be the subset of pairs s.t. M runs in time \le t(n)/|M|^{c} on inputs of length n, and |M|^{c}\le t(n)/2.

On these inputs, H runs in time |M|^{c}+|M|^{c}\cdot t(n)/|M|^{c}\le t(n), as desired. To accomplish this, H can begin by making a copy of M in time |M|^{c}\le t(n)/2. Then every step of the computation of M can be simulated by H with |M|^{c} steps, always keeping the description of M to the left of the head.

Now suppose N computes the same function as H in time t(n)/|N|^{c}. Note that x:=(N,1^{n-|N|}) falls in the domain X of the function, for n sufficiently large, using that t(n)=\omega (1). Now consider running N on x. We have N(x)=H(x) by supposition, but H(x) is the complement of N(x), contradiction. QED

This proof is somewhat unsatisfactory; in particular we have no control on the running time of H on inputs not in X. It is desirable to have a version of this fundamental result for total functions. Such a version is stated next. It requires additional assumptions on t and a larger gap between the running times. Perhaps surprisingly, as we shall discuss, both requirements are essential.

Theorem 1.4. Let t(n)\ge n be a function. Suppose that f(x):=t(|x|) is in \text {TM-Time}(t(n)/\log ^{c}n).

There is a total function in \text {TM-Time}(ct(n)\log t(n)) that cannot be computed by any TM M in time c_{M}t(n).

The assumption about t is satisfied by all standard functions, including all those in this book. (For example, take t(n):=n^{2}. The corresponding f is then |x|^{2}. To compute f on input x of n bits we can first compute |x|=n in time cn\log n (Exercise ??). This is a number of b:=\log n bits. We can then square this number in time b^{c}. Note that the time to compute f(x) is dominated by the cn\log n term coming from computing |x|, which does not depend on t and is much less than the n^{2}/\log ^{c}n in the assumption.) The assumption cannot be removed altogether because there exist pathological functions t for which the result is false.

The proof is similar to that of Theorem 1.3. However, to make the function total we need to deal with arbitrary machines, which may not run in the desired time or even stop at all. The solution is to clock H in a manner similar to the proof of the universal machine, Lemma ??.

Also, we define a slightly different language to give a stronger result – a unary language – and to avoid some minor technical details (the possibility that the computation of f erases the input).

We define a TM H that on input 1^{n} obtains a description of a TM M, computes t(n), and then simulates M on input 1^{n} for t(n) steps in a way similar to Lemma ??, and if M stops then H outputs the complement of the output of M; and if M does not stop then H stops and outputs anything. Now H computes a function in time about t(n). We argue that this function cannot be computed in much less time as follows. Suppose some TM M computes the function in time somewhat less than t(n). Then we can pick an 1^{n} for which H obtains the description of this M, and the simulation always stops. Hence, for that 1^{n} we would obtain M(1^{n})=H(1^{n})=1-M(1^{n}), which is a contradiction.

However, there are interesting differences with the simulation in Lemma ??. In that lemma the universal machine U was clocking the steps of the machine M being simulated. Now instead we need to clock the steps of U itself, even while U is parsing the description of M to compute its transition function. This is necessary to guarantee that H does not waste time on big TM descriptions.

Whereas in Lemma ?? the tape was arranged as

\begin{aligned} (x,M,\underline {i},t',y), \end{aligned}

it will now be arranged as

\begin{aligned} (x,M',\underline {i},t',M'',y) \end{aligned}

which is parsed as follows. The description of M is M'M'', M is in state \underline {i}, the tape of M contains xy and the head is on the left-most symbol of y. The integer t' is the counter decreased at every step

ProofDefine TM H that on input 1^{n}:

  1. Compute (n,t(n),1^{n}).
  2. Compute (M_{n},t(n),1^{n}). Here M_{n} is obtained from n by removing all left-most zeroes until the first 1. I.e., if n=0^{j}1x then M_{n}=x. This is similar to the fact that a program does not change if you add, say, empty lines at the bottom.
  3. Simulate M_{n} on 1^{n}, reducing the counter t(n) at every step, including those parsing M_{n}, as explained before.
  4. If M_{n} stops before the counter reaches 0, output the complement of its output. If the counter reaches 0 stop and output anything.

Running time of H.

  1. Computing n is similar to Exercise ??. By assumption t(n) is computable in time t(n)/\log ^{c}n. Our definition of computation allows for erasing the input, but we can keep n around spending at most another \log ^{c}n factor. Thus we can compute (n,t(n)) in time t(n). Finally, in case it was erased, we can re-compute 1^{n} in time cn\log n by keeping a counter (initialized to a copy of n).
  2. This takes time c(n+t(n)): simply scan the input and remove zeroes.
  3. Decreasing the counter takes c|t(n)| steps. Hence this simulation will take ct(n)\log t(n) time.

Overall the running time is ct(n)\log t(n).

Proof that the function computed by H requires much time. Suppose some TM M computes the same function as H. Consider inputs 1^{n} where n=0^{j}1M. Parsing the description of M to compute its transition function takes time c_{M}, a value that depends on M only and not on j. Hence H will simulate \lfloor t(n)/c_{M}\rfloor steps of M. If M stops within that time (which requires t(n) to be larger than b_{M}, and so n and j sufficiently large compared to M) then the simulation terminates and we reach a contradiction as explained before. QED

The extra \log t(n) factor cannot be reduced because of the surprising result presented in Theorem 1.5 showing that, on TMs, time o(n\log n) equals time n for computing total functions.

However, tighter time hierarchies hold for more powerful models, like RAMs. Also, a time hierarchy for total functions for \text {BPTime} is… an open problem! But a hierarchy is known for partial functions.

Exercise 1.4. (1) State and prove a tighter time hierarchy for Time (which recall corresponds to RAMs) for total functions. You don’t need to address simulation details, but you need to explain why a sharper separation is possible.

(2) Explain the difficulty in extending (1) to \text {BPTime}.

(3) State and prove a time hierarchy for \text {BPTime} for partial functions.

1.3.1 \text {TM-Time}(o(n\log n))=\text {TM-Time}(n)

In this subsection we prove the result in the title, which we also mentioned earlier. First let us state the result formally.

Theorem 1.5. [12] Let f:\{0,1\}^* \to \{0,1\} be in \text {TM-Time}(t(n)) for a t(n)=o(n\log n). Then f\in \text {TM-Time}(n).

Note that time n is barely enough to scan the input. Indeed, the corresponding machines in Theorem 1.5 will only move the head in one direction.

The rest of this section is devoted to proving the above theorem. Let M be a machine for f witnessing the assumption of the theorem. We can assume that M stops on every input (even though our definition of time only applies to large enough inputs), possibly by adding \le n to the time, which does not change the assumption on t(n). The theorem now follows from the combination of the next two lemmas.

Lemma 1.1. Let M be a TM running in time t(n)\le o(n\log n). Then on every input x\in \{0,1\}^* every i-CS with i\le |x| has length \le c_{M}.

ProofFor an integer b let x(b) be a shortest input such that there exists j\in \{0,1,\ldots ,n\} for which the j-CS in the computation of M on x(b) has length \ge b. Let n(b):=|x(b)|.

Exercise 1.5. Prove n(b)\to \infty for b\to \infty .

There are n(b)+1\ge n(b) tape boundaries within or bordering x(b). If we pick a boundary uniformly at random, the average length of a CS on x(b) is \le t(n(b))/n(b). Hence there are \ge n(b)/2 choices for i s.t. the i-CS on x(b) has length \le 2t(n(b))/n(b).

The number of such crossing sequences is

\begin{aligned} \le (s+1)^{2t(n(b))/n(b)}=(s+1)^{o(n(b)\log (n(b))/n(b)}=n(b)^{o(\log s)}. \end{aligned}

Hence, the same crossing sequence occurs at \ge (n(b)/2)/n(b)^{o(\log s)}\ge 4 positions i, using that n(b) is large enough.

Of these four, one could be the CS of length \ge b from the definition of x(b). Of the other three, two are on the same side of j. We can remove the corresponding interval of the input without removing the CS of length \ge b. Hence we obtained a shorter input with a CS of length \ge b. Contradiction. QED

Lemma 1.2. Suppose f:\{0,1\}^* \to \{0,1\} is computable by a TM such that on every input x, every i-CS with i\le |x| has length \le b. Then f is computable in time n by a TM with c_{b} states that only moves the head in one direction.

Exercise 1.6. Prove this.

1.4 Circuits

The situation for circuits is similar to that of 2-TMs, and by Theorem ?? we know that proving \omega (n\log n) bounds on circuits is harder than for 2-TMs. Even a bound of cn is unknown. The following is a circuit version of Theorem 1.2. Again, the bound is close to what we saw in Theorem ??.

Theorem 1.6. [3] There are functions f:\{0,1\} ^{n}\to \{0,1\} that require circuits of size \ge (1-o(1))2^{n}/n, for every n.

One can prove a hierarchy for circuit size, by combining Theorem 1.6 and Theorem ??. As stated, the results give that circuits of size cs compute more functions than those of size s. In fact, the “o(1)” in the theorems is small, so one can prove a sharper result. But a stronger and more enjoyable argument exists.

Theorem 1.7. [Hierarchy for circuit size] For every n and s\le c2^{n}/n there is a function f:\zonzo that can be computed by circuits of size s+cn but not by circuits of size s.

ProofConsider a path from the all-zero function to a function which requires circuits of size \ge s, guaranteed to exist by Theorem 1.6, changing the output of the function on one input at each step of the path. Let h be the first function that requires size >s, and let h' be the function right before that in the path. Note that h' has circuits of size \le s, and h can be computed from h' by changing the value on a single input. The latter can be implemented by circuits of size cn. QED

Exercise 1.7. Prove a stronger hierarchy result for alternating circuits, where the “cn” in Theorem 1.7 is replaced with “c.”

In fact, this improvement is possible even for (non aternating) circuits, see Problem 1.2.

1.4.1 The circuit won’t fit in the universe: Non-asymptotic, cosmological results

Most of the results in this book are asymptotic, i.e., we do not explicitly work out the constants because they become irrelevant for larger and larger input lengths. As stated, these results don’t say anything for inputs of a fixed length. For example, any function on 10^{100} bits is in Time(c).

However, it is important to note that all the proofs are constructive and one can work out the constants, and produce non-asymptotic results. We state next one representative example when this was done. It is about a problem in logic, an area which often produces very hard problems.

On an alphabet of size 63, the language used to write formulas includes first-order variables that range over \mathbb {N}, second-order variables that range over finite subsets of \mathbb {N}, the predicates “y=x+1” and “x\in S” where x and y denote first-order variables and S denotes a set variable, and standard quantifiers, connectives, constants, binary relation symbols on integers, and set equality. For example one can write things like “every finite set has a maximum:” \forall S\exists x\in S\forall y\in S,y\le x.

Theorem 1.8. [4] To decide the truth of logical formulas of length at most 610 in this language requires a circuit containing at least 10^{125} gates. So even if each gate were the size of a proton, the circuit would not fit in the known universe.

Their result applies even to randomized circuits with error 1/3, if 610 is replaced with 614. (We can define randomized circuits analogously to BPTime.)

1.5 Problems

Problem 1.1. Hierarchy Theorem 1.4 only gives a function f that cannot be computed fast on all large enough input lengths: it is consistent with the theorem that f can be computed fast on infinitely many input lengths.

Give a function f:\{0,1\}^* \to \{0,1\}^* mapping x to \{0,1\} ^{\log \log \log |x|} that is computable in time n^{c} but such that for any TM M running in time n^{2} the following holds. For every n\ge c_{M} and every x\in \{0,1\} ^{n} such that M(x)\ne f(x).

Hint: Note the range of f. Can this result hold as stated with range \{0,1\} ?

Problem 1.2. Replace “cn” in Theorem 1.7 with “c.”

Problem 1.3. Prove that \{0^{i}1^{i}:i\ge 0\}\in \text {TM-Time\ensuremath {(cn\log n)\setminus \text {TM-Time}(n)}.}

This shows Theorem 1.5 is tight, and gives an explicit problem witnessing the time-hierarchy separation in Theorem 1.4, for a specific time bound. For the negative result, don’t use pumping lemmas or other characterization results not covered in this book.

Problem 1.4. The following argument contradicts Theorem 1.4; what is wrong with it?

“By Theorem 1.5, \text {TM-Time}(n\log ^{0.9}n)=\text {TM-Time}(n). By padding (Theorem 1.5), \text {TM-Time}(n\log ^{1.1}n)=\text {TM-Time}(n\log ^{0.9}n). Hence \text {TM-Time}(n\log ^{1.1}n)=\text {TM-Time}(n).

References

[1]   F. C. Hennie. One-tape, off-line turing machine computations. Information and Control, 8(6):553–578, 1965.

[2]   Kojiro Kobayashi. On the structure of one-tape nondeterministic turing machine time hierarchy. Theor. Comput. Sci., 40:175–193, 1985.

[3]   Claude E. Shannon. The synthesis of two-terminal switching circuits. Bell System Tech. J., 28:59–98, 1949.

[4]   Larry Stockmeyer and Albert R. Meyer. Cosmological lower bound on the circuit complexity of a small problem in logic. J. ACM, 49(6):753–784, 2002.