Mathematics of the impossible, Chapter 7, Space
Thoughts 2023-03-24
As mentioned in Chapter 2, Time is only one of several resources we with to study. Another important one is space, which is another name for the memory of the machine. Computing under space constraints may be less familiar to us than computing under time constraints, and many surprises lay ahead in this chapter that challenge our intuition of efficient computation.
If only space is under consideration, and one is OK with a constant-factor slackness, then TMs and RAMs are equivalent; much like is invariant under power changes in time. In a sense, changing the space by a constant factor is like changing the time by a power; from this point of view the equivalence is not surprising.
We shall consider both space bounds bigger than the input length and smaller. For the latter, we have to consider the input separately. The machine should be able to read the input, but not write on its cells. One way to formalize this is to consider 2TMs, where one tape holds the input and is read-only. The other is a standard read-write tape.
We also want to compute functions whose output is more than bit. One option is to equip the machine with yet another tape, which is write-only. We prefer to stick to two tapes and instead require that given the output bit of is computable efficiently.
Definition 7.1. A function is computable in if there is a 2TM which on input on the first tape, where and , outputs the bit of , and the machine never writes on the first tape and never uses more than cells on the second.
We define:
We investigate next some basic relationship between space and time.
Proof. 1. A TM running in time can only move its heads at most tape cells. We can write all these contents in one tape. To simulate one step of the we do a pass on all the contents and update them accordingly.
2. A RAM running in time can only access memory cells, each containing at most bits; the factor is to take into account that the machine starts with word length . We simulate this machine and for each Write operation we add a record on the tape with the memory cell index and the content, similar to Theorem ??. When the RAM reads from memory, we scan the records and read off the memory values from one of them. If the record isn’t found, then the simulation returns the value corresponding to the initial configuration.
We also allocate bits for the registers of the RAM. It can be shown that the operations among them can be performed in the desired space, since they only involve a logarithmic number of bits. A stronger result is proved later in Theorem 7.5.
3. On an input with , a machine can be in at most configurations. The first factor is for the head position on the input. The second factor is for the contents of the second tape. Since the machine ultimately stops, configurations cannot repeat, hence the same machine (with no modification) will stop in the desired time. QED
A non-trivial saving is given by the following theorem.
Theorem 7.2. [12] , for every function .
This result is not specific to MTMs but can be proved for other models such as RAMs.
From Theorem 7.1 and the next exercise we have
Just like for Time, for space one has universal machines and a hierarchy theorem. The latter implies . Hence, analogously to the situation for Time and NTime (section º5.1), we know that at least one of the inclusions above between L and PSpace is strict. Most people seem to think that all are, but nobody can prove that any specific one is.
7.1 Branching programs
Branching programs are the non-uniform counterpart of , just like circuits are the non-uniform counterpart of .
Definition 7.2. A (branching) program is a directed acyclic graph. A node can be labeled with an input variable, in which case it has two outgoing edges labeled and . Alternatively a node can be labeled with or , in which case it has no outgoing edges. One special node is the start node.
The space of the program with nodes is . A program computes a function by following the path from the starting node, following edge labels corresponding to the input, and outputting as soon as it reaches a node labeled .
Theorem 7.3. Suppose a 2TM computes in space . Then has branching programs of space . In particular, any has branching programs of size .
Proof. Each node in the program corresponds to a configuration. QED
Definition 7.3. The branching program given by Theorem 7.3 is called the configuration graph of .
The strongest available impossibility result for branching programs is the following.
Theorem 7.4. [16] There are explicit functions (in ) that require branching programs of size on inputs of length .
7.2 The power of L
Computing with severe space bounds, as in L, seems quite difficult. Also, it might be somewhat less familiar than, say, computing within a time bound. It turns out that L is a powerful class capable of amazing computational feats that challenge our intuition of efficient computation. Moreover, these computational feats hinge on deep mathematical techniques of wide applicability. We hinted at this in Chapter 1. We now give further examples. At the same time we develop our intuition of what is computable with little space.
To set the stage, we begin with a composition result. In the previous sections we used several times the simple result that the composition of two maps in P is also in P. This is useful as it allows us to break a complicated algorithm in small steps to be analyzed separately – which is a version of the divide et impera paradigm. A similar composition result holds and is useful for space, but the argument is somewhat less obvious.
Lemma 7.1. Let be in and satisfy for a function . Suppose is in .
Then the composed function defined as is computable in space .
In particular, if and are in L then is in L, as long as for a constant .
7.2.1 Arithmetic
A first example of the power of L is given by its ability to perform basic arithmetic. Grade school algorithms use a lot of space, for example they employ space to multiply two -bit integers.
Theorem 7.5. The following problems are in L:
- Addition of two input integers.
- Iterated addition: Addition of any number of input integers.
- Multiplication of two input integers.
- Iterated multiplication: Multiplication of any number of input integers.
- Division of two integers.
Iterated multiplication is of particular interest because it can be used to compute “pseudorandom functions.” Such objects shed light on our ability to prove impossibility results via the “Natural Proofs” connection which we will see in Chapter ??.
Proof of 1. in Theorem 7.5. We are given as input and an index and need to compute bit of . Starting from the least significant bits, we add the bits of and , storing the carry of 1 bit in memory. Output bits are discarded until we reach bit , which is output. QED
Exercise 7.3. Prove 2. and 3. in Theorem 7.5.
Proving 4. and 5. is more involved and requires some of those deep mathematical tools of wide applicability we alluded to before. Division can be computed once we can compute iterated multiplication. In a nutshell, the idea is to use the expansion
We omit details about bounding the error. Instead, we point out that this requires computing powers which is an example of iterated multiplication (and in fact is no easier).
So for the rest of this section we focus on iterated multiplication. Our main tool for this the Chinese-remaindering representation of integers, abbreviated CRR.
Definition 7.4. We denote by the integers modulo equipped with addition and multiplication (modulo ).
Theorem 7.6. Let be distinct primes and . Then is isomorphic to .
The forward direction of the isomorphism is given by the map
For the converse direction, there exist integers , depending only on the such that the converse direction is given by the map
Each integer is mod for and is mod .
Example 7.1. is isomorphic to . The equation corresponds to . The equation corresponds to . Note how addition and multiplication in CRR are performed in each coordinate separately; how convenient.
To compute iterated multiplication the idea is to move to CRR, perform the multiplications there, and then move back to standard representation. A critical point is that each coordinate in the CRR has a representation of only bits, which makes it easy to perform iterated multiplication one multiplication at the time, since we can afford to write down intermediate products.
The algorithm is as follows:
- Let and compute the first prime numbers .
- Convert the input into CRR: Compute .
- Compute the multiplications in CRR: .
- Convert back to standard representation.
Now we explain how steps 1, 2, and 3 can be implemented in L. Step 4 can be implemented in L too, but showing this is somewhat technical due to the computation of the numbers in Theorem 7.6. However these numbers only depend on the input length, and so we will be able to give a self-contained proof that iterated multiplication has branching programs of size .
Step 1
By Theorem ??, the primes have magnitude and so can be represented with bits. We can enumerate over integers with bits in L. For each integer we can test if it’s prime by again enumerating over all integers and with bits and checking if , say using the space-efficient algorithm for multiplication in Theorem 7.5. (The space required for this step would in fact be .)
Step 2
We explain how given we can compute in L. If is bit of we have that
Step 3
This is a smaller version of the original problem: for each , we want to compute from . However, as mentioned earlier, each is at most in magnitude and thus has a representation of bits. Hence we can just perform one multiplication at the time in L.
Step 4
By Theorem 7.6, to convert back to standard representation from CRR we have to compute the map
Assuming we can compute the , this is just multiplication and iterated addition, which are in L by Theorem 7.5.
Putting the steps together
Combining the steps together we can compute iterated multiplication in L as long as we are given the integers in Theorem 7.6.
Theorem 7.7. Given integers , and given the integers as in Theorem 7.6, where , we can compute in L.
In particular, because the only depend on the input length, but not on the they can be hardwired in a branching program.
Exercise 7.5. Show that given integers and one can decide if
in L. You cannot use the fact that iterated multiplication is in L, a result which we stated but not completely proved.
Exercise 7.6. Show that the iterated multiplication of matrices over the integers has branching programs of size .
7.2.2 Graphs
We now give another example of the power of L.
Definition 7.5. The undirected reachability problem: Given an undirected graph and two nodes and in determine if there is a path from to .
Standard time-efficient algorithms to solve this problem mark nodes in the graph. In logarithmic space we can keep track of a constant number of nodes, but it is not clear how we can avoid repeating old steps forever.
Theorem 7.8. [20] Undirected reachability is in L.
The idea behind this result is that a random walk on the graph will visit every node, and can be computed using small space, since we just need to keep track of the current node. Then, one can derandomize the random walk and obtain a deterministic walk, again computable in L.
7.2.3 Linear algebra
Our final example comes from linear algebra. Familiar methods for solving a linear system
can be done requires a lot of space. For example using elimination we need to rewrite the matrix . Similarly, we cannot easily compute determinants using small space. However, a different method exists.
Theorem 7.9. [9] Solving a linear system is computable in .
7.3 Checkpoints
The checkpoint technique is a fundamental tool in the study of space-bounded computation. Let us illustrate it at a high level. Let us consider a graph , and let us write if there is a path of length from to . The technique allows us to trade the length of the path with quantifiers. Specifically, for any parameter , we can break down paths from to in smaller paths that go through checkpoints. The length of the smaller paths needs be only (assuming that divides ). We can guess the breakpoints and verify each smaller path separately, at the price of introducing quantifiers but with the gain that the path length got reduced from to . The checkpoint technique is thus an instantiation of the general paradigm of guessing computation and verifying it locally, introduced in Chapter 5. One difference is that now we are only going to guess parts of the computation.
An important aspect of this technique is that it can be applied recursively: We can apply it again to the problems . We need to introduce more quantifiers, but we can reduce the path length to , and so on. We will see several instantiations of this technique, for various settings of parameters, ranging from to .
We now utilize the checkpoint technique to show a simulation of small-space computation by small-depth alternating circuits.
Theorem 7.10. A function computable by a branching program with nodes is also computable by an alternating circuit of depth and size , for any .
To illustrate the parameters, suppose , and let us pick where . Then we have circuits of depth and size . In other words, we can have depth and size , for every . This in particular holds for every function in . We will later give explicit functions (also in ) which cannot be computed by circuits of depth and size , “just short” of ruling out . This state of affairs is worth emphasis:
Proof. We apply the checkpoint technique to the branching program, recursively, with parameter . For simplicity we first assume that is a power of . Each application of the technique reduces the path length by a factor . Hence with applications we can reduce the path length to .
In one application, we have an quantifier over nodes, corresponding to an Or gate with fan-in , and then a quantifier over smaller paths, corresponding to an And gate with fan-in . This gives a tree with leaves. Iterating, the number of leaves will be
Each leaf can be connected to the input bit on which it depends. The size of the circuit is at most times the number of leaves.
If is not a power of we can view the branching program as having nodes where is a power of . QED
The following is a uniform version of Theorem 7.10, and the proof is similar. It shows that we can speed up space-bounded computation with alternations.
Theorem 7.11. [17]Any is also in , for any .
Proof. Let be the configuration graph of . Note this graph has nodes. We need to decide if the start configuration reaches the accept configuration in this graph within steps.
We apply to this graph the checkpoint technique recursively, with parameter . Each application of the technique reduces the path length by a factor . Hence with applications we can reduce the path length to
Each quantifier ranges over bits for large enough .
There remains to check a path of length , i.e., an edge. The endpoints of this edge are two configurations and which depend on the quantified bits. The machine can compute the two endpoints in time where is the total number of quantified bits, using rapid access. Once it has and it can check if leads to in one step by reading one bit from the input. Note , so . QED
We note that by the generic simulation of alternating time by small-depth circuits in Lemma 6.2, the above theorem also gives a result similar to Theorem 7.10.
7.4 Reductions: L vs. P
Again, we can use reductions to related the space complexity of problems. In particular we can identify the problems in P which have space-efficient algorithms iff every problem in P does.
Proof. Circuit value is in P since we can evaluate one gate at the time. Now let . We can reduce computing on input to a circuit value instance, as in the simulation of TMs by circuits in Theorem ??. The important point is that this reduction is computable in L. Indeed, given an index to a gate in the circuit, we can compute the type of the gate and index to its children via simple arithmetic, which is in L by Theorem 7.5, and some computation which only depends on the description of the P-time machine for . QED
Definition 7.8. The monotone circuit value problem: Given a circuit with no negations and an input , compute .
Recall from section 7.2.3 that finding solutions to linear systems
has space-efficient algorithms. Interestingly, if we generalize equalities to inequalities the problem becomes P complete. This set of results illustrates a sense in which “linear algebra” is easier than “optimization.”
Definition 7.9. The linear inequalities problem: Given a matrix of integers and a -dimensional vector, determine if the system has a solution over the reals.
Proof. The ellipsoid algorithm shows that Linear inequalities is in P, but we will not discuss this classic result.
Instead, we focus on showing how given a circuit and an input we can construct a set of inequalities that are satisfiable iff .
We shall have as many variables as gates in the circuit, counting input gates as well.
For an input gate add equation .
For a Not gate add equation .
For an And gate add equations .
The case of Or is similar, or can be dispensed by writing an Or using Not and And.
Finally, if is the output gate add equation .
We claim that in every solution to the value of is the value in of gate on input . This can be proved by induction. For input and Not gates this is immediate. For an And gate, note that if then as well because of the equations and . The same holds if . If both and are 1 then is as well because of the equations and . QED
7.4.1 Nondeterministic space
Because of the insight we gained from considering non-deterministic time-bounded computation in section º5.1, we are naturally interested in non-deterministic space-bounded computation. In fact, perhaps we will gain even more insight, because this notion will really challenge our understanding of computation.
For starters, let us define non-deterministic space-bounded computation. A naive approach is to define it using the quantifiers from section º6.4, leading to the class . This is an ill-fated choice:
Instead, non-deterministic space is defined in terms of non-deterministic TMs.
Definition 7.10. A function is computable in if there is a two-tape TM which on input never writes on the first tape and never uses more than cells on the second, and moreover:
1. The machine is equipped with a special “Guess” state. Upon entering this state, a guess bit is written on the work tape under the head.
2. iff there exists a choice for the guess bits that causes the machine to output .
We define
How can we exploit this non-determinism? Recall from section 7.2.2 that reachability in undirected graphs is in L. It is unknown if the same holds for directed graphs. However, we can solve it in NL.
Definition 7.11. The directed reachability problem: Given a directed graph and two nodes and , decide if there is a path from to .
Proof. The proof simply amounts to guessing a path in the graph. The algorithm is as follows:
“On input , let .
For to :
If , accept.
Guess a neighbor of . Let .
If you haven’t accepted, reject.”
The space needed is . QED
We can define NL completeness similarly to NP and P completeness, and have the following result.
7.4.2 Nspace vs. time
Recall by definition . We showed in Theorem 7.1. We can strengthen the inclusion to show that it holds even for non-deterministic space.
Proof. On input x, we compute the configuration graph of on input . The nodes are the configurations, and there is an edge from to if the machine can go from to in one step. Then we solve reachability on this graph in power time, using say breadth-first-search. QED
The next theorem shows that non-deterministic space is not much more powerful than deterministic space: it buys at most a square. Contrast this with the P vs. NP question! The best determistic simulation of NP that we know is the trivial . Thus the situation for space is entirely different.
How can this be possible? The high-level idea, which was used already in Lemma 7.1, can be cast as follows:
Theorem 7.17. [23] , for every . In particular, .
Proof. We use the checkpoint technique with parameter , and re-use the space to verify the smaller paths. Let be a non-deterministic TM computing a function in . We aim to construct a deterministic TM that on input returns
where decides if is reachable from in steps in the configuration graph of on input , and is the start configuration, is the accept configuration, and is the number of configurations of .
The key point is how to implement Reach.
For all “middle” configurations
Reject
Let denote the space needed for computing . We have
This is because we can re-use the space for two calls to Reach. Therefore, the space for is
QED
Recall that we do not know if is closed under complement. It is generally believed not to be the case, and we showed that if it is then the PH collapses Exercise 6.3.
What about space? Is closed under complement? Theorem 7.17 shows . So if then the complement function is in (and in particular in ). Hence, up to a quadratic loss in space, non-deterministic space is closed under complement.
Can we avoid squaring the space?
Yes! This is weird!
Proof. We want a non-deterministic 2TM that given accepts if there is no path from to in .
For starters, suppose the machine has computed the number of nodes reachable from . The key idea is that there is no path from to iff there are nodes different from reachable from . Thus, knowing we can solve the problem as follows
If Accept, else Reject.
There remains to compute .
Let be the nodes at distance from , and let . Note . We seek to compute .
To compute from , enumerate nodes (candidate in ). For each , enumerate over all nodes in , and check if is an edge. If so, increase by .
The enumeration over is done guessing nodes and paths from . If we don’t find nodes, we reject. QED
7.5 TiSp
So far in this chapter we have focused on bounding the space usage. For this, the TM model was sufficient, as remarked at the beginning. It is natural to consider algorithms that operate in little time and space. For this, of course, whether we use TMs or RAMs makes a difference.
Definition 7.12. Let be the functions computable on a RAM that on every input runs in time and only uses memory cells to .
An alternative definition of TiSp would allow the RAM to access cells anywhere in memory. One can maintain a data structure to show that this alternative definition is equivalent to Definition 7.12.
To illustrate the relationship between TiSp, Time, and Space, consider undirected reachability. It is solvable in Time by bradth-first search, and in logarithmic space by Theorem 7.8. But it isn’t known if it is in for some constant .
Exercise 7.11. Prove that Theorem 7.11 holds even for any function in for any and .
The following is a non-uniform version of TiSp.
Definition 7.13. A branching program of length and width is a branching program where the nodes are partitioned in layers where nodes in only lead to nodes in , and for every .
Thus represents the time of the computation, and the space.
Recall that Theorem 7.10 gives bounds of the form on the size of branching program (without distinguishing between length and width). For branching programd og length and width this bound gives . But it gives nothing for power width like . The state-of-the-art for power width is [2, 7] (in fact the bound holds even for subexponential) .
With these definitions in hand we can refine the connection between branching programs and small-depth circuits in Theorem 7.10 for circuits of depth 3.
Theorem 7.19. Let be computable by a branching program with width and time . Then is computable by an alternating depth-3 circuit with wires.
We will later show explicit functions that require depth-3 circuits of size . Theorem 7.19 shows that improving this would also improve results for small-width branching programs, a refinement of the message emphasized before after Theorem 7.10.
7.6 Notes
For a discussion of the complexity of division, see [3]. For a compendium of P-complete problems see [11].
References
[1] Mikl≤s Ajtai. -formulae on finite structures. Annals of Pure and Applied Logic, 24(1):1–48, 1983.
[2] Mikl≤s Ajtai. A non-linear time lower bound for boolean branching programs. Theory of Computing, 1(1):149–176, 2005.
[3] Eric Allender. The division breakthroughs. Bulletin of the EATCS, 74:61–77, 2001.
[4] Noga Alon, Oded Goldreich, Johan Hσstad, and RenΘ Peralta. Simple constructions of almost -wise independent random variables. Random Structures & Algorithms, 3(3):289–304, 1992.
[5] Kenneth E. Batcher. Sorting networks and their applications. In AFIPS Spring Joint Computing Conference, volume 32, pages 307–314, 1968.
[6] Paul Beame, Stephen A. Cook, and H. James Hoover. Log depth circuits for division and related problems. SIAM J. Comput., 15(4):994–1003, 1986.
[7] Paul Beame, Michael Saks, Xiaodong Sun, and Erik Vee. Time-space trade-off lower bounds for randomized computation of decision problems. J. of the ACM, 50(2):154–195, 2003.
[8] Stephen A. Cook. A hierarchy for nondeterministic time complexity. J. of Computer and System Sciences, 7(4):343–353, 1973.
[9] L. Csanky. Fast parallel matrix inversion algorithms. SIAM J. Comput., 5(4):618–623, 1976.
[10] Aviezri S. Fraenkel and David Lichtenstein. Computing a perfect strategy for n x n chess requires time exponential in n. J. Comb. Theory, Ser. A, 31(2):199–214, 1981.
[11] Raymond Greenlaw, H. James Hoover, and Walter Ruzzo. Limits to Parallel Computation: P-Completeness Theory. 02 2001.
[12] John E. Hopcroft, Wolfgang J. Paul, and Leslie G. Valiant. On time versus space. J. ACM, 24(2):332–337, 1977.
[13] Richard M. Karp and Richard J. Lipton. Turing machines that take advice. L’Enseignement MathΘmatique. Revue Internationale. IIe SΘrie, 28(3-4):191–209, 1982.
[14] Leonid A. Levin. Universal sequential search problems. Problemy Peredachi Informatsii, 9(3):115–116, 1973.
[15] Joseph Naor and Moni Naor. Small-bias probability spaces: efficient constructions and applications. SIAM J. on Computing, 22(4):838–856, 1993.
[16] E. I. Nechiporuk. A boolean function. Soviet Mathematics-Doklady, 169(4):765–766, 1966.
[17] Valery A. Nepomnjaščiĭ. Rudimentary predicates and Turing calculations. Soviet Mathematics-Doklady, 11(6):1462–1465, 1970.
[18] NEU. From RAM to SAT. Available at http://www.ccs.neu.edu/home/viola/, 2012.
[19] Wolfgang J. Paul, Nicholas Pippenger, Endre SzemerΘdi, and William T. Trotter. On determinism versus non-determinism and related problems (preliminary version). In IEEE Symp. on Foundations of Computer Science (FOCS), pages 429–438, 1983.
[20] Omer Reingold. Undirected connectivity in log-space. J. of the ACM, 55(4), 2008.
[21] J. M. Robson. N by N checkers is exptime complete. SIAM J. Comput., 13(2):252–267, 1984.
[22] Rahul Santhanam. On separators, segregators and time versus space. In IEEE Conf. on Computational Complexity (CCC), pages 286–294, 2001.
[23] Walter J. Savitch. Relationships between nondeterministic and deterministic tape complexities. Journal of Computer and System Sciences, 4(2):177–192, 1970.
[24] Michael Sipser. A complexity theoretic approach to randomness. In ACM Symp. on the Theory of Computing (STOC), pages 330–335, 1983.
[25] Seinosuke Toda. PP is as hard as the polynomial-time hierarchy. SIAM J. on Computing, 20(5):865–877, 1991.
[26] Leslie G. Valiant and Vijay V. Vazirani. NP is as easy as detecting unique solutions. Theor. Comput. Sci., 47(3):85–93, 1986.
[27] N. V. Vinodchandran. A note on the circuit complexity of PP. Theor. Comput. Sci., 347(1-2):415–418, 2005.
[28] Emanuele Viola. On approximate majority and probabilistic time. Computational Complexity, 18(3):337–375, 2009.