Teaching circuits as the first computational model

Windows On Theory 2022-06-21

This fall, I am once again teaching Harvard’s “Introduction to Theoretical Computer Science” course (CS 121). Like many “intro to TCS / intro to theory of computation” courses, Harvard’s course used to be taught with Sipser’s classic textbook. Sipser’s book is indeed, for better or worse, a classic. It is extremely well-written and students like it very much. It has clear explanations, plenty of examples and solved exercises, and a wealth of material on the web accumulated through decades of it being used in many courses. On the other hand, CS in general and theoretical CS in particular has changed a lot in the 25+ years since the book was written. In fact, the basic approach of starting with finite automata as the initial model of computation dates to the 1969 book of Hopcroft and Ullman.

One of my main goals in revising the theoretical CS course is to give students both rigorous foundations as well as a taste of modern topics. Some of these modern topics:

Cryptography: a topic that combines mathematical beauty, practical importance, and a demonstration that sometimes computational hardness can be a resource rather than a hindrance.
Quantum computing: a topic that shows the interaction between TCS and physics, the fundamental nature of the “Church Turing hypothesis”, and how we can (as in crypto) take a “lemon” (inability of classical computers to simulate certain quantum processes) and use it to make “lemonade” (a computer with stronger power than classical computers).
Randomized computation and derandomization: Randomization is now essential to so many areas of CS, and so it is important to both demonstrate its power, and also how we might use complexity to remove it.
Machine learning and average-case complexity: Traditionally in an intro TCS course the focus is purely on worst-case complexity. This leads to a disconnect with modern applications of CS, and in particular machine learning.

So, I ended up writing my own text – Introduction to Theoretical Computer Science. While at some point I hope to make it into a printed book, it will always be available freely online on https://introtcs.org/. The markdown source for it is available on the repository https://github.com/boazbk/tcs . I’ve benefitted greatly from feedback from both students and readers around the globe: at the time of writing, the project has 330 issues and 385 pull requests.

A central difference between the approach I take and the one of previous courses is that I start from Boolean circuits as the first model of computation. Boolean circuits are crucial to teach the topics above:

Cryptography is much more natural with circuits rather than Turing machines as the model of computation. Statements such as “128 bits of security” make no sense in the asymptotic Turing machine formalism, but can be made precise with circuits.
The standard model for quantum computing is quantum circuits.
Derandomization is best described using the circuit model, and of course many results such as BPP in the Polynomial-Hierarchy are best shown using circuits and the class P/poly as an intermediate concept.
Circuits are a very natural fit for machine learning, and in particular Neural Networks are just a special type of circuit.

Finally, while circuits are often considered an “advanced” topic, they have some advantages over automata as the initial model of computation:

Finite is always easier than infinite: Starting with circuits enables us to start the course talking about arguably the simplest object: finite functions. Writing down the truth table of a finite function, and showing that there is more than one circuit to compute the same function, also helps clarify the difference between specification of a function and its implementation by some algorithm, which is distinction that many students grapple with.
Circuits are connected to actual hardware. An intro to TCS course is not a pure math course – we want to convey to students that are models are motivated by actual computing. Circuits make this connection much closer, and less artificial than automata or even Turing machines.
Can show cool theorems early. If we start with automata, then the first theorems we show can often seem not well motivated to students. It takes some time to build the machinery to show the main theorem – equivalence of automata and regular expressions – and the proof of that theorem is rather technical. In contrast, with circuits we can show three important theorems rather quickly: (1) every finite function $f:\{0,1\}^n \rightarrow \{0,1\}$ can be computed by some circuit of at most $\tilde{O}(2^n)$ size, (2) every circuit of size $s$ can be represented as a labeled graph and hence (using adjacency list) by a string of size $\tilde{O}(s)$ , and (3) using (2) and the fact that there are $2^{2^n}$ functions mapping $\{0,1\}^n$ to $\{0,1\}$ , there exist some function $f:\{0,1\}^n \rightarrow \{0,1\}$ that requires a circuit of $\tilde{\Omega}(2^n)$ gates.

While the course is a theory course, and not about programming, one of my goals in the book and course was to connect it to programming. This is not just to motivate students and make them feel that the material is “practical” but also to better understand the theory itself. Notions such as NP-completeness reductions can be often confusing to students (which is why they get the direction wrong half the time). Implementing a reduction from 3SAT to independent set and seeing the end result make it much more concrete.

3SAT to independent set reduction from the chapter on NP reductions.

One way in which I wanted to use programming is to demonstrate to students how we can take a piece of Python code such as the code for adding two numbers given in their binary representation:

def add(A,B):       """Add two binary numbers, given as lists of bits"""      Y = []      carry = zero(A[0]) # initialize carry to 0      for i in range(len(A)): # compute i-th digit of output            y = xor(A[i],B[i],carry) # xor function            carry = maj(A[i],B[i],carry) # majority function            Y.append(y)      Y.append(carry)      return Y

And obtain the corresponding circuit:

The code also uses the following one-linear helper functions

def maj(a,b,c): return (a & b) | (b&c) | (a&c)def zero(a): return a & ~adef xor2(a,b): return (a & ~b) | (~a & b)def xor(*L): return xor2(*L) if len(L)==2 else xor2(xor(*L[:-1]),L[-1])

If you think about it, the task corresponds to extracting the computational graph of a piece of Python code. This is precisely the same task that auto-differentiation packages such as Pytorch need to do. Hence it can be solved in a similar way. Thus, inspired by Karpathy’s micro-grad package (see my back-propagation tutorial) and using the awesome SchemDraw package, I wrote a short colab notebook that does precisely that.

Specifically, the notebook defines a Bit class that (as its name suggests) stores a single bit. The class defines the logical AND, OR and NOT operations ( &, | , ~ in Python). If a and b are two bits then c = a & b not just contains the value which is the AND of the values of a and b, but also pointers to a and b and remembers how it was computed from them. This allows us to obtain from c a formula/circuit expressing it in terms of a and b.

class Bit:  counter  = 0  def __init__(self,val=0, label="-"):     self.label = label    self.data = val    self.children = []    def op(self,f, label, *others):    inputs = [self.data] + [o.data for o in others]    out = Bit(f(*inputs),label)    out.children = [self] + list(others)    return out  def __and__(self,other): return self.op(lambda a,b: a & b, "\\wedge", other)  def __or__(self,other): return self.op(lambda a,b: a | b, "\\vee", other)  def __invert__(self): return self.op(lambda a: ~a, "\\neg")

Now we can write a simple recursive function formula (see the notebook) to give out the latex of the formula corresponding to how a particular bit was computed. So if we write

from IPython.display import Markdown, display, MathY = xor(Bit(0,"X_0"), Bit(1,"X_1"))Math(formula(Y))

Then we will get

The code of a recursive function that transforms this into the circuit

is only slightly more complicated. The code to draw the addition circuit above is the following:

A = [Bit(0,f"A_{i}") for i in range(2)]B = [Bit(0,f"B_{i}") for i in range(2)]draw_circ(*add(A,B))

See the colab notebook for more.