Intro TCS course post-mortem

Windows On Theory 2018-03-12

This fall I taught CS 121 – “Introduction to Theoretical Computer Science” – at Harvard. This is analogous to courses known at other universities as “Introduction to the Theory of Computation”, “Automata and Computability”, or “Great Ideas in Theoretical Computer Science”, and are often taught using Sipser’s excellent book. However, I decided to significantly revise it and write my own text (which is still work in progress but available online with the source on a GitHub repository).

CS 121 is a required course for Computer Science concentrators, and so we had about 160 students. There was a great variability in students preparation and background, many of which have not taken a proof-based course before. That, combined with the inevitable first-time kinks, made the first several weeks challenging for both the students and the teaching team. That said, I am overall very pleased with the students’ performance. In a course that contained fairly advanced material, students overall did quite well in the problem sets, midterm and final exams. I was also very pleased with my team of teaching fellows (headed by an amazing undergraduate student – Juan Perdomo) that had to deal with teaching a new iteration of the course, including many concepts that they themselves weren’t so familiar with.

Perhaps the most significant change I made from the standard presentation is to make non uniform computation (specifically straightline programs / circuits with NAND gates) the initial computational model, rather than automata. I am quite happy with this choice and intend to keep it for the following reasons:

Boolean gates such as NAND have much tighter correspondence to actual hardware and convey the notion that this is not an arbitrary abstract model but intends to capture computation as is physically realizable.
Starting with a model for finite functions allows us to avoid dealing with infinity in the first few lectures.
Much of the conceptual lessons of the course – that we can model computation mathematically, that we can encode an algorithm as a string and give it as input to other algorithms, that there is a universal algorithm, and that some functions are harder that others – can already be explained in the finite non uniform setting.
The non uniform model is crucial for talking about cryptography (e.g., explaining notions such as “128 bits of security” or giving a model where bitcoin proofs of work make sense), pseudorandomness and quantum computing. Cryptography and pseudorandomness are the most compelling examples of “mining hardness” or “making lemonade out of the computational difficulty lemon” which is a core take away concept. Further I believe that it is crucial to talk about quantum computing in a course that aims to model computation as it exists in the world we live in.
A more minor point is that the non uniform model and the notion of “unrolling the loop” to simulate a uniform computation by a non uniform one, makes certain proofs such as Cook-Levin Theorem, Godel’s Incompleteness Theorem, and BPP in P/poly, technically much easier.

So how did the students like it? The overall satisfaction with the course was 3.6 (in a 1-5 scale) which gives me one more reason to be thankful that I’m a tenured professor and not an Uber driver. On the positive side, 60% of the students rated the course as “very good” or “excellent” and 83% rated it as “good”,”very good” or “excellent”.

Here are the student answers to the question “What did you learn from this course? How did it change you?”. As expected, they are a mixed bag. One answer was “I learned about the theory of computation. This course made me realize I do not want to study the theory of computation. ” which I guess means that the course helped this student fulfill the ancient goal of knowing thyself.

In terms of difficulty and workload, 44% of the students found it “difficult” and 23% found it “very difficult” which is a little (but not significantly) more than the average difficulty level for CS classes at Harvard. While the mean amount of hours (outside lectures) spent on this course per week was 11.6 (par for the course in CS classes), you don’t need the sum of squares algorithm to see that the distribution is a mixture model:

cs121workload

I imagine that the students with less math preparation needed to work much more (but perhaps also gained more in terms of their math skills).

Lessons learned for next time:

There is more work to be done on the text, especially to make it more accessible to students not used to reading mathematical definitions and notations. I plan to add more Sipser-style “proof ideas” to the theorems in the text, and add more plain English exposition, especially at the earlier chapters.
Many students got hung up on the details of the computational models, in particular my “Turing machine analog” which was the NAND++ programming language. I need to find a way to strike the right balance between making sure there is a precise and well-defined model, and being able to properly prove theorems about it, and getting the broader point across that the particular details of the model don’t matter.
I find the idea of incorporating programming-languages based models in this course appealing, and have made some use of Jupyter notebooks in this course. I need to spend more thought on how to use these tools in the pedagogically most useful way.