Course retrospective: SMT Solving and Solver-Aided Systems

composition.al 2020-09-21

Last fall, at the start of my second year at UC Santa Cruz, I taught a graduate seminar on SMT solving and solver-aided systems!

I was lucky I had the chance to offer this course at all. I had originally been scheduled to teach something more typical and boring, but in spring 2019, as my first year at UCSC was ending, my department chair asked me if I wanted to instead teach a special topics grad seminar in the fall, on anything I wanted. Hell yeah! Sign me up! I couldn’t believe my good fortune, since I’d already gotten to teach one such course in my first year. The topic I’d chosen that first time, Languages and Abstractions for Distributed Programming, had been really fun, but pretty squarely in my comfort zone as a researcher. For this one, I decided I would venture outside of my comfort zone and learn something new.

Choosing a topic

It wasn’t too hard to decide what I wanted to learn. For years, ever since my time working with the Reluplex team at Stanford while I was at Intel Labs, I’d wanted to know more about how the guts of SMT solvers worked. In my field, programming languages (PL), lots of people use solvers as black-box tools without being too concerned about their implementation. But to build things like, for instance, Reluplex, which is a domain-specific SMT solver for verifying safety properties of neural networks, deeper knowledge of solver internals is required.

I was excited by the idea of domain-specific SMT solvers. Just as with domain-specific compilers, a domain-specific solver could exploit domain knowledge for efficiency (and elegance!). But, while I had some kind of a clue about how to implement a compiler, I had very little of a clue about how to implement an SMT solver. This was rather embarrassing, considering I had just published a paper arguing that people ought to be building domain-specific SMT solvers. Well, time to learn. I would force myself to learn about solver internals by signing myself up to teach about them!

I decided that the first three weeks of my course would have a lecture format and focus on solver internals. After that, it would switch to a seminar format in which students would read and present papers on solver-aided languages (like Dafny, Liquid Haskell, and Rosette), solver-aided systems (like KLEE), and domain-specific solvers (like Reluplex). The papers I chose came from a wide variety of venues — verification, testing, security, operating systems, and several distinct flavors of PL — reflecting my own expanding interests, or at least reflecting my attempt to keep my own PL myopia in check.

Six students signed up to take my class for credit (a number that, to my amazement, grew to eight over the first few weeks), and a couple of others audited. Most people in the room were grad students, but a couple of them were undergrads, and there were even a couple of professors who sat in from time to time (!).

The first three weeks: SMT solver innards

Once I knew that I was going to spend three weeks lecturing about solvers, I went to look for courses at other institutions that seemed to have have a good-sized overlap with what I wanted to teach. Two that stood out were Işıl Dillig’s Automated Logical Reasoning at UT Austin, and Emina Torlak’s Computer-Aided Reasoning for Software at UW. Both were influential in how I designed this part of my course. (If you like slides, I strongly recommend both of their lecture slides!)

I saw that some folks I trusted had recommended Kroening and Strichman’s Decision Procedures: An Algorithmic Point of View as a textbook, so I began reading it and making lecture notes based on the first few chapters. I also decided to go with Marijn Heule and Oliver Kullmann’s very nice 2017 CACM article “The Science of Brute Force” as the first reading assignment in the course, and I made the first lecture entirely based on it.

I’ve been meaning to clean up these lecture notes before sharing them — make them less embarrassingly loquacious, maybe add some proper diagrams instead of the ASCII art — but let’s face it, that’s never gonna happen. So, here they are, in 17-point font because the use case is me-squinting-at-my-laptop-from-two-meters-away-while-writing-furiously-on-the-board. The PDFs are generated from Markdown, using Pandoc; if you want the original Markdown documents (for instance, if you want to regenerate the PDFs with a smaller font size), here you go.

Here’s what the board looked like at the end of the last day of class!

tag urself I'm the "what the spec allows"/"what the compiler does" diagram pic.twitter.com/OPIp72j4fx

— Lindsey Kuper (@lindsey) October 21, 2019

If I had wanted to keep lecturing after three weeks, we could’ve continued marching through Kroening and Strichman and talked about a bunch more theories and decision procedures for them, but at this point, we knew enough to be able to bring a basic understanding of how solvers worked to the next part of the course. I had planned to do one more lecture, about the Nelson-Oppen method of combining decision procedures, but I ended up dropping it because it wasn’t actually essential background for the papers we were reading, and because two additional students joining the class meant I had to rearrange the schedule to make room for one more day of student project presentations at the end.

The rest of the course: solver-aided systems

Since the whole point of teaching a seminar is to sit down, shut up, and give students a chance to talk instead (and then offer feedback on what they say), the rest of the course consisted of paper presentations by students, mostly on solver-aided systems and languages. We read 18 papers, 16 of which were presented by students (with each student presenting two papers) and two of which were presented by guest speakers.

Each student had to turn in a short response for each of the readings they didn’t present. Unlike with my fall 2019 seminar, I didn’t make the reading responses public on GitHub. Instead, we used a non-public forum that was part of our course management system, and it was set up so that students had to turn in a response themselves before being able to see other people’s. I tried to leave comments in response to everyone’s responses, every time, but this was a lot of work, and it didn’t really do much to foster discussion, as I had hoped it might. In the future, I might just leave one general top-level response-to-the-responses for each reading.

Here’s what we read:

  • We started with a bunch of work on symbolic execution for finding bugs: the EXE bug-finding tool and its co-designed solver, STP, then HAMPI, an STP-based solver for string constraints, and then EXE’s famous successor, KLEE. I had been aware of some of these tools, but hadn’t actually read the papers about them until this course. I’m really glad I did! It was an intentional choice to put these four papers first, because I wanted to give people a break from the more theoretical material from the lectures and show them that solver-aided systems were immediately useful for fuzzing the software that they used every day (like the GNU coreutils!).
  • We read the Rosette paper, and then we had two guest speakers to talk about their work that used or built on Rosette: Mangpo Phothilimthana visited to talk about her Rosette-based work on synthesizing efficient GPU kernels, and James Bornholt visited to talk about his symbolic profiler for Rosette. Rosette is a domain-specific language embedded in Racket, intended for building solver-aided languages.1 We also read about Smten, which could be thought of as Rosette’s Haskell-based cousin. I think Smten is really cool, although it hasn’t taken off as a platform for solver-aided programming in the way that Rosette has (though G2Q is related follow-on work).
  • There are other ways of augmenting a programming language with a SMT solver, aside from “use the solver to implement a run-time symbolic execution strategy”! We read about liquid types, an SMT-powered decidable refinement type system, and its realization in Liquid Haskell, which required reckoning with laziness; these tools use the solver entirely at compile time, in contrast to Rosette or Smten. We read about Dafny, a verification language that compiles both to executable code and (by way of Boogie) to verification conditions intended to be used input to a solver. And we read about Kaplan, an extension to Scala that enables SMT-powered constraint programming, and that can make use of the solver both at run time and at compile time.
  • We read two papers on custom solver theories: the aforementioned Reluplex paper, which I’ve written about before, and “SMT solving for the theory of ordering constraints” by Cunjing Ge et al. In this second paper, the authors observed that Z3’s integer difference logic, or IDL, was more expressive a theory than they needed for their domain of SMT-powered data race detection. A less expressive theory not only did what they needed, but had an asymptotically more efficient decision procedure. I like this paper because, like Reluplex, it’s about a situation where a custom solver theory was called for, but unlike Reluplex, it’s not about adding stuff to an existing theory, but rather, taking stuff away.
  • Finally, we read about solver-powered verification tools for a variety of domains: web page layout with Cassius, OS system calls with Hyperkernel, and my personal favorite, distributed data consistency, with Indigo and Quelea. (The Quelea paper was actually the only paper in the intersection of this course and my previous special topics course. I might really like the Quelea paper, y’all.)

Course projects

I asked every student to do an independent project as part of the course. This was different from my previous grad seminar, where, in lieu of traditional course projects, I had asked students to contribute to a public course blog. The blog had been a success, but largely because I poured an immense amount of effort into editing it, and I didn’t think I would be able to handle that level of effort again, especially not now that I was teaching a topic that was further from my area of expertise. So we stuck with the typical course project approach. I’ve seen many an over-ambitious course project crash and burn, so I told students to aim low and spend no more than 40 hours on their project over the course of the quarter.

At the halfway point of the course, I asked students to turn in a two-page project proposal, including an overview, background and related work, a summary of the approach, the expected outcomes and deliverables of the project, and a timeline. None of this was legally binding, and some projects changed drastically between this point and the end of the course, which is to be expected.

For the final project report, I asked for 8-12 pages, including an overview, background and related work, approach, and outcomes. For projects that involved developing some sort of software artifact, which was most of them, I asked the students to turn that in, too. Finally, each student gave a short in-class presentation about their project during the last week of class.

The students carried out a nice variety of projects, reflecting the diversity of their interests:

  • A liquid type checker implemented using Rosette.
  • A solver for the classic task scheduling problem of “which courses should I take to get a degree, and when?”, also implemented using Rosette.
  • A large-scale graph data analysis, carried out with the help of the Z3 Python API.
  • Spelunking the source code of an open-source-but-utterly-undocumented solver-aided system, Indigo, getting it running, and commenting it extensively.
  • A comparison of two solver-aided tools for checking the equivalence of SQL queries, Cosette and EQUITAS, and an experimental evaluation of Cosette’s capabilities.
  • A survey of nonlinear arithmetic solvers and their applications to hybrid systems.
  • A survey of applications of SAT and SMT solving to software security. (Two students independently decided to do this.)

Some students struggled with carrying out an independent project, while others were almost too independent: their project ended up being only quite loosely connected to the material we had actually studied. I felt that the most successful of the projects were those that took something we had looked at in the course as a jumping-off point and then explored something new. Personally, I thought the Rosette-based projects stood out in this regard, which I think says something about the power of Rosette as a platform for creating solver-aided tools!

Overall reflections

I was happy with my choice of Kroening and Strichman as a textbook. That said, when I got to conflict-driven clause learning (CDCL), the algorithm at the guts of modern SAT (and hence SMT) solvers, I went looking around for alternative reading material. Chapter 4 of the Handbook of Satisfiability is a standard reference that I found useful, and the aforementioned slides from Işıl and Emina’s courses were helpful, too.

In the end, though, I decided to teach CDCL in a slightly different way from the sources I’d seen. I first presented a solver that only does Boolean constraint propagation (BCP), and hence can only solve an “easy” subset of SAT problems. In particular, it cannot solve problems that require making guesses, and then potentially backtracking if those guesses are wrong. Then I extended that solver to a CDCL solver in the next lecture, with support for guesses and backtracking. We talked about how the solver did clause learning, but we glossed over the details of conflict analysis. It helped that the Heule and Kullman article that we started the course with had already discussed clause learning at a high level, too, so we weren’t going into it cold.

I only briefly mentioned CDCL’s predecessor, DPLL, and I never showed a complete DPLL solver. DPLL is simpler than CDCL: it also uses BCP, but it doesn’t do clause learning (that is, adding “conflict clauses” to the formula you’re trying to solve). Clause learning is conceptually not so hard (learn from your past bad decisions, so you don’t make them again!), but the details are tricky. So, in case students thought it would be helpful to learn about DPLL before moving on to CDCL, I also pointed them to this “SAT Solver Primer” by David Mitchell, recommended to me by Chung-chieh Shan, that takes that approach. In fact, Mitchell starts with something that’s even simpler than DPLL: a brute-force algorithm that doesn’t do any BCP at all. Instead, it checks every possible assignment to see if it is a satisfying assignment. This approach can solve all SAT problems if you’re willing to wait long enough, but is intractably slow. He then moves on to DPLL, then CDCL.

What I did in my lectures was kind of the opposite. I asked students to first consider an algorithm that only does BCP. But that algorithm — which I called the “elementary school” solver — only works on formulas that can be solved without guessing, which to me are the ones that are reminiscent of elementary-school logic puzzles. Starting with BCP made sense to me, because BCP seems more like what humans do intuitively when we’re, say, solving sudoku puzzles. I later gave a talk called “Reasoning Under Uncertainty in SMT Solving, Research, and Life” at PLMW @ POPL 2020 that takes this same pedagogical tack.

As for the rest of the course, the reading list was kind of all over the map, and we only had time for a superficial discussion of many of the readings. My hope is that, even if students didn’t take in all the gory details of the individual papers, they began to map out the connections between them. I can say that for me personally, the most exciting new connection I made was identifying two ways of thinking about symbolic execution, which I’ll call “the KLEE way” and “the Rosette way”. KLEE is “an operating system for symbolic processes” and a symbolic interpreter for LLVM instructions, and Rosette is a “symbolic virtual machine”. Those things sure sound like they ought to be related — and they are! But KLEE uses symbolic execution as a tool for finding bugs in programs that are normally executed concretely, whereas in Rosette-land, symbolic execution is the normal, default execution strategy for programs. I think there’s a shift in mindset that takes place when one starts writing programs that are actually supposed to be executed symbolically all the time, not just for debugging. Once you make that shift, it necessitates all kinds of helpful programming tools, like James’s symbolic profiler, or a symbolic type system! I’m sure there’s a lot more interesting work to do in this vein.

Symbolic execution, by the way, is an old idea that long predates the rise of SMT solvers. This was also true of some of the other applications for SMT solving that we explored in the course — refinement types, for instance. SMT solving is merely an implementation strategy for these ideas, and not necessarily the only implementation strategy. But once you are using SMT as an implementation strategy, it seems to open up really interesting new possibilities, and I think that makes SMT worth studying purely for its own sake, too!

  1. Once, years ago, I asked Ras Bodik if one should think of Rosette as a solver-aided DSL for building solver-aided DSLs. He said, “Yes, but we don’t describe it that way, because people’s heads would explode.”