How to freeze after writing
composition.al 2014-06-07
Last week at POPL, I presented “Freeze After Writing: Quasi-Deterministic Parallel Programming with LVars”. In this paper, my co-authors Aaron Turon, Neel Krishnaswami, Ryan Newton, and I added three new features to the LVars programming model: handlers, quiescence, and freezing. I’ve posted the slides from the talk, and sooner or later there should be a video, too! Meanwhile, here’s some of the same material in written form.
Our paper extends the original LVars programming model of least-upper-bound writes and threshold reads. Although the original LVars model generalized IVars while retaining determinism, it was still quite limited. In particular, there are a lot of things it’s hard to do with a threshold read. Suppose we have an LVar representing a container — say, a set. What if we we want to iterate over its elements? What if we want to find out if some element is not in the set? And what if we want to be able to react to the appearance of elements we aren’t expecting? It’s hard to threshold on something unless we know it’s going to be written eventually.
Handlers, quiescence, and freezing extend the LVars model to make it possible to do all those things. By adding them, we push our programming model into the realm of what we call quasi-determinism — a weaker guarantee than determinism. Happily, though, there’s a way to make our way back to determinism from quasi-determinism, and I’ll explain how that works, too.
A challenge: parallel graph traversal
Let’s consider a problem that’s particularly hard to solve using just threshold reads. Suppose we want to do a parallel traversal of a directed graph, starting from a given node and finding all the nodes reachable from it.
Ordinarily, the algorithm for doing this would go something like the following: we mark the start node as seen. Then we look at all its neighbors, perhaps launching a thread for each one, and for each neighbor, we check to see if it’s marked as having been seen. If so, then that thread is done; if not, then we mark it as seen, and then launch a thread for each of its neighbors. This continues until all the reachable nodes have been seen.
As we traverse the graph, the set of nodes that we’ve seen grows monotonically, and we continue adding the neighbors of seen nodes to that set until the set reaches a fixed point. Because this algorithm involves a monotonically growing data structure, it would seem at first glance to be a great match for the LVars model — we could use an LVar to represent the set of seen nodes, with set union as least upper bound. Unfortunately, we can’t express this algorithm using only threshold reads, because there’s no way to check if a node has been seen or not. And, without that test, we won’t know when to stop looking for new nodes.
In this graph, for instance, adding node 11 will lead to an attempt to add its neighbor, node 10, which will lead to an attempt to add node 9, and so on — in this case, infinitely, because there happens to be a cycle in the graph. But those writes aren’t doing anything; they’re no-op writes, because the nodes they’re trying to add to the set are already there. So, the LVar is done changing, all meaningful writes are over with, and it would be safe to terminate if only we had a way to know that the meaningful writes are over with.
Events and event handlers
The way we accomplish this in our model is with what we call events and event handlers. An event is a write that actually moves an LVar’s state upward in the lattice, as opposed to a no-op write, and an event handler is associated with an LVar and listens for state-changing updates to that LVar — such as a new node arriving in the set of seen nodes. The event handler can run a callback in reaction to an event (and a callback can do writes of its own, which can trigger further events, recursively).
Let’s try writing our graph traversal function using event handlers:
123456789101112131415
traverse :: G.Graph -> Int -> Par QuasiDet s (ISet Frzn Int)traverse g startNode = do seen <- newEmptySet h <- newHandler seen (\node -> do mapM (\v -> insert v seen) (neighbors g node) return ()) insert startNode seen -- Kick things off quiesce h freeze seenmain = do v <- runParIO $ traverse myGraph (0 :: G.Vertex) putStr $ show (fromISet v)
To traverse a graph g
starting from a particular node startNode
and find all the reachable nodes, we first create a new LVar of a set type, called seen
. Then we attach an event handler to the seen
set. The newHandler
function, defined elsewhere, takes the LVar, seen
, as its first argument. As its second argument, newHandler
takes the callback that we want to run every time an event arrives — that is, every time a new node is added to the set. The callback takes that newly arrived node node
as its argument, looks up node
’s neighbors in the graph by calling a function neighbors
(also defined elsewhere), and then maps the insert
function over the list of neighbor nodes that neighbors
returns.
Then, we add the starting node to the seen
set with insert startNode seen
— and we’re off! The event handler does the rest.
How do we know when we’re done handling events? The quiesce
operation blocks until all of the callbacks running in a particular handler pool, in this case h
, are done running. So, we call quiesce
on h
, and that call will block until we’ve handled all the meaningful writes.
Freezing
Once we’ve quiesced, we still have one problem: how do we look at the exact contents of the LVar that just quiesced? This is where freezing comes in. The freeze
operation is a non-blocking read that lets us find out the exact contents of an LVar. We call freeze seen
on the last line of our traverse
function, and it returns the LVar’s contents.
Importantly, once an LVar has been frozen, its state can’t change anymore, and any further writes that would change its state instead raise an exception! So it’s a good thing that we quiesced before we froze, so we know that no more writes to the LVar are going to occur, making it safe to look at the LVar’s exact contents.
But wait! Aren’t explicit synchronization barriers bad?
Since we had to make sure to quiesce before freezing, aren’t we just relying on a synchronization barrier being in the right place? Aren’t we right back to nondeterminism if we forget to quiesce?
Actually, it’s not that bad, because even if we forget to quiesce, or even if we do it in the wrong place, the worst thing that can happen by freezing too early is that we raise a write-after-freeze exception. For a program that performs freezes, we can guarantee that there are only two possible outcomes: either the deterministic result of all the effects, or a write-after-freeze exception. Put another way, all executions that produce a final value produce the same final value, but some executions might raise an exception.
Moreover, if we do hit the write-after-freeze exception case, it always means that there’s a synchronization bug in our program, and in principle the error message can even tell us exactly which write happened after which freeze, so we can more easily debug the race!
Making our way back to determinism
We call this property quasi-determinism, and we’ve proved that the LVars model with the addition of handlers, quiescence, and freezing is quasi-deterministic; for more on the quasi-determinism proof, see our paper or the accompanying tech report. But what does quasi-determinism really mean for us as programmers?
The good news is that the particular graph traversal we saw above is deterministic. The bad news is that, in general, freezing introduces quasi-determinism to the programming model, because we might freeze before we’ve quiesced, or because our thread that’s freezing and quiescing might be racing with some other thread to write into a shared LVar.
But, if we could somehow ensure that freezes only ever happen after all writes have completed, then our computations should be deterministic, because then we wouldn’t have any write-after-freeze races. As it turns out, there’s a way to enforce this at the implementation level. The trick is to let the implementation take care of quiescing and freezing for us at the end of a computation. In our LVish library, the mechanism we provide for doing this is called runParThenFreeze
.
Freeze after writing
LVar operations can only run inside Par
computations, and the only way to execute a Par
computation is with one of the “run” functions that the LVish library provides. For instance, the version of traverse
above uses runParIO
. But another one of these “run” functions, runParThenFreeze
, executes a Par
computation and then returns a frozen LVar at the end of it, after an implicit global barrier. So, if we have a Par
computation that side-effects an LVar and we run it with runParThenFreeze
, then we don’t have to manually freeze
the LVar ourselves, which means we don’t have to manually quiesce
, either! Here’s what traverse
looks like when it’s written with runParThenFreeze
:
1234567891011121314
traverse :: G.Graph -> Int -> Par Det s (ISet s Int)traverse g startNode = do seen <- newEmptySet h <- newHandler seen (\node -> do mapM (\v -> insert v seen) (neighbors g node) return ()) insert startNode seen -- Kick things off return seenmain = do putStr $ show $ S.toList $ fromISet $ runParThenFreeze $ traverse myGraph (0 :: G.Vertex)
Notice that this second version of traverse
returns a computation of type Par Det
, for “deterministic”. The “Det
” is an effect level, meaning that the computation can only perform deterministic operations — in particular, it’s not allowed to do a freeze
, and if it tried to, then that would be a static type error. On the other hand, in our first version of traverse
, where we call quiesce
and freeze
explicitly, traverse
returns a computation of type Par QuasiDet
, for “quasi-deterministic”. Hence the type of a Par
computation reflects its determinism-by-construction or quasi-determinism-by-construction guarantee.
How can the second version of traverse
be deterministic by construction? Because the Par
computation never has to do any LVar reads at all, let alone any freezing LVar reads! It merely starts off the computation and then returns the seen
LVar. The LVar may well still be growing at the time traverse
returns — but, since runParThenFreeze
cannot return unless all threads launched inside the Par
computation are done, there’s no threat to determinism, and we’ll always get the same set of nodes as our answer. More generally, the fact that it’s possible to safely return an LVar while it is still being written to by a handler makes it possible to set up dataflow-style, pipeline-parallel networks of LVars, letting us combine task parallelism and pipeline parallelism in the same deterministic program. I’ll write more about this style of programming with LVars in a future post.
Try it yourself
In the lvar-examples repository, I’ve written up a couple of self-contained examples so that you can try this code out yourself. Here’s the version with explicit freezing, and here’s the one that uses runParThenFreeze
. There are also many more examples of handlers and freezing in the repo. Enjoy — and please file bugs, since there are certain to be some!