The continuation monad: a mathematical introduction

Wildon's Weblog 2025-08-18

In the Haskell category, objects are types, and morphisms are Haskell functions. The continuation monad for a fixed return type r is the functor msending a type ato the type m a of functions (a -> r) -> r. The purpose of this post is to use vector space duality to motivate the definitions of the monadic operations return :: a -> m a and join :: m m a -> m a. We show that return, the unit in the monad, is the analogue of the canonical inclusion of a vector space in its double dual, and then monad multiplication is the analogue of the canonical surjection V^{\star\star\star\star} \twoheadrightarrow V^{\star\star} induced by applying the unit on V^\star to get a canonical inclusion V^\star \hookrightarrow V^{\star\star\star}. Having constructed the continuation monad, and proved that it is a monad, we then use the Haskell identity

 ma >>= f = join (fmap f ma)

to define the monadic bind operation. (Here fmap :: (a -> c) -> m a -> m c is what the continuation monad does on morphisms; when applied to f :: a -> m b we get a function fmap f :: m a -> m m b, and we compose with join :: m m b -> m b to remove the outermost monad layer.) By unpacking this definition into one that is self-contained, and then staring hard, we interpret monadic bind, denoted >>= as a way to sequence computations that result, in the end, in a value of type r. Thus the order in this post is the opposite of most (all?) of the many good introductions to the continuation monad that can be found on the web: I hope this will make this post of interest to mathematicians, and also perhaps to the expert computer scientists as an illustration of how a non-expert mathematician came (at last) to some understanding of the ideas. We end with a few examples of Haskell programs written in the continuation monad, or equivalently, in continuation passing style.

A Haskell module with all the Haskell below and some extras is available from my website.

The double dual monad

Fix a field \mathbb{F} and let Vect be the category of \mathbb{F}-vector spaces. There is a canonical inclusion V \hookrightarrow V^{\star\star} defined by ‘v maps to evaluate at v‘, or in symbols

v \stackrel{\eta_V}{\longmapsto} [\theta \mapsto \theta(v)]\qquad(\star)

for v \in V and \theta \in V^\star. Thus (\star) defines a natural transformation \eta from the identity functor on Vect to the double dual functor T. This will turn out out to to be the unit in the double dual monad.

Remark. In general V^{\star\star} is a vastly bigger vector space than V: for instance if V is the vector space of all \mathbb{F}-valued sequences with finite support, then V^{\star} is the vector space of all \mathbb{F}-valued sequences. Thus V has a canonical countable basis, whereas and V^\star has dimension equal to the continuum \mathfrak{c}. Moreover V^{\star\star} has dimension 2^\mathfrak{c}. (Even the existence of bases for V^{\star} and V^{\star\star} depends on set theoretic foundations: when \mathbb{F} = \mathbb{F}_2 it is implied by the Boolean Prime Ideal Theorem, a weak form of the Axiom of Choice.) For this reason we have to reason formally, mainly by manipulating symbols, rather than use matrices or bases. But this is a good thing: for instance it means our formulae for the monad unit and monad multiplication translates entirely routinely to define the continuation monad in the category Hask.

The unit tells us how to multiply

To make the double dual functor a monad, we require multiplication, that is, a natural transformation \mu : T^2 \rightarrow T. Thus for each vector space V, we require a ‘naturally defined’ linear map \mu_V : V^{\star\star\star\star} \rightarrow V^{\star\star}. As already said, we obtain this as the dual map to \eta_{V^\star} : V^\star \rightarrow V^{\star\star\star}. That is

\mu_V = \eta_{V^\star}^\star.

This needs some unpacking. Explicitly, \eta_{V^\star} is the linear map

\theta \stackrel{\eta_{V^\star}}{\longmapsto} [\phi \mapsto \phi(\theta)]

for \theta \in V^{\star} and \phi \in V^{\star\star}. Generally, given a linear map f : U \rightarrow W, the dual map is given by `restriction along f‘: that is, we regard a dual vector \theta \in W^\star as an element of U^\star by mapping U into W by f, and then applying \theta; in symbols f^\star (\theta) = \theta \circ f. Therefore the dual map to \eta_{V^\star}, which is the V component of the natural transformation \mu, is

\begin{aligned} w &\stackrel{\mu_V}{\longmapsto} [\theta \mapsto w(\eta_{V^\star}(\theta))] \\ &= [\theta \mapsto w([\phi \mapsto \phi(\theta)])]  \quad (\dagger) \end{aligned}

If you have an intuitive feeling for what this means, well great. I hope in a later post to show that \mu has an intuitive interpretation in the double power set monad interpreted using propositional logic. (Subsets of a set X are in bijection with functions X \rightarrow \{0,1\}, so this is the case \mathbb{F} = \mathbb{F}_2.) Here I will note that the definition of \mu_V in (\dagger) does at least typecheck: \phi \in V^{\star\star} so the map \phi \mapsto \phi(\theta) is an element of V^{\star\star\star} and therefore in the domain of \Phi : V^{\star\star\star} \rightarrow \mathbb{F}. Naturality of \mu follows very neatly from the naturality of \eta by duality: see the commutative diagrams below.

But why is it a monad?

What we haven’t shown is that the triple (T, \eta, \mu) defines a monad. For this we need to check the monad laws shown in the commutative diagrams below.

The first is the identity law that \mu_V \circ \eta_{V^{\star\star}} = \mathrm{id}_{V^{\star\star}}. This has a short proof, which as expected from the remark about formalism above, come down to juggling the order of symbols until everything works out. Recall that \eta_{U} is the canonical embedding of U into U^{\star\star}. Thus if we take \phi \in V^{\star\star} then \eta_{V^{\star\star}}(\phi) is the element of V^{\star\star\star\star} defined by

\Psi \mapsto \mathrm{ev}_\phi(\Psi) = \Psi(\phi).

Moreover, \mu_V is defined by regarding elements of V^{\star\star\star\star} as elements of V^{\star\star} by restriction along the canonical inclusion \eta_{V^\star} : V^{\star} \rightarrow V^{\star\star\star}. This map sends \theta \in V^{\star} to the linear map [\phi \mapsto \phi(\theta)] in V^{\star\star\star}. Therefore

\begin{aligned} (\mu_V \circ \eta_{V^{\star\star}})(\phi) &= \mu_V([\Psi \mapsto \Psi(\phi)]) \\&= [\theta \mapsto (\eta_{V^\star}(\theta))(\phi)] \\ &= [\theta \mapsto \phi(\theta)] \\&= \phi \end{aligned}

and so \mu_V \circ \eta_{V^{\star\star}} is indeed the identity of V^{\star\star}.

A proof in this style of the second associativity law seems distinctly fiddly, not least because there is no ‘super case’ Greek alphabet to represent elements of V^{\star\star\star\star\star}. I plan a sequel to this post in which we will show that T(V) = V^{\star\star} is the monad induced by the adjunction betweeen the contravariant dual and itself; the identity monad law then becomes the triangle law (\eta D) \circ (D\epsilon) = G for the right-hand functor D : \mathbf{Vect} \rightarrow \mathbf{Vect}^\mathrm{op} in the adjunction, and the associativity law becomes an immediate application of the interchange law. If time permits I’ll include the proof by string diagrams: to my surprise I can’t immediately find this on the web, except in this nice video from the Catsters: see the blackboard at the 9 minute mark.

But how could it not be a monad?

As a final remark, for either law one could argue `how else could it possibly work?’ Thus in first, we know that (\mu_{V} \circ \eta_{V^{\star\star}})(\phi) is some kind of linear map V^{\star} \rightarrow \mathbb{F}, and since all we have to hand is \phi \in V^{\star\star}, maybe it should be obvious from naturality that the map is \phi itself? It would be intriguing if this could be made into a rigorous argument either working in Vect or directly in Hask, perhaps using ideas from Wadler’s paper Theorems for free.

The continuation monad in Haskell

We now translate the definition of the unit and multiplication from the double dual monad into Haskell. This is completely routine. For the unit, we translate \eta_V :V \rightarrow V^{\star\star}, as defined in (\star) by \eta_V(v) = \mathrm{ev}_v to

return x = \k -> k x -- return :: a -> ((a -> r) -> r)

It might be more idiomatic to write this as return = flip ($). Either way we get return :: a -> ((a -> r) -> r).

You can join if you can return

For multiplication, we translate \mu_V : V^{\star\star\star\star} \rightarrow V^{\star\star}, as defined in (\dagger) by \mu_V(w) = [\theta \mapsto w(\eta_{V^\star}(\theta))] to

join :: ((((a -> r) -> r) -> r) -> r) -> ((a -> r) -> r)join w = \k -> w (return k)

Note that this uses return, capturing the sense seen above in which the unit for the double dual monad tells us how to define multiplication. Unpacked to be self-contained join becomes

join w = \k -> w (\φ -> φ k)

which thanks to Haskell’s support for Unicode is still valid code. It is a useful exercise to prove the identity monad law by equation-style reasoning from these definitions.

You can bind if you can join

We now use the Haskell identity ma >>= f = join (fmap f ma) mentioned in the introduction to define bind. First of course we must define fmap. For the double dual functor with f : V \rightarrow W, we have f^\star (\theta) = \theta \circ f and so f^{\star\star} (\phi) = \phi \circ f^\star or unpacked fully,

f^{\star\star}\phi = [\theta \mapsto \phi(f^\star (\theta))] = \theta \mapsto \phi(\theta \circ f)

We therefore define

fmap :: (a -> b) -> ((a -> r) -> r) -> (b -> r) -> r fmap f run_a = \k_b -> run_a (k_b . f)

and the compiler will now accept

bind run_a a_to_run_b = join $ fmap a_to_run_b run_a

(For why run_a is a reasonable name for values of the monadic type (a -> r) -> r see the final part of this post.) It is now a mechanical exercise to substitute the definitions fmap run_a = \k_b -> run_a (k_b . f) and then join w = \k -> w (\φ -> φ k) to get a self-contained definition, which we tidy up in three further stages.

bind1 run_a a_to_run_b = join $ \k_b -> run_a (k_b . a_to_run_b)  where join w = \k -> w (\φ -> φ k)bind2 run_a a_to_run_b = \k -> (\k_b -> run_a (k_b . a_to_run_b)) (\φ -> φ k)bind3 run_a a_to_run_b = \k -> run_a ((\φ -> φ k) . a_to_run_b)bind4 run_a a_to_run_b = \k -> run_a (\x -> (a_to_run_b x) k)

In the final line, the parentheses around a_to_run_b could be dropped. Incidentally, with the DeriveFunctor extension, ghc can derive the declaration for fmap just seen.

But don’t monads need an algebraic data type?

Answer: yes if we want to take advantage of Haskell’s do notation, and be consistent with Haskell expected types for return, fmap, join and >>=. Below we define an algebraic data type RunCont with inner parameter r for the return type and outer parameter a, so that the ‘curried type’ RunCont r has kind * -> *, as a monad should. The constructor is called RC (it is also common to reuse the type name as the constructor) and the ‘wrapped value’ can be extracted using runCont :: RunCont r a -> (a -> r) -> r.

newtype RunCont r a = RC {runCont :: (a -> r) -> r} 

For instance if a_to_run_b :: a -> RunCont r b then runCont (a_to_run_b x) has type (b -> r) -> r so is the correct replacement for a_to_run_b in bind4. Wrapping return is even more routine, and we end up with

instance Monad (RunCont r) where  return x = RC (\k -> k x)  RC run_a >>= a_to_run_b = RC (\k -> run_a (\x -> runCont (a_to_run_b x) k))

It is a nice fact, that bind determines both fmap and join, so, after the unpacking above to avoid using fmap in the definition of bind, everything we need is in the declaration above. That said, version of Haskell after 7.10 require separate instance declarations for Functor, which to illustrate this point we define by

instance Functor (RunCont r) where    fmap f (RC run_a) = (RC run_a) >>= (return . f)

and for Applicative, which can be derived from bind, but since I find the result completely unintuitive, instead we’ll use the usual boilerplate

instance Applicative (RunCont r) where    pure  = return    () = ap      -- ap :: Monad m => m (a -> b) -> m a -> m b

recovering these from the Monad declaration. Incidentally, the functor declaration also has a ‘boilerplate’ definition; fmap = liftM is ubiquitous. I can’t easily find on the web why the ghc authors make this — deliberately codebreaking — change (the page on the Haskell wiki has a deadlink for a 2011 propoal along these lines that was rejected) but surely the rise of programming in the applicative style has something to do with it.

Why newtype?

I used newtype rather than data above to make RC strict in its value. Counterintuitively (to me at least) this means that overall the RunCont monad is lazier, because the compiler does not need to inspect the argument to RC to check it has type (a -> r) -> r. Ever since I implemented the Random state monad with data rather than newtype, and many weeks later, found puzzling instance of what seemed like Haskell being over-eager to evaluate, I have been able to remember this distinction. An alternative to using newtype, which emphasises this point, is data RunCont r a = RC {!runCont :: (r -> a) -> a}. Note the explicit strictness declaration. It may be that newtype has a further benefit over data, since the strictness and unique constructor together mean that the compiler can implement (or erase?) the type with zero computational overhead.

Continuation passing style is the computational model of the continuation monad

Not before time we mention that the type a -> r is that of continuations: functions that take as input type a and return the desired final output type of the entire computation r.

Callbacks

Such functions arise naturally as callbacks: ‘function holes’ left in the anticipation that another author or library will be able to fill in the gap. Consider for instance the following program intended to check the Fibonacci identity \sum_{k=0}^n F_k = F_{n+2} - 1.

fibBothSides n = (sum [fib k | k <- [0..n]], fib (n+2) - 1)

Of course this won’t compile, because there is no fib function to hand. Rather than take time off to write one, we instead press on, by bringing fib into scope in the simplest possible way

run_fibBothSides n fib = (sum [fib k | k  (Int -> Integer) -> (Integer, Integer)

Here I’ve supposed fib has type Int -> Integer; this is reasonable because the answer might be more than 2^{64}, but surely the input won’t be, but the true reason is so that the types a (here Int) and r (here Integer) are different. Thus run_fibBothSides is a way of running the continuation fib to give the answer we want, and \n -> RC (run_fibBothSides n) is a value of type Int -> RunCont Int Integer, so suitable for use in our monad, most obviously as the second argument to bind >>=. For a minimal example, with exactly the same structure, take

run_fib n fib = fib n      -- run_fib  :: Int -> (Int -> Integer) -> Integerrun_fib' n = \fib -> fib n -- run_fib' :: Int -> (Int -> Integer) -> Integerrun_fib'' n = RC (\fib -> fib n) -- run_fib'' :: Int -> RC Int Integer

where the final function could have been written simply as return n; as we have seen many times, the unit in the continuation monad is ‘x maps to evaluate at x‘. In this form we see clearly that run_fib doesn’t really care that the intended continuation is the Fibonacci function; it could be any value of type Int -> Integer. I hope this explains why I’ve used RunCont as the name for the monadic type: its wrapped values are continuation runners of type (a -> r) -> r, and not continuations, of type a->r. To my mind, using Cont is as bad as conflating elements of V^{\star} (continuations) with elements of V^{\star\star} (continuation runners).

Continuation passing style

Consider the following very simple Haskell functions addOne x = x+1 and timesTwo y = y * 2 of intended type Int -> Int We can compose them to get timesTwoAfterAddOne = timesTwo . addOne. This is too easy. Instead we imagine that addOne has to control the entire future evaluation, and to help it out, we’ll give a suitably extended version of it the continuation timesTwo as the second argument:

addOneCPS x timesTwo = timesTwo (x + 1)

and so timesTwoAddOne x = addOneCPS x timesTwo. Of course this generalizes immediately to any continuation of type Int -> r; below I’ve renamed timesTwo to k (as in the derivation of bind above), but the function and type are the same:

addOneCPS :: Int -> (Int -> r) -> raddOneCPS x k = k (x+1)

Similarly we can write addTwo in this style,

timesTwoCPS :: Int -> (Int -> r) -> rtimesTwoCPS y k = k (y * 2)

and now a composed function, still in CPS style so having a continuation that might perhaps be as simple as id :: Int -> Int (when r is Int) or show :: Int -> String (when r is String), is

timesTwoAfterAddOneCPS x k = addOneCPS x (\y -> timesTwoCPS y k)

Even allowing for the extra argument, this is far less attractive than the simple composition timesTwo . addOn we had earlier. I’m sure you already see where this is leading. Above our final version of monadic bind for the continuation monad (written without an algebraic data type) was, changing the dummy variable from x to y to avoid an imminent clash,

bind4 run_a a_to_run_b = \k -> run_a (\y -> (a_to_run_b y) k)

Substitute addOne x for run_a and timesTwo for a_to_run_b and get

bind4 (addOne x) timesTwo = \k -> (addOne x) (\y -> (timesTwo y) k)

which is the \eta-expansion of timesTwoAfterAddOneCPS x. Therefore, when wrapped in the appropriate constructors, we can write timesTwoAfterAddOneCPS x = (addOneCPS x) >>= timesTwoCPS, and even use do notation:

timesTwoThenAddOneCPS x =     do y <- addOneCPS x   -- addOneCPS x   = RC (\k -> k (x+1))       timesTwoCPS y      -- timesTwoCPS y = RC (\k -> k (y*2))

Recursion in continuation passing style

This is a stub I’ll fill in later for a CPS-style sum function. Of course all it will really shows is that that the continuation passing style (which is just a complicated right fold with a final continuation) is the wrong way to write sum. The correct way is to use a strict left fold, or far better, just use the built in sum, written by experts who know all this perfectly.

Final thought on continuations

Finally, as far as I can see, continuation passing style isn’t really very important in Haskell, and some comments I’ve read suggest it can even be harmful: the linked answer mentions that for GHC,

‘The translation into CPS encodes a particular order of evaluation, whereas direct style does not. That dramatically inhibits code-motion transformations’.

On the other hand, continuations are excellent for stretching the mind, and illuminating how programming languages really work. Hence, no doubt, their ubiquity in computer science courses, and the many tutorials on the web. In this spirit, let me mention that, as in the Yoneda Lemma, if we get to pick the return type r in RunCont r a then there is a very simple hack to extract show that any continuation runner really is a hidden value of type a (this is a common metaphor in web tutorials), which we can extract by

counit :: RunCont a a -> acounit (RC run_a) = run_a id

This might suggest an alarmingly simple way to implement the continuation monad: just choose the output type r to be a, extract the value, and do the obvious. But of course the code below only type checks if we (as a text substitution) replace r with a, and so, as I’ve seen many times, somehow the syntactical type checker can guide us away from what is semantically incorrect.

instance Monad (RunCont a) where    return x = RC (\k -> k x)    (RC run_a) >>= g = let x = run_a id in g x     -- • Couldn't match expected type ‘a1’ with actual type ‘a’