Write an interpreter: variables

composition.al 2013-06-29

This post is the second in a series. Last time, we started coding up a toy interpreter for a language of arithmetic expressions. This time, we’ll extend the language to include variables and update our interpreter to handle that extended language.

A slightly less tiny grammar of expressions

Our interpreter only knows how to deal with expressions: things of type Expr. To be able to interpret a bigger language, we’ll need to expand the interpreter’s notion of what an Expr is.

Let’s add a new variant to our Expr type:

data Expr = Number Int
    | Var String
    | Plus Expr Expr
    | Minus Expr Expr
    | Times Expr Expr

This is what we had before, but with one new variant added: Var. Var is a value constructor for variables, just as Number is a value constructor for integers. A convenient way to represent a variable is as a string, such as "x" or "totalMonkeys", so Var will take a String argument.

With Var added to the definition of Expr, we can write down some Exprs with variables in them:

*Interp> :t Var "x"
Var "x" :: Expr
*Interp> :t Plus (Var "x") (Number 5)
Plus (Var "x") (Number 5) :: Expr
*Interp> :t Plus (Var "x") (Var "y")
Plus (Var "x") (Var "y") :: Expr

GHCi reports that these expressions are indeed Exprs. Now we just have to teach our interpreter how to deal with them.

Environments

When given an expression, the job of an interpreter is to return that expression’s value. This is easy enough for the interpreter to accomplish when we’re working with expressions like 9 or (3 + 4) or ((203 + (6 * (34 - 1))) + 7000). But what is the value of an expression containing a variable, like (x + 5)?

To interpret expressions that contain variables, merely having the expressions isn’t enough. But if we can get our hands on some extra information that would tell us that, say, the value of x is 42, then we would know how to interpret (x + 5), or any other expression containing x. This extra information is called an environment.

A simple way to represent an environment might be a list of pairs. For instance, [("x", 42), ("y", 5)] could represent the environment in which x’s value is 42 and y’s value is 5. Instead of using a list of pairs, we’ll use Haskell’s Map data structure from the Data.Map library, which comes with many convenient operations — but it doesn’t really matter what data structure we use, as long as it lets us associate variables with values.

Using Map, let’s define our own Env type for environments. Since we want to map variables to values, or in our case, Strings to Ints, we’ll instantiate the Map type with String and Int.

1
2
3

import qualified Data.Map as Map

type Env = Map.Map String Int

With our Env type defined, we can define a function lookupEnv that will let us look things up in an environment. lookupEnv is just a thin wrapper around the lookup function that Data.Map already provides.

lookupEnv :: Env -> String -> Int
lookupEnv env x = case Map.lookup x env of
  Just n -> n
  Nothing -> error "Unbound variable!"

The lookupEnv operation takes two arguments: env, the environment, and x, the variable we’re trying to look up. If the lookup succeeds, lookupEnv returns the value it found; otherwise, it raises an exception.

We haven’t yet written anything that will let us add new bindings to an environment. For now, we can roll our own environments by calling the convenient fromList, which takes an argument like [("x", 42), ("y", 5)] and produces the corresponding Map.

*Interp> lookupEnv (Map.fromList [("x", 42), ("y", 5)]) "x"
42
*Interp> lookupEnv (Map.fromList [("x", 42), ("y", 5)]) "y"
5
*Interp> lookupEnv (Map.fromList [("x", 42), ("y", 5)]) "z"
*** Exception: Unbound variable!

The `interp` function

Now that we have environments and a way to look things up in them, we’re ready to add support for variables to interp.

First, we’ll tweak the type of interp. Before, it simply took an Expr and returned an Int. Now that we have to be able to interpret variables, we’ll need to have an environment handy to look up their values in. We can pass one in as a second argument to interp:

interp :: Expr -> Env -> Int

Next, in the definition of interp, we’ll add a new case to handle Var expressions. When we encounter a Var, we have to determine its value. Fortunately, this is easy: we just call the lookupEnv function we defined, passing in the environment and the variable as its arguments.

None of the other cases of interp will need to make use of the environment at all. But since interp now takes an environment argument, each of the recursive calls in the Plus, Minus, and Times cases will have to pass the environment along.

interp e env = case e of
  Var x -> lookupEnv env x
  Number n -> n
  Plus e1 e2 -> (interp e1 env) + (interp e2 env)
  Minus e1 e2 -> (interp e1 env) - (interp e2 env)
  Times e1 e2 -> (interp e1 env) * (interp e2 env)

We now have what’s known as an environment-passing interpreter.

We’re ready to interpret some expressions with variables. We’ll have to remember to give interp two arguments: first, the Expr to be interpreted, and second, the environment to interpret them in. How will we get an environment to pass to interp? We can do something like Map.fromList [("x", 42), ("y", 5)], or, if we just want an empty environment, we can call Map.empty. Let’s define a quick shorthand for empty environments:

1
2

emptyEnv :: Env
emptyEnv = Map.empty

Now we’re ready to call interp. What happens if we try to interpret an Expr containing variables, but we only give it emptyEnv?

1
2

*Interp> interp (Plus (Var "x") (Number 5)) emptyEnv
*** Exception: Unbound variable!

The interpreter correctly reports that emptyEnv isn’t enough information to interpret Var "x". Nevertheless, emptyEnv is handy for when we just want to interpret expressions without variables.

1
2

*Interp> interp (Times (Number 2) (Number 0)) emptyEnv
0

But, when we do have variables to interpret, we need to give interp an environment that binds them to values.

1
2

interp (Plus (Var "x") (Number 5)) (Map.fromList [("x", 42)])
47

Hooray! Our language now supports variables. Here’s the complete interpreter:

import qualified Data.Map as Map

data Expr = Number Int
    | Var String
    | Plus Expr Expr
    | Minus Expr Expr
    | Times Expr Expr

type Env = Map.Map String Int

emptyEnv :: Env
emptyEnv = Map.empty

lookupEnv :: Env -> String -> Int
lookupEnv env x = case Map.lookup x env of
  Just n -> n
  Nothing -> error "Unbound variable!"

interp :: Expr -> Env -> Int
interp e env = case e of
  Var x -> lookupEnv env x
  Number n -> n
  Plus e1 e2 -> (interp e1 env) + (interp e2 env)
  Minus e1 e2 -> (interp e1 env) - (interp e2 env)
  Times e1 e2 -> (interp e1 env) * (interp e2 env)

Parsing

Since we can now interpret expressions containing variables, it would be convenient to be able to lex and parse them as well, so that we can write, say, "(10 + (x + y))" instead of Plus (Number 10) (Plus (Var "x") (Var "y")).

Last time, we generated a parser for our language from a Happy grammar file. Adding support for variables requires only a couple of tweaks to that file. Happily, we can deal with lexing and parsing variables in a way analogous to how we’re already dealing with integers. The resulting Happy grammar file is a little longer, but no more complicated.

Next time: functions

Now that we have support for variables in our language, we’re on our way to being able to write functions that take arguments. In the next post in this series, we’ll extend our interpreter to be able to handle functions and function calls.

All of the code from this post, plus a little additional scaffolding, is on GitHub. Here’s a diff of everything that changed from last time.