Write an interpreter: variables
composition.al 2013-06-29
This post is the second in a series. Last time, we started coding up a toy interpreter for a language of arithmetic expressions. This time, we’ll extend the language to include variables and update our interpreter to handle that extended language.
A slightly less tiny grammar of expressions
Our interpreter only knows how to deal with expressions: things of type Expr
. To be able to interpret a bigger language, we’ll need to expand the interpreter’s notion of what an Expr
is.
Let’s add a new variant to our Expr
type:
1 2 3 4 5
data Expr = Number Int
| Var String
| Plus Expr Expr
| Minus Expr Expr
| Times Expr Expr
This is what we had before, but with one new variant added: Var
. Var
is a value constructor for variables, just as Number
is a value constructor for integers. A convenient way to represent a variable is as a string, such as "x"
or "totalMonkeys"
, so Var
will take a String
argument.
With Var
added to the definition of Expr
, we can write down some Expr
s with variables in them:
1 2 3 4 5 6
*Interp> :t Var "x"
Var "x" :: Expr
*Interp> :t Plus (Var "x") (Number 5)
Plus (Var "x") (Number 5) :: Expr
*Interp> :t Plus (Var "x") (Var "y")
Plus (Var "x") (Var "y") :: Expr
GHCi reports that these expressions are indeed Expr
s. Now we just have to teach our interpreter how to deal with them.
Environments
When given an expression, the job of an interpreter is to return that expression’s value. This is easy enough for the interpreter to accomplish when we’re working with expressions like 9
or (3 + 4)
or ((203 + (6 * (34 - 1))) + 7000)
. But what is the value of an expression containing a variable, like (x + 5)
?
To interpret expressions that contain variables, merely having the expressions isn’t enough. But if we can get our hands on some extra information that would tell us that, say, the value of x
is 42
, then we would know how to interpret (x + 5)
, or any other expression containing x
. This extra information is called an environment.
A simple way to represent an environment might be a list of pairs. For instance, [("x", 42), ("y", 5)]
could represent the environment in which x
’s value is 42
and y
’s value is 5
. Instead of using a list of pairs, we’ll use Haskell’s Map
data structure from the Data.Map
library, which comes with many convenient operations — but it doesn’t really matter what data structure we use, as long as it lets us associate variables with values.
Using Map
, let’s define our own Env
type for environments. Since we want to map variables to values, or in our case, String
s to Int
s, we’ll instantiate the Map
type with String
and Int
.
1 2 3
import qualified Data.Map as Map
type Env = Map.Map String Int
With our Env
type defined, we can define a function lookupEnv
that will let us look things up in an environment. lookupEnv
is just a thin wrapper around the lookup
function that Data.Map
already provides.
1 2 3 4
lookupEnv :: Env -> String -> Int
lookupEnv env x = case Map.lookup x env of
Just n -> n
Nothing -> error "Unbound variable!"
The lookupEnv
operation takes two arguments: env
, the environment, and x
, the variable we’re trying to look up. If the lookup succeeds, lookupEnv
returns the value it found; otherwise, it raises an exception.
We haven’t yet written anything that will let us add new bindings to an environment. For now, we can roll our own environments by calling the convenient fromList
, which takes an argument like [("x", 42), ("y", 5)]
and produces the corresponding Map
.
1 2 3 4 5 6
*Interp> lookupEnv (Map.fromList [("x", 42), ("y", 5)]) "x"
42
*Interp> lookupEnv (Map.fromList [("x", 42), ("y", 5)]) "y"
5
*Interp> lookupEnv (Map.fromList [("x", 42), ("y", 5)]) "z"
*** Exception: Unbound variable!
The interp
function
Now that we have environments and a way to look things up in them, we’re ready to add support for variables to interp
.
First, we’ll tweak the type of interp
. Before, it simply took an Expr
and returned an Int
. Now that we have to be able to interpret variables, we’ll need to have an environment handy to look up their values in. We can pass one in as a second argument to interp
:
1
interp :: Expr -> Env -> Int
Next, in the definition of interp
, we’ll add a new case to handle Var
expressions. When we encounter a Var
, we have to determine its value. Fortunately, this is easy: we just call the lookupEnv
function we defined, passing in the environment and the variable as its arguments.
None of the other cases of interp
will need to make use of the environment at all. But since interp
now takes an environment argument, each of the recursive calls in the Plus
, Minus
, and Times
cases will have to pass the environment along.
1 2 3 4 5 6
interp e env = case e of
Var x -> lookupEnv env x
Number n -> n
Plus e1 e2 -> (interp e1 env) + (interp e2 env)
Minus e1 e2 -> (interp e1 env) - (interp e2 env)
Times e1 e2 -> (interp e1 env) * (interp e2 env)
We now have what’s known as an environment-passing interpreter.
We’re ready to interpret some expressions with variables. We’ll have to remember to give interp
two arguments: first, the Expr
to be interpreted, and second, the environment to interpret them in. How will we get an environment to pass to interp
? We can do something like Map.fromList [("x", 42), ("y", 5)]
, or, if we just want an empty environment, we can call Map.empty
. Let’s define a quick shorthand for empty environments:
1 2
emptyEnv :: Env
emptyEnv = Map.empty
Now we’re ready to call interp
. What happens if we try to interpret an Expr
containing variables, but we only give it emptyEnv
?
1 2
*Interp> interp (Plus (Var "x") (Number 5)) emptyEnv
*** Exception: Unbound variable!
The interpreter correctly reports that emptyEnv
isn’t enough information to interpret Var "x"
. Nevertheless, emptyEnv
is handy for when we just want to interpret expressions without variables.
1 2
*Interp> interp (Times (Number 2) (Number 0)) emptyEnv
0
But, when we do have variables to interpret, we need to give interp
an environment that binds them to values.
1 2
interp (Plus (Var "x") (Number 5)) (Map.fromList [("x", 42)])
47
Hooray! Our language now supports variables. Here’s the complete interpreter:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
import qualified Data.Map as Map
data Expr = Number Int
| Var String
| Plus Expr Expr
| Minus Expr Expr
| Times Expr Expr
type Env = Map.Map String Int
emptyEnv :: Env
emptyEnv = Map.empty
lookupEnv :: Env -> String -> Int
lookupEnv env x = case Map.lookup x env of
Just n -> n
Nothing -> error "Unbound variable!"
interp :: Expr -> Env -> Int
interp e env = case e of
Var x -> lookupEnv env x
Number n -> n
Plus e1 e2 -> (interp e1 env) + (interp e2 env)
Minus e1 e2 -> (interp e1 env) - (interp e2 env)
Times e1 e2 -> (interp e1 env) * (interp e2 env)
Parsing
Since we can now interpret expressions containing variables, it would be convenient to be able to lex and parse them as well, so that we can write, say, "(10 + (x + y))"
instead of Plus (Number 10) (Plus (Var "x") (Var "y"))
.
Last time, we generated a parser for our language from a Happy grammar file. Adding support for variables requires only a couple of tweaks to that file. Happily, we can deal with lexing and parsing variables in a way analogous to how we’re already dealing with integers. The resulting Happy grammar file is a little longer, but no more complicated.
Next time: functions
Now that we have support for variables in our language, we’re on our way to being able to write functions that take arguments. In the next post in this series, we’ll extend our interpreter to be able to handle functions and function calls.
All of the code from this post, plus a little additional scaffolding, is on GitHub. Here’s a diff of everything that changed from last time.