 ### Omer Iqbal

geek . lisper . full-stack dev . emacs padawan . coffee addict

# Writing a Compiler in transit

code for this project available here

Winter vacations usually always involve a 10hr + transit at Bangkok Airport. My initial plan of action was to finish the second season of “The Sopranos” (which is an awesome series btw). However due to a set of circumstances, involving a weariness of watching mobsters eradicate each other, I decided to revisit haskell.

So I got another airport wifi token, and quickly started saving pages from Learn You a Haskell and the Write yourself a Scheme in 48 hrs. What better way to learn a language than write a small compiler, I reasoned.

## A small step into Monad-istan

I had a very primitive understanding of parser combinators. Ironically I had understood them better when I used kern, a clojure parser combinator library with wonderful documentation.

What took me a while to understand were (and where I still get confused at times) were Monads, the do .. notation and the puzzling bind (»=) operator

I almost exclaimed “Eureka” when I discovered that (»=) is equivalent to Scala’s flatMap. ie for a list, its equivalent to applying map and flatten.

The best and most comprehensive resource I found on the matter was the Monad section in What I Wish I Knew When Learning Haskell by Stephen Diehl

So I could do things like

``````*Main> [1,2,3] >>= (\ e -> [e + 1])
[2,3,4]

-- equivalent to

*Main> do { e <- [1,2,3]; return (e + 1)}
[2,3,4]
``````

## AST the first

Before I could write the parser I had to define a represetation for the AST that the parser would produce. By virtue of being a Lisp, pretty much everything was either a value, or a list of values. So I initially came up with :

``````data LispVal = LSymbol String
| LList [LispVal]
| LInteger Integer
| LFloat Double
| LString String
| LBool Bool
``````

Unlike the tutorial I wanted to support Floats and not just Integers.

## Enter Parsec

The parser was pretty standard. I just needed to mostly follow the tutorial. I desugared some of the code from do notation to using raw (»=), just to clarify myself. Towards the end I came up with:

``````parseExpr :: Parser LispVal
parseExpr = parseSymbol <|>
parseString <|>
parseQuoted <|>
try (parseInteger) <|>
parseFloat <|>
do char '('
x <- parseList
char ')'
return x

parseList :: Parser LispVal
parseList = liftM LList \$ sepBy parseExpr spaces

parseString :: Parser LispVal
parseString = do char '"'
x <- many (noneOf "\"")
char '"'
return \$ LString x

parseSymbol :: Parser LispVal
parseSymbol = do first <- letter <|> specialChar
rest <- many (letter <|> specialChar <|> digit)
let symbol= first:rest
return \$ case symbol of
"true" -> LBool True
"false" -> LBool False
_ -> LSymbol symbol

parseInteger :: Parser LispVal
parseInteger = do lookAhead \$ many1 digit >> notFollowedBy (oneOf ".")
ds <- many1 digit
return \$ (LInteger . read) ds

parseFloat :: Parser LispVal
parseFloat = liftM (LFloat . read) \$ do d1 <- (many digit)
c <- (char '.')
d2 <- (many digit)
return \$ d1++[c]++d2
``````

## The Evaluation

This was my favorite part of the entire endeavour. It was also the point where I stopped following the tutorial.

The first task was thinking of the signature for eval. If I didn’t have any environment to worry about, eval would simply be LispVal -> LispVal.

``````type EnvVal = (Environment, LispVal)
``````

However, because I wanted to worry about Environments from the getgo, I decided that eval should be Environment -> LispVal -> LispVal. That is if no functions could modify the existing environment. Which doesn’t really work. I needed to return eval to return a new environment. Thus I settled on Environment -> LispVal -> (Environment, LispVal)

First order of business was setting up the primitives data types that eval to themselves

``````type EnvVal = (Environment, LispVal)

eval :: Environment -> LispVal -> EnvVal
eval e v@(LString _) = (e,v)
eval e v@(LInteger _) = (e,v)
eval e v@(LBool _) = (e,v)
eval e v@(LFloat _) = (e,v)
eval e v@(LError _) = (e,v)
eval e (LList [LSymbol "quote", v]) = (e,v)
eval e v@(LList []) = (e,v)
``````

And adding symbol insertion and lookup

``````eval env (LSymbol name) = (env, lookupSymbol env name [])

eval env (LList [LSymbol "def", LSymbol name, v]) =
((M.insert name (snd (eval env v)) env), LSymbol name)
``````

## Function Application

However, at this point I decided I needed to define some primitive functions which were always available in the Environment. To do so, I would first need to extend LispVal to include a type for Primive Functions. I also added a constructor for errors.

``````data LispVal = LSymbol String
| LList [LispVal]
| LInteger Integer
| LFloat Double
| LString String
| LBool Bool
| LPrimitive ([LispVal]->LispVal)
| LError String
``````

I also needed a little bit of type inference as I wanted support for both floats and integers, and both had several common primitive functions. So behold, a very very poor man’s polymorphism:

``````eval env (LList (LSymbol func : args)) =
(env, apply env func \$ map (snd . (eval env)) args)

apply :: Environment -> String -> [LispVal] -> LispVal
apply env func args =
case M.lookup func \$ M.union listPrimitives env of
Just (LPrimitive f) -> (f args)
Nothing -> inferTypeAndApply func args

inferTypeAndApply :: String -> [LispVal] -> LispVal
inferTypeAndApply func args
| isIntList args = lookupSymbol intPrimitives func args
| isFloatList args = lookupSymbol floatPrimitives func args
| isBoolList args = lookupSymbol boolPrimitives func args
| otherwise = LError "symbol not defined"
``````

Btw, this approach is probably quite brittle. No autopromotion of numeric types either.

## A poor man’s prelude

Next up, I had to define a set of primitive functions. This was probably the easiest part of the entire endeavour. This was also around the time I started munching some delicious Phad Thai.

``````intPrimitives :: M.Map String LispVal
intPrimitives = M.fromList [("+", LPrimitive \$ intBinaryOp (+)),
("-", LPrimitive \$ intBinaryOp (-)),
("*", LPrimitive \$ intBinaryOp (*)),
("/", LPrimitive \$ intBinaryOp (quot)),
("mod", LPrimitive \$ intBinaryOp (rem)),
("pow", LPrimitive \$ intBinaryOp (^))]

intBinaryOp :: (Integer -> Integer -> Integer)->([LispVal]->LispVal)
intBinaryOp f = (\ args ->
case args of
(LInteger x):(LInteger y):[] -> LInteger (f x y)
_ -> LError "arity error")

floatPrimitives :: Environment
floatPrimitives = M.fromList [("+", LPrimitive \$ floatBinaryOp (+)),
("-", LPrimitive \$ floatBinaryOp (-)),
("*", LPrimitive \$ floatBinaryOp (*)),
("/", LPrimitive \$ floatBinaryOp (/)),
("pow", LPrimitive \$ floatBinaryOp (**))]

floatBinaryOp :: (Double -> Double -> Double)->([LispVal]->LispVal)
floatBinaryOp f = (\ args ->
case args of
(LFloat x):(LFloat y):[] -> LFloat (f x y)
_ -> LError "arity error")

boolPrimitives :: Environment
boolPrimitives = M.fromList [("and", LPrimitive \$ boolBinaryOp (B.&&)),
("or", LPrimitive \$ boolBinaryOp (B.||)),
--("xor", LPrimitive \$ boolBinaryOp (B.xor)),
("not", LPrimitive \$ (\args ->
case args of
(LBool x):[] -> LBool \$ B.not x
_ -> LError "arity error"))]

boolBinaryOp :: (Bool -> Bool -> Bool) -> ([LispVal]->LispVal)
boolBinaryOp f = (\args ->
case args of
(LBool x):(LBool y):[] -> LBool (f x y)
_ -> LError "arity error")
``````

So hurray. A few hours down and I have a lispy calculator almost ready.

Getting impatient to see some action, I decided to implement the repl itself.

``````showVal :: LispVal -> String
showVal (LString c) = "\"" ++ c ++ "\""
showVal (LSymbol name) = name
showVal (LInteger i) = show i
showVal (LFloat f) = show f
showVal (LBool True) = "true"
showVal (LBool False) = "false"
showVal (LList c) = "(" ++ unwordsList c ++ ")"
showVal (LError e) = "Error : " ++ e

instance Show LispVal where show = showVal

readAndEval :: Environment -> String -> EnvVal

printEval :: EnvVal -> IO ()
printEval p = putStrLn (show (snd p))

flushStr :: String -> IO ()
flushStr str = putStr str >> hFlush stdout

readPrompt :: String -> IO String
readPrompt prompt = flushStr prompt >> getLine

repl :: Environment -> IO ()
repl env = (readPrompt "Hisp>>> ") >>=
(\ inp ->
if inp == "quit" then
return ()
else
let pair = readAndEval env inp
in
printEval pair >>=
(\_ -> repl (fst pair)))

main = repl M.empty
``````

I had to revisit Monad-istan while writing this. I decided to avoid the do notation altogether when writing the repl function, leading to ugly but understable (atleast for me) code.

``````Hisp>>> (+ 2 3)
5
Hisp>>> (+ 2.0 5.5)
7.5
Hisp>>> (and (or true false) true)
true
Hisp>>> (/ 2 3)
0
Hisp>>> (/ 2.0 3.0)
0.6666666666666666
Hisp>>> (pow 2 4)
16
Hisp>>> (pow 4.0 0.5)
2.0
``````

I also decided to add some primitives for list operations. This was a lisp after all. For lists I went with the classical approach for representing them as cons cells.

``````toLList :: [LispVal] -> LispVal
toLList = foldr (\ l acc -> LList [LSymbol "cons", l, acc]) (LList [])

cons :: [LispVal] -> LispVal
cons args =
case args of
x:y:[] ->   LList [LSymbol "cons",x,y]
_ -> LError "arity error"

first :: [LispVal] -> LispVal
first ((LList [LSymbol "cons",f,r]):[]) = f
first  _ = LError "illegal arguments"

rest :: [LispVal] -> LispVal
rest ((LList [LSymbol "cons",f,r]):[]) = r
rest  _ = LError "illegal arguments"

listPrimitives :: Environment
listPrimitives = M.fromList [("list", LPrimitive (toLList)),
("cons", LPrimitive (cons)),
("first", LPrimitive (first)),
("rest", LPrimitive (rest))
]

Hisp>>> (list 1 2 3)
(cons 1 (cons 2 (cons 3 ())))
Hisp>>> (def l (list 5 6 7 8 true false))
l
Hisp>>> l
(cons 5 (cons 6 (cons 7 (cons 8 (cons true (cons false ()))))))
Hisp>>> (first l)
5
Hisp>>> (rest l)
(cons 6 (cons 7 (cons 8 (cons true (cons false ())))))
``````

So far so good!

## Of Closures

I decided I needed a new Data Constructor for Lambdas, as lambdas needed to keep a copy of the environment where they are declared, as well as their bindings and body. To enable recursion, lambda’s should keep their names if defined.

Therefore I changed LispVal by adding LLambda:

``````data LispVal = LSymbol String
| LList [LispVal]
| LInteger Integer
| LFloat Double
| LString String
| LBool Bool
| LPrimitive ([LispVal]->LispVal)
| LLambda {name :: String,
env :: Environment,
bindings :: [LispVal],
body :: LispVal}
| LError String
``````

I also needed to modify eval to give me Lambdas when seeing form like “(fn …)” or “(def .. (fn..”

``````-- for named functions

eval e (LList [LSymbol "def", LSymbol name, LList [LSymbol "fn", LList bindings, body@(LList _)]]) =
let     newLambda = LLambda name e bindings body
newEnv = M.insert name newLambda e
in
(newEnv, newLambda)

-- for anonymous functions

eval e (LList [LSymbol "fn", LList bindings, body@(LList _)]) = (e, LLambda "" e bindings body)
``````

For applying the Lambdas I implemented a helper function applyLambda. Lambdas could either be invoked via their symbols, or directly such as ((fn (bindings) body) args)

``````-- direct application

eval e (LList ((LList [LSymbol "fn", LList bindings, body]):args)) =
let lambda = LLambda "" e bindings body
in
applyLambda lambda args

-- added another case to apply for symbol application

apply :: Environment -> String -> [LispVal] -> LispVal
apply env func args =
case M.lookup func \$ M.union listPrimitives env of
Just (LPrimitive f) -> (f args)
Just l@(LLambda _ _ _ _) -> snd \$ applyLambda l args
Nothing -> inferTypeAndApply func args

applyLambda :: LispVal -> [LispVal] -> EnvVal
applyLambda l@(LLambda name lenv bindings body) args
| length args == length bindings =
let closure = M.union (M.fromList [(name, l)]) (M.union lenv (M.fromList (zip (map show bindings) args)))
in
eval closure body
| otherwise = (lenv, LError "arity error")
``````

There’s a hack here. My original ambitious plan was to allow clojure style destructuring. That’s why the bindings are of type [LispVal]. However around dawn as both sleep and my flight approached, I just hacked around by mapping show on the bindings.

Another thing to note is that I added the name of the lambda to the environment when applying it. This is to support recursion.

So, finally trying it out

``````Hisp>>> (def sum (fn (a b) (+ a b)))
<lambda : sum-> body:(+ a b)>
Hisp>>> (sum 1 2)
3

Hisp>>> ((fn (a) (+ a 1)) 1)
2
``````

By this time, my flight was almost ready.

## Conditionals and Equality

After landing in SG, I realised in order to make factorial (aka hello world) I needed conditionals. and equality. So first I implemented a very simple if:

``````eval e (LList [LSymbol "if", cond, e1, e2]) =
let evalCond = (eval e cond)
in
case snd evalCond of
LBool True -> eval (fst evalCond) e1
LBool False -> eval (fst evalCond) e2
_ -> (e, LError "type error")
``````

For equality I simply decided to add an Eq instance to the LispVal type.

``````instance Eq LispVal where
(LPrimitive _) == _ = False
(LLambda _ _ _ _) == _ = False
(LError _) == _ = False

(LInteger i1) == (LInteger i2) = (i1 == i2)
(LInteger i1) == _ = False

(LFloat f1) == (LFloat f2) = (f1 == f2)
(LFloat f1) == _ = False

(LString s1) == (LString s2) = (s1 == s2)
(LString s1) == _ = False

(LBool b1) == (LBool b2) = (b1 == b2)
(LBool b1) == _ = False

-- adding eval rule for =

eval e (LList [LSymbol "=", e1, e2]) =
let evalE1 = snd \$ eval e e1
evalE2 = snd \$ eval e e2
in
(e,LBool (evalE1 == evalE2))
``````

Finally testing it..

``````Hisp>>> (def factorial
(fn (a)
(if (= a 0)
1
(* a (factorial (- a 1))))))
<lambda : factorial-> body:(if (= a 0) 1 (* a (factorial (- a 1))))>
Hisp>>> (factorial 5)
120
``````

Eureka!

## Afterthoughts

Well foremostly, this project helped me gain a very healthy appreciation for Haskell. Furthermore it helped de-mystefy some of the magic behind compilers and how programming languages are implemented.

Hisp is very very far from a complete compiler. There are several probelms with the way I designed the implementation:

• Doing IO currently isn’t really possible. Probably using IORef to represent my Environment instead of a pure Data.Map would help here.
• Several core functions like comparison (>,<, etc) aren’t available
• Being a noob, the code is repitive is several areas and could probably be immensely refactored.
• TESTS, TESTS, TESTS