Writing a Compiler in transit
code for this project available here
Winter vacations usually always involve a 10hr + transit at Bangkok Airport. My initial plan of action was to finish the second season of “The Sopranos” (which is an awesome series btw). However due to a set of circumstances, involving a weariness of watching mobsters eradicate each other, I decided to revisit haskell.
So I got another airport wifi token, and quickly started saving pages from Learn You a Haskell and the Write yourself a Scheme in 48 hrs. What better way to learn a language than write a small compiler, I reasoned.
A small step into Monad-istan
I had a very primitive understanding of parser combinators. Ironically I had understood them better when I used kern, a clojure parser combinator library with wonderful documentation.
What took me a while to understand were (and where I still get confused at times) were Monads, the do .. notation and the puzzling bind (»=) operator
I almost exclaimed “Eureka” when I discovered that (»=) is equivalent to Scala’s flatMap. ie for a list, its equivalent to applying map and flatten.
The best and most comprehensive resource I found on the matter was the Monad section in What I Wish I Knew When Learning Haskell by Stephen Diehl
So I could do things like
*Main> [1,2,3] >>= (\ e -> [e + 1])
[2,3,4]
-- equivalent to
*Main> do { e <- [1,2,3]; return (e + 1)}
[2,3,4]
AST the first
Before I could write the parser I had to define a represetation for the AST that the parser would produce. By virtue of being a Lisp, pretty much everything was either a value, or a list of values. So I initially came up with :
data LispVal = LSymbol String
| LList [LispVal]
| LInteger Integer
| LFloat Double
| LString String
| LBool Bool
Unlike the tutorial I wanted to support Floats and not just Integers.
Enter Parsec
The parser was pretty standard. I just needed to mostly follow the tutorial. I desugared some of the code from do notation to using raw (»=), just to clarify myself. Towards the end I came up with:
parseExpr :: Parser LispVal
parseExpr = parseSymbol <|>
parseString <|>
parseQuoted <|>
try (parseInteger) <|>
parseFloat <|>
do char '('
x <- parseList
char ')'
return x
parseList :: Parser LispVal
parseList = liftM LList $ sepBy parseExpr spaces
parseString :: Parser LispVal
parseString = do char '"'
x <- many (noneOf "\"")
char '"'
return $ LString x
parseSymbol :: Parser LispVal
parseSymbol = do first <- letter <|> specialChar
rest <- many (letter <|> specialChar <|> digit)
let symbol= first:rest
return $ case symbol of
"true" -> LBool True
"false" -> LBool False
_ -> LSymbol symbol
parseInteger :: Parser LispVal
parseInteger = do lookAhead $ many1 digit >> notFollowedBy (oneOf ".")
ds <- many1 digit
return $ (LInteger . read) ds
parseFloat :: Parser LispVal
parseFloat = liftM (LFloat . read) $ do d1 <- (many digit)
c <- (char '.')
d2 <- (many digit)
return $ d1++[c]++d2
The Evaluation
This was my favorite part of the entire endeavour. It was also the point where I stopped following the tutorial.
The first task was thinking of the signature for eval. If I didn’t have any environment to worry about, eval would simply be LispVal -> LispVal.
type EnvVal = (Environment, LispVal)
However, because I wanted to worry about Environments from the getgo, I decided that eval should be Environment -> LispVal -> LispVal. That is if no functions could modify the existing environment. Which doesn’t really work. I needed to return eval to return a new environment. Thus I settled on Environment -> LispVal -> (Environment, LispVal)
First order of business was setting up the primitives data types that eval to themselves
type EnvVal = (Environment, LispVal)
eval :: Environment -> LispVal -> EnvVal
eval e v@(LString _) = (e,v)
eval e v@(LInteger _) = (e,v)
eval e v@(LBool _) = (e,v)
eval e v@(LFloat _) = (e,v)
eval e v@(LError _) = (e,v)
eval e (LList [LSymbol "quote", v]) = (e,v)
eval e v@(LList []) = (e,v)
And adding symbol insertion and lookup
eval env (LSymbol name) = (env, lookupSymbol env name [])
eval env (LList [LSymbol "def", LSymbol name, v]) =
((M.insert name (snd (eval env v)) env), LSymbol name)
Function Application
However, at this point I decided I needed to define some primitive functions which were always available in the Environment. To do so, I would first need to extend LispVal to include a type for Primive Functions. I also added a constructor for errors.
data LispVal = LSymbol String
| LList [LispVal]
| LInteger Integer
| LFloat Double
| LString String
| LBool Bool
| LPrimitive ([LispVal]->LispVal)
| LError String
I also needed a little bit of type inference as I wanted support for both floats and integers, and both had several common primitive functions. So behold, a very very poor man’s polymorphism:
eval env (LList (LSymbol func : args)) =
(env, apply env func $ map (snd . (eval env)) args)
apply :: Environment -> String -> [LispVal] -> LispVal
apply env func args =
case M.lookup func $ M.union listPrimitives env of
Just (LPrimitive f) -> (f args)
Nothing -> inferTypeAndApply func args
inferTypeAndApply :: String -> [LispVal] -> LispVal
inferTypeAndApply func args
| isIntList args = lookupSymbol intPrimitives func args
| isFloatList args = lookupSymbol floatPrimitives func args
| isBoolList args = lookupSymbol boolPrimitives func args
| otherwise = LError "symbol not defined"
Btw, this approach is probably quite brittle. No autopromotion of numeric types either.
A poor man’s prelude
Next up, I had to define a set of primitive functions. This was probably the easiest part of the entire endeavour. This was also around the time I started munching some delicious Phad Thai.
intPrimitives :: M.Map String LispVal
intPrimitives = M.fromList [("+", LPrimitive $ intBinaryOp (+)),
("-", LPrimitive $ intBinaryOp (-)),
("*", LPrimitive $ intBinaryOp (*)),
("/", LPrimitive $ intBinaryOp (quot)),
("mod", LPrimitive $ intBinaryOp (rem)),
("pow", LPrimitive $ intBinaryOp (^))]
intBinaryOp :: (Integer -> Integer -> Integer)->([LispVal]->LispVal)
intBinaryOp f = (\ args ->
case args of
(LInteger x):(LInteger y):[] -> LInteger (f x y)
_ -> LError "arity error")
floatPrimitives :: Environment
floatPrimitives = M.fromList [("+", LPrimitive $ floatBinaryOp (+)),
("-", LPrimitive $ floatBinaryOp (-)),
("*", LPrimitive $ floatBinaryOp (*)),
("/", LPrimitive $ floatBinaryOp (/)),
("pow", LPrimitive $ floatBinaryOp (**))]
floatBinaryOp :: (Double -> Double -> Double)->([LispVal]->LispVal)
floatBinaryOp f = (\ args ->
case args of
(LFloat x):(LFloat y):[] -> LFloat (f x y)
_ -> LError "arity error")
boolPrimitives :: Environment
boolPrimitives = M.fromList [("and", LPrimitive $ boolBinaryOp (B.&&)),
("or", LPrimitive $ boolBinaryOp (B.||)),
--("xor", LPrimitive $ boolBinaryOp (B.xor)),
("not", LPrimitive $ (\args ->
case args of
(LBool x):[] -> LBool $ B.not x
_ -> LError "arity error"))]
boolBinaryOp :: (Bool -> Bool -> Bool) -> ([LispVal]->LispVal)
boolBinaryOp f = (\args ->
case args of
(LBool x):(LBool y):[] -> LBool (f x y)
_ -> LError "arity error")
So hurray. A few hours down and I have a lispy calculator almost ready.
Getting impatient to see some action, I decided to implement the repl itself.
showVal :: LispVal -> String
showVal (LString c) = "\"" ++ c ++ "\""
showVal (LSymbol name) = name
showVal (LInteger i) = show i
showVal (LFloat f) = show f
showVal (LBool True) = "true"
showVal (LBool False) = "false"
showVal (LList c) = "(" ++ unwordsList c ++ ")"
showVal (LError e) = "Error : " ++ e
instance Show LispVal where show = showVal
readAndEval :: Environment -> String -> EnvVal
readAndEval env input = eval env $ readExpr input
printEval :: EnvVal -> IO ()
printEval p = putStrLn (show (snd p))
flushStr :: String -> IO ()
flushStr str = putStr str >> hFlush stdout
readPrompt :: String -> IO String
readPrompt prompt = flushStr prompt >> getLine
repl :: Environment -> IO ()
repl env = (readPrompt "Hisp>>> ") >>=
(\ inp ->
if inp == "quit" then
return ()
else
let pair = readAndEval env inp
in
printEval pair >>=
(\_ -> repl (fst pair)))
main = repl M.empty
I had to revisit Monad-istan while writing this. I decided to avoid the do notation altogether when writing the repl function, leading to ugly but understable (atleast for me) code.
Hisp>>> (+ 2 3)
5
Hisp>>> (+ 2.0 5.5)
7.5
Hisp>>> (and (or true false) true)
true
Hisp>>> (/ 2 3)
0
Hisp>>> (/ 2.0 3.0)
0.6666666666666666
Hisp>>> (pow 2 4)
16
Hisp>>> (pow 4.0 0.5)
2.0
I also decided to add some primitives for list operations. This was a lisp after all. For lists I went with the classical approach for representing them as cons cells.
toLList :: [LispVal] -> LispVal
toLList = foldr (\ l acc -> LList [LSymbol "cons", l, acc]) (LList [])
cons :: [LispVal] -> LispVal
cons args =
case args of
x:y:[] -> LList [LSymbol "cons",x,y]
_ -> LError "arity error"
first :: [LispVal] -> LispVal
first ((LList [LSymbol "cons",f,r]):[]) = f
first _ = LError "illegal arguments"
rest :: [LispVal] -> LispVal
rest ((LList [LSymbol "cons",f,r]):[]) = r
rest _ = LError "illegal arguments"
listPrimitives :: Environment
listPrimitives = M.fromList [("list", LPrimitive (toLList)),
("cons", LPrimitive (cons)),
("first", LPrimitive (first)),
("rest", LPrimitive (rest))
]
Hisp>>> (list 1 2 3)
(cons 1 (cons 2 (cons 3 ())))
Hisp>>> (def l (list 5 6 7 8 true false))
l
Hisp>>> l
(cons 5 (cons 6 (cons 7 (cons 8 (cons true (cons false ()))))))
Hisp>>> (first l)
5
Hisp>>> (rest l)
(cons 6 (cons 7 (cons 8 (cons true (cons false ())))))
So far so good!
Of Closures
I decided I needed a new Data Constructor for Lambdas, as lambdas needed to keep a copy of the environment where they are declared, as well as their bindings and body. To enable recursion, lambda’s should keep their names if defined.
Therefore I changed LispVal by adding LLambda:
data LispVal = LSymbol String
| LList [LispVal]
| LInteger Integer
| LFloat Double
| LString String
| LBool Bool
| LPrimitive ([LispVal]->LispVal)
| LLambda {name :: String,
env :: Environment,
bindings :: [LispVal],
body :: LispVal}
| LError String
I also needed to modify eval to give me Lambdas when seeing form like “(fn …)” or “(def .. (fn..”
-- for named functions
eval e (LList [LSymbol "def", LSymbol name, LList [LSymbol "fn", LList bindings, body@(LList _)]]) =
let newLambda = LLambda name e bindings body
newEnv = M.insert name newLambda e
in
(newEnv, newLambda)
-- for anonymous functions
eval e (LList [LSymbol "fn", LList bindings, body@(LList _)]) = (e, LLambda "" e bindings body)
For applying the Lambdas I implemented a helper function applyLambda. Lambdas could either be invoked via their symbols, or directly such as ((fn (bindings) body) args)
-- direct application
eval e (LList ((LList [LSymbol "fn", LList bindings, body]):args)) =
let lambda = LLambda "" e bindings body
in
applyLambda lambda args
-- added another case to apply for symbol application
apply :: Environment -> String -> [LispVal] -> LispVal
apply env func args =
case M.lookup func $ M.union listPrimitives env of
Just (LPrimitive f) -> (f args)
Just l@(LLambda _ _ _ _) -> snd $ applyLambda l args
Nothing -> inferTypeAndApply func args
applyLambda :: LispVal -> [LispVal] -> EnvVal
applyLambda l@(LLambda name lenv bindings body) args
| length args == length bindings =
let closure = M.union (M.fromList [(name, l)]) (M.union lenv (M.fromList (zip (map show bindings) args)))
in
eval closure body
| otherwise = (lenv, LError "arity error")
There’s a hack here. My original ambitious plan was to allow clojure style destructuring. That’s why the bindings are of type [LispVal]. However around dawn as both sleep and my flight approached, I just hacked around by mapping show on the bindings.
Another thing to note is that I added the name of the lambda to the environment when applying it. This is to support recursion.
So, finally trying it out
Hisp>>> (def sum (fn (a b) (+ a b)))
<lambda : sum-> body:(+ a b)>
Hisp>>> (sum 1 2)
3
Hisp>>> ((fn (a) (+ a 1)) 1)
2
By this time, my flight was almost ready.
Conditionals and Equality
After landing in SG, I realised in order to make factorial (aka hello world) I needed conditionals. and equality. So first I implemented a very simple if:
eval e (LList [LSymbol "if", cond, e1, e2]) =
let evalCond = (eval e cond)
in
case snd evalCond of
LBool True -> eval (fst evalCond) e1
LBool False -> eval (fst evalCond) e2
_ -> (e, LError "type error")
For equality I simply decided to add an Eq instance to the LispVal type.
instance Eq LispVal where
(LPrimitive _) == _ = False
(LLambda _ _ _ _) == _ = False
(LError _) == _ = False
(LInteger i1) == (LInteger i2) = (i1 == i2)
(LInteger i1) == _ = False
(LFloat f1) == (LFloat f2) = (f1 == f2)
(LFloat f1) == _ = False
(LString s1) == (LString s2) = (s1 == s2)
(LString s1) == _ = False
(LBool b1) == (LBool b2) = (b1 == b2)
(LBool b1) == _ = False
-- adding eval rule for =
eval e (LList [LSymbol "=", e1, e2]) =
let evalE1 = snd $ eval e e1
evalE2 = snd $ eval e e2
in
(e,LBool (evalE1 == evalE2))
Finally testing it..
Hisp>>> (def factorial
(fn (a)
(if (= a 0)
1
(* a (factorial (- a 1))))))
<lambda : factorial-> body:(if (= a 0) 1 (* a (factorial (- a 1))))>
Hisp>>> (factorial 5)
120
Eureka!
Afterthoughts
Well foremostly, this project helped me gain a very healthy appreciation for Haskell. Furthermore it helped de-mystefy some of the magic behind compilers and how programming languages are implemented.
Hisp is very very far from a complete compiler. There are several probelms with the way I designed the implementation:
- Doing IO currently isn’t really possible. Probably using IORef to represent my Environment instead of a pure Data.Map would help here.
- Several core functions like comparison (>,<, etc) aren’t available
- Being a noob, the code is repitive is several areas and could probably be immensely refactored.
- TESTS, TESTS, TESTS