Omer Iqbal bio photo

Omer Iqbal

geek . lisper . full-stack dev . emacs padawan . coffee addict

Twitter Facebook Github

Writing a Compiler in transit

code for this project available here


Winter vacations usually always involve a 10hr + transit at Bangkok Airport. My initial plan of action was to finish the second season of “The Sopranos” (which is an awesome series btw). However due to a set of circumstances, involving a weariness of watching mobsters eradicate each other, I decided to revisit haskell.

So I got another airport wifi token, and quickly started saving pages from Learn You a Haskell and the Write yourself a Scheme in 48 hrs. What better way to learn a language than write a small compiler, I reasoned.

A small step into Monad-istan

I had a very primitive understanding of parser combinators. Ironically I had understood them better when I used kern, a clojure parser combinator library with wonderful documentation.

What took me a while to understand were (and where I still get confused at times) were Monads, the do .. notation and the puzzling bind (»=) operator

I almost exclaimed “Eureka” when I discovered that (»=) is equivalent to Scala’s flatMap. ie for a list, its equivalent to applying map and flatten.

The best and most comprehensive resource I found on the matter was the Monad section in What I Wish I Knew When Learning Haskell by Stephen Diehl

So I could do things like

*Main> [1,2,3] >>= (\ e -> [e + 1])
[2,3,4]

-- equivalent to

*Main> do { e <- [1,2,3]; return (e + 1)}
[2,3,4]

AST the first

Before I could write the parser I had to define a represetation for the AST that the parser would produce. By virtue of being a Lisp, pretty much everything was either a value, or a list of values. So I initially came up with :

data LispVal = LSymbol String
             | LList [LispVal]
             | LInteger Integer
             | LFloat Double
             | LString String
             | LBool Bool

Unlike the tutorial I wanted to support Floats and not just Integers.

Enter Parsec

The parser was pretty standard. I just needed to mostly follow the tutorial. I desugared some of the code from do notation to using raw (»=), just to clarify myself. Towards the end I came up with:

parseExpr :: Parser LispVal
parseExpr = parseSymbol <|>
            parseString <|>
            parseQuoted <|>
            try (parseInteger) <|>
            parseFloat <|>
            do char '('
               x <- parseList
               char ')'
               return x

parseList :: Parser LispVal
parseList = liftM LList $ sepBy parseExpr spaces

parseString :: Parser LispVal
parseString = do char '"'
                 x <- many (noneOf "\"")
                 char '"'
                 return $ LString x


parseSymbol :: Parser LispVal
parseSymbol = do first <- letter <|> specialChar
                 rest <- many (letter <|> specialChar <|> digit)
                 let symbol= first:rest
                 return $ case symbol of
                   "true" -> LBool True
                   "false" -> LBool False
                   _ -> LSymbol symbol



parseInteger :: Parser LispVal
parseInteger = do lookAhead $ many1 digit >> notFollowedBy (oneOf ".")
                  ds <- many1 digit
                  return $ (LInteger . read) ds



parseFloat :: Parser LispVal
parseFloat = liftM (LFloat . read) $ do d1 <- (many digit)
                                        c <- (char '.')
                                        d2 <- (many digit)
                                        return $ d1++[c]++d2

The Evaluation

This was my favorite part of the entire endeavour. It was also the point where I stopped following the tutorial.

The first task was thinking of the signature for eval. If I didn’t have any environment to worry about, eval would simply be LispVal -> LispVal.

type EnvVal = (Environment, LispVal)

However, because I wanted to worry about Environments from the getgo, I decided that eval should be Environment -> LispVal -> LispVal. That is if no functions could modify the existing environment. Which doesn’t really work. I needed to return eval to return a new environment. Thus I settled on Environment -> LispVal -> (Environment, LispVal)

First order of business was setting up the primitives data types that eval to themselves

type EnvVal = (Environment, LispVal)

eval :: Environment -> LispVal -> EnvVal
eval e v@(LString _) = (e,v)
eval e v@(LInteger _) = (e,v)
eval e v@(LBool _) = (e,v)
eval e v@(LFloat _) = (e,v)
eval e v@(LError _) = (e,v)
eval e (LList [LSymbol "quote", v]) = (e,v)
eval e v@(LList []) = (e,v)

And adding symbol insertion and lookup

eval env (LSymbol name) = (env, lookupSymbol env name [])

eval env (LList [LSymbol "def", LSymbol name, v]) = 
    ((M.insert name (snd (eval env v)) env), LSymbol name)

Function Application

However, at this point I decided I needed to define some primitive functions which were always available in the Environment. To do so, I would first need to extend LispVal to include a type for Primive Functions. I also added a constructor for errors.

data LispVal = LSymbol String
             | LList [LispVal]
             | LInteger Integer
             | LFloat Double
             | LString String
             | LBool Bool
             | LPrimitive ([LispVal]->LispVal)
             | LError String

I also needed a little bit of type inference as I wanted support for both floats and integers, and both had several common primitive functions. So behold, a very very poor man’s polymorphism:

eval env (LList (LSymbol func : args)) = 
   (env, apply env func $ map (snd . (eval env)) args)

apply :: Environment -> String -> [LispVal] -> LispVal
apply env func args =
  case M.lookup func $ M.union listPrimitives env of
    Just (LPrimitive f) -> (f args)
    Nothing -> inferTypeAndApply func args

inferTypeAndApply :: String -> [LispVal] -> LispVal
inferTypeAndApply func args
  | isIntList args = lookupSymbol intPrimitives func args
  | isFloatList args = lookupSymbol floatPrimitives func args
  | isBoolList args = lookupSymbol boolPrimitives func args
  | otherwise = LError "symbol not defined"

Btw, this approach is probably quite brittle. No autopromotion of numeric types either.

A poor man’s prelude

Next up, I had to define a set of primitive functions. This was probably the easiest part of the entire endeavour. This was also around the time I started munching some delicious Phad Thai.

intPrimitives :: M.Map String LispVal
intPrimitives = M.fromList [("+", LPrimitive $ intBinaryOp (+)),
                            ("-", LPrimitive $ intBinaryOp (-)),
                            ("*", LPrimitive $ intBinaryOp (*)),
                            ("/", LPrimitive $ intBinaryOp (quot)),
                            ("mod", LPrimitive $ intBinaryOp (rem)),
                            ("pow", LPrimitive $ intBinaryOp (^))]


intBinaryOp :: (Integer -> Integer -> Integer)->([LispVal]->LispVal)
intBinaryOp f = (\ args ->
                  case args of
                    (LInteger x):(LInteger y):[] -> LInteger (f x y)
                    _ -> LError "arity error")


floatPrimitives :: Environment
floatPrimitives = M.fromList [("+", LPrimitive $ floatBinaryOp (+)),
                              ("-", LPrimitive $ floatBinaryOp (-)),
                              ("*", LPrimitive $ floatBinaryOp (*)),
                              ("/", LPrimitive $ floatBinaryOp (/)),
                              ("pow", LPrimitive $ floatBinaryOp (**))]

floatBinaryOp :: (Double -> Double -> Double)->([LispVal]->LispVal)
floatBinaryOp f = (\ args ->
                    case args of
                      (LFloat x):(LFloat y):[] -> LFloat (f x y)
                      _ -> LError "arity error")


boolPrimitives :: Environment
boolPrimitives = M.fromList [("and", LPrimitive $ boolBinaryOp (B.&&)),
                             ("or", LPrimitive $ boolBinaryOp (B.||)),
                             --("xor", LPrimitive $ boolBinaryOp (B.xor)),
                             ("not", LPrimitive $ (\args ->
                                                    case args of
                                                      (LBool x):[] -> LBool $ B.not x
                                                      _ -> LError "arity error"))]


boolBinaryOp :: (Bool -> Bool -> Bool) -> ([LispVal]->LispVal)
boolBinaryOp f = (\args ->
                   case args of
                     (LBool x):(LBool y):[] -> LBool (f x y)
                     _ -> LError "arity error")

So hurray. A few hours down and I have a lispy calculator almost ready.

Getting impatient to see some action, I decided to implement the repl itself.

showVal :: LispVal -> String
showVal (LString c) = "\"" ++ c ++ "\""
showVal (LSymbol name) = name
showVal (LInteger i) = show i
showVal (LFloat f) = show f
showVal (LBool True) = "true"
showVal (LBool False) = "false"
showVal (LList c) = "(" ++ unwordsList c ++ ")"
showVal (LError e) = "Error : " ++ e

instance Show LispVal where show = showVal

readAndEval :: Environment -> String -> EnvVal
readAndEval env input = eval env $ readExpr input

printEval :: EnvVal -> IO ()
printEval p = putStrLn (show (snd p))

flushStr :: String -> IO ()
flushStr str = putStr str >> hFlush stdout

readPrompt :: String -> IO String
readPrompt prompt = flushStr prompt >> getLine

repl :: Environment -> IO ()
repl env = (readPrompt "Hisp>>> ") >>=
           (\ inp ->
             if inp == "quit" then
               return ()
             else
               let pair = readAndEval env inp
               in
                printEval pair >>=
                (\_ -> repl (fst pair)))
                
main = repl M.empty

I had to revisit Monad-istan while writing this. I decided to avoid the do notation altogether when writing the repl function, leading to ugly but understable (atleast for me) code.

Hisp>>> (+ 2 3)
5
Hisp>>> (+ 2.0 5.5)
7.5
Hisp>>> (and (or true false) true)
true
Hisp>>> (/ 2 3)
0
Hisp>>> (/ 2.0 3.0)
0.6666666666666666
Hisp>>> (pow 2 4)
16
Hisp>>> (pow 4.0 0.5)
2.0

I also decided to add some primitives for list operations. This was a lisp after all. For lists I went with the classical approach for representing them as cons cells.

toLList :: [LispVal] -> LispVal
toLList = foldr (\ l acc -> LList [LSymbol "cons", l, acc]) (LList [])

cons :: [LispVal] -> LispVal
cons args =
  case args of
    x:y:[] ->   LList [LSymbol "cons",x,y]
    _ -> LError "arity error"

first :: [LispVal] -> LispVal
first ((LList [LSymbol "cons",f,r]):[]) = f
first  _ = LError "illegal arguments"

rest :: [LispVal] -> LispVal
rest ((LList [LSymbol "cons",f,r]):[]) = r
rest  _ = LError "illegal arguments"

listPrimitives :: Environment
listPrimitives = M.fromList [("list", LPrimitive (toLList)),
                             ("cons", LPrimitive (cons)),
                             ("first", LPrimitive (first)),
                             ("rest", LPrimitive (rest))
                            ]


Hisp>>> (list 1 2 3)
(cons 1 (cons 2 (cons 3 ())))
Hisp>>> (def l (list 5 6 7 8 true false))
l
Hisp>>> l
(cons 5 (cons 6 (cons 7 (cons 8 (cons true (cons false ()))))))
Hisp>>> (first l)
5
Hisp>>> (rest l)
(cons 6 (cons 7 (cons 8 (cons true (cons false ())))))

So far so good!

Of Closures

I decided I needed a new Data Constructor for Lambdas, as lambdas needed to keep a copy of the environment where they are declared, as well as their bindings and body. To enable recursion, lambda’s should keep their names if defined.

Therefore I changed LispVal by adding LLambda:

data LispVal = LSymbol String
             | LList [LispVal]
             | LInteger Integer
             | LFloat Double
             | LString String
             | LBool Bool
             | LPrimitive ([LispVal]->LispVal)
             | LLambda {name :: String,
                        env :: Environment,
                        bindings :: [LispVal],
                        body :: LispVal}
             | LError String

I also needed to modify eval to give me Lambdas when seeing form like “(fn …)” or “(def .. (fn..”

-- for named functions
 
eval e (LList [LSymbol "def", LSymbol name, LList [LSymbol "fn", LList bindings, body@(LList _)]]) =
  let     newLambda = LLambda name e bindings body
          newEnv = M.insert name newLambda e
  in
  (newEnv, newLambda)

-- for anonymous functions

eval e (LList [LSymbol "fn", LList bindings, body@(LList _)]) = (e, LLambda "" e bindings body)

For applying the Lambdas I implemented a helper function applyLambda. Lambdas could either be invoked via their symbols, or directly such as ((fn (bindings) body) args)

-- direct application


eval e (LList ((LList [LSymbol "fn", LList bindings, body]):args)) =
     let lambda = LLambda "" e bindings body
     in
        applyLambda lambda args

-- added another case to apply for symbol application

apply :: Environment -> String -> [LispVal] -> LispVal
apply env func args =
  case M.lookup func $ M.union listPrimitives env of
    Just (LPrimitive f) -> (f args)
    Just l@(LLambda _ _ _ _) -> snd $ applyLambda l args
    Nothing -> inferTypeAndApply func args



applyLambda :: LispVal -> [LispVal] -> EnvVal
applyLambda l@(LLambda name lenv bindings body) args
  | length args == length bindings =
           let closure = M.union (M.fromList [(name, l)]) (M.union lenv (M.fromList (zip (map show bindings) args)))
           in
             eval closure body
  | otherwise = (lenv, LError "arity error")

There’s a hack here. My original ambitious plan was to allow clojure style destructuring. That’s why the bindings are of type [LispVal]. However around dawn as both sleep and my flight approached, I just hacked around by mapping show on the bindings.

Another thing to note is that I added the name of the lambda to the environment when applying it. This is to support recursion.

So, finally trying it out

Hisp>>> (def sum (fn (a b) (+ a b)))
<lambda : sum-> body:(+ a b)>
Hisp>>> (sum 1 2)
3

Hisp>>> ((fn (a) (+ a 1)) 1)
2

By this time, my flight was almost ready.

Conditionals and Equality

After landing in SG, I realised in order to make factorial (aka hello world) I needed conditionals. and equality. So first I implemented a very simple if:

eval e (LList [LSymbol "if", cond, e1, e2]) =
     let evalCond = (eval e cond)
     in
     case snd evalCond of
         LBool True -> eval (fst evalCond) e1
         LBool False -> eval (fst evalCond) e2
         _ -> (e, LError "type error")

For equality I simply decided to add an Eq instance to the LispVal type.

instance Eq LispVal where
         (LPrimitive _) == _ = False
         (LLambda _ _ _ _) == _ = False
         (LError _) == _ = False

         (LInteger i1) == (LInteger i2) = (i1 == i2)
         (LInteger i1) == _ = False

         (LFloat f1) == (LFloat f2) = (f1 == f2)
         (LFloat f1) == _ = False

         (LString s1) == (LString s2) = (s1 == s2)
         (LString s1) == _ = False

         (LBool b1) == (LBool b2) = (b1 == b2)
         (LBool b1) == _ = False


-- adding eval rule for =

eval e (LList [LSymbol "=", e1, e2]) =
     let evalE1 = snd $ eval e e1
         evalE2 = snd $ eval e e2
     in
        (e,LBool (evalE1 == evalE2))

Finally testing it..

Hisp>>> (def factorial 
             (fn (a) 
                 (if (= a 0) 
                     1 
                     (* a (factorial (- a 1))))))
<lambda : factorial-> body:(if (= a 0) 1 (* a (factorial (- a 1))))>
Hisp>>> (factorial 5)
120

Eureka!

Afterthoughts

Well foremostly, this project helped me gain a very healthy appreciation for Haskell. Furthermore it helped de-mystefy some of the magic behind compilers and how programming languages are implemented.

Hisp is very very far from a complete compiler. There are several probelms with the way I designed the implementation:

  • Doing IO currently isn’t really possible. Probably using IORef to represent my Environment instead of a pure Data.Map would help here.
  • Several core functions like comparison (>,<, etc) aren’t available
  • Being a noob, the code is repitive is several areas and could probably be immensely refactored.
  • TESTS, TESTS, TESTS