 ### Omer Iqbal

geek . lisper . full-stack dev . emacs padawan . coffee addict

# Monadic Parsers: Implementing a micro Parsec

So I read somewhere that one could truly understand the beauty of monads by implementing monadic parsers. Given holidays and a lazy weekend, I decided to put that idea to the test.

A large section of the remainder of this post is derived and inspired from this amazing paper by Graham Hutton and Eric Meijer. I recommend giving it a read. It gives this matter far better treatment than a rookie’s blog post.

This post does not assume any background knowledge on Monads. Knowing about parsers might be helpful. Some basic Haskell knowledge would help too.

Code for this post can be found here.

## Enter a Parser

Lets start off by defining what is a parser.

``````data Parser a = Parser (String -> [(a, String)])
``````

So a Parser is a function that takes a String and returns a list of pairs of results and remaining Strings. It consumes a String, and produces a result of type a, along with the rest of the String that was left unparsed. To those not used to Type Constructors, just view a as a generic type in Java, or C++.

Therefore an empty list would denote the failure of a parser, where as a non-empty list would imply success.

Returning a list allows this definition to handle ambigious grammars that return multiple results. This would imply a string can be parsed in multiple ways.

If we only wanted to worry about getting a single result, we could have defined the parser with an option type.

``````data Parser a = Parser (String -> Maybe (a, String))
``````

In this scenario None would imply failure, while getting Just (a,String) would imply success with a single result pair.

However, we are going to stick with the first definition, as grammars can be ambigious.

## Our first Parser!

So lets now implement the simplest Parser possible:

``````item :: Parser Char
item = Parser (\s -> case s of
"" -> []
(c:cs) -> [(c,cs)])
``````

item is a parser that consumes the first character in a string, and returns that character as a result. It fails by returning an empty List, if the String is empty.

## Enter bind

So far you’re probably wondering that the hell Monads have to do with anything yet. We’re going to approach Monads in a bottom up way.

Firstly lets define a function to apply a Parser

``````parse (Parser p) = p
``````

I.e. extract the function from the Parser, and apply it.

So now with the interesting bit.

Lets say we want the ability to “bind” two parsers together. That is, chain the parsers such that result of one parser is fed into a function that takes that result and produces a new parser.

Another way to think about this is that you are composing two parsers together. This will be more obvious later on.

So lets implement bind

``````bind :: Parser a -> (a -> Parser b) -> Parser b
bind p f = Parser (\s -> concat \$
map (\ (a, s') -> parse (f a) s')
\$ parse p s)
``````

So if you read it backwards, first we apply the Parser p on the String s, which would return a list of tuples. Next we map over this list, and apply the function f to the result a. And then we parse the remaning string s’ with the new Parser we got from (f a). In the end we simply concat to flatten the list.

Perhaps this bit of refactoring would make this more readable:

``````bind :: Parser a -> (a -> Parser b) -> Parser b
bind p f = Parser (\s -> parse p s |>
map (\ (a, s') -> parse (f a) s') |>
concat)
``````

Sidenote. I personally prefer the OCaml/F# style of using the pipe operator `|>` for left to right application, which for me is more readable.

``````(|>) :: b -> (b -> c) -> c
(|>) = flip (\$)
``````

## Enter unit

The next building block we are going to work on is the function `unit`. `unit` takes any value and produces a Parser from it which consumes nothing and returns that value.

``````unit :: a -> Parser a
unit a = Parser (\s -> [(a,s)])
``````

This may not seem very useful, but bear with me.

## Why bind and unit?

So why did we implemenent `bind` and `unit`. Well they act as a building blocks for making parser combinators. Parser combinators, to those who haven’t heard of them before, are compositions of parsers.

A good example here would be a combinator called `satisfies`. This takes a predicate and returns a Parser that consumes a character if that predicate is satisfied, or fails otherwise.

Using `bind` and `unit`, we can implement `satisfies`:

``````satisfies :: (Char -> Bool) -> Parser Char
satisfies p = item `bind` \c ->
if p c then unit c else (Parser (\cs -> []))
``````

`item` from before is a parser that consumes the first character and returns it. This character is passed to a lambda that takes applies the predicat `p` on that character and returns Parser constructed from ‘unit’ denoting success, or a Parser returning an empty list, denoting failure.

Monads are defined as the type class :

``````class Monad m where
return :: a -> m a
(>>=) :: m a -> (a -> m b) -> m b
``````

If you look hard at the type signatures of `return` and `>>=` they are exactly the same as our `unit` and `bind` respectively, if you replace the m type variable with `Parser`.

As it happens, this pattern is quite common in many other areas, which is why the Monads are so ubiquitous in haskell.

So we could refactor our implementations of `bind` and `unit` by making them instances of the Monad typeclass:

``````instance Monad Parser where
return a = Parser (\s -> [(a,s)])
p >>= f = Parser (\s -> parse p s |>
map (\ (a, s') -> parse (f a) s') |>
concat)
``````

Et Voila! We have implemented a Parser Monad.

For a better treatment of Monads, I strongly recommend this tutorial

## do Notation

Typically parsers build using `>>=` would look like this:

``````p1 >>= \a1 ->
p2 >>= \a2 ->
...
pn >>= \an ->
f a1 a2 ... an
``````

Such a parser reads: Apply p1, and call its result a1. Then apply p2 and call it’s result a2. And so on. In the end perform some semantic action with f using the parser results. This could mean returning an AST node, or whatever you’re trying to parse.

``````do a1 <- p1
a2 <- p2
...
an <- pn
f a1 a2 ... an
``````

Essentially both representations mean the same thing.

The do notation can be written in a single line:

``````do {a1 <- p1; a2 <- p2; ...; an <- pn; f a1 a2 ... an}
``````

Just to clarify, I’ll implement a parser that consumes three characters, throws the second character away, and returns a tuple of the other two.

``````p :: Parser (Char,Char)
p = do {c <- item; item; d <- item; return (c,d)}
``````

The thing to note here is how I don’t have a `<-` subexpression for the second `item`. That’s because I don’t care about it’s result.

## More combinators

Now it’s time to implement some common combinators found in most Parser Combinator libraries.

``````class Monad m => MonadPlus m where
mzero :: m a
mplus :: m a -> m a -> m a

-- mplus for the Parser is like an choice operator.
mzero = Parser (\cs -> [])
mplus p q = Parser (\s -> (parse p) s ++ (parse q) s)

option :: Parser a -> Parser a -> Parser a
option  p q = Parser (\s -> case parse (mplus p q) s of
[] -> []
(x:xs) -> [x])
``````

The MonadPlus type class is another common pattern, which applies to our Parser implementation. The `mzero` method basically denotes failure, while `mplus` which in it’s implementation concatenates the results from two Parsers, acts as a sort of choice operator. This is because the failure is `[]`, so if the first parser fails, `[]` is simply concatenated with the result of the second parser.

Because in most cases we only care about the first result, we have another combinator `option` which just applies `mplus` and returns the first result on success, or the `[]` on failure.

We also introduce two new combinators, `char` and `string`. `char` takes a character and consumes and returns it if present, otherwise fails . `string` takes a string and consumes that string if found in the input.

``````char :: Char -> Parser Char
char c = satisfies (c ==)

string :: String -> Parser String
string "" = return ""
string (c:cs) = do { char c; string cs; return (c:cs)}
``````

As the implementation of `string` shows, parsers can be recursive.

``````many :: Parser a -> Parser [a]
many p = many1 p `option` return []

many1 :: Parser a -> Parser [a]
many1 p = do { a <- p; as <- many p; return (a:as)}

sepBy :: Parser a -> Parser b -> Parser [a]
p `sepBy` sep = (p `sepBy1` sep) `option` return []

sepBy1 :: Parser a -> Parser b -> Parser [a]
p `sepBy1` sep = do a <- p
as <- many (do {sep; p})
return (a:as)
``````

`many` parses repeated (0 or more) applications of Parser `p`. `many1` parsers 1 or more repeated applications of `p`.

`sepBy` parses repeated applications (0 or more) of `p` separated by applications of Parser `sep` whose values are thrown away. `sepBy1` imples 1 or more application as before.

## Lexical Parsing

Parsing is usually preceded by a lexical phase which tokenises the input. With parser combinators we can define parses to perform tokenising for us.

``````space :: Parser String
space = many (satisfies isSpace)
where isSpace ' ' = True
isSpace '\n' = True
isSpace '\t' = True
isSpace _ = False

token :: Parser a -> Parser a
token p = do { a <- p; space ; return a}

symb :: String -> Parser String
symb s = token (string s)

digit :: Parser Char
digit = satisfies C.isDigit

number :: Parser Int
number = do cs <- many1 digit
``````

The `space` combinator consumes any whitespace. The `token` combinator returns a token ignoring any trailing space. The `digit` combinator parses a single digit, while the `number` combinator parses integers.

## Example Grammars!

Armed with combinators, we can now implement some grammars, putting this little library to action.

The first example is of a simple math expression language with infix operators.

``````expr :: Parser Int
addop :: Parser (Int -> Int -> Int)
mulop :: Parser (Int -> Int -> Int)

term = factor `chainl1` mulop
factor = number `option` do { symb "("; n <- expr; symb ")"; return n}

addop = do {symb "+"; return (+)} `option` do {symb "-"; return (-)}
mulop = do {symb "*"; return (*)} `option` do {symb "/"; return (div)}

run :: String -> Int
run s = case parse expr s of
[(num, _)] -> num
``````

Trying this out:

``````λ> run "1"
1
λ> run "3+4"
7
λ> run "3+5*3"
18
λ> run "10/5+15"
17
``````

It works!

The only new combinator here is the `chainl1` combinator which parses repated applications of a parser `p` separated by applications of a parser `op` whose result value is used to combine the result from the `p` parsers.

``````chainl :: Parser a -> Parser (a -> a -> a) -> a -> Parser a
chainl p op a = (p `chainl1` op) `option` return a

chainl1 :: Parser a -> Parser (a -> a -> a) -> Parser a
p `chainl1` op = do {a <- p; rest a}
where rest a = (do f <- op
b <- p
rest (f a b))
`option` return a
``````

## Parsing JSON

Here is a simple JSON parser, I implemented in a few minutes using this little library.

``````module Json where
import Parser

data JSONVal = JBool Bool |
JString String |
JNumber Int |
JNull |
JArray [JSONVal] |
JPair (String, JSONVal) |
JObject [(String, JSONVal)]
deriving (Show)

parseBool :: Parser JSONVal
parseBool = do { symb "true"; return \$ JBool True} `option`
do { symb "false"; return \$ JBool False }

parseString :: Parser JSONVal
parseString = do { s <- quotedString; return \$ JString s}

parseNumber :: Parser JSONVal
parseNumber = do { n <- number; return \$ JNumber n}

parseNull :: Parser JSONVal
parseNull = do { symb "null"; return JNull}

parseArray :: Parser JSONVal
parseArray = do { symb "["; l <- parseJson `sepBy` (symb ","); symb "]"; return \$ JArray l}

parsePair :: Parser JSONVal
parsePair = do { k <- quotedString; symb ":"; v <- parseJson; return \$ JPair (k,v)}

parseObject :: Parser JSONVal
parseObject = do { symb "{"; ps <- many parsePair ; symb "}";
return \$ JObject \$ map (\ (JPair p) -> p) ps}

parseJson :: Parser JSONVal
parseJson = parseBool `option` parseString `option` parseNumber `option`
parseNull `option` parseArray `option` parseObject
``````

## Conclusion

After this, one should be able to recognize the beauty of Monads. Personally I was completely mind blown after realising that implementing Parser Combinators is this easy.

Having said that, I wouldn’t recommend using this little library for anything serious. Parsec exists, and does this and far more, including proper error handling which I don’t care about for now.