Parsing¶
Idris 2 comes with a Lexing and a Parsing library built into the contrib
package.
For this cookbook, we will write a very simple parser for a lambda calculus parser that will accept the following language:
let name = world in (\x.hello x) name
Once we write a lambda calculus parser, we will also see how we can take advantage of a powerful built in expression parser in Idris 2 to write a small calculator that should be smart enough to parse the following expression:
1 + 2 - 3 * 4 / 5
Lexer¶
The main lexer module is under Text.Lexer
. This module contains toTokenMap
which is a function
that converts a List (Lexer, k) -> TokenMap (Token k)
where k
is a token kind. This function
could be used for simple lexer to token mappings. The module also includes high level lexers for
specifying quantity and common programming primitives like alphas
, intLit
,
lineComment
and blockComment
.
The Text.Lexer
module also reexports Text.Lexer.Core
, Text.Quantity
and Text.Token
.
Text.Lexer.Core
provides the building blocks of the lexer, including a type called
Recognise
which is the underlying data type for the lexer. The other important function that this
module provides is a lex
which takes in a lexer and returns the tokens.
Text.Quantity
provides a data type Quantity
which can be used with certain lexers to specify
how many times something is expected to appear.
Text.Token
provides a data type Token
that represents a parsed token, its kind and the text.
This module also provides an important interface called TokenKind
which tells the lexer how to map
token kinds to Idris 2 types and how to convert each kind from a string to a value.
Parser¶
The main parser module is under Text.Parser
. This module contains different grammar parsers, the main
one being match
which takes a TokenKind
and returns the value as defined in the TokenKind
interface. There are other grammar parsers as well, but for our example, we will only be using match
.
The Text.Parser
module reexports Text.Parser.Core
, Text.Quantity
and Text.Token
.
Text.Parser.Core
provides the building blocks of the parser, including a type called Grammar
which is the underlying data type for the parser. The other important function that this module provides
is parse
which takes in a Grammar
and returns the parsed expression.
We covered Text.Quantity
and Text.Token
in the Lexer section so we’re not going to
repeat what they do here.
Lambda Calculus Lexer & Parser¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 | import Data.List
import Data.List1
import Text.Lexer
import Text.Parser
%default total
data Expr = App Expr Expr | Abs String Expr | Var String | Let String Expr Expr
Show Expr where
showPrec d (App e1 e2) = showParens (d == App) (showPrec (User 0) e1 ++ " " ++ showPrec App e2)
showPrec d (Abs v e) = showParens (d > Open) ("\\" ++ v ++ "." ++ show e)
showPrec d (Var v) = v
showPrec d (Let v e1 e2) = showParens (d > Open) ("let " ++ v ++ " = " ++ show e1 ++ " in " ++ show e2)
data LambdaTokenKind
= LTLambda
| LTIdentifier
| LTDot
| LTOParen
| LTCParen
| LTIgnore
| LTLet
| LTEqual
| LTIn
Eq LambdaTokenKind where
(==) LTLambda LTLambda = True
(==) LTDot LTDot = True
(==) LTIdentifier LTIdentifier = True
(==) LTOParen LTOParen = True
(==) LTCParen LTCParen = True
(==) LTLet LTLet = True
(==) LTEqual LTEqual = True
(==) LTIn LTIn = True
(==) _ _ = False
Show LambdaTokenKind where
show LTLambda = "LTLambda"
show LTDot = "LTDot"
show LTIdentifier = "LTIdentifier"
show LTOParen = "LTOParen"
show LTCParen = "LTCParen"
show LTIgnore = "LTIgnore"
show LTLet = "LTLet"
show LTEqual = "LTEqual"
show LTIn = "LTIn"
LambdaToken : Type
LambdaToken = Token LambdaTokenKind
Show LambdaToken where
show (Tok kind text) = "Tok kind: " ++ show kind ++ " text: " ++ text
TokenKind LambdaTokenKind where
TokType LTIdentifier = String
TokType _ = ()
tokValue LTLambda _ = ()
tokValue LTIdentifier s = s
tokValue LTDot _ = ()
tokValue LTOParen _ = ()
tokValue LTCParen _ = ()
tokValue LTIgnore _ = ()
tokValue LTLet _ = ()
tokValue LTEqual _ = ()
tokValue LTIn _ = ()
ignored : WithBounds LambdaToken -> Bool
ignored (MkBounded (Tok LTIgnore _) _ _) = True
ignored _ = False
identifier : Lexer
identifier = alpha <+> many alphaNum
keywords : List (String, LambdaTokenKind)
keywords = [
("let", LTLet),
("in", LTIn)
]
lambdaTokenMap : TokenMap LambdaToken
lambdaTokenMap = toTokenMap [(spaces, LTIgnore)] ++
[(identifier, \s =>
case lookup s keywords of
(Just kind) => Tok kind s
Nothing => Tok LTIdentifier s
)
] ++ toTokenMap [
(exact "\\", LTLambda),
(exact ".", LTDot),
(exact "(", LTOParen),
(exact ")", LTCParen),
(exact "=", LTEqual)
]
lexLambda : String -> Maybe (List (WithBounds LambdaToken))
lexLambda str =
case lex lambdaTokenMap str of
(tokens, _, _, "") => Just tokens
_ => Nothing
mutual
expr : Grammar state LambdaToken True Expr
expr = do
t <- term
app t <|> pure t
term : Grammar state LambdaToken True Expr
term = abs
<|> var
<|> paren
<|> letE
app : Expr -> Grammar state LambdaToken True Expr
app e1 = do
e2 <- term
app1 $ App e1 e2
app1 : Expr -> Grammar state LambdaToken False Expr
app1 e = app e <|> pure e
abs : Grammar state LambdaToken True Expr
abs = do
match LTLambda
commit
argument <- match LTIdentifier
match LTDot
e <- expr
pure $ Abs argument e
var : Grammar state LambdaToken True Expr
var = map Var $ match LTIdentifier
paren : Grammar state LambdaToken True Expr
paren = do
match LTOParen
e <- expr
match LTCParen
pure e
letE : Grammar state LambdaToken True Expr
letE = do
match LTLet
commit
argument <- match LTIdentifier
match LTEqual
e1 <- expr
match LTIn
e2 <- expr
pure $ Let argument e1 e2
parseLambda : List (WithBounds LambdaToken) -> Either String Expr
parseLambda toks =
case parse expr $ filter (not . ignored) toks of
Right (l, []) => Right l
Right e => Left "contains tokens that were not consumed"
Left e => Left (show e)
parse : String -> Either String Expr
parse x =
case lexLambda x of
Just toks => parseLambda toks
Nothing => Left "Failed to lex."
|
Testing out our parser gives us back the following output:
$ idris2 -p contrib LambdaCalculus.idr
Main> :exec printLn $ parse "let name = world in (\\x.hello x) name"
Right (let name = world in (\x.hello x) name)
Expression Parser¶
Idris 2 also comes with a very convenient expression parser that is
aware of precedence and associativity in Text.Parser.Expression
.
The main function called buildExpressionParser
takes in an OperatorTable
and a
Grammar
that represents the terms, and returns a parsed expression. The magic comes from
the OperatorTable
since this table defines all the operators, the grammars for those operators,
the precedence, and the associativity.
An OperatorTable
is a list of lists containing the Op
type. The Op
type allows you to specify
Prefix
, Postfix
, and Infix
operators along with their grammars. Infix
also contains the
associativity called Assoc
which can specify left associativity or AssocLeft
, right
associativity assoc or AssocRight
and as being non-associative or AssocNone
.
An example of an operator table we’ll be using for the calculator is:
[
[ Infix (match CTMultiply >> pure (*)) AssocLeft
, Infix (match CTDivide >> pure (/)) AssocLeft
],
[ Infix (match CTPlus >> pure (+)) AssocLeft
, Infix (match CTMinus >> pure (-)) AssocLeft
]
]
This table defines 4 operators for mulitiplication, division, addition and subtraction.
Mulitiplication and division show up in the first table because they have higher precedence than
addition and subtraction, which show up in the second table. We’re also defining them as infix operators,
with a specific grammar and all being left associative via AssocLeft
.
Building a Calculator¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | import Data.List1
import Text.Lexer
import Text.Parser
import Text.Parser.Expression
%default total
data CalculatorTokenKind
= CTNum
| CTPlus
| CTMinus
| CTMultiply
| CTDivide
| CTOParen
| CTCParen
| CTIgnore
Eq CalculatorTokenKind where
(==) CTNum CTNum = True
(==) CTPlus CTPlus = True
(==) CTMinus CTMinus = True
(==) CTMultiply CTMultiply = True
(==) CTDivide CTDivide = True
(==) CTOParen CTOParen = True
(==) CTCParen CTCParen = True
(==) _ _ = False
Show CalculatorTokenKind where
show CTNum = "CTNum"
show CTPlus = "CTPlus"
show CTMinus = "CTMinus"
show CTMultiply = "CTMultiply"
show CTDivide = "CTDivide"
show CTOParen = "CTOParen"
show CTCParen = "CTCParen"
show CTIgnore = "CTIgnore"
CalculatorToken : Type
CalculatorToken = Token CalculatorTokenKind
Show CalculatorToken where
show (Tok kind text) = "Tok kind: " ++ show kind ++ " text: " ++ text
TokenKind CalculatorTokenKind where
TokType CTNum = Double
TokType _ = ()
tokValue CTNum s = cast s
tokValue CTPlus _ = ()
tokValue CTMinus _ = ()
tokValue CTMultiply _ = ()
tokValue CTDivide _ = ()
tokValue CTOParen _ = ()
tokValue CTCParen _ = ()
tokValue CTIgnore _ = ()
ignored : WithBounds CalculatorToken -> Bool
ignored (MkBounded (Tok CTIgnore _) _ _) = True
ignored _ = False
number : Lexer
number = digits
calculatorTokenMap : TokenMap CalculatorToken
calculatorTokenMap = toTokenMap [
(spaces, CTIgnore),
(digits, CTNum),
(exact "+", CTPlus),
(exact "-", CTMinus),
(exact "*", CTMultiply),
(exact "/", CTDivide)
]
lexCalculator : String -> Maybe (List (WithBounds CalculatorToken))
lexCalculator str =
case lex calculatorTokenMap str of
(tokens, _, _, "") => Just tokens
_ => Nothing
mutual
term : Grammar state CalculatorToken True Double
term = do
num <- match CTNum
pure num
expr : Grammar state CalculatorToken True Double
expr = buildExpressionParser [
[ Infix ((*) <$ match CTMultiply) AssocLeft
, Infix ((/) <$ match CTDivide) AssocLeft
],
[ Infix ((+) <$ match CTPlus) AssocLeft
, Infix ((-) <$ match CTMinus) AssocLeft
]
] term
parseCalculator : List (WithBounds CalculatorToken) -> Either String Double
parseCalculator toks =
case parse expr $ filter (not . ignored) toks of
Right (l, []) => Right l
Right e => Left "contains tokens that were not consumed"
Left e => Left (show e)
parse1 : String -> Either String Double
parse1 x =
case lexCalculator x of
Just toks => parseCalculator toks
Nothing => Left "Failed to lex."
|
Testing out our calculator gives us back the following output:
$ idris2 -p contrib Calculator.idr
Main> :exec printLn $ parse1 "1 + 2 - 3 * 4 / 5"
Right 0.6000000000000001