0% found this document useful (0 votes)
10 views

37 Predictive Parsing

Uploaded by

CHIRAG BARIYA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

37 Predictive Parsing

Uploaded by

CHIRAG BARIYA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Predictive Parsing

How to Construct Recursive-Descent Parsers

Tuesday, December 7, 2010

CS235 Languages and Automata

Department of Computer Science


Wellesley College

Goals of This Lecture


o Introduce predictive parsers, efficient parsers for certain
grammars in reading the first token (or first few tokens)
of input is sufficient for determining which production to apply.

o Show how predictive parsers can be implemented by a


recursive descent parser in your favorite programming language.

o Show how to construct a predictive parsing table, which determines


whether a grammar is amenable to predictive parsing.

o Study some techniques for transforming nonpredictive grammars


into p
predictive ones,, including
g removing
g ambiguity
g y and
left-recursion removal.

o Learn how to use SML’s sum-of-product datatypes to represent


tokens and parse trees.

o Learn the distinction between concrete and abstract syntax.


Predictive Parsing 37-2

1
Main Example: Intexp
As our main example, we’ll use a simple integer expression
language that we’ll call Intexp.

The abstract syntax, or logical structure, of Intexp is described


by these
h SML datatypes:
d
datatype pgm = Pgm of exp (* a program is an expression *)
and exp = Int of int (* an expression is either an integer *)
| BinApp of exp * binop * exp (* or a binary operator application *)
and binop = Add | Sub | Mul | Div (* there are four binary operators *)
We’ll explore several versions of Intexp’s concrete syntax , i.e.,
how programs, expressions, and binary operators are written down.
We’ll also consider several extensions to Intexp.

Predictive Parsing 37-3

A Token Data Type for Intexp


token data type definition

datatype binop = Add | Mul | Sub | Div


yp token = EOF ((* special
datatype p “end of
f input”
p marker
m *))
| INT of int (* INT (3) is a token, while Int(3) is an expression *)
| OP of binop
| LPAREN | RPAREN

Sample “program”
((3+4) * (42-17))

SML token list for sample program

- Scanner.stringToTokens "((3+4)*(42-17))";
val it = [LPAREN, LPAREN, INT 3, OP Add, INT 4, RPAREN, OP Mul, LPAREN,
INT 42, OP Sub, INT 17, RPAREN, RPAREN] : Token.token list
(* Note: EOF does *not* appear explicitly in the token list, but is implicitly at the end *)
Predictive Parsing 37-4

2
Our First Concrete Syntax for Intexp:
Explicitly Parenthesized Operations
P → E EOF
Productions for
Concrete Grammar E → INT(int) | ( E B E )
B→+|-|*|/

datatype pgm = Pgm of exp


SML Data Types for and exp = Int of int
Abstract Grammar | BinApp of exp * binop * exp
and binop = Add | Sub | Mul | Div

Predictive Parsing 37-5

An Example Intexp Program


Chars: ( (3+4) * (42 – 17)) Lexer (scanner)

Tokens: ( ( INT(3) + INT(4) ) * ( INT(42) - INT(17) ) )


P
Parse Tree
(Concrete E EOF
Parser
Syntax Tree)
E B E

( E B E ) * ( E B E )

INT(3) + INT(4) INT(42) - INT(17)

Abstract - ParserParens.stringToPgm "((3+4)*(42-17))";


Syntax val it = Pgm (BinApp (BinApp (Int 3,Add,Int 4),
Tree (AST): Mul,BinApp (Int 42,Sub,Int 17))): AST.pgm

Predictive Parsing 37-6

3
Predictive Parsing For Intexp
P → E EOF
E → INT(int) | ( E B E )
B→+|-|*|/

Observe that:
• expressions (E) must begin with INT(int) or (.
• programs (P) must begin with an expression (E) and so must
begin with INT(int) or (.

Predictive Parsing 37-7

Predictive Parsing Table for Intexp


Can summarize observations on previous slide with a
predictive parsing table of variables x tokens in which
at most one production is valid per entry.
Empty slots in the table indicate parsing errors.

INT(i) ( OP(b) ) EOF


P
E E→ E→
INT(num) (EBE)
B B → OP(b)

Predictive Parsing 37-8

4
Recursive Descent Parsing
From a predictive parsing table, it is possible to construct
a recursive descent parser that parses tokens according
to productions in the table.

Such a parser can “eat” (consume) or “peek” (look at without


consuming) the next token.

For each variable X in the grammar, the parser has a function,


eatX, that is responsible for consuming tokens matched by
the RHS of a production for X and returning an abstract syntax
tree for the consumed tokens. Since the RHS of a production
m contain
may nt in other
th variables,
i bl s the
th eat…
t functions
f n ti ns can
n call
ll each
h
other recursively.

We will now study the SML code for a recursive descent parser
for Intexp.

Predictive Parsing 37-9

Intexp Parser: Scanner Functions


(* We assume the existence of the following token functions,
whose implementation details we will *not* study. *)

val initScanner : string


g -> unit
(* Initialize scanner from a string, creating implicit token stream *)

val nextToken : unit -> token


(* Remove and return next token from implicit token stream *)

val peekToken: unit -> token


(* Return next token from implicit token stream without removing it *)

Predictive Parsing 37-10

5
Intexp Parsing Functions
(* Collection of mutually recursive functions for recursive descent parsing *)
fun eatPgm () = … (* : unit -> pgm. Consume all program tokens and return pgm *)
and eatExp () = … (* : unit -> exp. Consume expression tokens and return exp*)
and eatBinop () = … (* : unit -> exp. Consume binop token and return binop *)
and eat token = ((* token ->> unit.
unit Consume next token and succeed without
complaint if it’s equal to the given token.
Otherwise complain w/error. *)
let val token' = nextToken()
in if token = token' then ()
else raise Fail ("Unexpected token: wanted " ^ (Token.toString token)
^ " but got " ^ (Token.toString token'))
end

fun stringToExp str = (initScanner(str); eatExp()) (* Parse string into exp *)


fun stringToPgm str = (initScanner(str); eatPgm()) (* Parse string into pgm *)

Predictive Parsing 37-11

Intexp: Parsing Programs, Expressions, and Binops


fun eatPgm () =
let val body = eatExp()
val _ = eat EOF
in Pgm(body)
end
and eatExp p () =
let val token = nextToken()
in case token of
INT(i) => Int(i)
| LPAREN => let val exp1 = eatExp()
val bin = eatBinop()
val exp2 = eatExp()
val _ = eat RPAREN
in BinApp(exp1,bin,exp2)
end
| _ => raise
i F Fail
il ("U
("Unexpected
t d ttoken
k b begins
i exp: " ^ (T
(Token.toString
k t St i token))
t k ))
end
and eatBinop () =
let val token = nextToken()
in case token of
OP(binop) => binop
| _ => raise Fail ("Expect a binop token but got: " ^ (Token.toString token))
end Predictive Parsing 37-12

6
An Extended Language: SLiP--
SLiP-- is a subset of Appel’s straight-line programming language (SLiP).
P → S EOF
S → ID(str) := E | print E | begin SL end
Productions
P d ti s f for
Concrete Grammar SL → % | S ; SL
E → ID(str) | INT(int) | ( E B E )
B→+|-|*|/

datatype pgm = Pgm of stm


and stm = Assign of string * exp
| Print of exp
SML Data Types for | Seq of stm list
Abstract Grammar and exp = Id of string
| Int of int
| BinApp of exp * binop * exp
and binop = Add | Sub | Mul | Div
Predictive Parsing 37-13

An Example SLiP-- Program


Chars: begin x := (3+4); print ((x-1)*(x+2)); end

Tokens: begin ID(“x”) := ( INT(3) + INT(4) ) ;

print ( ( ID(“x”)
ID( x) - INT(1) ) * ( ID(“x”)
ID( x) +

INT(2) ) ) ; end EOF

P
Parse Tree:
(see full tree S EOF
on next slide)
begin
g SL end

Pgm(Seq [Assign(“x”, BinApp(Int(3),Add,Int(4))),


Abstract
Print(BinApp(BinApp(Id(“x”),Sub,Int(1)),
Syntax
Mul,
Tree (AST):
BinApp(Id(“x”),Add,Int(2))))])
Predictive Parsing 37-14

7
An Example SLiP-- Program
Parse P
Tree:
S EOF

begin SL end

S ; SL

ID(“x”) := E S ; SL

( E B E ) print E %

INT(3) + INT(4) ( E B E )

( E B E ) * ( E B E )

ID(“x”) - INT(1) ID(“x”) + INT(2)


Predictive Parsing 37-15

Predictive Parsing For SLiP--


P → S EOF
S → ID(str) := E | print E | begin SL end
SL → % | S ; SL
E → ID(str) | INT(int) | ( E B E )
B→+|-|*|/

Observe that:
• expressions (E) must begin with ID(str), INT(int), or (.
• statements (S) must begin with ID(str), p
print, or begin.
• statement lists (SL) must begin with a statement (S) and so must
begin with ID(str), print, or begin . They must end with end (a token
that is not part of the SL tree but one immediately following it).
• programs (P) must begin with a statement (S) and so must begin with
ID(str) , print , or begin.
Predictive Parsing 37-16

8
Predictive Parsing Table for SLiP--
Can summarize observations on previous slide with a
predictive parsing table of variables x tokens in which
at most one production is valid per entry.
Empty slots in the table indicate parsing errors.

ID(s) INT(i) ( OP(b) print begin end


P P→ P→ P→
S EOF S EOF S EOF
S S→ S→ S→
ID(str) := E print E begin
SL end

SL SL → SL → SL → SL → %
S ; SL S ; SL S ; SL
E E→ E→ E→
ID(str) INT(num) (EBE)

B B → OP(b)

Predictive Parsing 37-17

NULLABLE, FIRST, and FOLLOW


Predictive parsing tables like that for Slip-- are constructed
using the following notions:

Let t range over terminals,


terminals V and W range over variables,
variables
 range over terminals  variables,
and  range over sequences of terminals  variables.

• NULLABLE() is true iff  can derive the empty string (%)

• FIRST() is the set of terminals that can begin strings


derived from .

• FOLLOW(V) is the set of terminals that can immediately


follow V in some derivation.

Predictive Parsing 37-18

9
Computing NULLABLE For Variables
A variable V is NULLABLE iff
1. There is a production V → %
OR
2. There is a production V → V1…Vn
and each of V1, … , Vn is NULLABLE
(Case 1 is really a special case of 2 with n = 0.)

In general, it is necessary to compute an iterative fixed point to


determine nullability of a variable. We’ve seen this already in the
algorithm for converting a CFG to Chomsky Normal Form.

Example (from Appel 3.2) Another example:


X→a|Y S’ → S EOF

Y→%|c S → T | 0S1

Z→d|XYZ T → % | 10T
Predictive Parsing 37-19

Computing FIRST
FIRST0 (V) = {} for every variable V
For all i > 0:
• FIRSTi (t) = {t}
• FIRSTi (V) = U {FIRSTi -1 () | V →  is a p
production for V}
• FIRST(1 … j… n ) = U1 ≤ j ≤ n {FIRSTi -1 (j) | 1, …, j-1 are all nullable}

Again, this is determined by an iterative fixed point computation:


calculate FIRSTi until k such that FIRSTk = FIRSTk-1.

X→a|Y S’ → S EOF
Y→%|c S → T | 0S1
Z→d|XYZ T → % | 10T

Predictive Parsing 37-20

10
Computing FOLLOW
FOLLOW0 (V) = {} for every variable V
For all i > 0:
FOLLOWi(V) =
U {FIRST(j) | W → V 1 … j … n is a production in the grammar
and 1, …, j-1 are all nullable variables}
U U {FOLLOWi-1(W) | W →  V1… n is a production in the grammar
and 1, …, n are all nullable variables}

Again, this is determined by an iterative fixed point computation:


calculate FOLLOWi until k such that FOLLOWk = FOLLOWk-1.

X→a|Y S’ → S EOF
Y→%|c S → T | 0S1
Z→d|XYZ T → % | 10T

Predictive Parsing 37-21

Example: Slip--
Calculate NULLABLE, FIRST, and FOLLOW for the
variables in the Slip-- grammar.

P → S EOF
S → ID(str) := E | print E | begin SL end
SL → % | S ; SL
E → ID(str) | INT(int) | ( E B E )
B→+|-|*|/

Predictive Parsing 37-22

11
Constructing Predictive Parsing Tables
A predictive parsing table has rows labeled by variables and
columns labeled by terminals.
To construct a predictive parsing table for a given grammar,
do the following for each production V → :
• For
F each
h t in
i FIRST(
FIRST(),
) enter V →  in
i row V,
V column
l t.
• If NULLABLE(), for each t in FOLLOW(V), enter V →  in row V, column t

ID(s) INT(i) ( OP(b) print begin end


P

SL

Predictive Parsing 37-23

Slip--: Recursive Descent Parser


(* Collection of mutually recursive functions for recursive descent parsing *)
fun eatPgm () = … (* : unit -> pgm. Consume all program tokens and return pgm *)
and eatStm () = … (* : unit -> stm. Consume all statement tokens and return stm *)
and eatStm List() = … (* : unit -> stm list. Consume all tokens for a sequence of
statements and return stm list *))
and eatExp () = … (* : unit -> exp. Consume expression tokens and return exp*)
and eatBinop () = … (* : unit -> exp. Consume binop token and return binop *)
and eat token = (* token -> unit. Consume next token and succeed without complaint
if it’s equal to the given token. Otherwise complain w/error. *)
let val token' = nextToken()
in if token = token' then ()
else raise Fail ("Unexpected token: wanted " ^ (Token.toString token)
^ " but got " ^ (Token.toString token'))
end

fun stringToExp str = (initScanner(str); eatExp()) (* Parse string into exp *)


fun stringToStm str = (initScanner(str); eatStm()) (* Parse string into stm *)
fun stringToPgm str = (initScanner(str); eatPgm()) (* Parse string into pgm *)

Predictive Parsing 37-24

12
Slip-- Parsing: Top-Level Function Examples
- stringToExp "((1+2)*(3-4))";
val it = BinApp (BinApp (Int 1,Add,Int 2),Mul,BinApp (Int 3,Sub,Int 4)) : exp

- stringToStm "x := (1+2);";


vall it = Assign
A i (" ("x",BinApp
" Bi A (I(Intt 11,Add,Int
Add I t 2)) : stm
t

- stringToPgm "begin x := (3+4); print ((x-1)*(x+2)); end";


val it =
Pgm
(Seq
[Assign ("x",BinApp (Int 3,Add,Int 4)),
Print
(BinApp (BinApp (Id "x"
x ,Sub,Int
Sub Int 1)
1),Mul,BinApp
Mul BinApp (Id "x"
x ,Add,Int
Add Int 2)))])
: pgm

Predictive Parsing 37-25

Slip-- Parsing: Programs and Statements


fun eatPgm () =
let val body = eatStm()
val _ = eat EOF
in Pgm(body)
end
n

and eatStm () =
let val token = nextToken()
in case token of
ID(str) => let val _ = eat GETS
val rhs = eatExp()
in Assign(str,rhs)
end
| PRINT => let val arg = eatExp()
i P
in Print(arg)
i ( )
end
| BEGIN => let val stms = eatStmList()
val _ = eat END
in Seq(stms)
end
| _ => raise Fail ("Unexpected token begins stm: " ^ (Token.toString token))
end
Predictive Parsing 37-26

13
Slip-- Parsing: Statement Lists

and eatStmList () =
let val token = peekToken() (* Must peek rather than eat
(the essence of FOLLOW!) *)
in case token of f
END => []
| _ => let val stm = eatStm()
val _ = eat SEMI
val stms = eatStmList()
in stm::stms
end
end

Predictive Parsing 37-27

Slip-- Parsing: Expressions


and eatExp () =
let val token = nextToken()
in case token of
ID(str) => Id(str)
| INT(i)
N ( ) => Int(i)
nt( )
| LPAREN => let val exp1 = eatExp()
val bin = eatBinop()
val exp2 = eatExp()
val _ = eat RPAREN
in BinApp(exp1,bin,exp2)
end
| _ => raise Fail ("Unexpected token begins exp: " ^ (Token.toString token))
end

and
d eatBinop
Bi () =
let val token = nextToken()
in case token of
OP(binop) => binop
| _ => raise Fail ("Expect a binop token but got: " ^ (Token.toString token))
end

Predictive Parsing 37-28

14
More Practice with NULLABLE, FIRST, & FOLLOW
NULLABLE FIRST FOLLOW
S’ → S EOF
S’
S → T | 0S1
S
T → % | 10T
T

0 1 EOF
S’
S
T

Parsing not predictive since some table slots now have


multiple entries!

Predictive Parsing 37-29

Adding Extra Lookahead


S’ → S EOF
S → T | 0S1
T → % | 10T

Sometimes predictivity can be re-established by adding


extra lookahead:

0 10 11 1 EOF EOF
S’

Predictive Parsing 37-30

15
LL(k) Grammars
An LL(k) grammar is one that has a predictive parsing
table with k symbols of lookahead.

• The SLiP
SLiP-- grammar is LL(1).

• The S’/S/T grammar is LL(2) but not LL(1).

In LL,

• the first L means the tokens are consumed left-to-right.

• th
the second
d L means that
th t the
th parse tree
t is
i constructed
t t d
in the manner of a leftmost derivation.

Predictive Parsing 37-31

Expressions with Prefix Syntax


Suppose we change Intexp/Slip-- expressions to use prefix syntax:

E → ID(str) | INT(int) | B E E

Eg , *-x1+y2
E.g.

Parsing is still predictive:

ID(s) INT(i) OP(b) print begin end

E E → ID(str) E → INT(num) E→BEE


B B → OP(b)

Predictive Parsing 37-32

16
Postfix Syntax for Expressions
Suppose we change Intexp/Slip-- expressions to use postfix syntax:

E → ID(str) | INT(int) | E E B

Eg , x1–y2+*
E.g.

Parsing is no longer predictive since some table slots now have


multiple entries:
ID(s) INT(i) OP(b) print begin end
E E → ID(str) E → INT(num)
E→EEB E→EEB
B B → OP(b)

Postfix expressions are fundamentally not predictive (not LL(k) for


any k), so there’s nothing we can do to parse them predictively.

But we’ll see later that we can parse them with a shift/reduce parser.
Predictive Parsing 37-33

Infix Syntax for Expressions


Suppose we change Slip-- expressions to use infix syntax
without required parens (but with optional ones)

E → ID(str) | INT(int) | E B E | ( E )

E.g. x - 1 * y + 2

Parsing is no longer predictive:


ID(s) INT(i) OP(b) ( print begin end
E E → ID(str) E → INT(num) E→(E)
E→EBE E→EBE
B B → OP(b)

This is not surprising: this grammar is ambiguous, and


no ambiguous grammar can be uniquely parsed with
any deterministic parsing algorithm.

Predictive Parsing 37-34

17
Digression: Ambiguity (Lec #24 Review)
A CFG is ambiguous if there is more than one parse tree for
a string that it generates.

S→% This is an example of an ambiguous grammar.


S → SS Th st
The string
i abba
bb has
h s an infinite
i fi it number
b off parse
s ttrees!
s!
S → aSb
S → bSa Here are a few of them:

S S S

S S S S S S
a S b b S a a S b S b S a
S S S
% % % b S a % a S b S S S S

% % % % % %

Predictive Parsing 37-35

Ambiguity Can Affect Meaning


Ambiguity can affect the meaning of a phrase in both natural
languages and programming languages.

Here’s are some natural language examples:


High school principal
Fruit flies like bananas.
A woman without her man is nothing.

A classic example in programming languages is arithmetic expressions:

E → ID(str) | INT(int) | E B E | ( E )
B→+|-|*|/

Predictive Parsing 37-36

18
Arithmetic Expressions: Precedence
E → ID(str) | INT(int) | E B E | ( E )
B→+|-|*|/

What does 2 * 3 + 4 mean?

E E

E B E E B E

Int(2) * E B E E B E + Int(4)

Int(3) + Int(4) Int(2) * Int(3)

Predictive Parsing 37-37

Arithmetic Expressions: Associativity

E → ID(str) | INT(int) | E B E | ( E )
B→+|-|*|/

What does 2 - 3 - 4 mean?

E E

E B E E B E

Int(2) - E B E E B E - Int(4)

Int(3) - Int(4) Int(2) - Int(3)

Predictive Parsing 37-38

19
Precedence Levels
We can transform the grammar to express precedence levels:

E→T|E+E|E–E Expressions
T→F|T*T|T/T Terms
F → ID(str) | INT(int) | ( E ) Factors

Now there is only one parse tree for 2 * 3 + 4. Why? What is it?

Predictive Parsing 37-39

Specifying Left Associativity


We can further transform the grammar to express left associativity.
E→T|E+T|E–T Expressions
T→F|T*F|T/F Terms
F → ID(str) | INT(int) | ( E ) Factors
Now there is only one parse tree for 2 - 3 - 4. Why? What is it?

How would we specify right associativity?

Predictive Parsing 37-40

20
Another Classic Example: Dangling Else
Stm → if Exp then Stm else Stm
Stm → if Exp then Stm
Stm → … other productions for statements …

There are two parse trees for the following statement.


What are they?

if Exp1 then if Exp2 then Stm1 else Stm2

Predictive Parsing 37-41

Fixing the Dangling Else


Stm → MaybeElseAfter | NoElseAfter
MaybeElseAfter → if Exp then MaybeElseAfter else MaybeElseAfter
MaybeElseAfter → … other productions for statements…
NoElseAfter → if Exp then Stm
NoElseAfter → if Exp then MaybeElseAfter else NoElseAfter

Now there is only one parse tree for the following statement.
What is it?

if Exp1 then if Exp2 then Stm1 else Stm2

Predictive Parsing 37-42

21
Back to Predictive Parsing:
Removing Ambiguity May not Help
Suppose we use an unambiguous infix grammar for arithmetic:
E→T|E+T|E–T Expressions
T→F|T*F|T/F Terms
F → ID(str) | INT(int) | ( E ) Factors
Parsing is still not predictive due to left recursion in E and T:
ID(s) INT(i) OP(b) ( print begin end
E E→T E→T E→T
E→E+T E→E+T E→E+T
E→E-T E→E-T E→E-T
T T→F T→F T→F
T→T*F T→T*F T→T*F
T→T/ F T→T/ F T→T/ F
F F → ID(str) F → INT(num) F→(E)

Predictive Parsing 37-43

Left Recursion Removal


Sometimes we can transform a grammar to remove left recursion
(parse trees are transformed correspondingly).

E→T|E+T|E–T
T→F|T*F|T/F
F → ID(str) | INT(int) | ( E )

E → T E’
E’ → % | + T E’ | - T E’
T→F T’
T’→%|*FT’|/FT’
F → ID(str) | INT(int) | ( E )

See Appel 3.2 for a general description of this transformation.


You will use this transformation in PS10.
Predictive Parsing 37-44

22
The Transformed Grammar is Predictive!
E → T E’
E’ → % | + T E’ | - T E’
T→F T’
T’ → % | * F T ’ | / F T ’
F → ID(str) | INT(int) | ( E )

ID(s) INT(i) + * ( ) ; EOF


E E → T E’ E → T E’ E → T E’
E’ E’ → E’ → % E’ → % E’ → %
+TEE’
T T → F T’ T → F T’ T → F T’
T’ T’ → % T’ → T’ → % T’ → % T’ → %
* F T’
F F→ F→ F→(E)
ID(str) INT(num)
Predictive Parsing 37-45

Transforming Parse Trees


The parse tree from the transformed grammar can be transformed
back to the untransformed grammar. (It’s hard to parse linear
sequence of tokens into trees, but it’s easy to transform trees!)
E.g. 2 * 3 + 4
E

T E’ E

T’ OP(+) T E’ E OP(+) T
F

INT(2) OP(*) F T’ F T’ % T F

INT(3) % INT(4) % T OP(*) F INT(4)

F INT(3)

INT(2)
Predictive Parsing 37-46

23

You might also like