37 Predictive Parsing
37 Predictive Parsing
1
Main Example: Intexp
As our main example, we’ll use a simple integer expression
language that we’ll call Intexp.
Sample “program”
((3+4) * (42-17))
- Scanner.stringToTokens "((3+4)*(42-17))";
val it = [LPAREN, LPAREN, INT 3, OP Add, INT 4, RPAREN, OP Mul, LPAREN,
INT 42, OP Sub, INT 17, RPAREN, RPAREN] : Token.token list
(* Note: EOF does *not* appear explicitly in the token list, but is implicitly at the end *)
Predictive Parsing 37-4
2
Our First Concrete Syntax for Intexp:
Explicitly Parenthesized Operations
P → E EOF
Productions for
Concrete Grammar E → INT(int) | ( E B E )
B→+|-|*|/
( E B E ) * ( E B E )
3
Predictive Parsing For Intexp
P → E EOF
E → INT(int) | ( E B E )
B→+|-|*|/
Observe that:
• expressions (E) must begin with INT(int) or (.
• programs (P) must begin with an expression (E) and so must
begin with INT(int) or (.
4
Recursive Descent Parsing
From a predictive parsing table, it is possible to construct
a recursive descent parser that parses tokens according
to productions in the table.
We will now study the SML code for a recursive descent parser
for Intexp.
5
Intexp Parsing Functions
(* Collection of mutually recursive functions for recursive descent parsing *)
fun eatPgm () = … (* : unit -> pgm. Consume all program tokens and return pgm *)
and eatExp () = … (* : unit -> exp. Consume expression tokens and return exp*)
and eatBinop () = … (* : unit -> exp. Consume binop token and return binop *)
and eat token = ((* token ->> unit.
unit Consume next token and succeed without
complaint if it’s equal to the given token.
Otherwise complain w/error. *)
let val token' = nextToken()
in if token = token' then ()
else raise Fail ("Unexpected token: wanted " ^ (Token.toString token)
^ " but got " ^ (Token.toString token'))
end
6
An Extended Language: SLiP--
SLiP-- is a subset of Appel’s straight-line programming language (SLiP).
P → S EOF
S → ID(str) := E | print E | begin SL end
Productions
P d ti s f for
Concrete Grammar SL → % | S ; SL
E → ID(str) | INT(int) | ( E B E )
B→+|-|*|/
print ( ( ID(“x”)
ID( x) - INT(1) ) * ( ID(“x”)
ID( x) +
P
Parse Tree:
(see full tree S EOF
on next slide)
begin
g SL end
7
An Example SLiP-- Program
Parse P
Tree:
S EOF
begin SL end
S ; SL
ID(“x”) := E S ; SL
( E B E ) print E %
INT(3) + INT(4) ( E B E )
( E B E ) * ( E B E )
Observe that:
• expressions (E) must begin with ID(str), INT(int), or (.
• statements (S) must begin with ID(str), p
print, or begin.
• statement lists (SL) must begin with a statement (S) and so must
begin with ID(str), print, or begin . They must end with end (a token
that is not part of the SL tree but one immediately following it).
• programs (P) must begin with a statement (S) and so must begin with
ID(str) , print , or begin.
Predictive Parsing 37-16
8
Predictive Parsing Table for SLiP--
Can summarize observations on previous slide with a
predictive parsing table of variables x tokens in which
at most one production is valid per entry.
Empty slots in the table indicate parsing errors.
SL SL → SL → SL → SL → %
S ; SL S ; SL S ; SL
E E→ E→ E→
ID(str) INT(num) (EBE)
B B → OP(b)
9
Computing NULLABLE For Variables
A variable V is NULLABLE iff
1. There is a production V → %
OR
2. There is a production V → V1…Vn
and each of V1, … , Vn is NULLABLE
(Case 1 is really a special case of 2 with n = 0.)
Y→%|c S → T | 0S1
Z→d|XYZ T → % | 10T
Predictive Parsing 37-19
Computing FIRST
FIRST0 (V) = {} for every variable V
For all i > 0:
• FIRSTi (t) = {t}
• FIRSTi (V) = U {FIRSTi -1 () | V → is a p
production for V}
• FIRST(1 … j… n ) = U1 ≤ j ≤ n {FIRSTi -1 (j) | 1, …, j-1 are all nullable}
X→a|Y S’ → S EOF
Y→%|c S → T | 0S1
Z→d|XYZ T → % | 10T
10
Computing FOLLOW
FOLLOW0 (V) = {} for every variable V
For all i > 0:
FOLLOWi(V) =
U {FIRST(j) | W → V 1 … j … n is a production in the grammar
and 1, …, j-1 are all nullable variables}
U U {FOLLOWi-1(W) | W → V1… n is a production in the grammar
and 1, …, n are all nullable variables}
X→a|Y S’ → S EOF
Y→%|c S → T | 0S1
Z→d|XYZ T → % | 10T
Example: Slip--
Calculate NULLABLE, FIRST, and FOLLOW for the
variables in the Slip-- grammar.
P → S EOF
S → ID(str) := E | print E | begin SL end
SL → % | S ; SL
E → ID(str) | INT(int) | ( E B E )
B→+|-|*|/
11
Constructing Predictive Parsing Tables
A predictive parsing table has rows labeled by variables and
columns labeled by terminals.
To construct a predictive parsing table for a given grammar,
do the following for each production V → :
• For
F each
h t in
i FIRST(
FIRST(),
) enter V → in
i row V,
V column
l t.
• If NULLABLE(), for each t in FOLLOW(V), enter V → in row V, column t
SL
12
Slip-- Parsing: Top-Level Function Examples
- stringToExp "((1+2)*(3-4))";
val it = BinApp (BinApp (Int 1,Add,Int 2),Mul,BinApp (Int 3,Sub,Int 4)) : exp
and eatStm () =
let val token = nextToken()
in case token of
ID(str) => let val _ = eat GETS
val rhs = eatExp()
in Assign(str,rhs)
end
| PRINT => let val arg = eatExp()
i P
in Print(arg)
i ( )
end
| BEGIN => let val stms = eatStmList()
val _ = eat END
in Seq(stms)
end
| _ => raise Fail ("Unexpected token begins stm: " ^ (Token.toString token))
end
Predictive Parsing 37-26
13
Slip-- Parsing: Statement Lists
and eatStmList () =
let val token = peekToken() (* Must peek rather than eat
(the essence of FOLLOW!) *)
in case token of f
END => []
| _ => let val stm = eatStm()
val _ = eat SEMI
val stms = eatStmList()
in stm::stms
end
end
and
d eatBinop
Bi () =
let val token = nextToken()
in case token of
OP(binop) => binop
| _ => raise Fail ("Expect a binop token but got: " ^ (Token.toString token))
end
14
More Practice with NULLABLE, FIRST, & FOLLOW
NULLABLE FIRST FOLLOW
S’ → S EOF
S’
S → T | 0S1
S
T → % | 10T
T
0 1 EOF
S’
S
T
0 10 11 1 EOF EOF
S’
15
LL(k) Grammars
An LL(k) grammar is one that has a predictive parsing
table with k symbols of lookahead.
• The SLiP
SLiP-- grammar is LL(1).
In LL,
• th
the second
d L means that
th t the
th parse tree
t is
i constructed
t t d
in the manner of a leftmost derivation.
E → ID(str) | INT(int) | B E E
Eg , *-x1+y2
E.g.
16
Postfix Syntax for Expressions
Suppose we change Intexp/Slip-- expressions to use postfix syntax:
E → ID(str) | INT(int) | E E B
Eg , x1–y2+*
E.g.
But we’ll see later that we can parse them with a shift/reduce parser.
Predictive Parsing 37-33
E → ID(str) | INT(int) | E B E | ( E )
E.g. x - 1 * y + 2
17
Digression: Ambiguity (Lec #24 Review)
A CFG is ambiguous if there is more than one parse tree for
a string that it generates.
S S S
S S S S S S
a S b b S a a S b S b S a
S S S
% % % b S a % a S b S S S S
% % % % % %
E → ID(str) | INT(int) | E B E | ( E )
B→+|-|*|/
18
Arithmetic Expressions: Precedence
E → ID(str) | INT(int) | E B E | ( E )
B→+|-|*|/
E E
E B E E B E
Int(2) * E B E E B E + Int(4)
E → ID(str) | INT(int) | E B E | ( E )
B→+|-|*|/
E E
E B E E B E
Int(2) - E B E E B E - Int(4)
19
Precedence Levels
We can transform the grammar to express precedence levels:
E→T|E+E|E–E Expressions
T→F|T*T|T/T Terms
F → ID(str) | INT(int) | ( E ) Factors
Now there is only one parse tree for 2 * 3 + 4. Why? What is it?
20
Another Classic Example: Dangling Else
Stm → if Exp then Stm else Stm
Stm → if Exp then Stm
Stm → … other productions for statements …
Now there is only one parse tree for the following statement.
What is it?
21
Back to Predictive Parsing:
Removing Ambiguity May not Help
Suppose we use an unambiguous infix grammar for arithmetic:
E→T|E+T|E–T Expressions
T→F|T*F|T/F Terms
F → ID(str) | INT(int) | ( E ) Factors
Parsing is still not predictive due to left recursion in E and T:
ID(s) INT(i) OP(b) ( print begin end
E E→T E→T E→T
E→E+T E→E+T E→E+T
E→E-T E→E-T E→E-T
T T→F T→F T→F
T→T*F T→T*F T→T*F
T→T/ F T→T/ F T→T/ F
F F → ID(str) F → INT(num) F→(E)
E→T|E+T|E–T
T→F|T*F|T/F
F → ID(str) | INT(int) | ( E )
E → T E’
E’ → % | + T E’ | - T E’
T→F T’
T’→%|*FT’|/FT’
F → ID(str) | INT(int) | ( E )
22
The Transformed Grammar is Predictive!
E → T E’
E’ → % | + T E’ | - T E’
T→F T’
T’ → % | * F T ’ | / F T ’
F → ID(str) | INT(int) | ( E )
T E’ E
T’ OP(+) T E’ E OP(+) T
F
INT(2) OP(*) F T’ F T’ % T F
F INT(3)
INT(2)
Predictive Parsing 37-46
23