TD2: LL(1) Parsing
Top-down Parsing
CMPT 379: Compilers
Instructor: Anoop Sarkar
anoopsarkar.github.io/compilers-class
1
Parsing - Roadmap
• Parser:
– decision procedure: builds a parse tree
• Top-down vs. bottom-up
• LL(1) – Deterministic Parsing
– recursive-descent
– table-driven
• LR(k) – Deterministic Parsing
– LR(0), SLR(1), LR(1), LALR(1)
• Parsing arbitrary CFGs – Polynomial time parsing
2
Top-Down vs. Bottom Up
Grammar: S ® A B Input String: ccbca
A®c|e
B ® cbB | ca
Top-Down/leftmost Bottom-Up/rightmost
S Þ AB S®AB ccbca Ü Acbca A®c
Þ cB A®c Ü AcbB B®ca
Þ ccbB B®cbB Ü AB B®cbB
Þ ccbca B®ca ÜS S®AB
3
Leftmost derivation for
id + id * id
E®E+E EÞE+E
E®E*E Þ id + E
E®(E) Þ id + E * E
E®-E Þ id + id * E
E ® id Þ id + id * id
E Þ*lm id + E \* E
4
Predictive Top-Down Parser
• Knows which production to choose based on
single lookahead symbol
• Need LL(1) grammars
– First L: reads input Left to right
– Second L: produce Leftmost derivation
– 1: one symbol of lookahead
• Cannot have left-recursion
• Must be left-factored (no left-factors)
• Not all grammars can be made LL(1)
5
LL(1) Parser
• In recursive-descent
– for each non-terminal and input token, many
choices of production to use
– Backtracking to remove bad choices
• In LL(1)
– for each non-terminal and each token, only one
production
S ®* 𝝎 A 𝜷 and next input token: t
A ®𝜶 is the only production
𝝎𝜶𝜷
6
Left Factoring
• Consider this grammar
– E®T+E|T
– T ® id | id * T | ( E )
• Hard to predict because
– For T two productions start with id
– For E it is not clear how to predict
• The grammar must not have left-recursion
• The grammar should be left-factored
7
Left Factoring
• In general, for rules
• Left factoring is achieved by the following
grammar transformation:
8
Left Factoring
• Recall the grammar
– E®T+E|T
– T ® id | id * T | ( E )
• Factor out common prefixes for productions
– E®TX
– X®+E|ε
– T ® id Y | ( E )
– Y®*T|ε
9
Predictive Parsing Table
• Can be specified via 2D tables
• One dimension for current (leftmost) non-terminal to expand
• One dimension for next token
• Each table entry contains one production
Productions
1 E®TX + * ( ) id $
2 X®e E TX TX
3 X®+E
X +E e e
4 T®(E)
5 T ® id Y T (E) id Y
6 Y®*T e *T e e
Y
7 Y®e
10
Predictive Parsing Table
• Consider [E, id] entry
• When current non-terminal is E and the next input
is id, use production E ® T X
Productions
1 E®TX + * ( ) id $
2 X®e E TX TX
3 X®+E
X +E e e
4 T®(E)
5 T ® id Y T (E) id Y
6 Y®*T e *T e e
Y
7 Y®e
11
Predictive Parsing Table
• Consider [Y, +] entry
• When current non-terminal is Y and the next input
is + , get rid of Y
• Y can be followed by + only if Y ® e
Productions
1 E®TX + * ( ) id $
2 X®e E TX TX
3 X®+E
X +E e e
4 T®(E)
5 T ® id Y T (E) id Y
6 Y®*T e *T e e
Y
7 Y®e
12
Predictive Parsing Table
• Blank entries indicate error situations
• Consider [E, *] entry
• There is no way to derive a string starting with *
from non-terminal E
Productions
1 E®TX + * ( ) id $
2 X®e E TX TX
3 X®+E
X +E e e
4 T®(E)
5 T ® id Y T (E) id Y
6 Y®*T e *T e e
Y
7 Y®e
13
Predictive Parsing
• Method similar to recursive descent, except
– For each non-terminal S
– We look at the next token a
– And chose the production shown at entry [S,a]
• We use a stack to keep track of pending non-
terminals (frontier of parse tree)
• We reject when we encounter an error state
• We accept when we encounter end-of-input
and empty stack 14
Table-Driven Parsing
Stack: to keep track
stack.push($); stack.push(S);
of what is pending
a = input.read();
in the derivation
forever do begin
X = stack.peek();
if X = a and a = $ then return SUCCESS;
elsif X = a and a != $ then
stack.pop(X); a = input.read();
elsif X != a and X Î N and M[X,a] not empty then
stack.pop(X);
stack.push(M[X,a]); /* M[X, a] = Y1…Yn */
else ERROR! X⟶Y1…Yn
end
15
+ * ( ) id $
E TX TX
e e
Trace “id*id”
X +E
T (E) id Y
Y e *T e e
Stack Input Action E
E$ id*id$ TX
TX$ id*id$ id Y T X
id Y X $ id*id$ terminal
YX$ *id$ *T
id Y e
*TX$ *id$ terminal
TX$ id$ id Y T
*
id Y X $ id$ terminal
YX$ $ e id Y
X$ $ e
$ $ Accept! e
16
When to pick Y ® e?
Productions • Choice between Y ® * T and Y ® e
1 E®TX • FIRST(*T) = { * }
2 X®e • For Y ® e we compute FOLLOW(Y)
3 X®+E • FOLLOW(Y) = ?
4 T®(E) • FOLLOW(Y) = FOLLOW(T)
5 T ® id Y • FOLLOW(T) = ( FIRST(X) – {e} ) +
6 Y®*T FOLLOW(E)
7 Y®e • FOLLOW(T) = { + , ) , $ }
• FOLLOW(Y) = { + , ) , $ }
17
Predictive Parsing table
• Given a grammar produce the predictive parsing
table
• We need to to know for all rules A ® a | b the
lookahead symbol
• Based on the lookahead symbol the table can be
used to pick which rule to push onto the stack
• This can be done using two sets: FIRST and
FOLLOW
18
Predictive Parsing Table
• For Nonterminal A, rule A ® a, and the token t,
M[A, t] = a in two cases:
• If a ⇒* t b
– a can derive a t in the first position
– We say that t Î First(a)
• A ® a and a ⇒ * e and S ⇒ * b A t δ
– Useful if stack has A, input is t and A cannot derive t
– In this case only option is to get rid of A (by a ⇒ * e)
• Can work only if t can follow A in at least on derivation
– We say t Î Follow(A)
19
FIRST and FOLLOW
20
Conditions for LL(1)
• Necessary conditions:
– no ambiguity
– no left recursion
– Left factored grammar
• A grammar G is LL(1) if - whenever
A®a|b
1. First(a) Ç First(b) = Æ
2. a Þ* e implies !(b Þ* e)
3. a Þ* e implies First(b) Ç Follow(A) = Æ
21
ComputeFirst(a: string of symbols)
// assume a = X1 X2 X3 … Xn
if X1 Î T then First[a] := {X1}
else begin
i:=1; First[a] := ComputeFirst(X1)\{e};
while Xi Þ* e do begin
if i < n then
First[a] := First[a] È ComputeFirst(Xi+1)\{e};
else
First[a] := First[a] È {e};
i := i + 1;
end Recursion in computing FIRST
end causes problems when faced with
recursive grammar rules
22
ComputeFirst; modified
foreach X Î T do First[X] := {X};
foreach p Î P : X ® e do First[X] := {e};
repeat foreach X Î N, p : X ® Y1 Y2 Y3 … Yn do begin
i:=1;
while Yi Þ* e and i <= n do begin
First[X] := First[X] È First[Yi]\{e};
i := i+1;
end
if i = n+1 then First[X] := First[X] È {e};
until no change in First[X] for any X;
23
ComputeFirst; modified
foreach X Î T do First[X] := X;
foreach p Î P : X ® e do First[X] := {e};
repeat foreach X Î N, p : X ® Y1 Y2 Y3 … Yn do begin
i:=1; Non-recursive FIRST computation
while Yi Þ* eworksand i with
<= nleft-recursive
do begin grammars.
First[X] := First[X]
ComputesÈ First[Y
a fixedi]\{e};for FIRST[X]
point
i := i+1; for all non-terminals X in the grammar.
end But this algorithm is very inefficient.
if i = n+1 then First[X] := First[X] È {e};
until no change in First[X] for any X;
24
First Sets
Productions
First(+) = {+} First(E) = ? 1 E®TX
2 X®e
First(*) = {*} First(T) ⊆ First(E)
3 X®+E
First( ‘(‘ ) = {‘(’} First(T) = {id, ‘(‘} 4 T®(E)
First( ‘)’ ) = {‘)’} First(E) = {id, ‘(‘} 5 T ® id Y
6 Y®*T
First(id) = {id} First(X) = {+, e} 7 Y®e
First(Y) = {*, e}
25
Follow Sets
• Algorithm sketch
1. Add $ to Follow(S)
2. For each production A ⟶ a X b
• Add First(b) – {e} to Follow(X)
3. For each A ⟶ a X b where e Î First(b)
• Add Follow(A) to Follow(X)
– Repeat steps 2-3 until no follow set grows
26
ComputeFollow
Follow(S) := {$};
repeat
foreach p Î P do
case p = A ® aBb begin
Follow[B] := Follow[B] È ComputeFirst(b)\{e};
if e Î First(b) then
Follow[B] := Follow[B] È Follow[A];
end
case p = A ® aB
Follow[B] := Follow[B] È Follow[A];
until no change in any Follow[N]
27
Follow Sets. Example
Productions
Follow(E) = {$, )}
Follow(E)⊆ Follow(X) 1 E®TX
Follow(X) = {$, )}
Follow(X)⊆ Follow(E) 2 X®e
Follow(T) = {+, $, )} 3 X®+E
First(X)-{e}⊆ Follow(T)
Follow(Y) = {+, $, )} 4 T®(E)
Follow(E)⊆ Follow(T) 5 T ® id Y
Follow(‘(‘) = {(, id}
Follow(Y)⊆ Follow(T) 6 Y®*T
Follow(‘)‘) = {+,$, )}
Follow(T)⊆ Follow(Y) 7 Y®e
Follow(+) = {(, id}
Follow(*) = {(, id}
Follow(id) = {*,+,$,)}
28
Building the Parse Table
• Compute First and Follow sets
• For each production A ® a
– For each t Î First(a)
• M[A,t] = a
– If e Î First(a), for each t Î Follow(A)
• M[A,t] = a
– If e Î First(a) and $ Î Follow(a)
• M[A,$] = a
– All undefined entries are errors
29
Predictive Parsing Table
First(E) = {id, ‘(‘} First(T) = {id, ‘(‘}
Follow(E) = {$, )} Follow(T) = {+, $, )}
First(X) = {+, e} First(Y) = {*, e}
Follow(X) = {$, )} Follow(Y) = {+, $, )}
Productions
1 E®TX + * ( ) id $
2 X®e E TX TX
3 X®+E
X +E e e
4 T®(E)
5 T ® id Y T (E) id Y
6 Y®*T e e e
Y *T
7 Y®e
30
Example First/Follow
S ® AB
A®c|e Not an LL(1) grammar
B ® cbB | ca
First(A) = {c, e} Follow(A) = {c}
First(B) = {c} Follow(A) Ç
First(cbB) = First(c) = {c}
First(ca) = {c} Follow(B) = {$}
First(S) = {c} Follow(S) = {$} 31
Converting to LL(1)
S ® AB
Note that grammar
A®c|e
is regular: c? (cb)* ca
B ® cbB | ca
c (c b c b … c b) c a c c (b c b … c b c) a
(c b c b … c b) c a c (b c b … c b c) a
S ® cAa
same as:
A ® cB | B
c c? (bc)* a
B ® bcB | e 32
Verifying LL(1) using F/F sets
S ® cAa
A ® cB | B
B ® bcB | e
First(A) = {b, c, e} Follow(A) = {a}
First(B) = {b, e} Follow(B) = {a}
First(S) = {c} Follow(S) = {$}
33
Building the Parse Table
• Compute First and Follow sets
• For each production A ® a
– foreach a Î First(a) add A ® a to M[A,a]
– If e Î First(a) add A ® a to M[A,b] for each b in
Follow(A)
– If e Î First(a) add A ® a to M[A,$] if $ Î
Follow(a)
– All undefined entries are errors
34
Predictive Parsing Table
Productions
1 T ® F T’ FIRST(T) = {id, (} FOLLOW(T) = {$, )}
2 T’ ® e FIRST(T’) = {*, e} FOLLOW(T’) = {$,)}
3 T’ ® * F T’ FIRST(F) = {id, (} FOLLOW(F) = {*,$,)}
4 F ® id
5 F®(T)
* ( ) id $
T T ® F T’ T ® F T’
T’ T’ ® * F T’ T’ ® e T’ ® e
F F®(T) F ® id
35
Revisit conditions for LL(1)
• A grammar G is LL(1) iff - whenever
A®a|b
1. First(a) Ç First(b) = Æ
2. a Þ* e implies !(b Þ* e)
3. a Þ* e implies First(b) Ç Follow(A) = Æ
• No more than one entry per table field
36
Error Handling
• Reporting & Recovery
– Report as soon as possible
– Suitable error messages
– Resume after error
– Avoid cascading errors
• Phrase-level vs. Panic-mode recovery
37
Panic-Mode Recovery
• Skip tokens until synchronizing set is seen
– Follow(A)
• garbage or missing things after
– Higher-level start symbols
– First(A)
• garbage before
– Epsilon
• if nullable
– Pop/Insert terminal
• “auto-insert”
• Add “synch” actions to table
38
Summary so far
• LL(1) grammars, necessary conditions
• No left recursion
• Left-factored
• Not all languages can be generated by LL(1)
grammar
• LL(1) – Parsing: O(n) time complexity
– recursive-descent and table-driven predictive parsing
• LL(1) grammars can be parsed by simple
predictive recursive-descent parser
– Alternative: table-driven top-down parser
39
Extra Slides
40
ComputeFirst on Left-recursive Grammars
• ComputeFirst as defined earlier loops on left-
recursive grammars
• Here is an alternative algorithm for
ComputeFirst
1. Compute non left-recursive cases of FIRST
2. Create a graph of recursive cases where FIRST of a
non-terminal depends on another non-terminal
3. Compute Strongly Connected Components (SCC)
4. Compute FIRST starting from root of SCC to avoid
cycles
41
ComputeFirst on Left-recursive Grammars
• Each Strongly Connected Component can have
recursion
• But the connections between SCC means that
(by defn) what we have now is a directed acyclic
graph – hence without left recursion
• Unlike top-down LL parsing, bottom-up LR
parsing allows left-recursive grammars, so this
algorithm is useful for LR parsing
42
ComputeFirst on Left-recursive Grammars
• S ® BD | D • A ® CB | a
• D ® d | Sd • C ® Bb | e
• B ® Ab | b Compute
FIRST0[A] := {a} Strongly
S D C Connected
FIRST0[C] := {} Components
FIRST0[B] := {b} B A
FIRST0[S] := {b, d} 2 SCCs: e.g. consider B-A-C
FIRST0[D] := {d} FIRST[B] := FIRST0[B] + ComputeFirst(A)
FIRST[A] := FIRST0[A] + ComputeFirst(C )
FIRST[A] := FIRST[A] + FIRST0[B]
FIRST[C] := FIRST0[C] + FIRST0[B]
43
FIRST[C] := FIRST[C] + {e}
Examples
S®ABC S®F
A® a | ε F® A ( B ) | B A
B® b B | ε A® x | y
C®c|ε B®aB|bB|ε
Is this LL(1)? Is this LL(1)?
44
Transition Diagram
c A a
S ® cAa S:
A ® cB | B A: c B
b c B
B ® bcB | e B:
e
45