0% found this document useful (0 votes)
22 views45 pages

td2 LL - 1 Parsing

Uploaded by

arinade000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views45 pages

td2 LL - 1 Parsing

Uploaded by

arinade000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

TD2: LL(1) Parsing

Top-down Parsing

CMPT 379: Compilers


Instructor: Anoop Sarkar
anoopsarkar.github.io/compilers-class

1
Parsing - Roadmap
• Parser:
– decision procedure: builds a parse tree
• Top-down vs. bottom-up
• LL(1) – Deterministic Parsing
– recursive-descent
– table-driven
• LR(k) – Deterministic Parsing
– LR(0), SLR(1), LR(1), LALR(1)
• Parsing arbitrary CFGs – Polynomial time parsing
2
Top-Down vs. Bottom Up
Grammar: S ® A B Input String: ccbca
A®c|e
B ® cbB | ca
Top-Down/leftmost Bottom-Up/rightmost
S Þ AB S®AB ccbca Ü Acbca A®c
Þ cB A®c Ü AcbB B®ca
Þ ccbB B®cbB Ü AB B®cbB
Þ ccbca B®ca ÜS S®AB
3
Leftmost derivation for
id + id * id
E®E+E EÞE+E
E®E*E Þ id + E
E®(E) Þ id + E * E
E®-E Þ id + id * E
E ® id Þ id + id * id

E Þ*lm id + E \* E
4
Predictive Top-Down Parser
• Knows which production to choose based on
single lookahead symbol
• Need LL(1) grammars
– First L: reads input Left to right
– Second L: produce Leftmost derivation
– 1: one symbol of lookahead
• Cannot have left-recursion
• Must be left-factored (no left-factors)
• Not all grammars can be made LL(1)
5
LL(1) Parser
• In recursive-descent
– for each non-terminal and input token, many
choices of production to use
– Backtracking to remove bad choices
• In LL(1)
– for each non-terminal and each token, only one
production
S ®* 𝝎 A 𝜷 and next input token: t
A ®𝜶 is the only production
𝝎𝜶𝜷
6
Left Factoring
• Consider this grammar
– E®T+E|T
– T ® id | id * T | ( E )
• Hard to predict because
– For T two productions start with id
– For E it is not clear how to predict
• The grammar must not have left-recursion
• The grammar should be left-factored
7
Left Factoring
• In general, for rules

• Left factoring is achieved by the following


grammar transformation:

8
Left Factoring
• Recall the grammar
– E®T+E|T
– T ® id | id * T | ( E )
• Factor out common prefixes for productions
– E®TX
– X®+E|ε
– T ® id Y | ( E )
– Y®*T|ε
9
Predictive Parsing Table
• Can be specified via 2D tables
• One dimension for current (leftmost) non-terminal to expand
• One dimension for next token
• Each table entry contains one production
Productions
1 E®TX + * ( ) id $
2 X®e E TX TX
3 X®+E
X +E e e
4 T®(E)
5 T ® id Y T (E) id Y
6 Y®*T e *T e e
Y
7 Y®e
10
Predictive Parsing Table
• Consider [E, id] entry
• When current non-terminal is E and the next input
is id, use production E ® T X

Productions
1 E®TX + * ( ) id $
2 X®e E TX TX
3 X®+E
X +E e e
4 T®(E)
5 T ® id Y T (E) id Y
6 Y®*T e *T e e
Y
7 Y®e
11
Predictive Parsing Table
• Consider [Y, +] entry
• When current non-terminal is Y and the next input
is + , get rid of Y
• Y can be followed by + only if Y ® e
Productions
1 E®TX + * ( ) id $
2 X®e E TX TX
3 X®+E
X +E e e
4 T®(E)
5 T ® id Y T (E) id Y
6 Y®*T e *T e e
Y
7 Y®e
12
Predictive Parsing Table
• Blank entries indicate error situations
• Consider [E, *] entry
• There is no way to derive a string starting with *
from non-terminal E
Productions
1 E®TX + * ( ) id $
2 X®e E TX TX
3 X®+E
X +E e e
4 T®(E)
5 T ® id Y T (E) id Y
6 Y®*T e *T e e
Y
7 Y®e
13
Predictive Parsing
• Method similar to recursive descent, except
– For each non-terminal S
– We look at the next token a
– And chose the production shown at entry [S,a]
• We use a stack to keep track of pending non-
terminals (frontier of parse tree)
• We reject when we encounter an error state
• We accept when we encounter end-of-input
and empty stack 14
Table-Driven Parsing
Stack: to keep track
stack.push($); stack.push(S);
of what is pending
a = input.read();
in the derivation
forever do begin
X = stack.peek();
if X = a and a = $ then return SUCCESS;
elsif X = a and a != $ then
stack.pop(X); a = input.read();
elsif X != a and X Î N and M[X,a] not empty then
stack.pop(X);
stack.push(M[X,a]); /* M[X, a] = Y1…Yn */
else ERROR! X⟶Y1…Yn
end
15
+ * ( ) id $
E TX TX
e e

Trace “id*id”
X +E
T (E) id Y
Y e *T e e

Stack Input Action E


E$ id*id$ TX
TX$ id*id$ id Y T X
id Y X $ id*id$ terminal
YX$ *id$ *T
id Y e
*TX$ *id$ terminal
TX$ id$ id Y T
*
id Y X $ id$ terminal
YX$ $ e id Y
X$ $ e
$ $ Accept! e
16
When to pick Y ® e?
Productions • Choice between Y ® * T and Y ® e
1 E®TX • FIRST(*T) = { * }
2 X®e • For Y ® e we compute FOLLOW(Y)
3 X®+E • FOLLOW(Y) = ?
4 T®(E) • FOLLOW(Y) = FOLLOW(T)
5 T ® id Y • FOLLOW(T) = ( FIRST(X) – {e} ) +
6 Y®*T FOLLOW(E)
7 Y®e • FOLLOW(T) = { + , ) , $ }
• FOLLOW(Y) = { + , ) , $ }

17
Predictive Parsing table
• Given a grammar produce the predictive parsing
table
• We need to to know for all rules A ® a | b the
lookahead symbol
• Based on the lookahead symbol the table can be
used to pick which rule to push onto the stack
• This can be done using two sets: FIRST and
FOLLOW

18
Predictive Parsing Table
• For Nonterminal A, rule A ® a, and the token t,
M[A, t] = a in two cases:
• If a ⇒* t b
– a can derive a t in the first position
– We say that t Î First(a)
• A ® a and a ⇒ * e and S ⇒ * b A t δ
– Useful if stack has A, input is t and A cannot derive t
– In this case only option is to get rid of A (by a ⇒ * e)
• Can work only if t can follow A in at least on derivation
– We say t Î Follow(A)
19
FIRST and FOLLOW

20
Conditions for LL(1)
• Necessary conditions:
– no ambiguity
– no left recursion
– Left factored grammar
• A grammar G is LL(1) if - whenever
A®a|b
1. First(a) Ç First(b) = Æ
2. a Þ* e implies !(b Þ* e)
3. a Þ* e implies First(b) Ç Follow(A) = Æ
21
ComputeFirst(a: string of symbols)
// assume a = X1 X2 X3 … Xn
if X1 Î T then First[a] := {X1}
else begin
i:=1; First[a] := ComputeFirst(X1)\{e};
while Xi Þ* e do begin
if i < n then
First[a] := First[a] È ComputeFirst(Xi+1)\{e};
else
First[a] := First[a] È {e};
i := i + 1;
end Recursion in computing FIRST
end causes problems when faced with
recursive grammar rules
22
ComputeFirst; modified
foreach X Î T do First[X] := {X};
foreach p Î P : X ® e do First[X] := {e};
repeat foreach X Î N, p : X ® Y1 Y2 Y3 … Yn do begin
i:=1;
while Yi Þ* e and i <= n do begin
First[X] := First[X] È First[Yi]\{e};
i := i+1;
end
if i = n+1 then First[X] := First[X] È {e};
until no change in First[X] for any X;
23
ComputeFirst; modified
foreach X Î T do First[X] := X;
foreach p Î P : X ® e do First[X] := {e};
repeat foreach X Î N, p : X ® Y1 Y2 Y3 … Yn do begin
i:=1; Non-recursive FIRST computation
while Yi Þ* eworksand i with
<= nleft-recursive
do begin grammars.
First[X] := First[X]
ComputesÈ First[Y
a fixedi]\{e};for FIRST[X]
point
i := i+1; for all non-terminals X in the grammar.
end But this algorithm is very inefficient.
if i = n+1 then First[X] := First[X] È {e};
until no change in First[X] for any X;
24
First Sets
Productions
First(+) = {+} First(E) = ? 1 E®TX
2 X®e
First(*) = {*} First(T) ⊆ First(E)
3 X®+E
First( ‘(‘ ) = {‘(’} First(T) = {id, ‘(‘} 4 T®(E)

First( ‘)’ ) = {‘)’} First(E) = {id, ‘(‘} 5 T ® id Y


6 Y®*T
First(id) = {id} First(X) = {+, e} 7 Y®e
First(Y) = {*, e}

25
Follow Sets
• Algorithm sketch
1. Add $ to Follow(S)
2. For each production A ⟶ a X b
• Add First(b) – {e} to Follow(X)
3. For each A ⟶ a X b where e Î First(b)
• Add Follow(A) to Follow(X)
– Repeat steps 2-3 until no follow set grows

26
ComputeFollow
Follow(S) := {$};
repeat
foreach p Î P do
case p = A ® aBb begin
Follow[B] := Follow[B] È ComputeFirst(b)\{e};
if e Î First(b) then
Follow[B] := Follow[B] È Follow[A];
end
case p = A ® aB
Follow[B] := Follow[B] È Follow[A];
until no change in any Follow[N]
27
Follow Sets. Example
Productions
Follow(E) = {$, )}
Follow(E)⊆ Follow(X) 1 E®TX
Follow(X) = {$, )}
Follow(X)⊆ Follow(E) 2 X®e
Follow(T) = {+, $, )} 3 X®+E
First(X)-{e}⊆ Follow(T)
Follow(Y) = {+, $, )} 4 T®(E)
Follow(E)⊆ Follow(T) 5 T ® id Y
Follow(‘(‘) = {(, id}
Follow(Y)⊆ Follow(T) 6 Y®*T
Follow(‘)‘) = {+,$, )}
Follow(T)⊆ Follow(Y) 7 Y®e
Follow(+) = {(, id}
Follow(*) = {(, id}
Follow(id) = {*,+,$,)}

28
Building the Parse Table
• Compute First and Follow sets
• For each production A ® a
– For each t Î First(a)
• M[A,t] = a
– If e Î First(a), for each t Î Follow(A)
• M[A,t] = a
– If e Î First(a) and $ Î Follow(a)
• M[A,$] = a
– All undefined entries are errors
29
Predictive Parsing Table
First(E) = {id, ‘(‘} First(T) = {id, ‘(‘}
Follow(E) = {$, )} Follow(T) = {+, $, )}
First(X) = {+, e} First(Y) = {*, e}
Follow(X) = {$, )} Follow(Y) = {+, $, )}
Productions
1 E®TX + * ( ) id $
2 X®e E TX TX
3 X®+E
X +E e e
4 T®(E)
5 T ® id Y T (E) id Y
6 Y®*T e e e
Y *T
7 Y®e
30
Example First/Follow
S ® AB
A®c|e Not an LL(1) grammar

B ® cbB | ca
First(A) = {c, e} Follow(A) = {c}
First(B) = {c} Follow(A) Ç
First(cbB) = First(c) = {c}
First(ca) = {c} Follow(B) = {$}
First(S) = {c} Follow(S) = {$} 31
Converting to LL(1)
S ® AB
Note that grammar
A®c|e
is regular: c? (cb)* ca
B ® cbB | ca
c (c b c b … c b) c a c c (b c b … c b c) a
(c b c b … c b) c a c (b c b … c b c) a
S ® cAa
same as:
A ® cB | B
c c? (bc)* a
B ® bcB | e 32
Verifying LL(1) using F/F sets
S ® cAa
A ® cB | B
B ® bcB | e

First(A) = {b, c, e} Follow(A) = {a}


First(B) = {b, e} Follow(B) = {a}
First(S) = {c} Follow(S) = {$}

33
Building the Parse Table
• Compute First and Follow sets
• For each production A ® a
– foreach a Î First(a) add A ® a to M[A,a]
– If e Î First(a) add A ® a to M[A,b] for each b in
Follow(A)
– If e Î First(a) add A ® a to M[A,$] if $ Î
Follow(a)
– All undefined entries are errors

34
Predictive Parsing Table
Productions
1 T ® F T’ FIRST(T) = {id, (} FOLLOW(T) = {$, )}
2 T’ ® e FIRST(T’) = {*, e} FOLLOW(T’) = {$,)}
3 T’ ® * F T’ FIRST(F) = {id, (} FOLLOW(F) = {*,$,)}
4 F ® id
5 F®(T)

* ( ) id $
T T ® F T’ T ® F T’

T’ T’ ® * F T’ T’ ® e T’ ® e

F F®(T) F ® id

35
Revisit conditions for LL(1)
• A grammar G is LL(1) iff - whenever
A®a|b
1. First(a) Ç First(b) = Æ
2. a Þ* e implies !(b Þ* e)
3. a Þ* e implies First(b) Ç Follow(A) = Æ
• No more than one entry per table field

36
Error Handling
• Reporting & Recovery
– Report as soon as possible
– Suitable error messages
– Resume after error
– Avoid cascading errors
• Phrase-level vs. Panic-mode recovery

37
Panic-Mode Recovery
• Skip tokens until synchronizing set is seen
– Follow(A)
• garbage or missing things after
– Higher-level start symbols
– First(A)
• garbage before
– Epsilon
• if nullable
– Pop/Insert terminal
• “auto-insert”
• Add “synch” actions to table

38
Summary so far
• LL(1) grammars, necessary conditions
• No left recursion
• Left-factored
• Not all languages can be generated by LL(1)
grammar
• LL(1) – Parsing: O(n) time complexity
– recursive-descent and table-driven predictive parsing
• LL(1) grammars can be parsed by simple
predictive recursive-descent parser
– Alternative: table-driven top-down parser
39
Extra Slides

40
ComputeFirst on Left-recursive Grammars
• ComputeFirst as defined earlier loops on left-
recursive grammars
• Here is an alternative algorithm for
ComputeFirst
1. Compute non left-recursive cases of FIRST
2. Create a graph of recursive cases where FIRST of a
non-terminal depends on another non-terminal
3. Compute Strongly Connected Components (SCC)
4. Compute FIRST starting from root of SCC to avoid
cycles

41
ComputeFirst on Left-recursive Grammars
• Each Strongly Connected Component can have
recursion
• But the connections between SCC means that
(by defn) what we have now is a directed acyclic
graph – hence without left recursion
• Unlike top-down LL parsing, bottom-up LR
parsing allows left-recursive grammars, so this
algorithm is useful for LR parsing

42
ComputeFirst on Left-recursive Grammars
• S ® BD | D • A ® CB | a
• D ® d | Sd • C ® Bb | e
• B ® Ab | b Compute
FIRST0[A] := {a} Strongly
S D C Connected
FIRST0[C] := {} Components
FIRST0[B] := {b} B A
FIRST0[S] := {b, d} 2 SCCs: e.g. consider B-A-C
FIRST0[D] := {d} FIRST[B] := FIRST0[B] + ComputeFirst(A)
FIRST[A] := FIRST0[A] + ComputeFirst(C )
FIRST[A] := FIRST[A] + FIRST0[B]
FIRST[C] := FIRST0[C] + FIRST0[B]
43
FIRST[C] := FIRST[C] + {e}
Examples
S®ABC S®F
A® a | ε F® A ( B ) | B A
B® b B | ε A® x | y
C®c|ε B®aB|bB|ε
Is this LL(1)? Is this LL(1)?

44
Transition Diagram
c A a
S ® cAa S:

A ® cB | B A: c B

b c B
B ® bcB | e B:
e

45

You might also like