0% found this document useful (0 votes)
3 views48 pages

Compiler 9

The document discusses top-down parsing techniques, specifically focusing on recursive descent parsers and LL(1) parsers. It explains the construction of parse trees, predictive parsing algorithms, and the handling of left recursion in grammars. Additionally, it includes examples and homework problems related to constructing LL(1) parsing tables and analyzing grammars.

Uploaded by

tmtsmbdalbary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views48 pages

Compiler 9

The document discusses top-down parsing techniques, specifically focusing on recursive descent parsers and LL(1) parsers. It explains the construction of parse trees, predictive parsing algorithms, and the handling of left recursion in grammars. Additionally, it includes examples and homework problems related to constructing LL(1) parsing tables and analyzing grammars.

Uploaded by

tmtsmbdalbary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 48

‫جامعة إفريقيا العالمية‬

‫كلية دراسات الحاسوب‬

‫تصميم‬
‫المترجمات‬
‫حسب (‪)412‬‬
:Lecture 9
Top-Down Parsing

Source code Front-End IR Object code


Lexical Syntax
Back-End
AnalysisAnalysis
Recursive Descent Parser
• Consider the grammar:
S→cAd
A → ab | a

The input string is “cad”

3
Recursive Descent Parser
(Cont.)
• Build parse tree:
step 1. From start symbol.
S

c A d

4
Recursive Descent Parser
(Cont.)
Step 2. We expand A using the first
alternative A → ab to obtain the following
tree:
S

c A d

a b
5
Recursive Descent Parser
(Cont.)
• Now, we have a match for the second input
symbol “a”, so we advance the input pointer to
“d”, the third input symbol, and compare d
against the next leaf “b”.

• Backtracking
– Since “b” does not match “d”, we report failure and go
back to A to see whether there is another alternative
for A that has not been tried - that might produce a
match!
– In going back to A, we must reset the input pointer to
“a”.
6
Recursive Descent Parser
(Cont.)
Step 3.

c A d

7
Creating a top-down parser
• Top-down parsing can be viewed as the
problem of constructing a parse tree for the
input string, starting form the root and
creating the nodes of the parse tree in
preorder.

• An example follows.

8
Creating a top-down parser
(Cont.)
• Given the grammar :
– E → TE’
– E’ → +TE’ | λ
– T → FT’
– T’ → *FT’ | λ
– F → (E) | id
• The input: id + id * id

9
Creating a top-down parser
(Cont.)

10
Top-down parsing

• A top-down parsing program consists of a set of


procedures, one for each non-terminal.

• Execution begins with the procedure for the start


symbol, which halts and announces success if its
procedure body scans the entire input string.

11
Top-down parsing
A typical procedure for non-terminal A in a top-down parser:

boolean A() {
choose an A-production, A → X1 X2 … Xk;
for (i= 1 to k) {
if (Xi is a non-terminal)
call procedure Xi();
else if (Xi matches the current input token “a”)
advance the input to the next token;
else /* an error has occurred */;
}
}

12
Top-down parsing
• Given a grammar:

input → expression
expression → term rest_expression
term → ID | parenthesized_expression
parenthesized_expression → ‘(‘ expression ‘)’
rest_expression → ‘+’ expression | λ

13
Top-down parsing
• For example:
input:

ID + (ID + ID)

14
Top-down parsing
Build parse tree:
start from start symbol to invoke:
int input (void)
input

expression $

Next, invoke expression()


15
Top-down parsing
input

expression $

term rest_expression

Next, invoke term()

16
Top-down parsing
input

expression $

term rest_expression

ID

select term → ID (matching input string “ID”)


17
Top-down parsing
Invoke rest_expression()
input

expression $

term rest_expression

ID + expression
18
Top-down parsing
The parse tree is:
input

expression $

term rest_expression

ID + expression
term rest_expression

parenthesized_expression λ

( expression )

term rest_expression

ID + expression

term rest_expression

ID λ 19
LL(1) Parsers
• The class of grammars for which we can
construct predictive parsers looking k symbols
ahead in the input is called the LL(k) class.

• Predictive parsers, that is, recursive-descent


parsers without backtracking, can be constructed
for the LL(1) class grammars.

• The first “L” stands for scanning input from left to


right. The second “L” for producing a leftmost
derivation. The “1” for using one input symbol of
look-ahead at each step to make parsing
decisions.
20
LL(1) Parsers (Cont.)
A → α | β are two distinct productions of
grammar G, G is LL(1) if the following 3
conditions hold:

1. FIRST(α) cannot contain any terminal in FIRST(β).


2. At most one of α and β can derive λ.
3. if β →* λ, FIRST(α) cannot contain
any terminal in FOLLOW(A).
if α →* λ, FIRST(β) cannot contain
any terminal in FOLLOW(A).
21
Construction of a predictive
parsing table
• The following rules are used to construct
the predictive parsing table:
– 1. for each terminal a in FIRST(α),
add A → α to matrix M[A,a]

– 2. if λ is in FIRST(α), then
for each terminal b in FOLLOW(A),
add A → α to matrix M[A,b]

22
LL(1) Parsers (Cont.)
Given the grammar:
input → expression 1
expression → term rest_expression 2
term → ID 3
| parenthesized_expression 4
parenthesized_expression → ‘(‘ expression ‘)’ 5
rest_expression → ‘+’ expression 6
| λ 7

Build the parsing table.

23
LL(1) Parsers (Cont.)

FIRST (input) = FIRST(expression)


=FIRST (term) = {ID, ‘(‘ }
FIRST (parenthesized_expression) ={ ‘(‘ }
FIRST (rest_expression) ={ ‘+’ λ}

FOLLOW (input) = {$ }
FOLLOW (expression) = {$ ‘)’ }
FOLLOW (term) =
FOLLOW (parenthesized_expression) = {$ ‘+’ ‘)’}
FOLLOW (rest_expression) = {$ ‘)’}

24
LL(1) Parsers (Cont.)
Non-terminal Input symbol
ID + ( ) $
Input 1 1
Expression 2 2
Term 3 4
parenthesized_e 5
xpression
rest_expression 6 7 7

25
Model of a table-driven
predictive parser

26
Predictive parsing algorithm
Set input pointer (ip) to the first token a;
Push $ and start symbol to the stack.
Set X to the top stack symbol;
while (X != $) { /*stack is not empty*/
if (X is token a) pop the stack and advance ip;
else if (X is another token) error();
else if (M[X,a] is an error entry) error();
else if (M[X,a] = X → Y1Y2…Yk) {
output the production X → Y1Y2…Yk;
pop the stack; /* pop X */
/* leftmost derivation*/
push Yk,Yk-1,…, Y1 onto the stack, with Y1 on top;
}
set X to the top stack symbol Y1;
} // end while 27
LL(1) Parsers (Cont.)
• Given the grammar:
– E → TE’ 1
– E’ → +TE’ 2
– E’ → λ 3
– T → FT’ 4
– T’ → *FT’ 5
– T’ → λ 6
– F → (E) 7
– F → id 8
28
LL(1) Parsers (Cont.)
FIRST(F) = FIRST(T) = FIRST(E) = { ( , id }
FIRST(E’) = { + , λ}
FIRST(T’) = { ﹡, λ}

FOLLOW(E) = FOLLOW(E’) = { ) , $ }
FOLLOW(T) = FOLLOW(T’) = { + , ) , $ }
FOLLOW(F) ={+, *, ),
$}
29
LL(1) Parsers (Cont.)
Non- Input symbols
terminal
Id + * ( ) $

E 1 1
E’ 2 3 3
T 4 4
T’ 6 5 6 6
F 8 7 30
LL(1) Parsers (Cont.)
Stack Input Output
$E id + id * id $
$E’T id + id * id $ E → TE’
$E’T’F id + id * id $ T → FT’
$E’T’id id + id * id $ F → id
$E’T’ + id * id $ match id
$E’ + id * id $ T’ → λ
$E’T+ + id * id $ E’ → +TE’
31
LL(1) Parsers (Cont.)
Stack Input Output
$E’T id * id $ match +
$E’T’F id * id $ T → FT’
$E’T’id id * id $ F → id
$E’T’ * id $ match id
$E’T’F* * id $ T’ → *FT’
$E’T’F id $ match *
$E’T’id id $ F → id
$E’T’ $ match id
$E’ $ T’ → λ
32
$ $ E’ → λ
Common Prefix
In Fig. 5.12, the common prefix:
if Expr then StmtList (R1,R2)
makes looking ahead to distinguish R1 from
R2 hard.

Just use Fig. 5.13 to factor it and “var”(R5,6)


The resulting grammar is in Fig. 5.14.

33
Left Recursion
• A production is left recursive
if its LHS symbol is the first symbol of
its RHS.

• In fig. 5.14, the production


StmtList→ StmtList ; Stmt
StmtList is left-recursion.

36
Left Recursion (Cont.)

37
Left Recursion (Cont.)
• Grammars with left-recursive productions
can never be LL(1).
– Some look-ahead symbol t predicts the
application of the left-recursive production
A → Aβ.
with recursive-descent parsing, the application
of this production will cause
procedure A to be invoked infinitely.
Thus, we must eliminate left-recursion.

38
Left Recursion (Cont.)
Consider the following left-recursive rules.
1. A → A α
2. | β the rules produce strings like β α α

we can change the grammar to:


1. A → X Y
2. X → β
3. Y → α Y
4. | λ the rules also produce strings like β α α

The EliminateLeftRecursion algorithm is shown in fig. 5.15.


Applying it to the grammar in fig. 5.14 results in fig. 5.16.

39
Left Recursion (Cont.)

3
4
5

40
Left Recursion (Cont.)
Now, we trace the algorithm with the grammar below:
(4) StmtList → StmtList ; Stmt
(5) | Stmt

first, the input is (4) StmtList → StmtList ; Stmt


because RHS(4) = StmtList α it is left-recursive (marker 1)
create two non-terminals X, and Y
for rule (4) (marker 2)
as StmtList = StmtList,
create StmtList → XY (marker 3)
for rule (5) (marker 2)
as StmtList != Stmt
create X → Stmt (marker 4)
finally, create Y → ; Stmt and Y → λ (marker 5)
Left Recursion (Cont.)

42
Homework 1
Construct the LL(1) table for the following
grammar:
1 Expr → - Expr
2 Expr → (Expr)
3 Expr → Var ExprTail
4 ExprTail → - Expr
5 ExprTail → λ
6 Var → id VarTail
7 VarTail → (Expr)
8 VarTail → λ
Homework 1 Solution
First(Expr) = {-, (, id}
First(ExprTail) = {-, λ}
First (Var) ={ id}
First (VarTail) = { (, λ}

Follow (Expr) = Follow (ExprTail) = {$, ) }


Follow (Var) = {$, ), -}
Follow (VarTail) = {$, ), -}
Homework 1 Solution (Cont.)
Non- Input Symbol
Terminal
- ( id ) $

Expr 1 2 3

ExprTail 4 5 5

Var 6

VarTail 8 7 8 8
Homework 2
• Given the grammar:
– S → i E t S S’ | a
– S’ → e S | λ
– E→b

– 1. Find the first set and follow set.


– 2. Build the parsing table.

46
Homework 2 Solution

First(S) = {i, a}
First(S’) = {e, λ}
First (E) = {b}

Follow (S) = Follow (S’) = {$, e}


Follow (E) = {t}

47
Homework 2 Solution (Cont.)

Non- Input Symbol


Terminal a b e i t $

S 2 1
S’ 3/4 4
E 5

As First(S’) contains λ and Follow (S’) = {$, e} So rule 4 is added to e, $.


3/4 (rule 3 or 4) means an error. This is not LL(1) grammar.

48

You might also like