Chapter 3 - Syntax Analysis
Chapter 3 - Syntax Analysis
Chapter 3
By Esubalew Alemneh
Contents (Session-1)
Introduction
Context-free grammar
Derivation
Parse Tree
Ambiguity
Resolving Ambiguity
Left Factoring
Non-Context Free Language Constructs
Introduction
generated
By detecting whether the program is written following the
grammar rules.
Reports syntax errors, attempts error correction and
recovery
Collects information into symbol tables
Introduction
Error
Source
program
Lexical
analyzer
token
Request
for token
Parse
parser tree
Rest of
front end
Symbol
table
Parsers can be
Top-down or Bottom-up
Int.
code
Where
Vt = {+, -, *, / (,), id}, Vn = {E}
S = {E}
Production are shown above
programming languages.
Derivation
A sequence of replacements of non-terminal
Derivation
Derivate string (id+id) from G1
Parse Tree
A parse tree can be seen as a graphical
representation of a derivation
Inner nodes of a parse tree are non-terminal
symbols.
The leaves of a parse tree are terminal symbols.
E
E -E
-(E)
E
-
-(E+E)
E
(
E
-
-(id+E)
E
-
E
(
id
-(id+id)
E
(
id
id
E
-
E
(
Ambiguity
An ambiguous grammar is one that produces
id+id*E
Eid+id*id
E
id
id
E+E*E
id+E*E
id+id*E
id+id*id
E
E
E
id
E
id
*
E
id
E
id
Ambiguity
For the most parsers, the grammar must be
unambiguous.
If a grammar unambiguous grammar then there are
unique selection of the parse tree for a sentence
We should eliminate the ambiguity in the grammar
during the design phase of the compiler.
An unambiguous grammar should be written to
eliminate the ambiguity.
We have to prefer one of the parse trees of a
sentence (generated by an ambiguous grammar)
to disambiguate that grammar to restrict to this
choice.
AmbiguityDangling If
stmt
expr then
E1
| otherstmts
stmt
else
if expr then
E2
S1
stmt
S1
S2
Resolving Ambiguity
Option 1: add a meta-rule e.g. precedence and
associativity rules
For example else associates with closest previous if
works, keeps original grammar intact
ad hoc and informal
explicitly
stmt matchedstmt | unmatchedstmt
matchedstmt if expr then matchedstmt else matchedstmt |
otherstmts
unmatchedstmt if expr then stmt |
if expr then matchedstmt else unmatchedstmt
formal, no additional rules beyond syntax
sometimes obscures original grammar
Resolving Ambiguity
Option 3: redesign the language to remove
the ambiguity
Stmt ::= ... |
if Expr then Stmt end |
if Expr then Stmt else Stmt end
formal, clear, elegant
allows sequence of Stmts in then and else
branches, no { , } needed
extra end required for every if
Left Recursion
A grammar is left recursive if it has a non-
recursive grammars.
So, we have to convert our left-recursive grammar
into an equivalent grammar which is not leftrecursive.
Two types of left-recursion
immediate left-recursion - appear in a single step of the
derivation (),
Indirect left-recursion - appear in more than one step of the
derivation.
recursion
A A
A A |
A A |
A
|
OR
In general,
A A 1 | ... | A m | 1 | ... | n
not start with A
where 1 ... n do
A 1 A | ... | n A
A 1 A | ... | m A |
an equivalent grammar
below
E E+T | T
T T*F | F
F id | (E)
Answer
E T E
E +T E |
T F T
T *F T |
F id | (E)
Indirect Left-Recursion
A grammar cannot be immediately left-recursive,
S Aa | b
A Ac | Sd | f
- Order of non-terminals: S = A1, A = A2
A1 A2 a | b
A2 A2 c | A1 d | f
The only production with j<i is A2 A1 d
for A:
- Replace it with A2 A2 ad | bd
A2 A2 c | A2 ad | bd | f
- Eliminate the immediate left-recursion in A
A2 bdA|bdA
A cA | adA|
Left Factoring
A predictive parser (a top-down parser without
A to 1
or
A to 2
Left Factoring
But, if we re-write the grammar as follows
A A
A 1 | 2
Left Factoring
Example1
Example2
A aA | cdg | cdeB |
cdfB
A bB | B
A aA | cdA
A bB | B
A g | eB | fB
A aA | b
A d | | b | bc
A aA | b
A d | | bA
A | c
Contents(Session-2)
Top Down Parsing
Recursive-Descent Parsing
Predictive Parser
Recursive Predictive Parsing
Non-Recursive Predictive Parsing
the start symbol and repeat the following steps until the fringe
of the parse tree matches the input string
1. At a node labeled A, select a production with A on its LHS
and for each symbol on its RHS, construct the appropriate
child
2. When a terminal is added to the fringe that doesn't match
the input string, backtrack
3. Find the next node to be expanded
! Minimize the number of backtracks as much as
possible
Recursive-Descent Parsing
Backtracking is needed (If a choice of a production rule
Recursive-Descent Parsing
It tries to find the left-most derivation.
Backtracking is needed
Example
S aBc
B bc | b
input: abc
A left-recursive grammar can cause a
Predictive Parser
A grammar
eliminate
left
no %100 guarantee.
When re-writing a non-terminal in a derivation step, a
predictive parser can uniquely choose a production
rule by just looking the current symbol in the input
string.
Note: When we are
stmt if ......
|
trying to write the nonwhile ...... |
terminal stmt, we can
begin ...... |
uniquely choose the
for .....
production rule by just
left recursion
factor
corresponds to a procedure/function.
Example
A aBb | bAB
proc A {
case of the current token {
a: - match the current token with a, and move to the next token;
- call B;
- match the current token with b, and move to the next token;
b: - match the current token with b, and move to the next token;
- call A;
- call B;
}
}
A aA | bB | l
If all other productions fail, we should apply an l-
Model of a table-driven
predictive parser
Output
a production rule representing a step of the derivation
Stack
contains the grammar symbols
at the bottom of the stack, there is a special end marker
symbol $.
initially the stack contains only the symbol $ and the starting
symbol S.
when the stack is emptied (i.e. only $ left in the stack), the
parsing is completed.
Parsing table
a two-dimensional array M[A,a]
b LL(1)
$
Parsing
Table
S aBa
B
input
B bB
output
$S
$aBa
$aB
$aBb
$aB
$aBb
$aB
$a
abba$
S aBa
abba$
bba$
B bB
bba$
ba$
B bB
ba$
a$
B
a$
We will see
how to
construct
parsing
table Very
soon
E TE
E
E +TE
T T FT
T
T
F F id
E is start symbol
E TE
T FT
T *FT
F (E)
if S Aa
FIRST(X)={X}
2. If X is , then FIRST(X)={}
E TE
E +TE |
T FT
T *FT|
F (E) | id
From Rule 1
FIRST(id) = {id}
From Rule 2
FIRST() = {}
From Rule 3 and
4
First(F) = {(, id}
First(T) = {*, }
FIRST(E) = {+, }
FIRST(E) = {(,id}
Others
FIRST(TE) = {(,id}
FIRST(+TE ) = {+}
FIRST(FT) = {(,id}
FIRST(*FT) = {*}
FIRST((E)) = {(}
anything
If ( A B is a production rule )
or ( A B is
a production rule and is in FIRST() ), then
everything in FOLLOW(A) is in FOLLOW(B).
E TE
E +TE |
iii. T FT
iv. T *FT |
v. F (E) | id
FOLLOW(E) = { $, ) }, because
i.
ii.
(E)
FOLLOW(E) = { $, ) } . Rule 3
FOLLOW(T) = { +, ), $ }
From Rule 2 + is in FOLLOW(T)
From Rule 3 Everything in Follow(E) is in Follow(T) since
First(E) contains
.
.
add A to M[A,a]
2. If in FIRST()
FIRST(TE)={(,id}
E +TE
FIRST(+TE )={+}
T FT
FIRST()={}
none
but since in FIRST()
and FOLLOW(E)={$,)} E into M[E,$] and M[E,)]
FIRST(FT)={(,id}
FIRST()={}
none
but since in FIRST()
and FOLLOW(T)={$,),+}
T into M[T,$], M[T,)]
& M[T,+]
F (E)
FIRST((E) )={(}
LL(1) Grammars
A grammar whose parsing table has no multiple-defined
production rule.
In this case, we say that it is not a LL(1) grammar.
SiCtSE |
EeS |
Cb
FIRST(iCtSE) = {i}
FIRST(a) = {a}
FIRST(eS) = {e}
FIRST() = {}
FIRST(b) = {b}
FOLLOW(S) = { $,e }
FOLLOW(E) = { $,e }
FOLLOW(C) = { t }
S iCtSE
S Sa
EeS
Cb
(LL(1) parsing)
if the terminal symbol on the top of stack does
Contents (Session-3)
Bottom Up Parsing
Handle Pruning
Implementation of A Shift-Reduce
Parser
LR Parsers
LR Parsing Algorithm
Actions of A LR-Parser
Constructing SLR Parsing Tables
SLR(1) Grammar
Error Recovery in LR Parsing
Bottom-Up Parsing
A bottom-up parser creates the parse tree of the
Bottom-Up Parsing
A shift-reduce parser tries to reduce the given input
reduced to
... S
rm
rm
rm
Handle
Informally, a handle of a string is a substring that
S A rm
rm
Handle Pruning
A right-most derivation in reverse can be
obtained by handle-pruning.
S 0 rm
rm
1 rm
2 ...
n-1 n=
rm
rm
input string
Start from n, find a handle Ann in n,
and
Reducing Production
Input Action
$ id+id*id$
shift
$id
+id*id$ reduce by F id
$F
+id*id$ reduce by T F
$T
+id*id$ reduce by E T
$E
+id*id$ shift
$E+
id*id$
shift
$E+id *id$
reduce by F id
$E+F *id$
reduce by T F
$E+T *id$
shift
$E+T*
id$ shift
$E+T*id
reduce by F id
$E+T*F
reduce by T T*F
$E+T $
reduce by E E+T
$E
accept
Shift-Reduce Parsers
The most prevalent type of bottom-up parser
right-most
k lookhead (k is omitted it is 1)
CFG
LR
grammars.
LALR
SLR
LR Parsers
LR parsing is attractive because:
LR parsers can be constructed to recognize virtually all
programming-language constructs for which contextfree grammars can be written.
LR parsing is most general non-backtracking shiftreduce parsing, yet it is still efficient.
The class of grammars that can be parsed using LR
methods is a proper superset of the class of grammars
that can be parsed with predictive parsers.
LL(1)-Grammars LR(1)-Grammars
An LR-parser can detect a syntactic error as soon as
it is possible to do so a left-to-right scan of the input.
Drawback of the LR method is that it is too much
LR Parsing Algorithm
input
a1
... ai
... an
stack
Sm
Xm
LR Parsing Algorithm
Sm-1
Xm-1
.
.
S1
X1
S0
s
t
a
t
e
s
Action Table
Goto Table
terminals and $
non-terminal
four different
actions
s
t
a
t
e
s
each item is
a state number
output
A Configuration of LR Parsing
Algorithm
A configuration of a LR parsing is:
Rest of Input
sentential form:
X1 ... Xm ai ai+1 ... an $
Xi is the grammar symbol represented by state s i
Actions of A LR-Parser
1. If ACTION[Sm,
move ; it shifts the next state s onto the stack, entering the
configuration
( So S1 ... Sm, ai ai+1 ... an $ ) ( So S1 ... Sm s, ai+1 ... an $ )
2. If ACTION[Sm,
r is the
length of , and s = GOTO[sm-r, A]. Output is the reducing
production A
Here the parser first popped r state symbols off the stack,
exposing state sm-r then the parser pushed s.
( So S1 ... Sm, ai ai+1 ... an $ ) to ( So S1 ... Sm-r s, ai ... an $ ) where
LR-parsing algorithm
Action Table
state
id
s5
Goto Table
)
s4
s6
r2
s7
r2
r2
r4
r4
r4
r4
s4
r6
acc
s5
r6
r6
s5
s4
s5
s4
r6
10
s6
r1
s7
r1
r1
10
r3
r3
r3
r3
s11
input
id*id+id$
0id5
*id+id$
reduce by Fid
Fid
0F3
*id+id$
reduce by TF
TF
0T2
*id+id$
shift 7
0T2*7
id+id$
shift 5
0T2*7id5
action
output
shift 5
+id$
0T2*7F10 (*)+id$
reduce by Fid
Fid
reduce by TT*F
TT*F
0T2
+id$
reduce by ET
0E1
+id$
shift 6
ET
0E1+6
id$
shift 5
0E1+6id5
reduce by Fid
Fid
0E1+6F3
reduce by TF
TF
0E1+6T9 (**)
0E1
$
accept
b/c goto(0, F)
=3
b/c goto(0, T)
=2
reduce by EE+T
b/c goto(7, F)
= 10
b/c goto(0, T)
=2
(*) T2*7F10
EE+T
reduced by T
b/c
Ex:
A aBb
aBb
A a Bb
.
.
A aB b
A aBb
Sets of LR(0) items will be the states of action and goto table of the
SLR parser.
i.e. States represent sets of "items.
Goto Operation
If I is a set of LR(0) items and X is a grammar symbol (terminal or non-terminal), then goto(I,X) is
defined as follows:
Example:
I ={E
F
.
.
E, E
(E), F
E+T, E
T, T
T*F, T
F,
id}
. .
. .
. .
. .
.
.
.
. .
. .
.
.
goto(I,E) = closure({ E E , E E +T }) = { E E , E E +T })
goto(I,T) = closure({ E T , T T *F }) = {E T , T T *F}
goto(I,F) = closure({ T F
}) = { T F
})
}) = { F id
E+T, E
T, T
T*F, T
F, F
(E), F
id }
. T, T . T * F, T . F, F .( E ), F . id}.
This gives us the items for the first state (state0
I0) of our DFA
E ., E
E . + T})
= {E E ., E E . + T} = call it I1
For symbol T, Goto(I0, T) = closure({E T ., T
F.} = I3
For symbol (, Goto(I0, () = closure({F ( .E ) }) = {F
(.E ), E . E + T, E . T, T . T * F, T . F, F .(
E ), F . Id} = I4
id.} = I5
Repeat this step for newly created states (I1, I2, I3, I4, I5)
E +. T, }, T . T * F, T . F, F .( E ), F . Id} = I6
E .+ T})
closure({E
E}) =
{ E
ET
T T*F
TF
.
.
.
E Rule 1
E+T Rule 2
T Rule 2
T*F
Rule 2
items for G.
C{I0,...,In}
2. Create the parsing action table as follows
4
Take E E . + T from I1, Goto(I1, +) = I6, then action[1, +] =
shift 6
Take T T . * F from I2, Goto(I2, *) = I7, then action[2, *] = shift
7
other shifts can be populated in the same way
o
.
.
.
.
.
.
.
.
Take
Take
Take
Take
Take
Take
Take
Take
Take
Action Table
stat
e
0
id
s5
s4
s6
r2
s7
r2
r2
r4
r4
r4
r4
s4
r6
r6
s5
s4
s5
s4
s6
r1
10
r3
Goto Table
8
r6
6
8
acc
s5
r6
10
s7
s1
1
r1
r1
r3
r3
r3
Exercise
Construct SLR Parse table for the augmented
SLR(1) Grammar
An LR parser using SLR(1) parsing tables for a
goto table.
Some error recovery are
Discard zero or more input symbols until a symbol
a is found
By marking each empty entry in the action table
with a specific error routine.
Assignment 3
Given the following grammar where a, b, & c