Ch3 SyntaxAnalysispdf 2024 01 01 08 48 28
Ch3 SyntaxAnalysispdf 2024 01 01 08 48 28
Unit - 3
Parsing
Role of parser
Parse tree
Ambiguous grammar
Parser Classification
Top-down parsing
Bottom-up parsing
Why Context Free Grammar (CFG)
S (S) (SS) ((S)S) (( )S) (( )(S)) (( )((S))) (( )(( )))
Role of parser
Parse tree
Ambiguous grammar
Parser Classification
Top-down parsing
Bottom-up parsing
The Role of Parser
The Role of Parser
In our compiler model, the parser obtains a string of tokens from the
lexical analyser and verifies that the string of token names can be
generated by the grammar for the source language.
Role of parser
Parse tree
Top-down parsing
Bottom-up parsing
Parse Tree
• Parse tree follows the precedence of operators. The deepest sub-tree traversed
first. So, the operator in the parent node has less precedence over the operator in
the sub-tree.
• A parse tree has a unique leftmost and a unique rightmost derivation (however,
we cannot tell which one was used by looking at the tree)
Parse Tree v/s Syntax Tree
3*4+5
3*4+5
Role of parser
Parse tree
Ambiguous grammar
Parser Classification
Top-down parsing
Bottom-up parsing
Ambiguous grammar
A CFG is
ambiguous if one
or more terminal
strings have
multiple leftmost
derivations from
the start symbol.
Ambiguous grammar
A CFG is ambiguous
if one or more
terminal strings have
multiple leftmost
derivations from the
start symbol.
Left Recursion in Grammar
derivation
• direct (A → A x)
• indirect (A → B C, B → A )
• hidden (A → B A, B → )
Left Recursion in Grammar
with
• If RHS of more than one production starts with the same symbol, then such
a grammar is called as Grammar With Common Prefixes.
A → αβ1 / αβ2 / αβ3 (Grammar with common prefixes)
• This kind of grammar creates a problematic situation for Top down parsers.
• Parsers can not decide which production must be chosen to parse the string
in hand.
• The grammar obtained after the process of left factoring is called as Left
Factored Grammar.
A → αA’
A → αβ1 | αβ2 | αβ3 | …… | αβn
A’ → β1 | β2 | β3 | …… | βn
Left Factor Grammar
Examples
S → aS’
S → aX | aY | aZ
S’ → X | Y | Z
S → iEtSS’ | a
S → iEtS | iEtSeS | a
S’ → eS | ∈
E→b E→b
Left Factor Grammar
Examples
A → aA’
A → aA’
A → aAB | aBc | aAc A’ → AD | Bc
A’ → AB | Bc | Ac
D→B|c
Examples
Role of parser
Parse tree
Ambiguous grammar
Parser Classification
Top-down parsing
Bottom-up parsing
Parser Classification
Why Context Free Grammar
Role of parser
Parse tree
Ambiguous grammar
Parser Classification
Top-down parsing
Bottom-up parsing
Top Down Parser
Top-down parser is the parser which generates parse tree for the given input
string with the help of grammar productions by expanding the non-terminals i.e. it
starts from the start symbol and ends on the terminals. It works as following:
Grow the tree downwards by expanding productions at the lower levels of the
tree.
Repeat till, Lower fringe consists only terminals and the input is consumed
Top-down parsing basically finds a leftmost derivation for an input string.
Top down parser with backtracking
Backtracking parsers are a type of top-down parser that can handle non-
deterministic grammar.
When a parsing decision leads to a dead end, the parser can backtrack and try
another alternative.
Backtracking parsers are not as efficient as other top-down parsers because they
can potentially explore many parsing paths.
Top down parser with backtracking
Limitations
Back Up is not allowed, less general, better than brute force approach
It can also viewed as an attempt to construct a parse tree for the input string from
the root and creating the nodes of the parse tree in preorder.
Main challenges:
1. back-tracking is messy, difficult and inefficient (solution: use input
“lookahead” to help make the right choice)
2. more alternatives --- even if we use one lookahead input char, there are still
more than 1 rules to choose --- A -> ab | a (solution: rewrite the grammar by
left-factoring)
3. left-recursion might cause infinite loop
what is the procedure for E -> E + E ?
(solution: rewrite the grammar by eliminating left-recursions)
4. error handling --- errors detected “far away” from actual source.
Predictive Parser
• Special case of Recursive descent parser is called Predictive parser.
• eliminating left recursion from it, and left factoring the resulting
grammar, we can obtain a grammar that can be parsed by a recursive
descent parser that needs no backtracking i.e. Predictive parser.
stmt → if expr then stmt else stmt
| while expr do stmt
| begin stmt_list end
• The keywords if, while and begin tell us which alternative is the only
one that could possibly succeed if we are to find a statement.
Predictive Parser
◼ How?
◼ What if we do some "preprocessing" to answer the question: Given
C → bC / ε First(B) = { c }
D → EF First(C) = { b , ε }
F→f/ε First(E) = { g , ε }
First(F) = { f , ε }
Predictive Parser…FIRST
• The given grammar is left recursive.
Calculate the first for the • So, we first remove left recursion from the given
grammar.
given grammar:
After eliminating left recursion, we get the
S→A
following grammar-
A → aB / Ad S→A
B→b A → aBA’
C→g A’ → dA’ / ε
B→b
C→g
Predictive Parser…FIRST
S → (L) / a
L → SL’
L’ → ,SL’ / ε First(S) = { ( , a }
First(L) = First(S) = { ( , a }
First(L’) = { , , ∈ }
Predictive Parser…FOLLOW
or *?
◼ Define:
C→g A’ → dA’ / ε
B→b
C→g
Predictive Parser…FIRST & FOLLOW
Calculate the follow for the
given grammar: Follow(S) = { $ }
S→A
A → aBA’ Follow(A) = Follow(S) = { $ }
A’ → dA’ / ε
B→b Follow(A’) = Follow(A) = { $ }
C→g
Follow(B) = { First(A’) – ε } ∪
First(S) = First(A) = { a }
First(A) = { a } Follow(A) = { d , $ }
First(A’) = { d , ε }
First(B) = { b } Follow(C) = NA
First(C) = { g }
Predictive Parser…FIRST & FOLLOW
Calculate the follow for First(S) = { ( , a }
the given grammar: First(L) = First(S) = { ( , a }
S → (L) / a First(L’) = { , , ∈ }
L → SL’
L’ → ,SL’ / ε
Follow(L) = { ) }
Follow(L’) = Follow(L) = { ) }
Predictive Parser…Exercise FIRST - FOLLOW
Q.1
S → aAbc | BCf A→C|ε
B → Cd | c C → df | ε
Q.2
S → qABC A → a | bbD B→a|ε
C→b|ε D→c|ε
Q.3
A→B|C B→n|i
C → (D) D → DA | A
Predictive Parser…Non Recursive
INPUT a + b $
Role of parser
Parse tree
Ambiguous grammar
Parser Classification
Top-down parsing
Bottom-up parsing
Bottom Up Parsing
◼ At each step, decide on some substring that matches the RHS of some
production
➢ Replace this string by the LHS (called reduction).
◼ If the substring is chosen correctly at each step, it is the trace of a
the right side of a production, and we can reduce such string by a non
terminal on left hand side production.
◼ Handle Pruning:- The process of discovering a handle and reducing it
◼ Reduction:
◼ Bottom-up parsing produces a
abbcde A→b
aAbcde A → Abc rightmost derivation…
aAde B→d …in reverse
aABe S → aABe
S
Bottom Up Parsing
Derivation
E→E+T/T n+n*n
T→T*F/F F+n*n F→n
F→(E)/n T+n*n T→F
E+n*n E→T
E+F*n F→n ◼ rightmost derivation…
E+T*n T→F
n+n*n E+T*F F→n
E+T T→T*F
E E→E+T
Bottom Up Parsing
It has following operations:
1. Shift:- Moving of the symbols from input buffer onto the stack, this action is
called shift.
2. Reduce:- If the handle appears on the top of the stack then reduction of it by
appropriate rule is done. That means R.H.S of the rule is popped of and L.H.S
is pushed in. This action is called Reduce action.
3. Accept:- If the stack contains start symbol only and input buffer is empty at
Precedence Relation
◼ Here we define three precedence relations between certain pairs of
terminals.
E → E + E | E - E | E * E | E / E | E ^ E | ( E ) | - E | id
id + * $ b
id .> .> .>
The table can be encoded by two precedence
+ <. .> <. .>
functions f and g that map terminal symbols to
f
* <. .> .> .>
integers.
$ <. <. <. For symbols a and b.
whenever a <. b f(a) < g(b) => fa gb
whenever a =· b f(a) = g(b)
a
whenever a .> b f(a) > g(b) => fa → gb
Bottom Up Parsing… Operator Precedence Parsing
Precedence Functions For any a and b, if a <.b
id + * $
gid fid id .> .> .> , place an edge from the
◼ Disadvantages:
◼ It cannot handle the unary minus (the lexical analyzer should handle the
unary minus).
◼ Small class of grammars.
◼ Difficult to decide which language is recognized by the grammar.
◼ Advantages:
◼ simple
◼ powerful enough for expressions in programming languages
Bottom Up Parsing… Operator Precedence Parsing
Error Cases:
1. No relation holds between the terminal on the top of stack and the next
input symbol.
2. A handle is found (reduction step), but there is no production with this
handle as a right side
Error Recovery:
1. Each empty entry is filled with a pointer to an error routine.
2. Decides the popped handle “looks like” which right hand side. And tries
to recover from that situation.
Bottom Up Parsing… LR(k) Parser
L: left-to-right scanning of the input
R: constructing a rightmost derivation in reverse
k: the number of input symbols of lookahead that are used in making parsing
decisions. (default value is 1)
Bottom Up Parsing… LR(k) Parser
Automata, recognizing the Set Of All Viable Prefixes by reading the stack from
Bottom To Top.
forms is constructed, it can be used to guide the handle selection in the Shift-reduce
Parser.
❖ 0 lookahead symbol
Bottom Up Parsing… LR(0) Parser
Augmented Grammar
➢ GOTO Function
◼ Augmented Grammar for a grammar G(starting symbol S) is G’, start with a new
starting symbol S’, with S’ → S
◼ Closure Item : An Item created by the closure operation on a state.
◼ Complete Item : An Item where the Item Dot is at the end of the RHS.
Bottom Up Parsing… LR(0) Parser
LR(0) Items (Parser States) …Closure Operation
is in closure(I)
3. Repeat until no new items are generated
Bottom Up Parsing… LR(0) Parser
LR(0) Items (Parser States) …Closure Operation
Augmented
Grammar
0: E’ → E
1: E→E+T
2: E→T
3: T→T*F
4: T→F
5: F→(E)
6: F → id
Bottom Up Parsing… LR(0) Parser
LR(0) Items (Parser States) …GOTO Function
0: E’ → E
1: E→E+T
2: E→T
3: T→T*F
4: T→F
5: F→(E)
6: F → id
Bottom Up Parsing… LR(0) Parser
LR(0) Items (Parser States) …GOTO Function
0: E’ → E
1: E→E+T
2: E→T
3: T→T*F
4: T→F
5: F→(E)
6: F → id
Bottom Up Parsing… LR(0) Parser
LR(0) Items (Parser States) …Example
A very simple grammar: I1: GOTO(I0, S)
S’ → S
S → aSb | ab I1: S’ → S.
I0: S’ → .S
S → .aSb I3: GOTO(I2, S)
Closure(S)
S → .ab
I3: S → aS.b
I2: GOTO(I0, a)
I2: GOTO(I0, a) I4: GOTO(I2, b)
id
Bottom Up Parsing… LR(0) Parser
LR(0) Items (Parser States) …Example I6: E → E + . T
T→.T*F
Ex: E’ → E I2: E → T. T→.F
E→E+T|T T → T. * F F→.(E)
T→T*F|F F → . id
I3: T → F.
F → ( E ) | id
I7: T → T * . F
I4: F → (. E) F→.(E)
I0: E’ → . E E→.E+T F → . id
E→.E+T E→.T
E→.T T→.T*F I8: F → ( E .)
T→.T*F T→.F E→E.+T
T→.F F→.(E)
F→.(E) F → . id I9: E → E + T.
F → . id T→T.*F
Stack Input
if … then stmt else … Shift/ Reduce Conflict
Bottom Up Parsing… LR(0) Parser
LR(0) Parser…Conflicts
“shift/reduce” or “reduce/reduce”
Example: stmt → if expr then stmt
We can’t tell whether it is | if expr then stmt else stmt
a handle
| other (any other statement)
Stack Input
if … then stmt else … Shift/ Reduce Conflict
Bottom Up Parsing… LR(0) Parser
Draw backs of LR(0) Parser
❖ LR(0) accepts only small class of LR(0) grammar because if conflicts occurs.
• SLR(1) parser will perform a reduce action for configuration B→a• if the
SLR(1)
LR(0)
Bottom Up Parsing… SLR(1) Parser
4: T → F
5: F → ( E )
6: F → id
Bottom Up Parsing… SLR(1) Parser
GOTO (E) I1: E’ → E . s0 s1
I0: E’ → . E
E→E.+T
E→.E+T I2: E → T.
T → T. * F s0 s2
E→.T
T→.T*F I3: T → F.
s0 s3
I4: F → (. E)
T→.F E→.E+T
E→.T s0 s4
F→.(E) T→.T*F
T→.F
F → . id F→.(E)
F → . id
I5: F → id . s0 s5
2 R2 S7 R2 R2
Bottom Up Parsing… SLR(1) Parser
0: E’ → E
1: E → E + T
2: E → T
3: T → T * F
4: T → F
5: F → ( E )
6: F → id
Bottom Up Parsing… SLR(1) Parser
Stack Input Action
0: E’ → E
1: E → E + T $0 n*n+n$ Shift 5
2: E → T $0n5 *n+n$ Reduce 6
3: T → T * F
4: T → F $0F3 *n+n$ Reduce 4
5: F → ( E ) $0T2 *n+n$ Shift 7
6: F → id
$0T2*7 n+n$ Shift 5
$0T2*7n5 +n$ Reduce 6
$ 0 T 2 * 7 F 10 +n$ Reduce 3
$0T2 +n$ Reduce 2
$0E1 +n$ Shift 6
$0E1+6 n$ Shift 5
$0E1+6n5 $ Reduce 6
$0E1+6F3 $ Reduce 4
$0E1+6T9 $ Reduce 1
$0E1 $ ACCEPT
Bottom Up Parsing… SLR(1) Parsing Conflicts
◼ SHIFT/REDUCE conflict
◼ REDUCE/REDUCE conflict
S →L=R|R L → * R | id R →L
S6
read state:
lookup the state I2 next symbol is =
∩ intersection operation
I2: R5
reduce state:
S → L•=R FOLLOW ( R ) ={ =, $ }
R → L•
Result is { = }
so SHIFT/REDUCE conflict is here
Bottom Up Parsing… SLR(1) Parsing Conflicts
Ex.1 S → CC Ex.3 S → xAy | xBy | xAz
C → cC A → aS | q
C→d B→q
Ex.2 S → AaAb
Ex.4 S → aSbS | bSaS | ε
S → BbBa
A→ε
B→ε
Bottom Up Parsing…CLR(1) and LALR(1)
S →L=R|R L → * R | id R →L
LR(1) Items
• LR(1) item = LR(0) item + lookahead
Ex:
S’ → S
S→CC
C→cC/d
Bottom Up Parsing…LALR(1) Parsing
• LALR(1) parsing (Look-Ahead LR) combines LR(1) states to reduce table size
• Combine 2 states if they have the same core items and compatible goto functions
• If cannot combine 2 states, then grammar is not LALR(1) but still LR(1).
Bottom Up Parsing…LALR(1) Parsing
Ex: I0 : S’→ . S, $ GOTO(I0, S) = I1
S → . C C, $ GOTO(I0, C) = I2
S’ → S C → . c C, c/d GOTO(I0, c) = I3
S→CC C → . d, c/d GOTO(I0, d) = I4
C→cC/d
I1: S’ → S . , $
I6: C → c . C, $ GOTO(I6, C) = I9
C → . c C, $ GOTO(I6, c) = I6
I2: S → C . C, $ GOTO(I2, C) = I5 C → .d, $ GOTO(I6, d) = I7
C → . c C, $ GOTO(I2, c) = I6
C → . d, $ GOTO(I2, d) = I7
I7: C → d . , $
I3: C → c . C, c/d GOTO(I3, C) = I8
C → . c C, c/d GOTO(I3, c) = I3
I8: C → c C . , c/d
C → .d, c/d GOTO(I3, d) = I4
I9: C → c C . , $
I4: C → d . , c/d
I5: S → C C . , $
Bottom Up Parsing…LALR(1) Parsing
I0 : S’→ . S, $
Ex: S → . C C, $
C → . c C, c/d
S’ → S I47: C → d . , c/d / $
C → . d, c/d
S→CC
C→cC/d I1: S’ → S . , $ I5: S → C C . , $
I2: S → C . C, $
C → . c C, $ I89: C → c C . , c/d / $
C → . d, $
I36: C → c . C, c/d / $
C → . c C, c/d / $
C → .d, c/d / $
Bottom Up Parsing…LALR(1) Parsing
Ex:
S’ → S
S→CC
C→cC/d
Bottom Up Parsing…Error Recovery
❑ Panic mode
➢ Pop until state with a goto on a nonterminal A is found, (where A
represents a major programming construct), push A
➢ Discard input symbols until one is found in the FOLLOW set of A
❑ Phrase-level recovery
➢ Implement error routines for every error entry in table
❑ Error productions
➢ Pop until state has error production, then shift on stack
➢ Discard input until symbol is encountered that allows parsing to continue
LL(1) V/S LR(1)