CD Unit3
CD Unit3
Parsing is a process which takes input string and produces either a parse tree or generates syntax error.
• Syntax Analyser takes tokens from Lexical Analyser & group them in a program structure, if any
s
Parsing Techniques
Depending upon how the parse tree is built, parsing techniques are classified
s
Top Down vs Bottom Up Parsing
s
Problems with Top Down Parsing
• The parse tree is generated from top to bottom (from root to leaves).
• The leftmost derivation is applied at each derivation step.
• In top-down parsing selection of proper production rule is very important.
2. Left Recursion
3. Left Factoring
4. Ambiguity
Top Down Parsing- Backtracking
Backtracking:- It is a technique in which non-terminal symbols are expanded on trial & error basis
until a match for input string is found.
Disadvantages:
s
1. This powerful technique
s
Top Down Parsing-Left Recursion
Left recursion is considered to be a problematic situation for Top down parsers where as Right
recursion and general recursion does not create any problem for the Top down parsers.
Therefore, left recursion has to be eliminated from the grammar.
s
Top Down Parsing- Left Factoring
Left Factoring is a process to transform the grammar with common prefixes.
The grammar obtained after the process of left factoring is called as Left Factored Grammar
s
Top Down Parsing- Ambiguity
A grammar is said to ambiguous if for any string generated by it, it produces more than
one-Parse tree
Or derivation tree
Or syntax tree
Or leftmost derivation
Or rightmost derivation
X → X+X X → X*X
→ a +X → X+X*X
Here we are able to → a+ X*X → a+ X*X
create 2 parse trees for → a+a*X → a+a*X
given production , so → a+a*a → a+a*a
the given grammar is
ambiguous.
Recursive Descent
It is a kind of Top-Down Parser.
It uses collection of recursive procedures for parsing given input string (for each non terminal)
CFG is used to build recursive routines.
Each nonterminal in the grammar is implemented as a function.
Begin with the start symbol S of the grammar by calling the function S().
Based on the first token received, apply the appropriate grammar rule for S.
s
Continue in this manner until S is “satisfied.”
Advantages:-
• Simple to build
Disadvantages:-
• Error-recovery is difficult.
• They are not able to handle as large a set of grammars as other parsing methods.
Recursive Descent- procedure
The procedure will start from top root node and we have to apply several production rules to get
bottom leaf nodes.
Step1: If input is a Non-terminal then call the corresponding procedure of that Non-terminal
Step4: There is no need to define main() . If we define main() then we have to call start symbol
function in main().
Recursive Descent- procedure
Example:-
E -> i E’
E’ -> + i E’ | ε
Eprime()
{
E ()
if (input == ‘+’ )
{ s
{ input++;
if (input == ‘i’ )
if (input == ‘i’ )
input++;
input++;
Eprime();
Eprime();
}
}
else
return;
}
sample input string = i+i $
First & Follow functions
Compiler design is the process of creating software that translates source code written in a high-
level programming language into machine code that can be executed by a computer. Parsing is one of
the crucial steps in the process of compiler design that involves breaking down the source code into
smaller pieces and analyzing their syntax to generate a parse tree.
FIRST and FOLLOW in Compiler Design functions are used to generate a predictive parser, which is a
type of syntax analyzer used to check whether the input source code follows the syntax of the
programming language.
s
The FIRST and FOLLOW in Compiler Design sets are used to construct a predictive parser, which can
predict the next token in the input. First tells which terminal can start production whereas the follows
tells the parser what terminal can follow a non-terminal.
By using FIRST and FOLLOW the compiler come to know in advance, “ which character of the string is
produced by applying which production rule”.
First & Follow functions
To compute the first set of a nonterminal, we must consider all possible productions of the
nonterminal symbol and compute the first set of the symbols that can appear as the first token in
the right-hand side of each production. If any of these symbols can derive the empty string (i.e., the
ε symbol), then we must also include the First set of the next symbol in the production.
RULES:-
Example:-
E->TE’
1. If A Aaα , α Є ( V U T )* then FIRST(A)= { a}
E’->+TE’/ε
2. If A ε then FIRST(A) = {ε } s
T->FT’
3. If ABC then
a)FIRST(A)=FIRST(B) , if FIRST(B) doesnot contain ε T’->*FT’/ε
b)FIRST(A)=FIRST(B) U FIRST(C), if FIRST(B) contains ε F->(E)/id
FIRST(E) ={ ( , id }
FIRST(E’) ={ +, ε }
FIRST(T) ={ ( , id }
FIRST(T’) ={ * , ε }
FIRST(F) ={ ( , id }
First & Follow functions
The Follow function of nonterminal symbols represents the set of terminal symbols that can appear
immediately after the nonterminal in any valid derivation.
To compute the Follow set of a nonterminal, we must consider all the productions in which the
nonterminal symbol appears and add the Follow set of the nonterminal’s parent to the Follow set of
the nonterminal. We must also consider the case where the nonterminal appears at the end of
production and add the Follow set of the nonterminal’s parent to the Follow set of the nonterminal.
s
RULES:-
3)FOLLOW(T) ={ +, $, ) }
• By applying rule2 (a) to ETE’, FOLLOW(E’) = FIRST(E’)
• FIRST(E’)={+, ε }-------1 , But Follow
s should not contain ε
• So substitute ε for E’ in E T E’ resulting ET
• For ET , FOLLOW(T) = FOLLOW(E) ={$, )}-----2
• Combining 1 & 2 we get FOLLOW(T) ={+, $, ) }
4)FOLLOW(T’) ={ +, $ ,)}
• By applying rule3 to TFT’ , FOLLOW(T’) = FOLLOW(T)
5)FOLLOW(F) ={ *, +, $, )}
• Apply same procedure as for 3 rd point
LLR(1) Parser
• LL(1) parser / Predictive Parser / Non-Recursive Descent Parser
2) STACK: Contains sequence of grammar symbols with $ as it's bottom marker. Initially stack
contains only $
3) PARSING TABLE: A two dimensional array M[A,a], where A is a non-terminal and a is a Terminal
s
It is a tabular implementation of the recursive descent parsing, where a stack is maintained by the
parser table than the language in which parser is written
LLR(1) Parser
Construction of LL(1) parser
STEP5:- String Validation, check whether Input string is accepted by parser or not
LLR(1) Parser
Example:-
EE+T |T
TT*F|F
F(E) |id
EE+T |T
Converted to ETE’
E’+TE’ | ε s
TT*F|F
Converted to TFT’
T’*FT’ | ε
After removing Left Recursion we get
ETE’
E’+TE’ | ε
TFT’
T’*FT’ | ε
F(E) |id
LLR(1) Parser
STEP2:- Elimination of Left Factoring
ETE’
E’+TE’ | ε
TFT’
T’*FT’ | ε
F(E) |id
s
The above grammar has no left factoring.
LLR(1) Parser
STEP3:- Calculation of First and Follow 3)FOLLOW(T) ={ +, $, ) }
ETE’ • By applying rule2 (a) to ETE’, FOLLOW(E’) =
E’+TE’ | ε FIRST(E’)
TFT’ • FIRST(E’)={+, ε }-------1 , But Follow should not
T’*FT’ | ε contain ε
F(E) |id • So substitute ε for E’ in E T E’ resulting ET
• For ET , FOLLOW(T) = FOLLOW(E) ={$, )}-----2
FIRST(E) ={ ( , id } • Combining 1 & 2 we get FOLLOW(T) ={+, $, ) }
FIRST(E’) ={ +, ε }
FIRST(T) ={ ( , id } 4)FOLLOW(T’)
s ={ +, $ ,)}
FIRST(T’) ={ * , ε } • By applying rule3 to TFT’ , FOLLOW(T’) =
FIRST(F) ={ ( , id } FOLLOW(T)
FOLLOW(E) ={ $, ) } 5)FOLLOW(F) ={ *, +, $, )}
• $ included as E is start symbol • Apply same procedure as for 3 rd point
• ) included as it’s the following symbol after E in F(E)
2) FOLLOW(E’) ={ $, ) }
• By applying rule3 to ETE’ , FOLLOW(E’) = FOLLOW(E)
LLR(1) Parser
STEP4:- Construction Parsing table
• All the Null Productions of the Grammars will go
• Rows are Non Terminals under the Follow elements and the remaining
• Columns are Terminals productions will lie under the elements of the First
set.
s
LLR(1) Parser
STEP5:- String Validation, check whether Input string is accepted by parser or not
Actions are decided by mapping Top of stack & input string in parsing Table
After parsing entire string if the stack contains only $ then we can say that the string is accepted by parser.
LLR(1) Parser
Now draw the parse tree from the Action column of the parsing table.
s
BOTTOM UP PARSING
Reduction, it is the activity in which the parser tries to match substring of input string with RHS of
production rule and replace it by corresponding LHS.
Handle , is the substring of input string that matches with RHS of production
It is the process of reducing the input string to the start symbol of the grammar i,.e, Right most
derivation in reverse order.
STEP1:
abbcde” STEP5:
S Here substring of input string ‘aABe’
STEP2: s matches with the production
“aAbcde” Here substring of input string ‘b’ S aABe , so replace substring
matches with the production Ab , ‘aABe’ with corresponding LHS S .
so replace substring ‘b’ with
corresponding LHS A .
STEP3:
“aAde” Here substring of input string ‘Abc’
matches with the production AAbc ,
so replace substring ‘Abc’ with
corresponding LHS A .
BOTTOM UP PARSING
Handle pruning, it is the process of obtaining the Right most derivation in reverse order.
Example-
S aABe SaABe
AAbc | b SaAde
B d SaAbcde
Sabbcde
Let input string is “abbcde”
Right sentential form Handle Production
abbcde b s Ab
aAbcde Abc AAbc
aAde d Bd
aABe aABe SaABe
S
s
Shift Reduce Parser-example2
Example 3 – Consider the grammar S –> ( L ) | a & L –> L , S | S
Perform Shift Reduce parsing for input string “( a, ( a, a ) ) “.
Stack Input Buffer Parsing Action
$ (a,(a,a))$ Shift
$( a,(a,a))$ Shift
$(L , ( a , a s) ) $ Shift
$ ( L, ( L ))$ Shift
s
$ ( L, ( L ) )$ Reduce S → (L)
$ ( L, S )$ Reduce L → L, S
$(L )$ Shift
$S $ Accept
LR Parser or LR(k) parser
LR parsing is one type of bottom up parsing. It is used to parse the large class of grammars.
NOTE:-
LR(0) and SLR(1) uses canonical collection of LR(0) items
LALR(1) and CLR(1) uses canonical collection of LR(1) items
LR Parser or LR(k) parser
1. Input buffer contains the string to be parsed followed by a $ Symbol.
2. A stack is used to contain a sequence of grammar symbols with a $ at the bottom of the stack.
3. Parsing table is a two dimensional array. It contains two parts:
a) Action part and
b) Go To part.
NOTE:-The LR algorithm requires stack, input, output and parsing table. In all type of LR parsing, input,
output and stack are same but parsing table is different.
s
LR Parser or LR(k) parser-- terminology
The dot(.) is useful to indicate that how much of the input has been scanned up to a given point in the
process of parsing.
Item , is any production rule with a dot(.) at the beginning of RHS of production.
LR (0) item , is any production rule with a dot(.) at some position of RHS of production.
Final Item , is any production rule with a dot(.) at the end of RHS of production.
Ex:-
S .A // item
SA.A // LR(0) item
SAA. // final item
s
Canonical collection represents the set of valid states for the LR parser
a) Canonical LR (0) collection
b) Canonical LR (1) collection
Canonical LR (0) collection helps to construct LR parsers like LR(0) & SLR parsers
To create Canonical LR (0) collection for Grammar, 3 things are required −
1. Augmented Grammar
2. Closure Function
3. goto Function
LR Parser or LR(k) parser-- terminology
Augmented grammar is a production rule where the start symbol appears only on the LHS of
productions. It is used to identify when to stop parser & declare string as accepted.
Ex:- For the grammar SAA & A SA
S’S is the augmented grammar, here S’ appears only on LHS
s
LR Parser or LR(k) parser-- Example
let
S` → S ----------------Accepted
S → AA ----------------r1
A → aA ----------------r2
A→b ----------------r3
LR Parser or LR(k) parser-- Example
s
LR Parser or LR(k) parser-- Example
Step7:-
• For every push operation the
pointer advances to next
symbolparsing the input string
NOTE:- AfterPUSH
NOTE:- After PUSH increment
increment the to
the pointer pointer
point toto point
next toinnext
symbol inputsymbol
string. in input string.
s
LR Parser or LR(k) parser-- Example
Explanation
• I0 on S is going to I1 so write it as 1.
• I0 on A is going to I2 so write it as 2.
• I2 on A is going to I5 so write it as 5.
• I3 on A is going to I6 so write it as 6.
• I0, I2and I3on a are going to I3 so write it as S3 which means that shift 3.
• I0, I2 and I3 on b are going to I4 so write it as S4 which means that shift 4.
• I4, I5 and I6 all states contains the final item because they contain • in the right most end. So rate the
production as production number.
• I1 contains the final item which drives(S` → S•), so s action {I1, $} = Accept.
• I4 contains the final item which drives A → b• and that production corresponds to the production
number 3 so write it as r3 in the entire row.
• I5 contains the final item which drives S → AA• and that production corresponds to the production
number 1 so write it as r1 in the entire row.
• I6 contains the final item which drives A → aA• and that production corresponds to the production
number 2 so write it as r2 in the entire row.
SLR parser
• SLR is simple LR.
• It is the smallest class of grammar having few number of states.
• SLR is very easy to construct and is similar to LR parsing.
• The only difference between SLR parser and LR(0) parser is that in LR(0) parsing table, we place
Reduce move only in the FOLLOW of LHS not the entire row as in LR(0).
s
SLR Parser-Example
STEP6: –
Find FOLLOW of LHS of production
FOLLOW(S)=$
FOLLOW(A)=a,b,$
let
S` → S ---------------- Accepted s
S → AA ----------------r1
A → aA ----------------r2
A→b ----------------r3
SLR Parser- Example
STEP7:-construct SLR(0) Parsing Table
• If a state is going to some other state on a terminal then it correspond to a shift move(Sn)
• If a state is going to some other state on a non- terminal then it correspond to go to move(n)
• If a state contain the final item in the particular row then write the reduce move only in the
FOLLOW of LHS.
s
LR Parser or LR(k) parser-- Example
Step7:-
• For every push operation the
parsingto the
pointer advances next input string
symbol NOTE:- AfterPUSH
NOTE:- After PUSH increment
increment the to
the pointer pointer
point toto point
next toinnext
symbol inputsymbol
string. in input string.
s
CLR Parser
• The CLR parser stands for canonical LR parser.
• It is a more powerful LR parser.
• It makes use of look-ahead symbols.
• This method uses a large set of items called LR(1) items.
s
CLR Parser--Example
The only difference between SLR parser and CLR(1) parser is that in CLR(1) parsing table, we
place Reduce move only in the look-ahead symbols
s not in the FOLLOW of LHS.
let
S` → S ----------------Accepted
S → AA ----------------r1
A → aA ----------------r2
A→b ----------------r3
CLR Parser -- Example
s
CLR Parser -- Example
Explanation
The placement of shift node in CLR (1) parsing table is same as the SLR (1) parsing table. Only difference
in the placement of reduce node.
I4 contains the final item which drives ( A → b•, a/b), so action {I4, a} = R3, action {I4, b} = R3.
I5 contains the final item which drives ( S → AA•, $), so action {I5, $} = R1.
I7 contains the final item which drives ( A → b•,$), sos action {I7, $} = R3.
I8 contains the final item which drives ( A → aA•, a/b), so action {I8, a} = R2, action {I8, b} = R2.
I9 contains the final item which drives ( A → aA•, $), so action {I9, $} = R2.
CLR Parser-- Example
Step7:-
parsing the input string
NOTE:- After PUSH increment the pointer to point to next symbol in input string.
Clearly I3 and I6 are same in their LR (0) items but differ in their look-ahead, so we can combine them
and called as I36.
I36 = { A → a•A, a/b/$ s
A → •aA, a/b/$
A → •b, a/b/$
}
The I4 and I7 are same but they differ only in their look ahead, so we can combine them and called as
I47.
I47 = {A → b•, a/b/$ }
The I8 and I9 are same but they differ only in their look ahead, so we can combine them and called as
I89.
I89 = {A → aA•, a/b/$ }
LALR Parser-- Example
STEP5:-Draw the Data Flow Diagram
s
LALR Parser--Example
The only difference between CLR(1) parser and LALR(1) parser is that in LALR(1) parsing
table, we combine the two similar states but with
s different look-ahead.
let
S` → S ----------------Accepted
S → AA ----------------r1
A → aA ----------------r2
A→b ----------------r3
LALR Parser -- Example
s
LALR Parser -- Example
Explanation
The placement of shift node in LALR (1) parsing table is same as the CLR (1) parsing table. Only
difference is in LALR parsing table construction , we merge these similar states.