0% found this document useful (0 votes)
15 views

CD Unit2

Uploaded by

hemanthavanapu13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

CD Unit2

Uploaded by

hemanthavanapu13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

Unit 2

Syntax Analysis
• The role of the parser
• Context-Free Grammars
• Derivations
• Parse Trees
• Ambiguity
• Left Recursion
• Left Factoring
• The second phase in compilation of a source program after lexical
analysis is syntax analysis.
• While lexical analyzer reads an input source program and produces as
output a sequence of tokens.
• syntax analyzer takes the tokens from lexical analyzer and groups
them some programming structure called “syntax tree or parse tree”
• If the syntax is not recognized then the syntax error will be generated.
• Example : consider the source program statement (a+b)*c;
The lexical analyzer reads the above statement and broken it into
the set of tokens like
a identifier
+ operator
b identifier
* operator
c identifier
• Now the syntax analyzer collect the above tokens from lexical analyzer
and arrange them into a structure called parse tree or syntax tree.
The role of parser
• The parser or syntax analyzer takes the token (Such as
Keywords, Identifier etc.) from the scanner (or) Lexical
analyzer.
• It then verifies whether the input String can be generated
from the grammar of the source language.
• The parser should also report syntactical errors in a manner
that is easily understand by the user.
• These errors are recovered by error handler.
• Parsers are classified into three types
1.Universal parser –These technique process the capability
of parsing any type of grammar.
2.Top-down parsing
3.Bottom-up parsing
Context-Free Grammars
• Context free grammar is a formal grammar which is used to generate
all possible strings in a given formal language.
• Context free grammar G can be defined by four tuples as:
G= (V, T, P, S)
Where,
• G describes the grammar
• T describes a finite set of terminal symbols. (lowercase letters,
operators symbols)
• V describes a finite set of non-terminal symbols (uppercase letters)
• P describes a set of production rules
• S is the start symbol.
• In CFG, the start symbol is used to derive the string.
• You can derive the string by repeatedly replacing a non-terminal
by the right hand side of the production, until all non-terminal
have been replaced by terminal symbols.
Production rules:
S → aSa
S → bSb
S→c
• Now check that abbcbba string can be derived from the given
CFG
S ⇒ aSa
S ⇒ abSba
S ⇒ abbSbba
S ⇒ abbcbba
Derivations
• Derivation is a sequence of production rules. It is used to get
the input string through these production rules. During parsing
we have to take two decisions.
These are as follows:
• We have to decide the non-terminal which is to be replaced.
• We have to decide the production rule by which the non-
terminal will be replaced.
• We have two options to decide which non-terminal to be
replaced with production rule.
Left-most Derivation
• In the left most derivation, the input is scanned and replaced with
the production rule from left to right. So in left most derivatives we
read the input string from left to right.
Example:
• Production rules:
S=S+S
S=S-S
S = a | b |c
Input:
a-b+c
The left-most derivation is:
S=S+S
S=S-S+S
S=a-S+S
S=a-b+S
S=a-b+c
Right-most Derivation
• In the right most derivation, the input is scanned and replaced
with the production rule from right to left. So in right most
derivatives we read the input string from right to left.
The right-most derivation is:
S=S-S
S=S-S+S
S=S-S+c
S=S-b+c
S=a-b+c
Parse Trees
• Parse tree is the graphical representation of symbol. The
symbol can be terminal or non-terminal.
• In parsing, the string is derived using the start symbol. The root
of the parse tree is that start symbol.
• It is the graphical representation of symbol that can be terminals
or non-terminals.
• Parse tree follows the precedence of operators. The deepest
sub-tree traversed first. So, the operator in the parent node has
less precedence over the operator in the sub-tree.
The parse tree follows these points:
• All leaf nodes have to be terminals.
• All interior nodes have to be non-terminals.
• In-order traversal gives original input string.
Production rules:
S= S + S | S * S
S= a|b|c
Input:
a*b+c
Ambiguity
• A context free grammar is said to be ambiguous if a String in the
language of the grammar can be represented by two or more
different parse trees.
• A grammar which has more than one left most derivation or right
most derivation for the same input String is called Ambiguous
grammar.
• For Example:
• Let us consider this grammar: E -> E+E|id We can create a 2 parse tree
from this grammar to obtain a string id+id+id. The following are the 2
parse trees generated by left-most derivation:

• Both the above parse trees are derived from the same grammar rules but both
parse trees are different. Hence the grammar is ambiguous.
Let us now consider the following grammar:
E -> I
E -> E + E
E -> E * E
E -> (E)
I -> ε | 0 | 1 | … | 9
• From the above grammar String 3*2+5 can be derived in 2 ways:
I) First leftmost derivation II) Second leftmost derivation
E=>E*E E=>E+E
=>I*E =>E*E+E
=>3*E+E =>I*E+E
=>3*I+E =>3*E+E
=>3*2+E =>3*I+E
=>3*2+I =>3*2+I
=>3*2+5 =>3*2+5
Left Recursion
• A production of grammar is said to have left recursion if the left most variable of
its RHS is same as variable of its LHS
• A grammar which have left recursive productions is called left recursive grammar
A->Aα| β
• If left recursion is present in the grammar then the top-down parser can enter
into infinity loop.
A()
{
A();
α
}
• Here the problem is that top-down parsing cannot handle the
grammar which contains the left recursion productions.so we have to
eliminate the left recursion productions.
• Left recursion is eliminated by converting grammar into right
recursive grammar
• Example: Eliminate the left recursion for the following grammar
E->E+T|T
T->T*F|F
F->(E)|id
Left Factoring
• When 2 or more productions starting with the same set of symbol so
we need grammar transformation called as left factoring
• Confusion to expand the non terminal A like which production to be
used
• Initially A is expanded till alpha., based on the remaining symbols A’
can replace beta 1 or beta 2 so the confusion gets eliminated
Example: E-> T+E|T
Sol:
E->TE’
E’->+E| ε
Top-Down Parsing
• The process of construction of parse tree starting from root and
proceed to children is called top down parsing
i.e., starting from start symbol of grammar and reaching the i/p
string.
The process of constructing the parse tree which starts from the root
and goes down to the leaf is Top-Down Parsing.
• Top-Down Parsers constructs from the Grammar which is free from
ambiguity and left recursion.
• Top-Down Parsers uses leftmost derivation to construct a parse tree.
• It does not allow Grammar With Common Prefixes.
Classification of parsing technique:

Parser

Top down parser Bottom up parser

Backtracking Predictive Parser Operator Shift reduce LR Parser


Precedence parser
parser

Recursive Decent Non-recursive SLR LALR CLR


Parser Decent-parser
LL(1)
Working of Top Down Parser :
Let’s consider an example where grammar is given and you need to
construct a parse tree by using top down parser technique.
Example
S -> aABe
A -> Abc | b
B -> d
Now, let’s consider the input to read and to construct a parse tree with
top down approach.
Input
abbcde
• Now, you will see that how top down approach works. Here, you will see
how you can generate a input string from the grammar for top down
approach.
• First, you can start with S -> a A B e and then you will see input string a in
the beginning and e in the end.
• Now, you need to generate abbcde .
• Expand A-> Abc and Expand B-> d.
• Now, You have string like aAbcde and your input string is abbcde.
• Expand A->b.
• Final string, you will get abbcde.
Given below is the Diagram explanation for constructing top down parse
tree. You can see clearly in the diagram how you can generate the input
string using grammar with top down approach.
Classification of top down parsing
• With backtracking - brute force technique
• Without backtracking - predictive parser(parse the tree
without using the backtracking concept)
It also uses LL(1) grammar and recursive descent parser
Recursive Descent Parser
The parser is constructed from top to down and i/p is read from left
to right
• Top down parser is of 2 types
1. Backtracking
2. Predictive Parser
• Top down parsing is a recursive descent parser,.it may be backtracking or
non backtracking
• Non backtracking comes under the predictive parser
• Predictive parser is an example of LL parser
Backtracking
• The parse tree is started from root node and i/p string is matched
against the production rules for replacing them
• Need to backtrack until the result of the tree is matching to the i/p
string
Limitations
If the given grammar has more number of alternatives then the cost
of backtracking is high
Example:
Consider the Grammar Whose productions are,
S->aAd|aB
A->b|c
B->ccd|ddc
Where S is the goal or start symbol
The following are the sequence of syntax trees generated during the parse
of the String accd .
Step1:Start the goal symbol of the tree S
Step2:Seclect the first production for S

a A d

Step3: Step4:
S S
a A d a A d
b c
Step5: Step6:
S S

a B a B

c c d

• if we make a sequence of erroneous and subsequently


discover a mismatch ,we may have to undo all these erroneous
expansions .for example ,the entries made in the symbol table
have to be removed.
 To overcome this problem ,it is reasonable a top down
parsing that do no back tracking .predictive parsers are the
top-down parsers without backtracking.
Recursive Descent Parsing
• It is a top down parser without backtracking
Steps to construct parse tree
1. Each non terminal acts as a procedure (RDP is implemented in terms
of procedures).
2. If it is a terminal then need to compare the terminal in the
production with the i/p string ( main aim is to from the given grammar
need to generate some i/p string)
If the terminal is matched with the i/p symbol then increment the i/p
pointer ( whenever there is a matching you have to perform
comparison with the next character in the i/p symbol
Consider the grammar
E->E+T|T
T->T*F|F
F->(E)|id
3. Normally a non terminal contains more than one production.,all the
productions are written in single procedure. So how many no.of non
terminals we have that many number of procedures are used in RDP
Example: Let us consider the grammar
E->TE’
E’->+TE’| ε
T->FT’
T’->*FT’| ε
F->(E)|id
E->TE’
• Procedure starts from the start symbol.
• If it is non terminal again call the procedure( a fn calling itself/with in the
fn calling a fn)
• E’ dashes are represented like EPRIME()
E’->+TE’| ε
• If it is a terminal you have to perform comparison operation,.whenever
the i/p symbol matches the i/p symbol so here + so increment the i/p
pointer i++ or advance(). Then it is having non terminal so call remaining
fn’s. this entire thing comes under if condition.
• As it is also having ε we are going to use else to stop the process. So else
is going to perform empty return
• Else indicates stop the process
• Whenever the terminal comes just write if condition increment the i/p pointer
• Consider an example
id+id$ __ always a string ends with $
Steps
1. First the production starts from start symbol E.
2. So in procedure E call the function T
3. Whenever T is called control goes to procedure T
4. In this T we are calling function F
5. Now control goes to procedure F
6. Whenever there is a match we are incrementing i/p pointer moves to +.,now this F is completed and
we need to move the next calling fn TPRIME()
7. Now move to TPRIME() procedure its not matching it going to provide empty return
8. Now move to next stmt EPRIME()
9. When reaches ID stop the procedure
Example: Let us consider the grammar

E->TE’ Procedure TPRIME();


E’->+TE’| ε if input==‘*’ then
T->FT’ begin
T’->*FT’| ε input++
F->(E)|id F();
Procedure E(); TPRIME();
Begin end
T(); Procedure F();
EPRIME(); if input==‘id’ then
end input++
Procedure EPRIME(); else if input==‘(‘ then
begin begin
if input==‘+’ then input++;
input++; E();
T(); if input==‘)’ then
EPRIME(); input++
end else ERROR
Procedure T(); end
begin else
F(); ERROR
TPRIME();
end
LL(1) Grammars
• Here the 1st L represents that the scanning of the Input will be done from
the Left to Right manner and
• the second L shows that in this parsing technique, we are going to use the
Left most Derivation Tree. (i.e. producing the left most derivation)
• And finally, the 1 represents the number of look-ahead, which means how
many symbols are you going to see when you want to make a decision.
i.e., in the process of derivation how many letters we have to use for
deriving a string
• The main purpose of this grammar is
if a non terminal is having more no. of productions which production
we have to use for deriving a string
 for this reason we need to construct LL(1) parsing table
• LL(1) Grammar (or) Non Recursive Predictive parser (or)
Predictive parser
Construction of LL(1) parser:
Steps:
1.Elimination of left recursion
2.Elimination of left factoring
3.Computation of FIRST and FOLLOW (finding first and follow for all
non terminal symbols)
4.Construction of parsing table
5.Check whether the input String is accepted by the grammar or not.
Given
E --> E+T/T
T --> T*F/F
F --> (E)| id
Eliminating left recursion
E --> TE'
E' --> +TE' | ε
T --> FT'
T' --> *FT' | ε
F --> (E) | id
 Production symbols having common symbol from the left end,.so here
there is no left factoring
Step3:Calculate first and follow table
E --> TE'
E' --> +TE' | ε
T --> FT'
T' --> *FT' | ε
F --> (E) | id
Construct FIRST and FOLLOW table for this grammar
S->ABCDE First(S) ={a,b,c} Follow(S)={$}
A->a| ε First(A)={a, ε} Follow(A)={b,c}
B->b| ε First(B) ={b, ε} Follw(B)={c}
C->c First(C) ={c} Follow(C)={d,e, $}
D->d| ε First(D)={d, ε} Follow(D)={e, $}
E->e| ε First(E) ={e, ε} Follow(E)= Follow(S)={$}
All the symbols in the rhs productions are deriving ε so we need to include ε
Construct FIRST and FOLLOW table for this grammar
S->ACB|CbB|Ba First(S) ={d,g,h, ε,b,a} Follow(S)={$}
A->da|BC First(A) ={d,g,h, ε} Follow(A)=First(CB)={h,g,$}
B->g| ε First(B) ={g, ε} Follow(B)={$,g,a,h}
C->h| ε First(C) ={h, ε} Follow(C)={g,$,b,h}
• First and Follow
First we need to find first of non terminal E., i.e., first(E)
If A derives epsilon we need to find first(b)
Epsilon will be added in first
Follow:
1. need to add dollar to the follow of start symbol of grammar
2. It is a production,.alpha and beta are string of terminals and non
terminals,. If we want to find follow of NT B then we need to consider
beta..
follow(B) include all symbols of first of beta except epsilon
4. Construction of LL(1) table/predictive parsing table
Rows= non terminals
Columns=terminals
1. If A-> α we should consider the row for the nonterminal A and columns are
first(α).
So in row A and first(α) we need to substitute this production A-> α
2. If α is having epsilon then we need to substitute this production A-> α
in row A and columns are the follow(A)
(i.e., If the production is in the form of A-> ε then calculate Follow(A) add to
M[A,a])
M[A,a]
M represents the parsing table
A -parsing table of row A
a- parsing table of column a it is in first of right hand side production
Non + * ( ) id $
terminal\i
nput
Symbol
E E->TE’ E->TE’

E’ E’->+TE’ E’->ε E’->ε

T T->FT’ T->FT’

T’ T’->ε T’->*FT’ T’->ε T’->ε

F F->(E) F->id
Rules:
1.If the production is in the form of A-> α then we have to calculate
First(α) add A-> α to M[A,a]
2.If the production is in the form of A-> ε then calculate Follow(A) add to
M[A,a]
3.The remaining entries of the parsing table are filled with errors.
5.To check whether the String is accepted or not:
• We need to parse the i/p string w=id*id+id
• So we have to consider stack, input and output
• Stack is a memory unit
• Initial configuration of stack is $ and E
$ is the bottom of the stack
We need to load the start symbol of the grammar on to the stack
E is the top of the stack which is the start symbol of the grammar E
• In i/p buffer we need to store the string to be parsed .,and this i/p string should be
followed by $
• If the non terminal is on top of the stack and i/p pointer points to any i/p symbol then
we need to refer the parsing table
• If terminal symbol is on top of the stack that must be matched with the current i/p
symbol then we can remove the terminal symbol on top of the stack and remove the
i/p symbol from the i/p buffer(i.e., we have to advance the i/p pointer to the next i/p
symbol)
• As E in the top of the stack,.that should be replaced by symbols on the
right hand side production in reverse order so T must be top on stack
• If epsilon is pushed no need to write that
• Generate the parse tree
• If stack is empty input buffer is also empty
Stack Input Action
$E id+id*id$ E->TE’
$E’T id+id*id$ T->FT’
$E’T’F id+id*id$ F->id
$E’T’id id+id*id$ Pop and remove id
$E’T’ +id*id$ T’->ε
$E’ +id*id$ E’->+TE’
$E’T+ +id*id$ Pop and remove +
$E’T id*id$ T->FT’
$E’T’F id*id$ F->id
$E’T’id id*id$ Pop and remove id
$E’T’ *id$ T->*FT’
$E’T’F* *id$ Pop and remove *
$E’T’F id$ F->id
$E’T’id id$ Pop and remove id
$E’T’ $ T’->ε
$E’ $ E’->ε
$ $ Pop and remove $
Not LL(1)
• A grammar is said to be LL(1) if its predictive parsing table has no
multiple entries
• In the previous table all entries are single entries and some blanks
occurred, this grammar is said to be LL(1) grammar
• If the predictive parsing table has multiple entries it is called as Not
LL(1) grammar
• So here M[S’,e] are having multiple entries it is Not LL(1) grammar
Error Recovery in Predictive parsing:
• An Error is detected during predictive parsing when the terminal on the
stack does not match the next input symbol.
• When a non terminal ‘A’ is on the top of the stack and ‘a’ is the next input
symbol and the parsing table entry M[A,a] is empty ,indicating an error.
• The purpose of reducing number of errors in the parser table is called error
recovery.
• LL(1) parser uses “panic mode “error recovery
Panic mode Error recovery:
• This is based on the idea of skipping the input symbol until a token in a set
of Synchronizing tokens appear.
• The Synchronizing sets should be chosen so that the parser recovers quickly
from errors
Rules:
1.If the parser looks up an entry M[A,b] for which it is blank ,then the input
symbol ‘b’ id skipped.
2.If the entry is “Synch”,then the non-terminal on top of the stack is popped
so that the parsing can continue.
3.If a token on top of the stack does not match the input symbol ,then the
token is popped out of the stack.
Example : consider the parsing table for the grammar
E->TE’ Follow(E)= Follow(E’)={$,)}
E’->+TE’| ε
T->FT’ Follow(T)= Follow(T’)={+,$,)}
T’->*FT’| ε
F->(E)|id Follow(F)={+,*,$,)}
using FOLLOW symbols on non-terminal as Synchronizing Symbols we have
id + * ( ) $
E E->TE’ E->TE’ synch synch

E’ E->TE’ E’->ε E’->ε


T T->FT’ synch T->FT’ synch synch
T’ T’->ε T’->*FT’ T’->ε T’->ε
F F->id synch synch F->(E) synch synch
• Thus the error have been recovered from the given input String .
• But the panic mode recovery does not give the error messages .
• So, informative error messages have to be supplied by the compiler
designer.
Pharse-Level Recovery
1.In this pharse level recovery the blank entries in the predictive
parsing table are filled with pointers to error routines.
2.These error routines may change ,insert or delete symbols on the
input and issue appropriate error messages.

You might also like