0% found this document useful (0 votes)
2 views29 pages

Unit 2 Notes

Unit-II of Compiler Design focuses on Syntax Analysis, detailing the role of parsers, error handling, and various parsing techniques such as top-down and bottom-up parsing. It covers key concepts like Context-Free Grammar (CFG), ambiguity, left recursion, and the computation of FIRST and FOLLOW sets. The objective is to design parsers based on given specifications and to understand the structure and syntax of programming languages.

Uploaded by

ankithmahareddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views29 pages

Unit 2 Notes

Unit-II of Compiler Design focuses on Syntax Analysis, detailing the role of parsers, error handling, and various parsing techniques such as top-down and bottom-up parsing. It covers key concepts like Context-Free Grammar (CFG), ambiguity, left recursion, and the computation of FIRST and FOLLOW sets. The objective is to design parsers based on given specifications and to understand the structure and syntax of programming languages.

Uploaded by

ankithmahareddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Unit-2 Subject: Compiler Design

UNIT-II

Syllabus:
Syntax Analysis (Parser): The Role of the Parser, Syntax Error Handling and
Recovery, Top-Down Parsing, Bottom-Up Parsing, Simple LR Parsing, More
Powerful LR Parsing, Using Ambiguous Grammars, Parser Generator YAAC.
Objective: Design top-down and bottom-up parsers
Outcome: For a given parser specification, design top-down and bottom-up parsers.

Syntax Analysis (Parser)


Second phase of compilation process is syntax analysis. Syntax Analyzer checks for
the syntax of the program. It took the sequence of tokens from scanner and groups
them into a structure called parse tree.

Parsing: determining the syntax or the structure of a program. Parse tree specifies
the statements execution sequence.

Syntax (rules to be followed in writing programming constructs in programs) of a


programming language is usually represented by CFG (Context-free Grammar) or
BNF (Backus-Naur Form).

Important steps followed in parsing are:


a. Specification of syntax (CFG)
b. Representation of input after parsing (parse tree)
c. Parsing algorithms (top-down and bottom-up algorithms)

The Role of the Parser:


Parser determines the syntactic structure of a program by taking tokens as inputs
and verifies that the string of token names can be generated by the grammar for the
source language, then constructs a parse tree/syntax tree.

Sequence of tokens Parser parse tree

Structure of the syntax tree depends on the syntactic structure of the programming
language.

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

There are three general types of parsers for grammars:


• universal
• top-down
• bottom-up
Universal parsing methods can parse any grammar. The following two algorithms
are the best examples of universal parsers.
• the Cocke-Younger-Kasami algorithm
• Earley's algorithm
These general methods are, however, too inefficient to use in producing compilers.

Commonly used methods in compilers are either top-down or bottom-up.


As implied by their names, top-down methods build parse trees from the top
(root) to the bottom (leaves), while bottom-up methods start from the leaves
and work their way up to the root. In either case, the input to the parser is
scanned from left to right, one symbol at a time.

Syntax Error Handling and Recovery:


Most of the programming language specifications never describe about error
handling, so it is the responsibility of compiler to handle errors. Compiler must track
down all the errors done by the programmer while coding the program and handle
them.

Common programming errors are:


Lexical Errors: Misspellings of identifiers, keywords, or operators and missing
quotes around text intended as a string.
Syntactic Errors: Misplaced semicolons or extra or missing braces i.e. "{" or "}.",
missing or extra else statements.
Semantic Errors: Type mismatches between operators and operands.

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

Logical Errors: Incorrect reasoning on the part of the programmer and use of
assignment operator = instead of the comparison operator ==.

Context Free Grammar (CFG):


Syntax of every programming language construct is specified by using CFG.
Definition: CFG is a collection of four things
G =( V, T, P, S)
• V is a collection of non-terminals
• T is a collection of terminals
• P is a set of production rules of the form P → ( V u T)* (LHS → RHS)
• S is a start symbol and S is a subset of V (S belongs to V)
Non-terminal: It is a symbol in a grammar which is further expanded by production
rule. Usually represented with capital letters.
Terminal: It is a symbol in a grammar which are not expandable further. Usually
represented with small letter (including epsilon).
These are the actual tokens.

Rules used in writing a grammar:


1. Every production rule of the grammar must be in the below form
LHS → RHS
2. LHS must be a single non-terminal. Every non-terminal symbol must be an
expandable(i.e., every non-terminal must produce atleast one terminal).
3. RHS can be a combination of both non-terminal and terminal symbols.
4. Null production rule is specified as non-terminal → ε
5. One of the non-terminal symbols should be the start symbol of the grammar.

Once a grammar is constructed, we can write a language for it.


A language for the grammar is a collection of strings/words that are derived from
that grammar.

Simplification of Grammar (CFG):


A grammar can be simplified/reduced by removing useless symbols, useless
production rules from the grammar.

1. A symbol can be removed from the grammar, if it is not used is producing any
string of the language. (Useless non-terminal and terminal symbols).

S → 0T | 1T | X | 0 | 1
T → 00

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

In the above grammar X is not derived. So, S → X must be removed.

2. There should not be any production rule of the form


Non-terminal → Non-terminal ( X → Y) called as unit productions.

A→B A→a|b
B→a|b
3. If ε is not there in the language, epsilon productions can be removed
Non-terminal → ε
S → 0S | 1S | ε => S → 0S | 1S | 0 | 1

Ambiguity
A grammar is ambiguous if there is any sentence for which there exists more than
one parse tree either in Left Most Derivation or Right Most Derivation.

Any parses for an ambiguous grammar have to choose somehow which tree to
return. There are a number of solutions to this; the parser could pick one arbitrarily,
or we can provide some hints about which to choose. Best of all is to rewrite the
grammar so that it is not ambiguous.
There is no general method for removing ambiguity. Ambiguity is acceptable in
spoken languages. Ambiguous programming languages are useless unless the
ambiguity can be resolved.

Left Recursion:
If there is any non-terminal A, such that there is a derivation A ->A for some
string, then the grammar is left recursive.

Algorithm for eliminating left Recursion:

1. Group all the A productions together like this:

A  A 1 | A 2 | - - - | A m | 1 | 2 | - - - | n

Where, A is the left recursive non-terminal,  is any string of terminals and i is


any string of terminals and nonterminals that does not begin with A.
2. Replace the above A productions by the following:
A  1 AI | 2 AI | - - - | n AI
AI  1 AI | 2 AI | - - - |m AI | 
Where, AI is a new nonterminal.

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

If a grammar contains left recursive, eliminate left recursion by using the


below formulae

Remove the left recursion from the production:


A→A|

Applying the transformation yields:

A →  AI
AI →  AI | 

Remaining part after A.

Example 1:
Remove the left recursion from the productions:
E→E+T|T
T→T*F|F
Applying the transformation yields:
E → T EI T → F TI
EI → +T EI |  TI → * F TI | 
Example 2:
Remove the left recursion from the productions:
E→E+T|E–T|T
T → T * F | T/F | F
Applying the transformation yields:

E → T EI T → F TI
E → + T EI | - T EI |  TI → * F TI | /F TI | 

Left Factoring:
Left factoring is a grammar transformation that is useful for producing a grammar
suitable for predictive parsing.

When it is not clear which of two alternative productions to use to expand a non-
terminal A, we may be able to rewrite the productions to defer the decision until
we have some enough of the input to make the right choice.

Algorithm:
For all A  non-terminal, find the longest prefix  that occurs in two or more right-hand sides ofA.

If    then replace all of the A productions,


A →  I |  2 | - - - |  n | r
With
A →  AI | r
AI → I | 2| - - - | n | 

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

Where, AI is a new element of non-terminal.

Repeat until no common prefixes remain.


It is easy to remove common prefixes by left factoring, creating new non-terminal.
For example, consider:
V→|r
Change to:
V →  VI
VI →  | r | 
Example:

Eliminate Left factoring in the grammar:


S → V := int
V → alpha ‘[‘ int ’]’ | alpha
Solution:
S → V := int
V → alpha VI
VI → ’[‘ int ’] | 

FIRST and FOLLOW:

Functions, FIRST and FOLLOW, allow us to fill in the entries of a parsing table
for G, whenever possible. Sets of tokens yielded by the FOLLOW function can
also be used as synchronizing tokens during error recovery.

FIRST( ): If is any string of nonterminal symbols of a grammar, let FIRST ( )


be the set of terminals that begin the strings derived from . If ->ε, then ε is also
in FIRST( ).
(OR)
FIRST(A): If A is any nonterminal symbols of a grammar, let FIRST (A) be the set
of terminals that begin the strings derived from A. If A->ε, then ε is also in
FIRST(A).

FOLLOW (A), for nonterminals A: It is the set of terminals a that can appear
immediately to the right of A in some sentential form, that is, the set of terminals
a such that there exists a derivation of the form S=> Aaβ for some and β.
• If A can be the rightmost symbol in some sentential form, then $ is in
FOLLOW(A).

Computation of FIRST ():


To compute FIRST(X) for all grammar symbols X, apply the following rules until
no more terminals or ε can be added to any FIRST set.

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

• If X is terminal, then FIRST(X) is {X}.

• If X→ ε is production, then FIRST(X) is { ε}.

• If X is nonterminal and X→Y1 Y2……Yk is a production, then place a in


FIRST(X) if for some i, a is in FIRST(Yi),and ε is in all of FIRST(Yi),and ε is
in all of FIRST(Y1),….. FIRST(Yi-1);that is Y1………. Yi-1-> ε. if ε is in
FIRST(Yj), for all j=,2,3…….k, then add ε to FIRST(X).

For example, everything in FIRST(Y1) is surely in FIRST(X). if Y1 does not derive


ε, then we add nothing more to FIRST(X), but if Y1-> ε, then we add FIRST(Y2)
and so on.

FIRST (A) = FIRST ( I) U FIRST ( 2) U - - - U FIRST ( n)

Where, A -> 1 | 2 | - - - | n, are all the productions for A.

FIRST(A ) = FIRST (A) if A doesn’t derive ε

else FIRST(A) = (FIRST (A) - { ε }) U FIRST ( )

Computation of FOLLOW():

To compute FOLLOW(A) for all nonterminals A, apply the following rules until
nothing can be added to any FOLLOW set.

• Place $ in FOLLOW(S), where S is the start symbol and $ is input right


end marker .
• If there is a production A→ Bβ, then everything of FIRST(β) is placed in
FOLLOW(B) if β won’t derive ε.
• If there is production A→αB, or A→αBβ where FIRST (β) contains ε
(i.e.,β→ ε),then everything in FOLLOW(A)is in FOLLOW(B).

Example:

Construct the FIRST and FOLLOW for the grammar:

A -> BC | EFGH | H
B -> b
C -> c | ε
E -> e | ε
F -> CE
G -> g

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

H -> h | ε

Solution:

1. Finding first () set:

first (H) = first (h) U first (ε) = {h, ε }


first (G) = first (g) = {g}
first (C) = first (c) U first (ε) = {c, ε }
first (E) = first (e) U first (ε) = {e, ε }
first (F) = first (CE) = (first (c) - { ε }) U first (E)
= ({c, ε } { ε }) U {e, ε } = {c, e, ε }
first (B) = first (b)={b}
first (A) = first (BC) U first (EFGH) U first (H)
= first (B) U (first (E) – { ε }) U first (FGH) U {h, ε}
= { b } U { e } U (first (F) – { ε }) U first (GH) U {h, ε}
= { b } U { e } U {c, e} U first (G) U {h, ε}
= { b } U { e } U {c, e} U { g } U {h, ε}
= {b, c, e, g, h, ε }

2. Finding follow() sets:

follow(A) = {$}
follow(B) = first(C) – { ε } U follow(A) = {C, $}
follow(G) = first(H) – { ε } U follow(A)
={h, ε } – { ε } U {$} = {h, $}
follow(H) = follow(A) = {$}
follow(F) = first(GH) = {g}
follow(E) = first(FGH) U follow(F)
= ((first(F) – { ε }) U first(GH)) U follow(F)
= {c, e} U {g} U {g}

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

= {c, e, g}
follow(C) = follow(A) U (first (E) – { ε }) U follow (F)
= { $ } U { e } U {g}
= {e, g, $}
TOP-DOWN PARSING:
Top down parsing is the construction of a Parse tree by starting at start symbol and
“guessing” each derivation until we reach a string that matches input. That is,
construct tree from root to leaves.
The advantage of top down parsing in that a parser can directly be written as a
program. Table- driven top-down parsers are of minor practical relevance.

1) Backtracking
2) Recursive descent parser
3) Predictive LL(1) parser

Top-down parsing can be viewed as an attempt to find a left most derivation for an
input string. Equivalently, it can be viewed as an attempt to construct a parse tree
for the input starting from the root and creating the nodes of the parse tree in
preorder.
The special case of recursive –decent parsing, called predictive parsing, where no
backtracking is required. The general form of top-down parsing, called recursive
descent, that may involve backtracking, that is, making repeated scans of the input.
Recursive descent or predictive parsing works only on grammars where the first
terminal symbol of each sub expression provides enough information to choose
which production to use. Recursive descent parser is a top down parser involving
backtracking. It makes a repeated scan of the input.

Backtracking
It will try different production rules to obtain the input string.
Backtracking is powerful, but slower.
Backtracking is not preferred for practical compilers.

Backtracking parsers are not seen frequently, as backtracking is very needed to


parse programming language constructs.
Example: consider the grammar
S→cAd
A→ab|a

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

And the input string w=cad. To construct a parse tree for this string top-down, we
initially create a tree consisting of a single node labeled scan input pointer points to
c, the first symbol of w. we then use the first production for S to expand tree and
obtain the tree as in the below figures.

c d c d c d

a b a

Fig(a) Fig(b) Fig(c)

In Fig(a) the left most leaf, labeled c, matches the first symbol of w, so we now
advance the input pointer to a ,the second symbol of w, and consider the next leaf,
labeled A. We can then expand A using the first alternative for A to obtain the tree
in Fig (b). we now have a match for the second input symbol so we advance the input
pointer to d, the third, input symbol, and compare d against the next leaf, labeled b.
since b does not match the d ,we report failure and go back to A to see where there
is any alternative for Ac that we have not tried but that might produce a match.

In going back to A, we must reset the input pointer to position2,we now try second
alternative for A to obtain the tree of Fig(c).The leaf matches second symbol of w and
the leaf d matches the third symbol .
The left recursive grammar can cause a recursive- descent parser, even one with
backtracking, to go into an infinite loop. That is ,when we try to expand A, we may
eventually find ourselves again trying to expand A without Having consumed any
input.

Predictive Parser: Predictive parser tries to predict the construction of tree using
one or more lookahead symbols from the input string.
• Recursive Descent Parser:
• LL(1) Parser:

Recursive Descent Parser: (no backtracking but recursive)


It is a parser that uses collection of recursive procedures to parse an input string for
a given grammar.
• CFG is used to build the procedures for each nonterminal

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

• RHS of each production rule is directly converted to program code of that


procedure.
Rules for constructing RDP (recursive descent parser):
RHS of the production rule is directly converted into program code symbol by
symbol.
• If the symbol is a nonterminal, call the corresponding procedure
• If the symbol is a terminal, then it is matched with the input symbol in the
input buffer and input reader is moved to next symbol.
• If the production rule contains many alternatives, then all those alternatives
must be combined in the same procedure.
Parser must be activated from start symbol.
Example: Consider the grammar.
P->num T
T-> * num T | ε
procedure p()
{
if lookahead == num
then {
match (“num”);
T();
}
else
error();
if lookahead == $
then {
declare success;
}
else
error();
}
Note: for start symbol we have to consider $ also.
procedure T()
{
if lookahead == *
then {
match (“*”);
if lookahead == num
then {
match (“num”);
T();
}
else
error();

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

}
}

procedure match(token t)
{
if lookahead == t
lookahead = nexttoken;
else
error();
}
procedure error()
{
print(“error”);
}

Predictive Parser Introduction:


Predictive parsing is top-down parsing without backtracking or lookahead. For many
languages, make perfect guesses (avoid backtracking) by using 1-symbol look-a-head.
i.e., if:
A → I |  2 | - - - | n.

Choose correct i by looking at first symbol it derive. If  is an alternative, choose it


last.

This approach is also called as predictive parsing. There must be at most one
production in order to avoid backtracking. If there is no such production then no
parse tree exists, and an error is returned.
The crucial property is that the grammar must not be left-recursive. Predictive
parsing works well on those fragments of programming languages in which keywords
occurs frequently.
The model of predictive parser is as follows:

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

A predictive parser has:

• Stack
• Input
• Parsing Table
• Output

The input buffer consists of the string to be parsed, followed by $, a symbol used as
a right end marker to indicate the end of the input string.
The stack consists of a sequence of grammar symbols with $ on the bottom,
indicating the bottom of the stack. Initially the stack consists of the start symbol of
the grammar on the top of $.
Recursive descent and LL parsers are often called predictive parsers because they
operate by predicting the next step in a derivation.

The algorithm for the Predictive Parser Program is as follows:

Input: A string w and a parsing table M for grammar G


Output: if w is in L(g),a leftmost derivation of w; otherwise, an error indication.
Method: Initially, the parser has $S on the stack with S, the start symbol of G on
top, and w$ in the input buffer. The program that utilizes the predictive parsing table
M to produce a parse for the input is:

Set ip to point to the first symbol of w$;


repeat
let ‘x’ be the top stack symbol and ‘a’ is the symbol pointed to by ip;
if X is a terminal or $ then

if X = a then

pop X from the stack and advance ip


else error()
else /* X is a non-terminal */
if M[X, a] = X -> Y1 Y2…………Yk then
begin
pop X from the stack;
push Yk, Yk-1,………………………….Y1 onto the stack, with Y1 on top;
output the production X -> Y1 Y2 ............................................ Yk
end
else error()

until X = $ /*stack is empty*/

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

LL(1) Parser:
It is a non-recursive predictive parser and a table driven parser.
The first L stands for “Left-to-right scan of input”. The second L stands for “Left-most
derivation”. The ‘1’ stands for “1 token of look ahead”.
• No LL (1) grammar can be ambiguous or left recursive.

The following data structures are used by LL(1) parser:

• Input Buffer: it is an array used to place input string with $ at the end of string.
• Stack: It is used to hold the left sentential for. RHS part of the production rule
is pushed into the stack in reverse order(from right to left).
o Initially stack contains $ on the top of stack.
• Parsing Table: It is a 2D array which contains all nonterminals on rows side and
terminals and $ on column side.

Because of using stack, this parser is a non-recursive.

Parser Working principle:


• Initially, start symbol of the grammar is pushed onto the stack
• Parser reads the top of the stack and current input symbol, and corresponding
action is determined with the help of parsing table.
Construction of parsing table:
It depends on two important functions called FIRST() and FOLLOW().
1) Compute first() and follow() for all the nonterminals
2) Construct parsing table using first() and follow()
a. For each nonterminal, fill the columns with production rules that
generate its first’s set.
b. If the first(nonterminal) contains  then the columns with production
rule that generates its first’s and follow’s set.

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

Example for constructing LL(1) parsing table:

If the table doesn’t contains multiple entries in any column, we say that the given
grammar is LL(1).
Parsing an input string using LL(1) parser and table:
Consider an input string ‘abba’. Parsing steps are:
• Initially stack contains $ on the top.
• Place the input string in input buffer with $ at the end of string.
• Parsing table is used to decide which production rule is used to parse the
string.
• If the stack top is a nonterminal, replace it with its RHS part in reverse.
• If the stack is a terminal and if both top symbol and current input symbol is
same, the pop that terminal.
Stack Input Parsing Action
$ abba$ push S on to the stack
$S abba$ S-> aABC, so replace S with RHS, push RHS in reverse order
$CBAa abba$ a is on stack top and a is current input symbol, so pop a.
$CBA bba$ A->bb, so replace A with RHS in reverse order
$CBbb bba$ pop b
$CBb ba$ pop b
$CB a$ B->a
$Ca a$ pop a
$C $ C-> pop C
$ $ String is accepted

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

BOTTOM-UP PARSING:
1) Shift Reduce Parser (SRD)
A shift-reduce parser uses a parse stack which (conceptually) contains grammar
symbols.

During the operation of the parser, symbols from the input are shifted onto
the stack. If a prefix of the symbols on top of the stack(handle) matches the RHS
of a grammar rule, then the parser reduces the RHS of the rule to its LHS, replacing
the RHS symbols on top of the stack with the nonterminal occurring on the LHS of
the rule. This shift-reduce process continues until the parser terminates,
reporting either success or failure. It terminates with success when the input is legal
and is accepted by the parser. It terminates with failure if an error is detected in the
input.
Basic Operation that are performed in SLR parsing is
Shift Operation: Shifts current input symbol on to the stack until a reduction can
be applied.
Reduce Operation: If a specific substring(Handle) matching the body of a
production(RHS) is appeared on the top of stack, it can be replaced by the
nonterminal at the head of the production(LHS).
Accept: if the input string is parsed completely. (i.e., Start symbol is on the top of
the stack and current input symbol id $)
Error: If handle is not found on top of stack or input is left in the input buffer, it is
called as error. String is not parsed.

The key decisions during bottom-up parsing are about when to reduce and about
what production to apply
A reduction is a reverse of a step in a derivation
The goal of a bottom-up parser is to construct a reverse of right most derivation.

Handle: It is string or a substring that matches with any of the RHS part of the
production rule in the given grammar.
Handle pruning: Replacing Handle with LHS of the production rule.
A Handle is a substring that matches the body of a production and whose reduction
represents one step along the reverse of a rightmost derivation.

Right sentential form Handle Reducing Production


id*id id F->id
F*id F T->F
T*id id F->id
T*F T*F T->T*F

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

T T E->T
E

Example: E=>T=>T*F=>T*id=>F*id=>id*id
Stack Input Action
$ id*id$ shift
$id *id$ reduce by F->id
$F *id$ reduce by T->F
$T *id$ shift
$T* id$ shift
$T*id $ reduce by F->id
$T*F $ reduce by T->T*F
$T $ reduce by E->T
$E $ accept
LR PARSERS
In LR parser, "L" is for left-to-right scanning of the input and the "R" is for
constructing a rightmost derivation in reverse.

LR parsers can be constructed to recognize virtually all programming-language


constructs for which context-free grammars can be written. The class of grammars
that can be parsed using LR methods is a proper subset of the class of grammars
that can be parsed with predictive parsers. An LR parser can detect a syntactic error
as soon as it is possible to do so on a left-to- right scan of the input.

The disadvantage is that it takes too much work to construct an LR parser


by hand for a typical programming-language grammar. But there are lots of LR
parser generators available to make this task easy.
The schematic form of an LR parser is as follows:

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

The parser uses a stack to store states, input symbols and grammar symbols. The
combination of the state symbol on top of the stack and the current input symbol
are used to index the parsing table and determine the shift/reduce parsing
decision.
The parsing table consists of two parts: action part and goto part.

The action table is a table with rows indexed by states and columns indexed by
terminal symbols. When the parser is in some state s and the current lookahead
terminal is t, the action taken by the parser depends on the contents of action[s][t],
which can contain four different kinds of entries:
• Shift s': Shift state s' onto the parse stack.
• Reduce r: Reduce by rule r. This is explained in more detail below.
• Accept: Terminate the parse with success, accepting the input.
• Error: Signal a parse error

The goto table is a table with rows indexed by states and columns indexed by
nonterminal symbols. When the parser is in state ‘s’ immediately after reducing by
rule N, then the next state to enter is given by goto[s][N].

Types of LR parsers are:


1) SLR (Simple LR Parsing)
2) CLR
3) LALR

Augmented Grammar:
If G is a grammar with start symbol S, then GI, the augmented grammar for G with
a new start symbol SI and production SI ->S.

The purpose of this new start stating production is to indicate to the parser when it
should stop parsing and announce acceptance of the input i.e., acceptance occurs
when and only when the parser is about to reduce by SI->S.

LR(k) parsers are most general non-backtracking shift-reduce parsers.


Two cases of interest are k=0 and k=1. LR(1) is of practical relevance.
k’ stands for number of look-a-head that are used in making parsing decisions.
When (K) is omitted, ‘K’ is assumed to be 1.
LR(0) item: no lookahead
LR(1) item: one lookahead
LR(0) items play a key role in the SLR parser.
LR(1) items play a key role in the CLR and LALR parsers.

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

States of an LR parser
States represent set of items. An LR(0) item of G is a production of G with the dot at
some position of the body:
For A->XYZ we have following LR(0) items
A->.XYZ
A->X.YZ
A->XY.Z
A->XYZ.

Canonical LR(0) Item sets:


Constructing canonical LR(0) item sets
Augmented grammar:
• G with addition of a production: S’->S
Closure of item sets: (only for nonterminals)
If I is a set of items, closure(I) is a set of items constructed from I by the following
rules:
• Add every item in I to closure(I)
• If A->α.Bβ is in closure(I) and B->γ is a production, then add the item B->.γ to
clsoure(I).
Goto:
Goto (Ii,X) where Ii is an item set and X is a grammar symbol for [A-> α.Xβ], then
create a new item set Ij and add item [A-> αX. β] (here . is moved one symbol a head)

Example: Construction of LR(0) item sets


E’->E (augmented production rule)
E -> E + T | T
T -> T * F | F I1 : goto(I0,E)
F -> (E) | id E’->E.
E->E.+T
I0(closure({[E’->.E]})
E’->.E
E->.E+T I2:goto(I0,T)
E->.T E’->T.
T->.T*F T->T.*F I3:goto(I0,F)
T->.F T->F.
F->.(E)
F->.id
I4: goto(I0,( )
F->(.E)
I5:goto(I0,id) E->.E+T
F->id. E->.T
T->.T*F
T->.F
Dept. of CSE, MEC F->.(E) 2021-2022
F->.id
Unit-2 Subject: Compiler Design

Note: It is just an example, complete item sets are specified in the above example.

1. Simple LR parser(SLR):
Example:

Parsing Table Construction


Steps to be followed to construct parsing table:

Action Goto

State Id + * ( ) $ E T F
I0 S5 S4 1 2 3
1 S6 Accept
2 R2 S7 R2 R2
3 R4 R4 R4 R4
4 S5 S4 8 2 3

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

5 R6 R6 R6 R6
6 S5 S4 9 3
7 S5 S4 10
8 S6 S11
9 R1 S7 r1 r1
10 R3 R3 R3 R3
11 R5 R5 R5 R5

Parsing an input string


Rules to be followed in parsing are:

STACK INPUT ACTION


1. 0 id*id+id$ shift by S5
2. 0id5 *id+id$ see 5 on *
reduce by F→id
If A A→
Pop 2*| | symbols.
=2*1=2 symbols.
Pop 2 symbols off the stack
State 0 is then exposed on F.
Since goto of state 0 on F is
3, F and 3 are pushed onto
the stack
3. 0F3 *id+id$ reduce by T →F
pop 2 symbols push T. Since
goto of state 0 on T is 2, T
and 2, T and 2 are pushed
onto the stack.
4. 0T2 *id+id$ shift by S7
5. 0T2*7 id+id$ shift by S5
6. 0T2*7id5 +id$ reduce by r6 i.e.
F →id
Pop 2 symbols,
Append F,
See 7 on F, it is 10
7. 0T2*7F10 +id$ reduce by r3, i.e.,
T →T*F
Pop 6 symbols, push T
Sec 0 on T, it is 2
Push 2 on stack.

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

8. 0T2 +id$ reduce by r2, i.e.,


E T
Pop two symbols,
Push E
See 0 on E. It 10 1
Push 1 on stack
9. 0E1 +id$ shift by S6.

10. 0E1+6 id$ shift by S5

11. 0E1+6id5 $ reduce by r6 i.e.,


F id
Pop 2 symbols, push F,
see 6 on F
It is 3, push 3
12. 0E1+6F3 $ reduce by r4, i.e.,
T F
Pop 2 symbols,
Push T, see 6 on T
It is 9, push 9.
13. 0E1+6T9 $ reduce by r1, i.e.,
E E+T
Pop 6 symbols, push E
See 0 on E, it is 1
Push 1.
14. 0E1 $ Accept

2. CLR Parser
Example:

S ->CC
C ->cC/d.
1.Number the grammar productions:
1. S ->CC
2. C ->cC
3. C ->d

2.The Augmented grammar is:

SI ->S
S ->CC

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

C ->cC
C ->d.

Constructing the sets of LR(1) items:


We begin with:
SI ->.S,$ begin with look-a-head (LAH) as $.

We match the item [SI ->.S,$] with the term [A ->.B, a] In the procedure closure,
i.e.,
A = SI
=
B =S
=
a=$
Function closure tells us to add [B->.r,b] for each production B->r and terminal b in FIRST (a).
Now ->r must be S->CC, and since  is  and a is $, b may only be $. Thus,

S->.CC,$

We continue to compute the closure by adding all items [C->.r,b] for b in FIRST [C$] i.e.,
matching [S->.CC,$] against [A ->.B, a] we have, A=S,  =  B=C and a=$.

FIRST (C$) = FIRST (C)


FIRST(C) = {c,d}

We add items:
C->.cC, c
C->cC, d
C->.d, c
C->.d,d
None of the new items have a non-terminal immediately to the right of the dot,
so we have completed our first set of LR(1) items. The initial I0 items are:
I0:
SI->.S, $
S->.CC, $
C->.cC, c/d
C->.d, c/d
Now we start computing goto (I0,X) for various non-terminals i.e.,
Goto (I0,S):
I1 : SI->S.,$ -> reduced item.

Goto (I0,C): I2 :

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

S->C.C, $
C->.cC,$
C->.d,$

Goto (I0,c) : I3 :
C->c.C, c/d
C->.cC, c/d
C->.d, c/d

Goto (I0,d) : I4 :
C->d., c/d -> reduced item.

Goto (I2,C) : I5 :
S->CC., $ -> reduced item.

Goto (I2,C) : I6
C->c.C, $
C->.cC, $
C->.d, $

Goto (I2,d) : I7
C->d., $ -> reduced item.

Goto (I3,C) : I8
C->cC., c/d -> reduced item.

Goto (I3,C) : I3
C->c.C, c/d
C->.cC, c/d
C->.d, c/d

Goto (I3,d) : I4
C->d., c/d. -> reduced item.

Goto (I6,C) : I9
C->cC., $ -> reduced item.

Goto (I6,C) : I6
C->c.C, $
C->,cC, $

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

C->.d, $

Goto (I6,d) : I7
C->d., $ -> reduced item.

All are completely reduced. So now we construct the canonical LR(1) parsing
table –
Here there is no need to find FOLLOW ( ) set, as we have already taken look-a-
head for each set of productions while constructing the states.

Constructing LR(1) Parsing table:


Action goto
State C D $ S C
I0 S3 S4 1 2
1 Accept
2 S6 S7 5
3 S3 S4 8
4 R3 R3
5 R1
6 S6 S7 9
7 R3
8 R2 R2
9 R2

3. LALR Parser
Example:

1.Construct C={I0,I1,… ,In} The collection of sets of LR(1) items

2.For each core present among the set of LR (1) items, find all sets having that core,
and replace there sets by their Union# (plus them into a single term)

I0 ->same as previous
I1 -> same as previous
I2 -> same as previous
I36 – Clubbing item I3 and I6 into one I36 item.

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

C ->cC, c/d/$
C->cC, c/d/$
C->d, c/d/$
I5 ->some as previous
I47 - Clubbing item I4 and I7 into one I47 item
C->d, c/d/$
I89 - Clubbing item I8 and I9 into one I89 item
C->cC, c/d/$
LALR Parsing table construction:
Action Goto
State
c d $ S C

0 S36 S47 1 2

1 Accept

2 S36 S47 5

36 S36 S47 89

47 r3 r3

5 r1

89 r2 r2 r2

Using Ambiguous Grammars:


If the grammar is ambiguous, it creates conflicts, we cannot parse the input string.
Ambiguous grammar generates the below to kinds of conflicts while parsing:
• Shift-Reduce Conflict
• Reduce-Reduce Conflict
But for arithmetic expressions, ambiguous grammars are more compact, provides
more natural specification.
Ambiguous grammar can add any new production for special constructs. So,
ambiguous grammars must be handled carefully to obtain only one parse tree for
the specific input (i.e., precedence and associativity for arithmetic expressions)

Parser Generator YAAC:


To automate the process of parsing an input string by parser, certain automation tools for
parser generation are available.

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

YACC (Yet Another Compiler Compiler) is one such automatic tool for parser generation. It
is an UNIX based utility tool for LALR parser generator. LEX and YACC work together to
analyze the program syntactically, can report conflicts or ambiguities in the form of error
messages.

YACC Specification
%{ declarations %} required header files are declared
% token required tokens are declared

%% translation rules consist of a grammar production


translation rules and the associated semantic action.
%%

supporting C routines //main() method, yylex() function is called from main.

A translator is constructed using YACC specification. After translator is prepared, it is to


be saved with .y extension.
Compilation and execution steps:
• The UNIX system command to compile this specification is yacc filename.y
• it transforms the specification into y.tab.c and y.tab.h using LALR algorithm.
• Compile y.tab.c with ly library using CC compiler
cc y.tab.c -ly
• Compilation produces a.out file. execute it the input program to get the desired
output.
• To execute use ./a.out command

A set of productions of the below form


<head> -> <body>1 | <body>2 | . . . | <body>n
would be written in YACC as
<head> : <body>1 { <semantic action>1 }
| <body>2 { <semantic action>2 }

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

...
| <body>n { <semantic action>n }
;
In a YACC production, unquoted strings of letters and digits not declared as tokens are
considered as non-terminals.
A quoted single character, e.g., 'c', is considered as terminal symbol c, as well as the integer
code for the token represented by that character (i.e., Lex would return the character code
for 'c' to the parser, as an integer).

Second part of YACC specification contains semantic action which are a sequence of C
statements.
In this, the symbol $$ refers to the attribute value associated with the nonterminal of the
head, while $i refers to the value associated with the i th grammar symbol (terminal or
nonterminal) of the body.

In the YACC specification, for the written E -productions


E -> E + T | T
and their associated semantic actions as:
expr : expr '+' term { $$ = $1 + $3; } // Note: + is $2
| term { $$ = $1; } // Note: we can omit this
;

The third part of a YACC specification consists of supporting C-routines. A lexical analyzer
by the name yylex() must be provided.

Example:
YACC source program for a simple desk calculator that reads an arithmetic expression,
evaluates it, and then prints its numeric value.
Grammar for arithmetic expressions
E -> E + T | T
T -> T * F | F
F -> ( E ) | digit
The token digit is a single digit between 0 and 9.
%{
#include<stdio.h>
#include<ctype.h>
%}
%token DIGIT
%%
line : expr '\n' { printf("%d\n", $1); }
;
expr : expr '+' term { $$ = $1 + $3; }
| term

Dept. of CSE, MEC 2021-2022


Unit-2 Subject: Compiler Design

;
term : term '*' factor { $$ = $1 * $3; }
| factor
;
factor : '(' expr ')' { $$ = $2; }
| DIGIT
;
%%
void main()
{
yyparse();
}
void yyerror(char *s)
{
fprintf(stderr, "\nerror\n");
}
int yylex() {
int c;
c = getchar();
if (isdigit(c)) {
yylval = c-'0';
return DIGIT;
}
return c;
}

Dept. of CSE, MEC 2021-2022

You might also like