0% found this document useful (0 votes)
34 views

Chapter 3 Syntax Analysis Full Reading Material

Compiler language

Uploaded by

Solomonic Eman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Chapter 3 Syntax Analysis Full Reading Material

Compiler language

Uploaded by

Solomonic Eman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Chapter 3: Syntax Analysis

3.1 Introduction
The tokens generated by lexical analyzer are accepted by the next phase of compiler i.e. syntax
analyzer. The syntax analyzer or parser comes after lexical analysis. This is one of the important
components of front end of the compiler. In this chapter:
▪ we first understand the concept of parsing when we will concentrate on a parsing
technique called top down parsing.
▪ We will discuss various methods such as:
o Backtracking
o LL (1) parsing
o Recursive descent parsing and
o Predictive parsing techniques
▪ We will also discuss the processing steps required for predictive parsing.
3.2 Concept of Syntax Analysis
The syntax analysis is the second phase in compilation. The syntax analyzer (parser) basically
checks for the syntax of the language. The syntax analyzer takes the tokens from the lexical
analyzer and groups them in such as a way that some programming structure(syntax) can be
recognized. After grouping the token, if any syntax cannot be recognized then syntactic error will
be generated. This overall process is called syntax checking of the language.
Definition of Parser: A parsing or syntax analysis is a process which takes the input string w and
produces either parse tree (syntactic structure) or generates the syntactic errors.
For example: a = b + 10;
The above programming statement is the given to lexical analyzer. The lexical analyzer will
divide it into group of tokens. The syntax analyzer takes the tokens as input and generates a tree
like structure called parse tree. The parse tree drawn above is for some programming statement.
It shows how the statement gets parsed according to their syntactic specification.

Fig. 3.1 Parse Tree for a = b + 10

1
Compiled by Fikru T.
3.2.1 Basic Issues in Parsing
There are two important issues in parsing
i. Specification of syntax
ii. Representation of input after parsing
• A very important issue in parsing is specification of syntax in programming language.
Specification of syntax means how to write any programming language statements. There
some characteristics of specification of syntax-
i. Specification should be precise and unambiguous
ii. Specification should be in detail; it should cover all the details of the programming language.
iii. Specification should be complete.
Such specification is called “Context free Grammar” (CFG)
• Another important issue in parsing is representation of the input after parsing. This is
important because all the subsequent phases of compiler take the information from the parse
tree being generated. This is important because the information suggested by any input
programming statement should not be differed after building the syntax tree for it.
• Lastly the most crucial issue is the parsing algorithm based on which we get the parse tree
for the given input. There are two different approaches to parsing
o Top down parsing
o Bottom up parsing
• The parsing algorithms are based on these approaches. These algorithms deal with following
issues:
o How these algorithms work?
o Are they efficient in nature?
o What are their merits and limitations?
o What kind of input they require?
Keeping on thing in mind that we are now interested in simply syntax of the language we are
not interested in meaning right now. Hence, we will do only syntax analysis at this phase. Checking
of the meaning (Semantic) from syntactically correct input will be studied in the next phase of
compilation. The modified view of front end is as shown in Fig. 3.2.

2
Compiled by Fikru T.
3.2.2 Role of Parser
In the process of compilation, the parser and lexical analyzer work together. That means, when
parser requires string tokens it invokes lexical analyzer. In turn, the lexical analyzer supplies
tokens to syntax analyzer (parser).

The parser collects sufficient number of tokens and builds a parse tree. Thus, by building the
parse tree, parser smartly finds the syntactical errors if any. It is also necessary that the parser
should recover from commonly occurring errors so that remaining task of process the input can be
continued.
Why lexical and Syntax Analyzer are separated?
The lexical analyzer scans the input program and collects the tokens from it. Parser builds a parse
tree using these tokens. There are two important activities and these activities are independently
carried out by these two phases of compiler. Separating out these two phases has two advantages:
1. It accelerates the process of compilation

3
Compiled by Fikru T.
2. The error in the source input can be identified precisely
Now we will focus on the first issue of parser i.e. specification of input. As we know that
specification of input can be done by using “Context Free Grammar”.
3.3 Context Free Grammar
The context free grammar G is a collection of following things:
1. V is set of non-terminals
2. T is a set of terminals
3. S is a start symbol
4. P is a set of production rule
Thus G can be represented as G=(V, T, S, P)
The production rules are given in the following form:
Non-terminal (V U T) *
Example 1: let the language L= anbn where n>1
Let G=(V, T, S, P), where V={S}, T = {a,b} and S is a start symbol then, given the production
rules.
P={
S aSb
S ab
}

The production rules actually define the language anbn.


The non-terminal symbol occurs at the left-hand side (LHS). These are the symbols which need
to be expanded. The terminals symbols are nothing but the tokens used in the language. Thus, any
language construct can be defined by the context free grammar. For example, if we want to define
the declarative sentence using context free grammar then it could be as follow:
State Type List Terminator
Type int|float
List List, id
List id
Terminator ;
Using the above rules, we can derive:

4
Compiled by Fikru T.
Hence, int, id, id, id; can be defined by means of the above context free grammars.
Following rules to be followed while writing CFG.
1. A single non-terminal should be at LHS.
2. The rule should be always in the form of LHS → RHS where RHS may be the
combination of non-terminal and terminal symbols.
3. The NULL derivation can be specified as NT → ε
4. One of the non-terminals should be start symbol and congenitally we should write the
rules for this non-terminal.
3.3.1 Derivation and Parse Tree
Derivation from S means generation of string W from S. For constructing derivation two things
are important.
i. Choice of Non-terminal from several others.
ii. Choice of rule from production rules for corresponding non-terminal
Definition of Derivation Tree
Let G = (V, T, P, S) be a Context Free Grammar.
The derivation tree is a tree a which can be constructed by following properties:
i. The root has label S.
ii. Every vertex can be derived from (V U T U ε).

5
Compiled by Fikru T.
iii. If there exists a vertex A with children R1, R2, …., Rn then there should be production A →
R1, R2, …, Rn.
iv. The leaf nodes are from set T and interior nodes are from set V.
Instead of choosing the arbitrary non-terminal one can choose:
i. Either leftmost non-terminal in a sentential form then it is called leftmost derivation.
ii. Or rightmost non-terminal in a sentential form, then it is called rightmost derivation.
Example 1: Let G be a context free grammar for which the production rules are given as
below:

Derive the string aaabbabbba using above grammar.


Solution: leftmost derivation rightmost derivation

The parse tree for the above derivation is as given below:

Leftmost derivation Rightmost derivation

6
Compiled by Fikru T.
The structure shown above is called parse tree.
Example 2: Design a derivation tree for the following grammar-

Also obtain the leftmost and rightmost derivation for the string ‘aaabaab’ using above grammar.
Solution: Leftmost derivation Rightmost derivation

The derivation tree can be drawn as follows:

(a) Derivation Tree for Leftmost Derivation (b)Derivation Tree for Rightmost Derivation
Example: Consider the grammar given below:

Obtain leftmost and rightmost derivation for the string a + b * a + b

7
Compiled by Fikru T.
Solution:

(a) Derivation Tree for Leftmost Derivation (b) Derivation Tree for Rightmost Derivation
3.3.2 Ambiguous Grammar
A grammar G is said to be ambiguous if it generates more than one parse tree for a sentences of
a language L(G).
Example 1:
Then for id+id*id

Fig. 3.6 Ambiguous Grammar


Example 2: Find whether the following grammar is ambiguous or not.

Solution: we will construct parse tree for the string aab

8
Compiled by Fikru T.
There are two different parse trees for deriving string aab. Hence the above given grammar is
an ambiguous grammar.
Example 2: Show that following grammar is ambiguous or not.

Fig. 3.7 Ambiguous Grammar for Example 2.


3.4 Parsing Techniques
There are two parsing techniques. These are:
1. Top-down parsing
2. Bottom-up parsing
These parsing techniques work on the following principles
i. The parser scans input string from left to right and identifies that the derivation is leftmost
or rightmost.
ii. The parser makes use of production rules for choosing the appropriate derivation.
The different parsing techniques use different approaches in selecting the appropriate rules for
derivation. And finally, a parse tree is constructed. When the parse tree can be constructed from
root and expanded to leaves the such type of parser is called top-down parser. The name itself
tells us that the parser tree can be built from top to bottom. When the parser tree can be constructed
form leaves to root, then such type of parser is called bottom-up parser. Thus, the parser tree is
built in bottom-up manner. The next figure shows types of parsers.

9
Compiled by Fikru T.
Types of Parsers

Top – Down Parser Bottom – Up Parser

Backtracking Shift Reduce Parser LR Parser


Predictive Parser

Recursive LL (1) Parser SLR Parser LALR Parser LR Parser


Descent Parser

Fig. 3.8 Parsing Techniques


3.4.1 Top-Down Parser
In top-down parsing the parse tree is generated from top to bottom (from root to leaves). The
derivation terminates when the required input string terminates. The process of construction of
parse tree starting from root and proceed to children is called top-down parsing (TDP). That
means starting from start symbol of the given grammar and reaching the input string.
Example: Construct parse tree for the string w using the given grammar.

The derivation terminates when the required input string terminates. TDP internally uses leftmost
derivation. TDP is constructed for the grammar if it is free from ambiguity and left recursion. The
leftmost derivation matches this requirement. The main task in top-down parsing is to find the
appropriate production rule in order to produce the correct input string.
Example: consider the grammar

10
Compiled by Fikru T.
Consider the input string xyz is as shown below:

Now we will construct the parse tree for above grammar deriving the given input string. And for
this derivation we will make use of top-down approach.
Step 1:

The first leftmost leaf of the parse tree matches with the first input symbol. Hence, we will
advance the input pointer. The next leaf node is P. We have to expand the node P. After expansion
we get the node Y which matches with the input symbol Y.
Step 2:

Now the next node is w which is not matching with the input symbol. Hence, we go back to see
whether there is another alternative of P. Another alternative for P is Y which matches with the
current input symbol. And thus, we could produce a successful parse tree for the given input.
Step 3:

We halt and declare that the parsing is completed successfully. In top-down parsing selection of
proper rule is very important task. And this selection is based on trial-and-error technique. That
means we have to select a particular rule and if it is not producing the correct input string then we
need to backtrack and then we have to try another production. This process has to be repeated
until we get the correct input string. After trying all the productions if we found every production
unsuitable for the string match then in that case the parse tree cannot be built.
Problems with Top-Down Parsing
There are certain problems in top-down parsing. In order to implement the parsing, we need to
eliminate these problems. Let us discuss these problems and how to remove them.
1) Backtracking

11
Compiled by Fikru T.
Backtracking is a technique in which for expansion of no-terminal symbol we choose one
alternative and if some mismatch occurs then we try another alternative if any.
For example: then

If for a non-terminal there are multiple production rules beginning with the same input symbol
to get the correct derivation, we need to try all these possibilities. Secondly, in backtracking we
need to move some levels upward in order to check possibilities. This increases lot of overhead in
implementation of parsing. And hence it becomes necessary to eliminate the backtracking by
modifying the grammar.
2) Left Recursion:
Left recursive grammar is a grammar which is a given below:

Here means deriving the input in one or more steps. The A can be non-terminal
and α denotes some input string. If left recursion is present in the grammar then it creates serious
problem. Because of the left recursion the top-down parser can enter infinite loop. This is as shown
in the following Figure below.

The expansion of A causes further expansion of A and due to generation of A, Aα, Aαα, Aααα,
…, the input pointer will not be advanced. This causes major problem in top-down parsing and

12
Compiled by Fikru T.
therefore elimination of left recursion is a must. To eliminate left recursion, we need to modify the
grammar. Let, G be a context free grammar having a production rule with left recursion.

Then eliminate left recursion by re-writing the production rule as:

Thus, a new symbol A' is introduced. We can also verify whether the modified grammar is
equivalent to original or not.

For example: Consider the grammar


Then using equation 2 above, we can say

Then the rule becomes

Similarly, for the rule

We can eliminate left recursions:

13
Compiled by Fikru T.
The grammar for arithmetic expressions can be equivalently written as:

3) Left Factoring
If the grammar is left factored then it becomes suitable for the use. Basically, left factoring is
used when it is not clear that which the two alternatives are used to expand the non-terminals. By
the factoring we may be able to re-write the production in which the decision can be defined until
enough of the input is seen to make the first choice.
For example: consider the following expressions:

The left factored grammar becomes:

4) Ambiguity
The ambiguous grammar is not desirable in top-down parsing. Hence, we need to remove the
ambiguity from the grammar if it is present.
Example: is an ambiguous grammar.
We will design the parse tree for id+id*id as follows:

Fig.3.10 Ambiguous Grammar

14
Compiled by Fikru T.
For removing the ambiguity, we will apply one rule. If the grammar has left associative operator
(such as +, -, *, /) then induce the left recursion and if the grammar has right associative operator
(exponential operator) then induce the right recursion. The unambiguous grammar is:

Note one thing that the grammar is unambiguous but it is left recursive and elimination of such
left recursion is again must.

Fig.3.11 Types of Top-Down Parsers


There are two types by which the top-down parsing can be performed.
1. Backtracking
2. Predictive Parsing
3.4.1.1 Backtracking
Will try different production rules to find the match for the input string by backtracking each
time. The backtracking is a powerful than predictive parsing. But this technique a backtracking
parser is slower and it requires exponential time in general. Hence, backtracking is not preferred
for practical compilers.
3.4.1.2 Predictive Parser
As the name indicates predictive parser tries to predict the next construction using one or more
lookahead symbols from input string. There are two types of predictive parser:
I. Recursive Descent Parser

15
Compiled by Fikru T.
II. LL (1) Parser
1. Recursive Descent Parser
A parser that uses collection of recursive procedures for parsing the given input string is called
Recursive Descent (RD) Parser). This type of parser the CFG is used to build the recursive
routines. The RHS of the production rule is directly converted to a program. For each non-terminal
a separate procedure is written and body of the procedure (code) is RHS of the corresponding non-
terminal.
Basic steps for construction of RD Parser
The RHS of the rule is directly converted into program code symbol by symbol.
1. If the input symbol is non-terminal, then a call to the procedure corresponding to the no
terminal is made.
2. If the input symbol is terminal, then it is matched with the lookahead from input.
3. If the production rule has many alternates, then all these alternates have to be combined
into a single body of procedure.
4. The parser should be activated by a procedure a corresponding to the start symbol.
Example: Consider the grammar having start symbol S.

To construct a parse top-down for the input string w=cad, begin with a tree consisting of a
single node labeled S and the input ptr. pointing to c, the first symbol of w. S has only one
production, so we use it to expand S and obtain the tree as in figure.

So leftmost leaf labeled c matches the first symbol of the input w, so we advance the input
ptr to a, the second symbol of w and consider the next leaf labeled A.

Now we advanced A using the leftmost alternative. We have a match for the second input
symbol a, so we advance the input ptr to d, the third input symbol and compare d against the

16
Compiled by Fikru T.
next leaf, labeled b. Since b does not match d, we report failure and go back to A to see whether
there is another alternative for A that has not been tried, but that might produce a match. In
going back to A, we must reset the input pointer to position 2, the position it had when first
came to A, which means that the procedure for A must store the input pointer in a local variable.

The second alternative for A matches for A procedures the tree of Figure (c). The leaf a
match the second symbol of w and the leaf d matches the third symbol. Since we have produced
a parse tree for w, we halt and announce successful completion of parsing.
2. LL (1) Parser
The simple block diagram for LL (1) parser is as given below:

Fig.3.12 Model for LL (1) Parser


The data structure used by LL (1) are:
• Input buffer
• Stack
• Parsing table
The LL (1) parser uses input buffer to store the input tokens. The stack is used to hold the left
sentential form. The symbol table in RHS of rule are pushed into the stack in reverse order. i.e.
from right to left. Thus, use of stack makes this algorithm no recursive. The table is basically a two-

17
Compiled by Fikru T.
Top Input token Parsing Action
$ $ Parsing successfully halt
a A Pop a and advance lookahead to next token
a B Error
A a Refer table M[A,a] if entry at M[A,a] is error report error
A a Refer table M[A,a] is entry at M[A,a] is A→PQR then pop A then push R,
then push Q, then push P
dimensional array. The table has row for non-terminal and column for terminals. The table can
be represented as M[A,a] where A is a non-terminal and a is current input symbol. The parser works
as follows:
The parsing program reads top of the stack and a current input symbol. With the help of these
two symbols the parsing action is determined. The parsing action can be

The parser consults the table M[A,a] each time while taking the parsing actions hence this type
of parsing method is called table driven parsing algorithm. The configuration of LL (1) parser
can be defined by top of the stack and a lookahead token. One by one configuration is performed
and the input is successfully parsed if the parser reaches the halting configuration. When the stack
is empty and next token is $ then it corresponds to successful parse.
Configuration of Predictive LL (1) Parser
The construction of predictive LL (1) parser is based on two very important functions and those
are FIRST and FOLLOW.
For construction of Predictive LL (1) parser we have to follow the following steps:
1. Computation of FIRST and FOLLOW function
2. Construct the Predictive Parsing Table using FIRST and FOLLOW functions
3. Parse the input string with the help of Predictive Parsing Table
First Function
First (α) is a set of terminal symbols that are first symbols appearing at the RHS in derivation
of α. If α→ε then ε is also in FIRST (α).
Following are the rules used to compute the FIRST functions
1. If terminal symbol a the FIRST(a) ={a}
2. If there is a rule X→ε then FIRST(X) ={ε}

18
Compiled by Fikru T.
3. For the rule A→X1 X2 X3….Xk FIRST(A)=(FIRST(X1) U FIRST(X2) U(FIRST(X3)
….FIRST(XN) where k Xj<n such that 1<k-1
FOLLOW Function-
Follow(A) is defined as the set of terminal symbols that appear immediately to the right of A.
In other words,
FOLLOW(A)= {a|S→α Aa β where α and β are some grammar symbols may be terminal or
non-terminal.
The rules for computing FOLLOW function are as given Below:
1. For the start symbol S place $ in FOLLOW (S).
2. If there is a production A→αBβ then everything in FIRST(β) without ε is to be placed
in FOLLOW(B).
3. If there is a production rule A→αBβ or A →αβ and FIRST (β) ={ε} then FOLLOW(A)
=FOLLOW(B) or FOLLOW(B)=FOLLOW(A). That means everything in
FOLLOW(A) is in FOLLOW(B).
Example:1 Consider grammar

Find the FIRST and FOLLOW functions for the above grammar
Solution: As E→TE’ is the rule in which the first symbol at RHS is T.
Now T→FT’ in which the first symbol at RHS is F there is a rule for F as
F→(E)|id.
FIRST(E) = FIRST(T)=FIRST(F)
As F→(E)
F → id
Hence FIRST(E) = FIRST(T) = FIRST(F) = {(, id}
FIRST(E’) = (+, ε)
As E’→+TE’
E → ε by referring computation rule 2.
The first symbol appearing at RHS of production rule for E’ is added in the FIRST function.
FIRST(T’) = (*, ε)

19
Compiled by Fikru T.
As T→*TE’
E’→ ε
The first terminal symbol appearing at RHS of production rule for T’ is added in the FIRST
function.
Now we will compute FOLLOW Function.
FOLLOW(E) –
As there is a rule F → (E) the symbol ‘)’ appears immediately to the right of E.
Hence ‘)’ will be in FOLLOW(E).
The computation rule is A → αBβ we can map this rule with F → (E) then
A = F, α = (, B = E, β =).
FOLLOW(B) = FIRST(β) = FIRST (); = {)}
FOLLOW(B) = {)}
Since E is a start symbol add $ to follow of E.
Hence FOLLOW(E) = {), $}
FOLLOW(E’) –
E → TE’ the computation rule A → αBβ
Therefore, A = E’, α = T, B = E’, β = ε then by computation rule 3 everything in FOLLOW(A)
is in FOLLOW(B), i.e. everything in FOLLOW(E) is in FOLLOW(E’).
FOLLOW(E’) = {), $}
ii. E’ → +TE’ the computation rule is A → αBβ
Therefore, A = E’, α = +T, B = E’, β = ε then by computation e everything in FOLLOW(A) is
in FOLLOW(B) i.e. Everything in FOLLOW(E) is in FOLLOW(E’)
FOLLOW(E’) = {), $)
We can observe in the given grammar that) is really following E.
FOLLOW(T) –
We have to observe two rules
Consider we will map it with
A = E, α = ε, B = T, β = E’ by computational rule 2 FOLLOW(B)=(FIRST(β- ε)
that is FOLLOW(T) = {FIRST(E’)- ε}
= {{+, ε}- ε}
={+}

20
Compiled by Fikru T.
Consider we will map it with
A = E’, α = +, B = T, β = E’ by computational rule 3 FOLLOW(A)=(FIRST(B)
that is FOLLOW(E’) = FOLLOW(T)
FOLLOW(T) = {), $)
Finally FOLLOW(T) = {+} U {), $}
= {+,), $}
We can observe in the given grammar that + and) are really following T.
FOLLOW(T’) –
We will map this rule with then A = T, α = F, B = T’, β = ε then
FOLLOW(T’) = FOLLW(T) = {+,), $}

FOLLOW(F) –
Consider or then by computation rule.

Consider by computation rule 3.

Finally, FOLLOW(F) = {*} U {+,), $}


FOLLOW(F) = {+, *,), $}
To summarize above computation

Symbols FIRST FOLLOW

E { (, id )} { ), $ }
E’ { +, ε } { ), $ }

21
Compiled by Fikru T.
T { (, id } { +, ), $ }

T’ { *, ε } { +, ), $ }

F { (, id } { +, *, ), $ }
Algorithm for Predictive Parsing Table
The construction of predictive parsing table is an important activity in predictive parsing method.
this algorithm requires FIRST and FOLLOW functions.
Input: The Context Free Grammar G.
Output: Predictive Parsing Table M.
Algorithm:
For the rule A → α of grammar G
1. For each a in FIRST(α) create entry M [A, a] = A → α where a is terminal symbol.
2. For Ɛ in FIRST(α) create M [A, b] = A → α where b is the symbols from FOLLOW(A).
3. If Ɛ in FIRST(α) and $ is in FOLLOW(A) then create entry in the table M [A, $] = A → α.
4. All the remaining entries in the table M are marked as SYNTAX ERROR.
Example 1: Now we will make use of the above algorithm to create the parsing table for the
grammar

First create the table as follows:

Now we will fill up the entries in the table using the above algorithms. For that consider each rule
one by one.

22
Compiled by Fikru T.
E → TE'
A→α
A=E, α=TE’
FIRST(TE’) if E’=Ɛ FIRST(T) = {(, id)
M [E, (] = E → TE’
M [E, id] = E → TE’
E’ → +TE’
A→α
A=E’, α=+TF’
FIRST(+TF) = {+}
Hence M [E, +] = E' → +TE'
E' → ε
A→α
A=E', α= ε then
FOLLOW (E') = {), $}
Hence M [E',)] = E' → ε
M [E', $] = E' → ε
T → FT'
A→α
A=T, α= FT' then
FIRST (FT') = FIRST(F)= {(, id}
Hence M [F, (] = T→ FT'
And M [F, id] = T→ FT'
T → *FT'
A→α
A=T, α= *FT' then
FIRST (*FT') = {*}
Hence M [T, *] = T→ *TF'
T' → ε
A→α
A=T', α=ε

23
Compiled by Fikru T.
FOLLOW (T') = {+,), $}
Hence M [T', +] = T' → ε
M [T',)] = T' → ε
M [T', $] = T' → ε
F → id
A→α
A=F, α=id
FIRST (id) = {id}
Hence M [F, id] = F → id
The complete table can be as shown below:

id + * ( ) $
E E → TE' Error Error E → TE' Error Error
E' Error E' → +TE' Error Error E' → ε' E' → ε
T T → FT' Error Error T' → FT' Error Error
T' Error T' → ε T' → *FT' Error T' → ε T' → ε
F F → id Error Error F → id Error Error
Now the input string id + id * id $ can be parsed using above table. At the initial configuration
the stack will contain start symbol E, in the input buffer input string is placed.
Stack input Action
$E Id + id * id $
Now symbol E is at top of the stack and input pointer is at first id., hence M [E, id] is referred.
This entry tells us E → TE', so we will push E' first then T.

24
Compiled by Fikru T.
Stack input Action
$E'T id + id * id $ E → TE'
$E'T'F id + id * id $ T → FT'
$E'T'id id + id * id $ F → id
$E'T' + id * id $ Pop id
$E'T' + id * id $ T' → ε
$E'T+ + id * id $ E' → +TE'
$E'T id * id $ Pop +
$E'T'F id * id $ T → FT'
$E'T'id id * id $ F → id
$E'T' * id $ Pop id
$E'T'F* * id $ T' → *FT'
$E'T'F id $ Pop *
$E'T'id id $ F → id
$E'T' $ Pop id
$E'T' $ T' → ε
$E' $ E’ → ε
$ $ Accept

Thus, it is observed that the input is scanned from left to right and we always follow left most
derivation while parsing input string. Also, at a time only one input symbol is referred to taking
the parsing action. Hence the name of this parser is LL (1). The LL (1) Parser is a table-driven
predictive parser. The left recursion and ambiguous grammar are not allowed for LL (1) parser.
Example 2: Show that following grammar:
S → AaAb | BbBa
A→ε
B→ε
is LL (1).
Solution: Consider the grammar:
S → AaAb
S → BbBa
A→ε

25
Compiled by Fikru T.
B→ε
Now we will compute FIRST and FOLLOW functions.
FIRST(S) = {a, b} if we put
S → AaAb
S → aAb When A → ε
also, S → BbBa
S → bBa When B → ε
FIRST(A) = FIRST(B) = {ε}
FOLLOW(S) = {$}
a b $
FOLLOW(A) = FOLLOW(B) = {a, b}
S S → AaAb S → BbBa
The LL (1) parsing table is
A A→ε A→ε
B B→ε B→ε

Now consider the "ba" for Parsing:


Stack Input Action
$S ba$ S → BbBa
$ BbBa ba$ B→ε
$ bBa ba$ Pop b
$ Ba a$ B→ε
$a a$ Pop a
$ $ Accept.
This shows that the given grammar is LL (1).
Example 3: Construct LL (1) parsing table for the following grammar.
S → aB | aC | Sd | Se
B → bBc | f
C→g
Solution: Consider the grammar
S → aB
S → aC
S → Sd
S → Se

26
Compiled by Fikru T.
B → bBc
B→f
C→g
Now we will construct FIRST and FOLLOW for the above grammar.
FIRST(S) = {a}
FIRST(B) = {b, f}
FIRST(C) = {g}
FOLLOW(S) = {d, e, $}
FOLLOW(B) = {c}
FOLLOW(C) = {d, e}
The LL (1) parsing table can be as shown below:
a b c d e f g $
S S → aC
S → aB
B B → bBc B→f
C C→g
The above table shows multiple entries in table [S, a]. this shows that the given grammar is not
LL (1).
3.4.2 Bottom-Up Parser
In bottom-up parsing method, the input string is taken first and we try to reduce this string with
the help of grammar and try to obtain the start symbol. The process of parsing halts successfully
as soon as we reach to start symbol.
The parse tree is constructed from bottom to up that is from leaves to root. In this process, the
input symbols are placed at the leaf nodes after successful parsing. The bottom-up parse tree is
created starting from leaves, the leaf nodes together are reduced further to internal nodes, these
internal nodes are further reduced and eventually a root node is obtained. The internal nodes are
created from the list of terminal and non-terminal symbols. This involves:
Non-terminal for internal node = Non terminal U terminal
In this process, basically parser tries to identify RHS of the production rule and replace it by
corresponding LHS. This activity is called reduction. Thus, the crucial but prime task in bottom-

27
Compiled by Fikru T.
up parsing is to find the productions that can be used for reduction. The bottom-up parse tree
construction process indicates that the tracing of derivations is to be done in reverse order.
Example 1: Consider the grammar for declarative statement:
S → TL;
T → int | float
L → L, id | id
The input string is float id, id;
Parse Tree
Step 1: We will start from leaf node.
Step 2:

Step 3: Reducing float to T. T → float

Step 4: Read next string from input.

Step 5: Reducing id to L. L → id

Step 6: Read next string from input.

Step 7: Read next string from input.

28
Compiled by Fikru T.
Step 8: gets reduced

Step 9: gets reduced to

Fig.Bottom-up parsing
Step 10: The sentential form produced while constructing this parse tree is
float id, id;
T id, id;
T, L, id;
TL;
S
Step 11: Thus, looking at sentential form we can say that the rightmost derivation in reverse order
is performed.
Thus, basic steps in bottom-up parsing are:
1. Reduction of input string to start symbol
2. The sentential forms that are produced in the reduction process should trace out
rightmost derivation in reverse.
As said earlier, the crucial task in bottom up parsing is to find the substring that could be
reduced by appropriate non terminal. Such a substring is called handle.

29
Compiled by Fikru T.
In other words, handle is a string of substring that matches the right side of production and
we can reduce such string by a non-terminal on left hand side production. Such reduction
represents one step along the reverse of rightmost derivation. Formally we can define handle
as follow:
Handle of right sentential form ɤ is a production A → β a position of ɤ where the string β may
be found and replaced by A to produce the previous right sentential form in rightmost
derivation of ɤ.
For example
Consider the grammar
E → E+E
E →id
Now consider the string id + id + id and the rightmost derivation is

The bold strings are called handles.


Right sentential form Handle Production
id + id + id id E → id
E + id + id id E → id
E + E + id id E → id
E+E+E E+E E→E+E
E+E E+E E → E + E +E
Thus, bottom parser is essentially a process of detecting handles and using them in reduction.
3.4.2.1 Shift Reduce Parser
Shift reduce parser attempts to construct parse tree from leaves to root. Thus, it works on the
same principal of up parser. A shift reduce parser requires following data structures:
1. The input buffer storing the input string
2. A stack for storing and accessing the LHS and RHS of rules.
The initial configuration of Shift reduce parser is as shown below:

Fig.3.12 Bottom-up parsing


The parser performs following basic operations.
1. Shift: Moving of the symbols from input buffer onto the stack, this action is called shift.

30
Compiled by Fikru T.
2.Reduce: if the handle appears on the top of the stack then reduction of it by appropriate rule
is done. That means RHS of rule is popped of and LHS is pushed in. This action is called
Reduce action.
3. Accept: if the stack contains start symbol only and input buffer is empty at the same time
then that action is called accept. When accept state is obtained in that process of parsing
then it means a successful parsing is done.
4.Error: A situation in which parser cannot either shift or reduce the symbols, it cannot even
perform the accept action is called as error.
Example 1: Consider the grammar
E→E-E
E→E*E
E →id
Perform shift-Reduce parsing of the input string id1 - id2 * id3
Solution:
Stack Input Buffer Parsing Action
$ id1 - id2 * id3$ Shift
$id1 - id2 * id3$ Reduce by E → id
$E - id2 * id3$ Shift
$E- id2 * id3$ Shift
$E-id2 * id3$ Reduce by E → id
$E-E * id3$ Shift
$E-E* id3$ Shift
$E-E*id3 $ Reduce by E → id
$E-E*E $ Reduce by E → E*E
$E-E $ Reduce by E → E-E
$E $ Accept

Here we have followed two rules


1. If the incoming operator has more priority than in stack operator then perform shift.
2. If in stack operator has same or less priority than the priority of incoming operator then
performs reduce.
Example 2: Consider the following grammar
S → TL;
T → int | float
L → L, id | id
Parse the input string int id, id; using shift reduce parser.
Solution:

31
Compiled by Fikru T.
Stack Input Buffer Parsing Action
$ int id, id; $ Shift
$int id, id; $ Reduce by T → int
$T id, id; $ Shift
$Tid id; $ Reduce by L→ id
$TL id; $ Shift
$TL, id; $ Shift
$TL, id ;$ Reduce by L → L, id
$TL ;$ Shift
$TL; $ Reduce by S → TL;
$S $ Accept
Example 3: Consider the following grammar
S → (L) | a
L → L, S | S
Parse the input string (a, (a, a)) using reduce parser.
Solution:
Stack Input Buffer Parsing Action
$ (a, (a, a)) $ Shift
$( a, (a, a)) $ Shift
$(a , (a, a)) $ Reduce by S→ a
$(S , (a, a)) $ Reduce by L→ S
$(L , (a, a)) $ Shift
$(L, (a, a)) $ Shift
$(L, ( a, a)) $ Shift
$(L, (a , a)) $ Reduce by S → a
$(L, (S , a)) $ Reduce by L → S
$(L, (L , a)) $ Shift
$(L, (L, a)) $ Shift
$(L, (L,a )) $ Reduce by S → a
$(L, (L,S )) $ Reduce by L → L, S
$(L, (L )) $ Shift
$(L, (L) )$ Reduce by S → (L)
$(L, S )$ Reduce by L → L, S
$(L )$ Shift
$(L) $ Reduce by S → (L)
$S $ Accept
3.4.2.2 LR Parser
This is the most efficient method of bottom-up parsing which can be used to parse the large
class of context free grammars. This method is also called LR(k) parsing.
Here

32
Compiled by Fikru T.
• L stands for left to right scanning
• R stands for rightmost derivation in reverse
• K is number of input symbols. When k is omitted, k is assumed to be 1.
Properties of LR Parser
LR parser is widely used for following reasons:
1. LR parsers can be constructed to recognize most of the programming languages for which
context free grammar can be written.
2. The class of grammar that can be parsed by LR parser is a superset of class of grammars
that can be parsed using predictive parsers.
3. LR parser works using non backtracking shift reduce technique yet it is efficient one.
4. LR parsers detect syntactical errors very efficiently.
Structure of LR Parsers
The structure of LR parser is as given in following Fig. 3.13.

Fig.3.13 Structure of LR Parser


It consists of input buffer for storing, a stack for storing grammar symbols, output and a parsing
table comprised of two parts namely action and goto. There is one parsing program which actually
a driving program and reads the input symbol one at a time from the input buffer.
The driving program works on the following line:

33
Compiled by Fikru T.
1. It initializes the stack with start symbol and invokes scanner (lexical analyzer) to get next
token.
2. It determines sj the state currently on the top of the stack and ai the current input symbol.
3. It consults the parsing table for the action [sj, ai] which can have one of the four values
a. si means shift state i
b. rj means reduce by rule j
c. Accept means successful parsing is done.
d. Error indicates syntactical error.
Types of LR Parser
Following diagram represents the types of LR parser.

Fig.3.13 Techniques of LR Parsers


The SLR means simple LR parser, LALR means Lookahead LR parser and canonical LR or
simply "LR" parser -these are the three members of LR family. The overall structure of all these
LR parsers is same. All are table driven parsers. The relative powers of these parsers is SLR (1) ≤
LALR (1) ≤ LR (1). That means canonical LR parses larger class than LALR and LALR parses
larger class than SLR parser.
1. SLR Parser
We will start by the simplest form of LR parsing called SLR parser. It is the weakest of the three
methods but it is easiest to implement. The parsing can be done as:

34
Compiled by Fikru T.
Fig.3.13 Working of SLR (1) parser
A grammar for which SLR parser can be constructed is called SLR grammar.
Definition of LR (0) items and related terms:
1. The LR (0) item for grammar G is production rule in which symbol ● is inserted at some
position in RHS of the rule. For example:
S → ●ABC
S → A●BC
S → AB●C
S → ABC●
The production S → ε generates only one item S → ●
2.Augmented grammar: is a grammar G is having start symbol S then augmented grammar is
a new grammar G’ in which S’ is a new start symbol such that S’→ S. The purpose of this
grammar is to indicate the acceptance of input. That is when parser is about to reduce S’→ S
it reaches to acceptance state.
3.Kernel items: It is collection of all items S → ●S and all the items whose dots are not at the
leftmost end of RHS of the rule.
Non-kernel items: The collection of all the items in which ● are at the left end of RHS of the
rule.
4. Functions closure and goto: These are two important functions required to create collection
of canonical set of items.
5. Viable Prefix: It is the set of prefixes in the right sentential form of production A → α
Closure operation
For a context free grammar G, if I is the set of items then the function closure(I) can be
constructed using following rules.

35
Compiled by Fikru T.
1. Consider I is a set of canonical items and initially every item I is added to closure(I).
2. If rule A →α ●Bβ is a rule in closure (I) and there is another rule for B such as B → ɤ then
closure (I): A →α ●Bβ
B→ɤ
This rule has to be applied until no more new items can be added to closure (I).
The meaning of rule A →α ●Bβ is that during derivation of the input string at some point we
may require strings derivable from Bβ as input. A non-terminal immediately to the right of ●
indicates that it has to be expanded shortly.
goto operation:
The function goto can be defined as follows:
If there is a production A →α ●Bβ then goto (A →α ●Bβ, B) = A →αB●β. That means
simply shifting of ● one position to ahead over the grammar symbol (may be terminal or non-
terminal). The rule A →α●Bβ. Is in I then the same goto function can be written as goto (I, B).
Example 1: Consider the grammar:
X → Xb | a
Compute closure(I) and goto(I).
Solution: Let
I: X → ●Xb
Closure(I) = X → ●Xb
X → ●a
The goto function can be computed as
goto (I, X) = X → X ● b
X→X●b
Similarly goto(I,a) gives X → a●
Example 2: Consider the grammar
S → AS | b
A → SA | a
Compute closure(I) and goto (I)

36
Compiled by Fikru T.
Solution: We will first write the grammar using dot operator.

Let us call this as state as I0


Now we apply goto on each symbol in I0

Example 3: Consider the following grammar:


S → Aa | bAc | Bc | bBa
A→d
B→d
Solution: First we will write grammar using dot operator.
I0 : S → ●Aa
S → ●bAc
S → ●Bc
S → ●bBa
A → ●d
B → ●d
Now we will apply goto on each symbol state I0. Each goto(I) will generate new subsequent
states.

37
Compiled by Fikru T.
I1 : goto (I0, A)
S→A●a
I2 : goto (I0, b)
S → b ● Ac
I2 : goto (I0, b)
S → Bc
S → b ● Ba
A → ●d
B → ●d
I3 : goto (I0, B)
S→B●c
I4 : goto (I0, d)
A → d●
B → d●
Construction of canonical collection of set of items:
1. For the grammar G initially add S' → ● S in set of item C.
2. For each set of items Ii in C and for each grammar symbol X (may be terminal or
non-terminal) add closure ( Ii, X). This process should be repeated by applying goto
(Ii, X) for each X in Ii such that goto (Ii, X) is not empty and not in C. the set of
items has to constructed until no more set of items can be added to C.
Now we will consider one grammar and construct the set of items by applying closure and goto
functions.
Example:
E→E+T
E→T
T→T*F
T→F
F → (E)
F → id

38
Compiled by Fikru T.
In this grammar we will add the augmented grammar E → ●E in the I then we have to apply
closure(I).

The item I0 is constructed starting from the grammar E’ → ● E. Now immediately right to ● is
E. Hence, we have applied closure(I0) and thereby we add E-productions with ● at the end of the
rule. That means we have added E → ● E + T and E → ● T in I0. But again, as we can see that the
rule E → ● T which we added, contains non-terminal T immediately right to ●. So, we have to
add T-productions in I0 T → ● T * F and T → * F.
In T-productions after ● comes T and F respectively. But we have already added T-productions
so we will not add those. But we will add all the F-productions having dots. The F → ●(E) and F
→ ● id will be added. Now we can see that after dot (and id are coming in these two productions.
The (and id are terminal symbols and are not deriving any rule. Hence our closure function
terminates over here. Since there is no rule further, we will stop creating I0.
Now apply goto (I0, E)

Thus, I1 becomes:

Since in I1 there is no non-terminal after dot we cannot apply closure(I1).


By applying goto on T of I0

Since in I2 there is non terminal after dot we cannot apply closure(I2)

39
Compiled by Fikru T.
By applying goto on F of I0

Since after dot in I3 there is nothing, hence we cannot apply closure(I3).


By applying got on (of I0. But after dot E comes hence, we will apply closure on E, then on T,
then on F.

By applying goto on id of I0

Since in I5 there is non-terminal to the right of dot we cannot apply closure function here. Thus,
we have completed applying goto on I0. We will consider I1 for applying goto. In I2 there are two
productions E’ → E ● and E → E ● + T. There is no point applying goto on E’→ E ●, hence we
will consider E’→ E ● + T for application of goto.

There is no other rule in I1 for applying goto. So,


we will move to I2. In I2 there are two productions E → T ● and T → T ● * T. We will apply goto
on *.

40
Compiled by Fikru T.
The goto cannot be applied on I3. Apply goto on E in I4. In I4 there are two productions having
E after dot (F → ● E and E → ● E + T). Hence, we will apply goto on both of those productions.
The I8 becomes

If we will apply goto on (I4, T) but we get E → T● and T → T ● *F which is I2 only. Hence,
we will not apply goto on T. similarly we will not apply goto on F, (and id as we get the states I3,
I4, I5 again. Hence these goto cannot be applied to avoid repetition.
There is no point applying got on L5. hence now we will move ahead by applying goto on I6 for
T.

Then,

Then,

Applying goto on l9, I10, I11 is of no use. Thus, now there is no item that can be added in the set
of items. The collection of set of items is from I0 to I11.
Construction SLR Parsing Table
As we have seen in the structure of SLR parser that are two parts of SLR parsing table and those
are action and goto. By considering basic parsing actions such as shift, reduce, accept and error

41
Compiled by Fikru T.
we will fill up the action table. The goto table can be filled up using goto function. Let us see the
algorithm:
Input: An Augmented grammar G’
Output: SLR parsing table.
Algorithm:
1. Initially construct set of items C= {I0, I1, I2…In} where C is a collection of set of LR (0)
items for input grammar G’.
2. The parsing actions are based on each item Ii. The actions are as given below:
a. If A → αβ Ii and goto (Ii, a) =Ij then set action (I, a) as “shift j”. note that a must
be a terminal symbol.
b. If there is a rule A →α● is in Ii then set action [I, a] to “reduce A →α” for all
symbols a where a ϵ FOLLOW(A). Note that A must not an augmented grammar
S’.
c. If S’ → S is in Ii then the entry in the action table action [I, $] = “accept”.
3. The goto part of the SLR table can be filled as: the goto transitions for state I is
considered for non-terminals only. If goto [Ii, A] =Ij then goto [Ii, A] =j.
4. All the entries not defined by rule 2 and 3 are considered to be “error”.
Example 1: Construct the SLR (1) parsing table for

Solution: We will first construct a collection of canonical set of items for the
above grammar. The set of items generated by this method are also called SLR (0) items. As there
is no lookahead symbol in this set of items, zero is put in the bracket.

42
Compiled by Fikru T.
43
Compiled by Fikru T.
We can design a DFA for above set of items as follows:

Fig.3.14 DFA for set of items


In the given DFA every state is final state. The state I0 is initial state. Note that the DFA
recognizes viable prefixes.
For example: for the items:

44
Compiled by Fikru T.
The viable prefixes, E, E+ and E+T are recognized here continuing in this fashion. The DFA
can be constructed for set of items. Thus, DFA helps in recognizing valid viable prefixes that can
appear on the top of the stack.
Now we will also prepare a FOLLOW set for all the non-terminals because we require
FOLLOW set according to rule 2.b of parsing table algorithm.
FOLLOW(E’) = As E’ is a start symbol $ will be placed.
= {$}
FOLLOW(E):
E’ → E that means E’=E = start symbol. ⸫ we will add $.
E→E+T the + is following E ⸫ we will add +.
F → (E) the) is following E. ⸫ we will add ).
⸫ FOLLOW(E) = {+,), $}
FOLLOW(T):
As E’ → E, E → T the E’=E=T=start symbol. ⸫we will add $.

The + is following T hence we will add +.


T → T *F
As * is following T we will add *

45
Compiled by Fikru T.
As ) is following T we will add )
⸫ FOLLOW(T) = {+, *,), $}
FOLLOW(F):
As E’ → E, E → T and T → F the E’=E=T=F=start symbol. We will add $.

The + is following F hence we will add +.

As * is following F we0 will add *.

As ) is following F we will add )


FOLLOW(F)= {+, *,),$}
Building of Parsing of Table
Now from canonical collection of set of items, Consider I0.

46
Compiled by Fikru T.
Consider F → ● (E)
A → ● aβ
A = F, α = ε, a = (β = E)
goto (I0, ( ) = I4
⸫action [0, ( ] = shift 4
Similarly, for F → ● id
Entry in the action table action [0, id] = shift5 ⸫ goto(I0,id) = I5
Other item in I0 does not give any action. Hence, we will find the actions from I0 to I11.

Thus, SLR parsing table is filled up with the shift actions. Now we will fill it up with reduce
and accept action.
According to the rule 2.c from parsing table algorithm, there is a production E’ → E● in I1.
Hence, we will add the action “accept” in action [1, $].
Now in state I2.

47
Compiled by Fikru T.
Hence add rule E → T in the row of state 2 and in column of +,) and $. In the given example of
grammar E → T is rule number 2. ⸫action [2, +] = r2, action [2,)] = r2, action [2, $] = r2.
Similarly, now in state I3

Hence add rule T → F in the row of state 3 and in the column of +, *,) and $. In the given
example of grammar T → F is rule number 4. Therefore action [3, +] =r4, action [3,) =r4, action
[3, $] =r4. Thus, we can find the match for the rule A → α ● from remaining states I4 to I11 and fill
up the action table by respective “reduce” rule. The table with all the action [ ] entries will be:

48
Compiled by Fikru T.
Now we will fill up the goto table for all the non-terminals. In state I0, goto (I0, E) =I1. Hence
goto (0, E) =1, similarly goto (I0, T) =I2. Hence goto (0, T) =2. continuing in this fashion we can
fill up the goto entries of SLR parsing table.
Finally, the SLR (1) parsing table will look as:

Remaining blank entries in the table are considered as syntactical errors.


Parsing the input using parsing table
Now it’s time to parse the actual string using above parsing table. Consider the parsing
algorithm.
Input: The string w that is to be parsed and parsing table.
Output: Parse w if w ϵ L(G) using bottom up. If w ȼ L(G) then report syntactical error.
Algorithm:
1. Initially push 0 as initial state onto the stack and place the input string $ as end marker on
the input tape.
2. If S is the state on the top of the stack and a is the symbol from input buffer pointed by a
lookahead pointer then

49
Compiled by Fikru T.
• If action [S, a] = shift j then push a, then push j onto the stack. Advance the input
lookahead pointer.
• If action [S, a] = reduce A → β then pop 2*|β| symbols. If i is on the top of the stack
then push A then push goto [i, A] on the top of the stack.
• If action [S, a] = accept then halt the parsing process. It indicates the successful
parsing.
Let us take one valid string for the grammar

Input string: id * id + id
We will consider two data structures while taking the parsing actions and those are stack and
input buffer.

50
Compiled by Fikru T.
In the above table at first row we get action [0, id] =s5, that means shift id from input buffer
onto the stack and then push the state number 5. On the second row we get action [5, *] as r6 that
means reduce by rule 6, F → id, hence in the stack id is replaced by F. By referring goto [0, F] we
get state number 3 hence 3 is pushed onto the stack. Note that for every reduce action goto is
performed. This process of parsing is continued in this fashion and finally when we get action [1,
$] = accept we halt by successfully parsing the input string.
Example 2: Consider the following grammar

Construct the SLR parsing table for this grammar. Also parse the input a * b + a.
Solution: Let us number the production rules in the grammar.

Now we will build the canonical set of SLR (0) items. We will first introduce an augmented
grammar. We will first introduce an augmented grammar E’ → ●E and then the initial set of items
I0 will be generated as follows.

51
Compiled by Fikru T.
Now we will use goto function. From state I0 goto on E, T, F and a, b will be applied step by
step. Each goto transition will be generate new state Ii.

Now we will start applying goto transitions on state I1. From I1 state it is possible to apply goto
transitions only on +. Hence,

52
Compiled by Fikru T.
The goto transitions will be applied on I2 state now. We will choose goto transition because
there is no point applying goto on T.

If we will apply got on a or b from state I2 then we will get F → a which are state I4 and I5
respectively.
Hence, we will not consider I4 and I5 again. Now move to state I3. From I3 goto transitions on
*. Hence,

As there is no point in applying goto on state I4 and I5. We will choose state I6 for goto transition.

Now we will first obtain FOLLOW of E, T and F. as the FOLLOW computation is required
when the SLR parsing table is building.
FOLLOW(E) = {+, $}
FOLLOW(T) = {+, a, b, $}
FOLLOW(F) = {+, *, a, b, $}
As by rule 2 and 4, E → T and T → F we can state E = T = F. But E is a start symbol. Then by
rule 2 and 4, T and F can act as start symbol. ⸫we have added $ in FOLLOW(E), FOLLOW(T)
and FOLLOW(F).
The SLR parsing table can be constructed as follows:

53
Compiled by Fikru T.
Now we will parse the input a * b + a using above parse table.

Thus, the input string is parsed

54
Compiled by Fikru T.
Example 3: show that the following grammar:
S → Aa | bAc | dc | bda
A→d is not SLR (1).
Solution: We will number the production rules in the grammar.

Now let us construct canonical set of items.


I0: S’ → ● S
S → ● Aa
S → ● bAc
S → ● dc
S → ● bda
S → ●Aa
S→●d
I1: goto (I0, S)
S’ → S ●
I2: goto (I0, A)
S→A●a
I3: goto (I0, b)
S → b ● Ac
S → b ● da
A→●d
I4: goto (I0, d)
S→d●c
A→d●
I5: goto (I2, a)
S → Aa●

55
Compiled by Fikru T.
I6: goto (I3, A)
S → bA ● c
I7: goto (I3, d)
S → bd ● a
A→d●
I8: goto (I4, c)
S → dc ●
I9: goto (I6, c)
S → bAc ●
I10: goto (I7, a)
S → bda ●
Now we will construct FOLLOW(S) and FOLLOW(A).
FOLLOW(S) = {$}
FOLLOW(A) = {a, c}
The construction of SLR (1) parsing table is done with the help of set of items. The parsing table
is a given below:

56
Compiled by Fikru T.
The above table shows clearly that there are multiple entries in Action [7, a] and Action [4, c].
That means shift/reduce conflict will occur while parsing the input using this SLR parsing table.
This shows that the given grammar is not SLR (1).

LR(k) Parser
The canonical set of items is the parsing technique in which a lookahead symbol is generated
while constructing set of items. Hence the collection of set of items is referred as LR (1).
The value 1 in the bracket indicates that there is one lookahead symbol in the set of items.
We follow the same steps as discussed in SLR parsing techniques and those are:
1. Construction of canonical set of items along with the lookahead.
2. Building canonical LR Parsing Table.
3. Parsing the input string using canonical LR Parsing Table.
Construction of canonical set of items along with the lookahead:
1. For the grammar G initially add S’ → ● S in the set of item C.
2. For each set of items Ii in C and for each grammar symbol X (may be terminal or non-
terminal) add closure (Ii, X). This process should be repeated by applying goto (Ii, X) for
each X in Ii such that goto (Ii, X) is not empty and not in C. The set of items has to
constructed until no more set of items can be added to C.
3. The closure function can be computed as follows:
Such that X →α ● X β a and X → ɤ and b ϵ FIRST(βa) such that X → ●ɤ and b is
not in I then add X → ●ɤ, b to I.
4. Similarly, the goto function can be computed as for each item [A →α ● X β, a] is in I and
rule [A →α X ● β, a] is not in goto items then add [A →α X ● β, a] to goto items.
This process is repeated until no more set of items can be added to the collection of C.
Example 1:
S’ → S
S → CC
C → aC | d
Construct LR (1) set of items for the grammar
Solution: We will initially add S’ → S, $ as the first rule in I0. Now match

57
Compiled by Fikru T.
58
Compiled by Fikru T.
59
Compiled by Fikru T.
60
Compiled by Fikru T.
You can note one thing that I3 and I6 are different because the second component in I3 and I6 is
different.
Apply goto on d of I2 for the rule C → ●d, $.

Now if we apply goto on a and d of I3 we will get I3 and I4 respectively and there is no point in
repeating the states. So, we will apply goto on C of I3.

For I4 and I5 there is no point in applying gotos. Applying goto on a and d of I6 gives I6 and I7
respectively. Hence, we will apply goto on I6 for C for the rule.
C → ●d, $.

For remaining states I7, I8 and I9 we cannot apply goto. Hence the process of construction of set
LR (1) items is completed. Thus, the set of LR (1) items consists of I0 to I9 states.
Construction of canonical LR Parsing Table
To construct the Canonical LR Parsing Table first of all we see the actual algorithm and then we
will learn how to apply that algorithm on some example. The parsing table is similar to SLR
parsing table comprised of action and goto parts.
Input: n Augmented grammar G’.
Output: The canonical LR parsing table
Algorithm:
1. Initially construct set of items C= {I0, I1, I2…In} where C is a collection of set of LR (1)
items for the input grammar G’.
2. The parsing actions are based on each item Ii. The actions are as given below:
a. If [A → α ● a β, b] is in Ii and goto (Ii, a) = Ij then create an entry in the action table
action[I,a]=shift j.

61
Compiled by Fikru T.
b. If there is a production A → α ● a] in Ii then in the action table action [Ii, a] = reduce
by A → α. Hence, A should not be S’.
c. If there is a production S’→ S●, $ in Ii then action [I, $] = accept.
3. The goto part of the LR table can be filled as: the goto transitions for state I is considered
for non-terminals only. If goto (Ii, A) = Ij then goto [Ii, A] =j.
4. All the entries not defined by rule 2 and 3 are considered to be “error”.
Example: Construct the LR (1) parsing table for the following grammar:

Solution: First we will construct the set LR (1) items:

The DFA for the set of items can be drawn as follows:

62
Compiled by Fikru T.
Fig.3.15 DFA [goto graph]

Now consider I0 in which there is a rule matching with [A→α●aβ,b] as C→●aC, a | d and if the
goto is applied on a the we get the state I3. Hence we will create entry action[0,a]=shift. Similarly,

In I0
C→●da|d
A → α ● a β, b
A=C, α=ε , a=d ,β=ε, b=a | d
goto (I0,d)=I4
Hence action[0,d]=shift 4
For state I4

63
Compiled by Fikru T.
C→d ●a|d
A→α●a
A=C, α=d, a=a | d
Action[4,a]=reduce by C → d i.e. ruel(3) of given grammar
S’ → S ●, $ in I1
So we will create action[1,$]=accept.
The goto table can be filled by using the goto functions.
For instance goto(I0,S)=I1. Hence goto(0,S)=1. Continueing in this fashion we can fill up the
LR(1) parsing table as follows:

The remaning blain the table are considered as syntactical errror.


Parsing the input using LR(1) Parsing Table:
Using the above parsing table we can parse the input string "aadd" as :

64
Compiled by Fikru T.
Thus the given input string is successfully parsed using LR parser or canonical parser.
3.4.2.3 LALR Parser
In this type of parser the lookahead symbol is generated for each set of item. The table obtained
by this method are smaller in size than LR(k) parser. In fact the states os SLR and LALR parsing
are always same. Most of the programming languages use LALR parsers.
We follow the same steps as disccussed inSLR and canonical LR parsing techniques and those
are:
1. Construction of canonical set of items along with the lookahead
2. Building LALR parsing table
3. Parsing the input string using canoniacal LR parsing table
Construction of canonical set of items along with the lookahead:
The construction LR(1) items is same as dscussed in LR(1). But the only difference is that, in
construction of LR(1) items for LR parser, we have differed the two states if the second component
is different but in the case we will merge the two states of first and second components from both
the states.
Example 1:

Construct set of LR(1) items for LALR parser.

65
Compiled by Fikru T.
We have merged two states I3 and I6 and made the second components as a or d or $. The
production rule will remain as it is. Similarly, in L4 and I7. The set of items consist of states {I0, I1,
I36, I47, I5, I89}.
Construction of LALR Parsing Table
The algorithm for construction of LALR Parsing Table is as given below:
1. Construct the LR (1) set of items.
2. Merge the two states Ii and Ij if the first component (i.e. the production rules with dots) are
matching and create a new state replacing one of the older state such as Iij = Ii U Ij.
3. The parsing actions are based on each item Ii. The actions are as given below.
a) If [A → α ● a β, b] is in Ii and goto (Ii, a) = Ij then create any entry in the action table action
[I, a] = shift j.
b) If there is production [A → α ●, a] in Ii in the action table action [Ii, a] = reduce by A → α.
Here A should not be S’.
c) If there is a production A → α ●, $ in Ii then action [I, $] = accept
4. The goto part of the LR table can be filled as: the goto transitions for state I is considered for
non-terminals only. If goto [Ii, A] = Ij then goto [I, A] =j.

66
Compiled by Fikru T.
5. If the parsing action conflict then the algorithm fails to produce LALR parser and grammar
is not LALR (1). All the entries not defined by rule 3 and 4 are considered to be “error
Example 1:
S → CC
C → aC
C→d
Construct the parsing table for LALR (1) parser.
Solution: First the set LR (1) items can be constructed as follows with merged states.

Now consider state I0 there is a match with the rule [A → α ● a β, b] and goto (Ii = a) = Ij.
C → ● aC, a | d | $ and if the goto is applied on a’ then we get the state I36. Hence, we will create
entry action [0, a] = shift 36. Similarly,

67
Compiled by Fikru T.
S’ → S ●, $ in I1
So, we create action [1, $] =accept.
The goto table can be filled by using the goto functions.
For instance, got (i0, S) = I1. Hence goto [0, S] = 1. Continuing in this fashion we
can fill up the LR (1) parsing table as follows.

The string belonging to given grammar can be parsed using LALR parser. The bank entries are
supposed to be syntactic error.
Parsing the Input String using LALR Parser
The string having regular expression = a * d a * dϵ grammar G. we will consider input string as
" aadd " for parsing by using LALR parsing table.

68
Compiled by Fikru T.
Thus, the LALR and LR parser will mimic one another on the same input.
Example 2: Construct LALR parsing table for the following grammar
S → Aa
S → bAc
S → dc
S → bda
A→d
Parse the input string bdc using table generated by you.
Solution: Let us first number the production rules as below:

Now we will construct canonical set of LR (1) items for the above grammar.
I0:
S’ → ● S, $

69
Compiled by Fikru T.
S → ● Aa, $
S → ● bAc, $
S → ● dc, $
S → ● bda, $
A → ● d, a
In the above set of items, we will start from S’ → ● S, $. The second component is $
initially. After ● the S comes, hence we will add the rules deriving S. Now we have got the
rule.
S → ● Aa, $
It is resembling with A → α ● x β, a
Now we can map x to A then the second component of X → is FIRST(βa)
In our rule
A → ● d and second component is FIRST($a) = a
Hence A → ● d, a will be added in I0.
I1: goto (I0, S)
S’ → S ●, $ we will carry second component as it is.

70
Compiled by Fikru T.
In the above set of canonical items, no states are having common production rules. Hence, we
cannot merge these states. The same set of items will be considered for building LALR parsing
table.
We will construct LALR parsing table using following rules.
1. If [A → α ●aβ, b] is in Ii and goto (Ii, a) = Ii then action [i, a] = shift j.
2. If there is a production [A → α ●, a] in some state Ii then action [i, a] = reduce by A → α.
3. If there is a product S’ → S ●, $ in Ii then action [i, $] = accept.

71
Compiled by Fikru T.
Consider the input "bdc” for parsing with the help of above LALR parsing table.

Thus, the input string gets parsed completely.


3.4.2.4 Error recovery Strategies in Syntax Analysis
Error recovery strategies are used by the parser to recover from errors once it is detected. The
simplest recovery strategy is to quit parsing with an error message for the first error itself.
Panic Mode Recovery
Once an error is found, the parser intends to find designated set of synchronizing tokens by
discarding input symbols one at a time. Synchronizing tokens are delimiters, semicolon or} whose
role in source program is clear. When parser finds an error in the statement, it ignores the rest of
the statement by not processing the input. This is the easiest way of error recovery. It prevents the
parser from developing infinite loops.
Advantage
• Simplicity
• Never get into infinite loop
Disadvantage
• Additional errors cannot be checked as some of the input symbols will be skipped.
Panic Mode Recovery
Parser performs local correction on the remaining input when an error is detected. When a parser
finds error, it tries to take corrective measures so that the rest of inputs of statement allow the
parser to parse ahead. One wrong correction will lead to an infinite loop. The local correction may
be:

72
Compiled by Fikru T.
• Replacing a prefix by some string.
• Replacing comma by semicolon.
• Deleting extraneous semicolon.
• Inserting missing semicolon.
Advantage
• It can correct any input string.
Disadvantage
• It is difficult to cope up with actual error if it has occurred before the point of detection.
Error Production
Production which generate erroneous constructs are augmented to the grammar by considering
common errors that occur. These productions detect the anticipated errors during parsing. Error
diagnostics about the erroneous constructs are generated by the parser.
Global Correction
There are algorithms which make changes to modify an incorrect string into a correct string.
These algorithms perform minimal sequence of changes to obtain globally least-cost correction.
When a grammar G and an incorrect string w is given, these algorithms find a parse tree for a
string p related top with smaller number of transformations. The transformations may be insertions,
deletions, and change of tokens.
Advantage
• It has been used for phrase level recovery to find optimal replacement strings.
Disadvantage
• This strategy is too costly to implement in terms of time and space.
3.4.2.5 Parser generator
YACC is automatic toll for generating the parser program. YACC stands for Yet Another
Compiler Compiler. YACC provides a tool to produce a parser for a given grammar. YACC is a
program designed to compile a LALR (1) grammar. It is used to produce the source code of the
syntactic analyzer of the language produced by LALR (1) grammar. The input of YACC is the rule
or grammar and the output is a C++ program. These are some points about YACC:
• Input: A CFG- file.y
• Output: A parser y.tab.c (yacc)
✓ The output file "file.output" contains the parsing tables.
✓ The file "file.tab.h" contains declarations.

73
Compiled by Fikru T.
✓ The parser called the yyparse ().
✓ Parser expects to use a function called yylex () to get tokens.

The typical YACC Translator can be represented as:

Fig.3.16 Parser Generator Model


First, we write a YACC specification file; let us name it as x.y. This file is given to the
YACC compiler by using command. yacc x.y
Then it will generate a parser program using your YACC specification file. The parser
program has a standard name as y.tab.c. This is basically parser program in C generated
automatically. yacc – d x.y
By -d option two files will be get generated one is y.yab.c and other is y.tab.h. the header
file y.tab.h will store all the tokens and so you need not have to create y.tab.h explicitly. The
generated y.tab.c program will then be compiled by C compiler and generates the executable
a.out file. then you can test your YACC program with the help of some valid and invalid
strings.

74
Compiled by Fikru T.
Writing YACC specification program is the most logical activity. This specification file
contains the context free grammar and using the production rules of context free grammar the
parsing of input string can be done by y.tab.c.
YACC Specification
The YACC specification file consists of three parts declaration section, translation rule section
and supporting C functions.

Fig.3.17 Parts of YACC Specification


The specification file with these sections can be written as:
%{
/* declaration section */
%}
%%
/* Translation rule section */
%%
/* Required C/C++ functions*/
1. Declaration part: In this section ordinary C declaration can be put. Not only this we can
also declare grammar tokens in this section. The declaration of tokens should be after %{
%}.
2. The Translation rule section: Consists of all the production rules of context free grammar
with corresponding actions. For instance
rule 1 action 1
rule 2 action 2
. .

75
Compiled by Fikru T.
. .
rule n action n
if there are more than one alternative to a single rule then those alternatives should be separated
by | character. The actions are typical C statements. If CFG is

3. C function Section: This section consists main function in which the routine yyparse() will
be called. And it also consists of required C functions.

76
Compiled by Fikru T.

You might also like