Chapter 3 Syntax Analysis Full Reading Material
Chapter 3 Syntax Analysis Full Reading Material
3.1 Introduction
The tokens generated by lexical analyzer are accepted by the next phase of compiler i.e. syntax
analyzer. The syntax analyzer or parser comes after lexical analysis. This is one of the important
components of front end of the compiler. In this chapter:
▪ we first understand the concept of parsing when we will concentrate on a parsing
technique called top down parsing.
▪ We will discuss various methods such as:
o Backtracking
o LL (1) parsing
o Recursive descent parsing and
o Predictive parsing techniques
▪ We will also discuss the processing steps required for predictive parsing.
3.2 Concept of Syntax Analysis
The syntax analysis is the second phase in compilation. The syntax analyzer (parser) basically
checks for the syntax of the language. The syntax analyzer takes the tokens from the lexical
analyzer and groups them in such as a way that some programming structure(syntax) can be
recognized. After grouping the token, if any syntax cannot be recognized then syntactic error will
be generated. This overall process is called syntax checking of the language.
Definition of Parser: A parsing or syntax analysis is a process which takes the input string w and
produces either parse tree (syntactic structure) or generates the syntactic errors.
For example: a = b + 10;
The above programming statement is the given to lexical analyzer. The lexical analyzer will
divide it into group of tokens. The syntax analyzer takes the tokens as input and generates a tree
like structure called parse tree. The parse tree drawn above is for some programming statement.
It shows how the statement gets parsed according to their syntactic specification.
1
Compiled by Fikru T.
3.2.1 Basic Issues in Parsing
There are two important issues in parsing
i. Specification of syntax
ii. Representation of input after parsing
• A very important issue in parsing is specification of syntax in programming language.
Specification of syntax means how to write any programming language statements. There
some characteristics of specification of syntax-
i. Specification should be precise and unambiguous
ii. Specification should be in detail; it should cover all the details of the programming language.
iii. Specification should be complete.
Such specification is called “Context free Grammar” (CFG)
• Another important issue in parsing is representation of the input after parsing. This is
important because all the subsequent phases of compiler take the information from the parse
tree being generated. This is important because the information suggested by any input
programming statement should not be differed after building the syntax tree for it.
• Lastly the most crucial issue is the parsing algorithm based on which we get the parse tree
for the given input. There are two different approaches to parsing
o Top down parsing
o Bottom up parsing
• The parsing algorithms are based on these approaches. These algorithms deal with following
issues:
o How these algorithms work?
o Are they efficient in nature?
o What are their merits and limitations?
o What kind of input they require?
Keeping on thing in mind that we are now interested in simply syntax of the language we are
not interested in meaning right now. Hence, we will do only syntax analysis at this phase. Checking
of the meaning (Semantic) from syntactically correct input will be studied in the next phase of
compilation. The modified view of front end is as shown in Fig. 3.2.
2
Compiled by Fikru T.
3.2.2 Role of Parser
In the process of compilation, the parser and lexical analyzer work together. That means, when
parser requires string tokens it invokes lexical analyzer. In turn, the lexical analyzer supplies
tokens to syntax analyzer (parser).
The parser collects sufficient number of tokens and builds a parse tree. Thus, by building the
parse tree, parser smartly finds the syntactical errors if any. It is also necessary that the parser
should recover from commonly occurring errors so that remaining task of process the input can be
continued.
Why lexical and Syntax Analyzer are separated?
The lexical analyzer scans the input program and collects the tokens from it. Parser builds a parse
tree using these tokens. There are two important activities and these activities are independently
carried out by these two phases of compiler. Separating out these two phases has two advantages:
1. It accelerates the process of compilation
3
Compiled by Fikru T.
2. The error in the source input can be identified precisely
Now we will focus on the first issue of parser i.e. specification of input. As we know that
specification of input can be done by using “Context Free Grammar”.
3.3 Context Free Grammar
The context free grammar G is a collection of following things:
1. V is set of non-terminals
2. T is a set of terminals
3. S is a start symbol
4. P is a set of production rule
Thus G can be represented as G=(V, T, S, P)
The production rules are given in the following form:
Non-terminal (V U T) *
Example 1: let the language L= anbn where n>1
Let G=(V, T, S, P), where V={S}, T = {a,b} and S is a start symbol then, given the production
rules.
P={
S aSb
S ab
}
4
Compiled by Fikru T.
Hence, int, id, id, id; can be defined by means of the above context free grammars.
Following rules to be followed while writing CFG.
1. A single non-terminal should be at LHS.
2. The rule should be always in the form of LHS → RHS where RHS may be the
combination of non-terminal and terminal symbols.
3. The NULL derivation can be specified as NT → ε
4. One of the non-terminals should be start symbol and congenitally we should write the
rules for this non-terminal.
3.3.1 Derivation and Parse Tree
Derivation from S means generation of string W from S. For constructing derivation two things
are important.
i. Choice of Non-terminal from several others.
ii. Choice of rule from production rules for corresponding non-terminal
Definition of Derivation Tree
Let G = (V, T, P, S) be a Context Free Grammar.
The derivation tree is a tree a which can be constructed by following properties:
i. The root has label S.
ii. Every vertex can be derived from (V U T U ε).
5
Compiled by Fikru T.
iii. If there exists a vertex A with children R1, R2, …., Rn then there should be production A →
R1, R2, …, Rn.
iv. The leaf nodes are from set T and interior nodes are from set V.
Instead of choosing the arbitrary non-terminal one can choose:
i. Either leftmost non-terminal in a sentential form then it is called leftmost derivation.
ii. Or rightmost non-terminal in a sentential form, then it is called rightmost derivation.
Example 1: Let G be a context free grammar for which the production rules are given as
below:
6
Compiled by Fikru T.
The structure shown above is called parse tree.
Example 2: Design a derivation tree for the following grammar-
Also obtain the leftmost and rightmost derivation for the string ‘aaabaab’ using above grammar.
Solution: Leftmost derivation Rightmost derivation
(a) Derivation Tree for Leftmost Derivation (b)Derivation Tree for Rightmost Derivation
Example: Consider the grammar given below:
7
Compiled by Fikru T.
Solution:
(a) Derivation Tree for Leftmost Derivation (b) Derivation Tree for Rightmost Derivation
3.3.2 Ambiguous Grammar
A grammar G is said to be ambiguous if it generates more than one parse tree for a sentences of
a language L(G).
Example 1:
Then for id+id*id
8
Compiled by Fikru T.
There are two different parse trees for deriving string aab. Hence the above given grammar is
an ambiguous grammar.
Example 2: Show that following grammar is ambiguous or not.
9
Compiled by Fikru T.
Types of Parsers
The derivation terminates when the required input string terminates. TDP internally uses leftmost
derivation. TDP is constructed for the grammar if it is free from ambiguity and left recursion. The
leftmost derivation matches this requirement. The main task in top-down parsing is to find the
appropriate production rule in order to produce the correct input string.
Example: consider the grammar
10
Compiled by Fikru T.
Consider the input string xyz is as shown below:
Now we will construct the parse tree for above grammar deriving the given input string. And for
this derivation we will make use of top-down approach.
Step 1:
The first leftmost leaf of the parse tree matches with the first input symbol. Hence, we will
advance the input pointer. The next leaf node is P. We have to expand the node P. After expansion
we get the node Y which matches with the input symbol Y.
Step 2:
Now the next node is w which is not matching with the input symbol. Hence, we go back to see
whether there is another alternative of P. Another alternative for P is Y which matches with the
current input symbol. And thus, we could produce a successful parse tree for the given input.
Step 3:
We halt and declare that the parsing is completed successfully. In top-down parsing selection of
proper rule is very important task. And this selection is based on trial-and-error technique. That
means we have to select a particular rule and if it is not producing the correct input string then we
need to backtrack and then we have to try another production. This process has to be repeated
until we get the correct input string. After trying all the productions if we found every production
unsuitable for the string match then in that case the parse tree cannot be built.
Problems with Top-Down Parsing
There are certain problems in top-down parsing. In order to implement the parsing, we need to
eliminate these problems. Let us discuss these problems and how to remove them.
1) Backtracking
11
Compiled by Fikru T.
Backtracking is a technique in which for expansion of no-terminal symbol we choose one
alternative and if some mismatch occurs then we try another alternative if any.
For example: then
If for a non-terminal there are multiple production rules beginning with the same input symbol
to get the correct derivation, we need to try all these possibilities. Secondly, in backtracking we
need to move some levels upward in order to check possibilities. This increases lot of overhead in
implementation of parsing. And hence it becomes necessary to eliminate the backtracking by
modifying the grammar.
2) Left Recursion:
Left recursive grammar is a grammar which is a given below:
Here means deriving the input in one or more steps. The A can be non-terminal
and α denotes some input string. If left recursion is present in the grammar then it creates serious
problem. Because of the left recursion the top-down parser can enter infinite loop. This is as shown
in the following Figure below.
The expansion of A causes further expansion of A and due to generation of A, Aα, Aαα, Aααα,
…, the input pointer will not be advanced. This causes major problem in top-down parsing and
12
Compiled by Fikru T.
therefore elimination of left recursion is a must. To eliminate left recursion, we need to modify the
grammar. Let, G be a context free grammar having a production rule with left recursion.
Thus, a new symbol A' is introduced. We can also verify whether the modified grammar is
equivalent to original or not.
13
Compiled by Fikru T.
The grammar for arithmetic expressions can be equivalently written as:
3) Left Factoring
If the grammar is left factored then it becomes suitable for the use. Basically, left factoring is
used when it is not clear that which the two alternatives are used to expand the non-terminals. By
the factoring we may be able to re-write the production in which the decision can be defined until
enough of the input is seen to make the first choice.
For example: consider the following expressions:
4) Ambiguity
The ambiguous grammar is not desirable in top-down parsing. Hence, we need to remove the
ambiguity from the grammar if it is present.
Example: is an ambiguous grammar.
We will design the parse tree for id+id*id as follows:
14
Compiled by Fikru T.
For removing the ambiguity, we will apply one rule. If the grammar has left associative operator
(such as +, -, *, /) then induce the left recursion and if the grammar has right associative operator
(exponential operator) then induce the right recursion. The unambiguous grammar is:
Note one thing that the grammar is unambiguous but it is left recursive and elimination of such
left recursion is again must.
15
Compiled by Fikru T.
II. LL (1) Parser
1. Recursive Descent Parser
A parser that uses collection of recursive procedures for parsing the given input string is called
Recursive Descent (RD) Parser). This type of parser the CFG is used to build the recursive
routines. The RHS of the production rule is directly converted to a program. For each non-terminal
a separate procedure is written and body of the procedure (code) is RHS of the corresponding non-
terminal.
Basic steps for construction of RD Parser
The RHS of the rule is directly converted into program code symbol by symbol.
1. If the input symbol is non-terminal, then a call to the procedure corresponding to the no
terminal is made.
2. If the input symbol is terminal, then it is matched with the lookahead from input.
3. If the production rule has many alternates, then all these alternates have to be combined
into a single body of procedure.
4. The parser should be activated by a procedure a corresponding to the start symbol.
Example: Consider the grammar having start symbol S.
To construct a parse top-down for the input string w=cad, begin with a tree consisting of a
single node labeled S and the input ptr. pointing to c, the first symbol of w. S has only one
production, so we use it to expand S and obtain the tree as in figure.
So leftmost leaf labeled c matches the first symbol of the input w, so we advance the input
ptr to a, the second symbol of w and consider the next leaf labeled A.
Now we advanced A using the leftmost alternative. We have a match for the second input
symbol a, so we advance the input ptr to d, the third input symbol and compare d against the
16
Compiled by Fikru T.
next leaf, labeled b. Since b does not match d, we report failure and go back to A to see whether
there is another alternative for A that has not been tried, but that might produce a match. In
going back to A, we must reset the input pointer to position 2, the position it had when first
came to A, which means that the procedure for A must store the input pointer in a local variable.
The second alternative for A matches for A procedures the tree of Figure (c). The leaf a
match the second symbol of w and the leaf d matches the third symbol. Since we have produced
a parse tree for w, we halt and announce successful completion of parsing.
2. LL (1) Parser
The simple block diagram for LL (1) parser is as given below:
17
Compiled by Fikru T.
Top Input token Parsing Action
$ $ Parsing successfully halt
a A Pop a and advance lookahead to next token
a B Error
A a Refer table M[A,a] if entry at M[A,a] is error report error
A a Refer table M[A,a] is entry at M[A,a] is A→PQR then pop A then push R,
then push Q, then push P
dimensional array. The table has row for non-terminal and column for terminals. The table can
be represented as M[A,a] where A is a non-terminal and a is current input symbol. The parser works
as follows:
The parsing program reads top of the stack and a current input symbol. With the help of these
two symbols the parsing action is determined. The parsing action can be
The parser consults the table M[A,a] each time while taking the parsing actions hence this type
of parsing method is called table driven parsing algorithm. The configuration of LL (1) parser
can be defined by top of the stack and a lookahead token. One by one configuration is performed
and the input is successfully parsed if the parser reaches the halting configuration. When the stack
is empty and next token is $ then it corresponds to successful parse.
Configuration of Predictive LL (1) Parser
The construction of predictive LL (1) parser is based on two very important functions and those
are FIRST and FOLLOW.
For construction of Predictive LL (1) parser we have to follow the following steps:
1. Computation of FIRST and FOLLOW function
2. Construct the Predictive Parsing Table using FIRST and FOLLOW functions
3. Parse the input string with the help of Predictive Parsing Table
First Function
First (α) is a set of terminal symbols that are first symbols appearing at the RHS in derivation
of α. If α→ε then ε is also in FIRST (α).
Following are the rules used to compute the FIRST functions
1. If terminal symbol a the FIRST(a) ={a}
2. If there is a rule X→ε then FIRST(X) ={ε}
18
Compiled by Fikru T.
3. For the rule A→X1 X2 X3….Xk FIRST(A)=(FIRST(X1) U FIRST(X2) U(FIRST(X3)
….FIRST(XN) where k Xj<n such that 1<k-1
FOLLOW Function-
Follow(A) is defined as the set of terminal symbols that appear immediately to the right of A.
In other words,
FOLLOW(A)= {a|S→α Aa β where α and β are some grammar symbols may be terminal or
non-terminal.
The rules for computing FOLLOW function are as given Below:
1. For the start symbol S place $ in FOLLOW (S).
2. If there is a production A→αBβ then everything in FIRST(β) without ε is to be placed
in FOLLOW(B).
3. If there is a production rule A→αBβ or A →αβ and FIRST (β) ={ε} then FOLLOW(A)
=FOLLOW(B) or FOLLOW(B)=FOLLOW(A). That means everything in
FOLLOW(A) is in FOLLOW(B).
Example:1 Consider grammar
Find the FIRST and FOLLOW functions for the above grammar
Solution: As E→TE’ is the rule in which the first symbol at RHS is T.
Now T→FT’ in which the first symbol at RHS is F there is a rule for F as
F→(E)|id.
FIRST(E) = FIRST(T)=FIRST(F)
As F→(E)
F → id
Hence FIRST(E) = FIRST(T) = FIRST(F) = {(, id}
FIRST(E’) = (+, ε)
As E’→+TE’
E → ε by referring computation rule 2.
The first symbol appearing at RHS of production rule for E’ is added in the FIRST function.
FIRST(T’) = (*, ε)
19
Compiled by Fikru T.
As T→*TE’
E’→ ε
The first terminal symbol appearing at RHS of production rule for T’ is added in the FIRST
function.
Now we will compute FOLLOW Function.
FOLLOW(E) –
As there is a rule F → (E) the symbol ‘)’ appears immediately to the right of E.
Hence ‘)’ will be in FOLLOW(E).
The computation rule is A → αBβ we can map this rule with F → (E) then
A = F, α = (, B = E, β =).
FOLLOW(B) = FIRST(β) = FIRST (); = {)}
FOLLOW(B) = {)}
Since E is a start symbol add $ to follow of E.
Hence FOLLOW(E) = {), $}
FOLLOW(E’) –
E → TE’ the computation rule A → αBβ
Therefore, A = E’, α = T, B = E’, β = ε then by computation rule 3 everything in FOLLOW(A)
is in FOLLOW(B), i.e. everything in FOLLOW(E) is in FOLLOW(E’).
FOLLOW(E’) = {), $}
ii. E’ → +TE’ the computation rule is A → αBβ
Therefore, A = E’, α = +T, B = E’, β = ε then by computation e everything in FOLLOW(A) is
in FOLLOW(B) i.e. Everything in FOLLOW(E) is in FOLLOW(E’)
FOLLOW(E’) = {), $)
We can observe in the given grammar that) is really following E.
FOLLOW(T) –
We have to observe two rules
Consider we will map it with
A = E, α = ε, B = T, β = E’ by computational rule 2 FOLLOW(B)=(FIRST(β- ε)
that is FOLLOW(T) = {FIRST(E’)- ε}
= {{+, ε}- ε}
={+}
20
Compiled by Fikru T.
Consider we will map it with
A = E’, α = +, B = T, β = E’ by computational rule 3 FOLLOW(A)=(FIRST(B)
that is FOLLOW(E’) = FOLLOW(T)
FOLLOW(T) = {), $)
Finally FOLLOW(T) = {+} U {), $}
= {+,), $}
We can observe in the given grammar that + and) are really following T.
FOLLOW(T’) –
We will map this rule with then A = T, α = F, B = T’, β = ε then
FOLLOW(T’) = FOLLW(T) = {+,), $}
FOLLOW(F) –
Consider or then by computation rule.
E { (, id )} { ), $ }
E’ { +, ε } { ), $ }
21
Compiled by Fikru T.
T { (, id } { +, ), $ }
T’ { *, ε } { +, ), $ }
F { (, id } { +, *, ), $ }
Algorithm for Predictive Parsing Table
The construction of predictive parsing table is an important activity in predictive parsing method.
this algorithm requires FIRST and FOLLOW functions.
Input: The Context Free Grammar G.
Output: Predictive Parsing Table M.
Algorithm:
For the rule A → α of grammar G
1. For each a in FIRST(α) create entry M [A, a] = A → α where a is terminal symbol.
2. For Ɛ in FIRST(α) create M [A, b] = A → α where b is the symbols from FOLLOW(A).
3. If Ɛ in FIRST(α) and $ is in FOLLOW(A) then create entry in the table M [A, $] = A → α.
4. All the remaining entries in the table M are marked as SYNTAX ERROR.
Example 1: Now we will make use of the above algorithm to create the parsing table for the
grammar
Now we will fill up the entries in the table using the above algorithms. For that consider each rule
one by one.
22
Compiled by Fikru T.
E → TE'
A→α
A=E, α=TE’
FIRST(TE’) if E’=Ɛ FIRST(T) = {(, id)
M [E, (] = E → TE’
M [E, id] = E → TE’
E’ → +TE’
A→α
A=E’, α=+TF’
FIRST(+TF) = {+}
Hence M [E, +] = E' → +TE'
E' → ε
A→α
A=E', α= ε then
FOLLOW (E') = {), $}
Hence M [E',)] = E' → ε
M [E', $] = E' → ε
T → FT'
A→α
A=T, α= FT' then
FIRST (FT') = FIRST(F)= {(, id}
Hence M [F, (] = T→ FT'
And M [F, id] = T→ FT'
T → *FT'
A→α
A=T, α= *FT' then
FIRST (*FT') = {*}
Hence M [T, *] = T→ *TF'
T' → ε
A→α
A=T', α=ε
23
Compiled by Fikru T.
FOLLOW (T') = {+,), $}
Hence M [T', +] = T' → ε
M [T',)] = T' → ε
M [T', $] = T' → ε
F → id
A→α
A=F, α=id
FIRST (id) = {id}
Hence M [F, id] = F → id
The complete table can be as shown below:
id + * ( ) $
E E → TE' Error Error E → TE' Error Error
E' Error E' → +TE' Error Error E' → ε' E' → ε
T T → FT' Error Error T' → FT' Error Error
T' Error T' → ε T' → *FT' Error T' → ε T' → ε
F F → id Error Error F → id Error Error
Now the input string id + id * id $ can be parsed using above table. At the initial configuration
the stack will contain start symbol E, in the input buffer input string is placed.
Stack input Action
$E Id + id * id $
Now symbol E is at top of the stack and input pointer is at first id., hence M [E, id] is referred.
This entry tells us E → TE', so we will push E' first then T.
24
Compiled by Fikru T.
Stack input Action
$E'T id + id * id $ E → TE'
$E'T'F id + id * id $ T → FT'
$E'T'id id + id * id $ F → id
$E'T' + id * id $ Pop id
$E'T' + id * id $ T' → ε
$E'T+ + id * id $ E' → +TE'
$E'T id * id $ Pop +
$E'T'F id * id $ T → FT'
$E'T'id id * id $ F → id
$E'T' * id $ Pop id
$E'T'F* * id $ T' → *FT'
$E'T'F id $ Pop *
$E'T'id id $ F → id
$E'T' $ Pop id
$E'T' $ T' → ε
$E' $ E’ → ε
$ $ Accept
Thus, it is observed that the input is scanned from left to right and we always follow left most
derivation while parsing input string. Also, at a time only one input symbol is referred to taking
the parsing action. Hence the name of this parser is LL (1). The LL (1) Parser is a table-driven
predictive parser. The left recursion and ambiguous grammar are not allowed for LL (1) parser.
Example 2: Show that following grammar:
S → AaAb | BbBa
A→ε
B→ε
is LL (1).
Solution: Consider the grammar:
S → AaAb
S → BbBa
A→ε
25
Compiled by Fikru T.
B→ε
Now we will compute FIRST and FOLLOW functions.
FIRST(S) = {a, b} if we put
S → AaAb
S → aAb When A → ε
also, S → BbBa
S → bBa When B → ε
FIRST(A) = FIRST(B) = {ε}
FOLLOW(S) = {$}
a b $
FOLLOW(A) = FOLLOW(B) = {a, b}
S S → AaAb S → BbBa
The LL (1) parsing table is
A A→ε A→ε
B B→ε B→ε
26
Compiled by Fikru T.
B → bBc
B→f
C→g
Now we will construct FIRST and FOLLOW for the above grammar.
FIRST(S) = {a}
FIRST(B) = {b, f}
FIRST(C) = {g}
FOLLOW(S) = {d, e, $}
FOLLOW(B) = {c}
FOLLOW(C) = {d, e}
The LL (1) parsing table can be as shown below:
a b c d e f g $
S S → aC
S → aB
B B → bBc B→f
C C→g
The above table shows multiple entries in table [S, a]. this shows that the given grammar is not
LL (1).
3.4.2 Bottom-Up Parser
In bottom-up parsing method, the input string is taken first and we try to reduce this string with
the help of grammar and try to obtain the start symbol. The process of parsing halts successfully
as soon as we reach to start symbol.
The parse tree is constructed from bottom to up that is from leaves to root. In this process, the
input symbols are placed at the leaf nodes after successful parsing. The bottom-up parse tree is
created starting from leaves, the leaf nodes together are reduced further to internal nodes, these
internal nodes are further reduced and eventually a root node is obtained. The internal nodes are
created from the list of terminal and non-terminal symbols. This involves:
Non-terminal for internal node = Non terminal U terminal
In this process, basically parser tries to identify RHS of the production rule and replace it by
corresponding LHS. This activity is called reduction. Thus, the crucial but prime task in bottom-
27
Compiled by Fikru T.
up parsing is to find the productions that can be used for reduction. The bottom-up parse tree
construction process indicates that the tracing of derivations is to be done in reverse order.
Example 1: Consider the grammar for declarative statement:
S → TL;
T → int | float
L → L, id | id
The input string is float id, id;
Parse Tree
Step 1: We will start from leaf node.
Step 2:
Step 5: Reducing id to L. L → id
28
Compiled by Fikru T.
Step 8: gets reduced
Fig.Bottom-up parsing
Step 10: The sentential form produced while constructing this parse tree is
float id, id;
T id, id;
T, L, id;
TL;
S
Step 11: Thus, looking at sentential form we can say that the rightmost derivation in reverse order
is performed.
Thus, basic steps in bottom-up parsing are:
1. Reduction of input string to start symbol
2. The sentential forms that are produced in the reduction process should trace out
rightmost derivation in reverse.
As said earlier, the crucial task in bottom up parsing is to find the substring that could be
reduced by appropriate non terminal. Such a substring is called handle.
29
Compiled by Fikru T.
In other words, handle is a string of substring that matches the right side of production and
we can reduce such string by a non-terminal on left hand side production. Such reduction
represents one step along the reverse of rightmost derivation. Formally we can define handle
as follow:
Handle of right sentential form ɤ is a production A → β a position of ɤ where the string β may
be found and replaced by A to produce the previous right sentential form in rightmost
derivation of ɤ.
For example
Consider the grammar
E → E+E
E →id
Now consider the string id + id + id and the rightmost derivation is
30
Compiled by Fikru T.
2.Reduce: if the handle appears on the top of the stack then reduction of it by appropriate rule
is done. That means RHS of rule is popped of and LHS is pushed in. This action is called
Reduce action.
3. Accept: if the stack contains start symbol only and input buffer is empty at the same time
then that action is called accept. When accept state is obtained in that process of parsing
then it means a successful parsing is done.
4.Error: A situation in which parser cannot either shift or reduce the symbols, it cannot even
perform the accept action is called as error.
Example 1: Consider the grammar
E→E-E
E→E*E
E →id
Perform shift-Reduce parsing of the input string id1 - id2 * id3
Solution:
Stack Input Buffer Parsing Action
$ id1 - id2 * id3$ Shift
$id1 - id2 * id3$ Reduce by E → id
$E - id2 * id3$ Shift
$E- id2 * id3$ Shift
$E-id2 * id3$ Reduce by E → id
$E-E * id3$ Shift
$E-E* id3$ Shift
$E-E*id3 $ Reduce by E → id
$E-E*E $ Reduce by E → E*E
$E-E $ Reduce by E → E-E
$E $ Accept
31
Compiled by Fikru T.
Stack Input Buffer Parsing Action
$ int id, id; $ Shift
$int id, id; $ Reduce by T → int
$T id, id; $ Shift
$Tid id; $ Reduce by L→ id
$TL id; $ Shift
$TL, id; $ Shift
$TL, id ;$ Reduce by L → L, id
$TL ;$ Shift
$TL; $ Reduce by S → TL;
$S $ Accept
Example 3: Consider the following grammar
S → (L) | a
L → L, S | S
Parse the input string (a, (a, a)) using reduce parser.
Solution:
Stack Input Buffer Parsing Action
$ (a, (a, a)) $ Shift
$( a, (a, a)) $ Shift
$(a , (a, a)) $ Reduce by S→ a
$(S , (a, a)) $ Reduce by L→ S
$(L , (a, a)) $ Shift
$(L, (a, a)) $ Shift
$(L, ( a, a)) $ Shift
$(L, (a , a)) $ Reduce by S → a
$(L, (S , a)) $ Reduce by L → S
$(L, (L , a)) $ Shift
$(L, (L, a)) $ Shift
$(L, (L,a )) $ Reduce by S → a
$(L, (L,S )) $ Reduce by L → L, S
$(L, (L )) $ Shift
$(L, (L) )$ Reduce by S → (L)
$(L, S )$ Reduce by L → L, S
$(L )$ Shift
$(L) $ Reduce by S → (L)
$S $ Accept
3.4.2.2 LR Parser
This is the most efficient method of bottom-up parsing which can be used to parse the large
class of context free grammars. This method is also called LR(k) parsing.
Here
32
Compiled by Fikru T.
• L stands for left to right scanning
• R stands for rightmost derivation in reverse
• K is number of input symbols. When k is omitted, k is assumed to be 1.
Properties of LR Parser
LR parser is widely used for following reasons:
1. LR parsers can be constructed to recognize most of the programming languages for which
context free grammar can be written.
2. The class of grammar that can be parsed by LR parser is a superset of class of grammars
that can be parsed using predictive parsers.
3. LR parser works using non backtracking shift reduce technique yet it is efficient one.
4. LR parsers detect syntactical errors very efficiently.
Structure of LR Parsers
The structure of LR parser is as given in following Fig. 3.13.
33
Compiled by Fikru T.
1. It initializes the stack with start symbol and invokes scanner (lexical analyzer) to get next
token.
2. It determines sj the state currently on the top of the stack and ai the current input symbol.
3. It consults the parsing table for the action [sj, ai] which can have one of the four values
a. si means shift state i
b. rj means reduce by rule j
c. Accept means successful parsing is done.
d. Error indicates syntactical error.
Types of LR Parser
Following diagram represents the types of LR parser.
34
Compiled by Fikru T.
Fig.3.13 Working of SLR (1) parser
A grammar for which SLR parser can be constructed is called SLR grammar.
Definition of LR (0) items and related terms:
1. The LR (0) item for grammar G is production rule in which symbol ● is inserted at some
position in RHS of the rule. For example:
S → ●ABC
S → A●BC
S → AB●C
S → ABC●
The production S → ε generates only one item S → ●
2.Augmented grammar: is a grammar G is having start symbol S then augmented grammar is
a new grammar G’ in which S’ is a new start symbol such that S’→ S. The purpose of this
grammar is to indicate the acceptance of input. That is when parser is about to reduce S’→ S
it reaches to acceptance state.
3.Kernel items: It is collection of all items S → ●S and all the items whose dots are not at the
leftmost end of RHS of the rule.
Non-kernel items: The collection of all the items in which ● are at the left end of RHS of the
rule.
4. Functions closure and goto: These are two important functions required to create collection
of canonical set of items.
5. Viable Prefix: It is the set of prefixes in the right sentential form of production A → α
Closure operation
For a context free grammar G, if I is the set of items then the function closure(I) can be
constructed using following rules.
35
Compiled by Fikru T.
1. Consider I is a set of canonical items and initially every item I is added to closure(I).
2. If rule A →α ●Bβ is a rule in closure (I) and there is another rule for B such as B → ɤ then
closure (I): A →α ●Bβ
B→ɤ
This rule has to be applied until no more new items can be added to closure (I).
The meaning of rule A →α ●Bβ is that during derivation of the input string at some point we
may require strings derivable from Bβ as input. A non-terminal immediately to the right of ●
indicates that it has to be expanded shortly.
goto operation:
The function goto can be defined as follows:
If there is a production A →α ●Bβ then goto (A →α ●Bβ, B) = A →αB●β. That means
simply shifting of ● one position to ahead over the grammar symbol (may be terminal or non-
terminal). The rule A →α●Bβ. Is in I then the same goto function can be written as goto (I, B).
Example 1: Consider the grammar:
X → Xb | a
Compute closure(I) and goto(I).
Solution: Let
I: X → ●Xb
Closure(I) = X → ●Xb
X → ●a
The goto function can be computed as
goto (I, X) = X → X ● b
X→X●b
Similarly goto(I,a) gives X → a●
Example 2: Consider the grammar
S → AS | b
A → SA | a
Compute closure(I) and goto (I)
36
Compiled by Fikru T.
Solution: We will first write the grammar using dot operator.
37
Compiled by Fikru T.
I1 : goto (I0, A)
S→A●a
I2 : goto (I0, b)
S → b ● Ac
I2 : goto (I0, b)
S → Bc
S → b ● Ba
A → ●d
B → ●d
I3 : goto (I0, B)
S→B●c
I4 : goto (I0, d)
A → d●
B → d●
Construction of canonical collection of set of items:
1. For the grammar G initially add S' → ● S in set of item C.
2. For each set of items Ii in C and for each grammar symbol X (may be terminal or
non-terminal) add closure ( Ii, X). This process should be repeated by applying goto
(Ii, X) for each X in Ii such that goto (Ii, X) is not empty and not in C. the set of
items has to constructed until no more set of items can be added to C.
Now we will consider one grammar and construct the set of items by applying closure and goto
functions.
Example:
E→E+T
E→T
T→T*F
T→F
F → (E)
F → id
38
Compiled by Fikru T.
In this grammar we will add the augmented grammar E → ●E in the I then we have to apply
closure(I).
The item I0 is constructed starting from the grammar E’ → ● E. Now immediately right to ● is
E. Hence, we have applied closure(I0) and thereby we add E-productions with ● at the end of the
rule. That means we have added E → ● E + T and E → ● T in I0. But again, as we can see that the
rule E → ● T which we added, contains non-terminal T immediately right to ●. So, we have to
add T-productions in I0 T → ● T * F and T → * F.
In T-productions after ● comes T and F respectively. But we have already added T-productions
so we will not add those. But we will add all the F-productions having dots. The F → ●(E) and F
→ ● id will be added. Now we can see that after dot (and id are coming in these two productions.
The (and id are terminal symbols and are not deriving any rule. Hence our closure function
terminates over here. Since there is no rule further, we will stop creating I0.
Now apply goto (I0, E)
Thus, I1 becomes:
39
Compiled by Fikru T.
By applying goto on F of I0
By applying goto on id of I0
Since in I5 there is non-terminal to the right of dot we cannot apply closure function here. Thus,
we have completed applying goto on I0. We will consider I1 for applying goto. In I2 there are two
productions E’ → E ● and E → E ● + T. There is no point applying goto on E’→ E ●, hence we
will consider E’→ E ● + T for application of goto.
40
Compiled by Fikru T.
The goto cannot be applied on I3. Apply goto on E in I4. In I4 there are two productions having
E after dot (F → ● E and E → ● E + T). Hence, we will apply goto on both of those productions.
The I8 becomes
If we will apply goto on (I4, T) but we get E → T● and T → T ● *F which is I2 only. Hence,
we will not apply goto on T. similarly we will not apply goto on F, (and id as we get the states I3,
I4, I5 again. Hence these goto cannot be applied to avoid repetition.
There is no point applying got on L5. hence now we will move ahead by applying goto on I6 for
T.
Then,
Then,
Applying goto on l9, I10, I11 is of no use. Thus, now there is no item that can be added in the set
of items. The collection of set of items is from I0 to I11.
Construction SLR Parsing Table
As we have seen in the structure of SLR parser that are two parts of SLR parsing table and those
are action and goto. By considering basic parsing actions such as shift, reduce, accept and error
41
Compiled by Fikru T.
we will fill up the action table. The goto table can be filled up using goto function. Let us see the
algorithm:
Input: An Augmented grammar G’
Output: SLR parsing table.
Algorithm:
1. Initially construct set of items C= {I0, I1, I2…In} where C is a collection of set of LR (0)
items for input grammar G’.
2. The parsing actions are based on each item Ii. The actions are as given below:
a. If A → αβ Ii and goto (Ii, a) =Ij then set action (I, a) as “shift j”. note that a must
be a terminal symbol.
b. If there is a rule A →α● is in Ii then set action [I, a] to “reduce A →α” for all
symbols a where a ϵ FOLLOW(A). Note that A must not an augmented grammar
S’.
c. If S’ → S is in Ii then the entry in the action table action [I, $] = “accept”.
3. The goto part of the SLR table can be filled as: the goto transitions for state I is
considered for non-terminals only. If goto [Ii, A] =Ij then goto [Ii, A] =j.
4. All the entries not defined by rule 2 and 3 are considered to be “error”.
Example 1: Construct the SLR (1) parsing table for
Solution: We will first construct a collection of canonical set of items for the
above grammar. The set of items generated by this method are also called SLR (0) items. As there
is no lookahead symbol in this set of items, zero is put in the bracket.
42
Compiled by Fikru T.
43
Compiled by Fikru T.
We can design a DFA for above set of items as follows:
44
Compiled by Fikru T.
The viable prefixes, E, E+ and E+T are recognized here continuing in this fashion. The DFA
can be constructed for set of items. Thus, DFA helps in recognizing valid viable prefixes that can
appear on the top of the stack.
Now we will also prepare a FOLLOW set for all the non-terminals because we require
FOLLOW set according to rule 2.b of parsing table algorithm.
FOLLOW(E’) = As E’ is a start symbol $ will be placed.
= {$}
FOLLOW(E):
E’ → E that means E’=E = start symbol. ⸫ we will add $.
E→E+T the + is following E ⸫ we will add +.
F → (E) the) is following E. ⸫ we will add ).
⸫ FOLLOW(E) = {+,), $}
FOLLOW(T):
As E’ → E, E → T the E’=E=T=start symbol. ⸫we will add $.
45
Compiled by Fikru T.
As ) is following T we will add )
⸫ FOLLOW(T) = {+, *,), $}
FOLLOW(F):
As E’ → E, E → T and T → F the E’=E=T=F=start symbol. We will add $.
46
Compiled by Fikru T.
Consider F → ● (E)
A → ● aβ
A = F, α = ε, a = (β = E)
goto (I0, ( ) = I4
⸫action [0, ( ] = shift 4
Similarly, for F → ● id
Entry in the action table action [0, id] = shift5 ⸫ goto(I0,id) = I5
Other item in I0 does not give any action. Hence, we will find the actions from I0 to I11.
Thus, SLR parsing table is filled up with the shift actions. Now we will fill it up with reduce
and accept action.
According to the rule 2.c from parsing table algorithm, there is a production E’ → E● in I1.
Hence, we will add the action “accept” in action [1, $].
Now in state I2.
47
Compiled by Fikru T.
Hence add rule E → T in the row of state 2 and in column of +,) and $. In the given example of
grammar E → T is rule number 2. ⸫action [2, +] = r2, action [2,)] = r2, action [2, $] = r2.
Similarly, now in state I3
Hence add rule T → F in the row of state 3 and in the column of +, *,) and $. In the given
example of grammar T → F is rule number 4. Therefore action [3, +] =r4, action [3,) =r4, action
[3, $] =r4. Thus, we can find the match for the rule A → α ● from remaining states I4 to I11 and fill
up the action table by respective “reduce” rule. The table with all the action [ ] entries will be:
48
Compiled by Fikru T.
Now we will fill up the goto table for all the non-terminals. In state I0, goto (I0, E) =I1. Hence
goto (0, E) =1, similarly goto (I0, T) =I2. Hence goto (0, T) =2. continuing in this fashion we can
fill up the goto entries of SLR parsing table.
Finally, the SLR (1) parsing table will look as:
49
Compiled by Fikru T.
• If action [S, a] = shift j then push a, then push j onto the stack. Advance the input
lookahead pointer.
• If action [S, a] = reduce A → β then pop 2*|β| symbols. If i is on the top of the stack
then push A then push goto [i, A] on the top of the stack.
• If action [S, a] = accept then halt the parsing process. It indicates the successful
parsing.
Let us take one valid string for the grammar
Input string: id * id + id
We will consider two data structures while taking the parsing actions and those are stack and
input buffer.
50
Compiled by Fikru T.
In the above table at first row we get action [0, id] =s5, that means shift id from input buffer
onto the stack and then push the state number 5. On the second row we get action [5, *] as r6 that
means reduce by rule 6, F → id, hence in the stack id is replaced by F. By referring goto [0, F] we
get state number 3 hence 3 is pushed onto the stack. Note that for every reduce action goto is
performed. This process of parsing is continued in this fashion and finally when we get action [1,
$] = accept we halt by successfully parsing the input string.
Example 2: Consider the following grammar
Construct the SLR parsing table for this grammar. Also parse the input a * b + a.
Solution: Let us number the production rules in the grammar.
Now we will build the canonical set of SLR (0) items. We will first introduce an augmented
grammar. We will first introduce an augmented grammar E’ → ●E and then the initial set of items
I0 will be generated as follows.
51
Compiled by Fikru T.
Now we will use goto function. From state I0 goto on E, T, F and a, b will be applied step by
step. Each goto transition will be generate new state Ii.
Now we will start applying goto transitions on state I1. From I1 state it is possible to apply goto
transitions only on +. Hence,
52
Compiled by Fikru T.
The goto transitions will be applied on I2 state now. We will choose goto transition because
there is no point applying goto on T.
If we will apply got on a or b from state I2 then we will get F → a which are state I4 and I5
respectively.
Hence, we will not consider I4 and I5 again. Now move to state I3. From I3 goto transitions on
*. Hence,
As there is no point in applying goto on state I4 and I5. We will choose state I6 for goto transition.
Now we will first obtain FOLLOW of E, T and F. as the FOLLOW computation is required
when the SLR parsing table is building.
FOLLOW(E) = {+, $}
FOLLOW(T) = {+, a, b, $}
FOLLOW(F) = {+, *, a, b, $}
As by rule 2 and 4, E → T and T → F we can state E = T = F. But E is a start symbol. Then by
rule 2 and 4, T and F can act as start symbol. ⸫we have added $ in FOLLOW(E), FOLLOW(T)
and FOLLOW(F).
The SLR parsing table can be constructed as follows:
53
Compiled by Fikru T.
Now we will parse the input a * b + a using above parse table.
54
Compiled by Fikru T.
Example 3: show that the following grammar:
S → Aa | bAc | dc | bda
A→d is not SLR (1).
Solution: We will number the production rules in the grammar.
55
Compiled by Fikru T.
I6: goto (I3, A)
S → bA ● c
I7: goto (I3, d)
S → bd ● a
A→d●
I8: goto (I4, c)
S → dc ●
I9: goto (I6, c)
S → bAc ●
I10: goto (I7, a)
S → bda ●
Now we will construct FOLLOW(S) and FOLLOW(A).
FOLLOW(S) = {$}
FOLLOW(A) = {a, c}
The construction of SLR (1) parsing table is done with the help of set of items. The parsing table
is a given below:
56
Compiled by Fikru T.
The above table shows clearly that there are multiple entries in Action [7, a] and Action [4, c].
That means shift/reduce conflict will occur while parsing the input using this SLR parsing table.
This shows that the given grammar is not SLR (1).
LR(k) Parser
The canonical set of items is the parsing technique in which a lookahead symbol is generated
while constructing set of items. Hence the collection of set of items is referred as LR (1).
The value 1 in the bracket indicates that there is one lookahead symbol in the set of items.
We follow the same steps as discussed in SLR parsing techniques and those are:
1. Construction of canonical set of items along with the lookahead.
2. Building canonical LR Parsing Table.
3. Parsing the input string using canonical LR Parsing Table.
Construction of canonical set of items along with the lookahead:
1. For the grammar G initially add S’ → ● S in the set of item C.
2. For each set of items Ii in C and for each grammar symbol X (may be terminal or non-
terminal) add closure (Ii, X). This process should be repeated by applying goto (Ii, X) for
each X in Ii such that goto (Ii, X) is not empty and not in C. The set of items has to
constructed until no more set of items can be added to C.
3. The closure function can be computed as follows:
Such that X →α ● X β a and X → ɤ and b ϵ FIRST(βa) such that X → ●ɤ and b is
not in I then add X → ●ɤ, b to I.
4. Similarly, the goto function can be computed as for each item [A →α ● X β, a] is in I and
rule [A →α X ● β, a] is not in goto items then add [A →α X ● β, a] to goto items.
This process is repeated until no more set of items can be added to the collection of C.
Example 1:
S’ → S
S → CC
C → aC | d
Construct LR (1) set of items for the grammar
Solution: We will initially add S’ → S, $ as the first rule in I0. Now match
57
Compiled by Fikru T.
58
Compiled by Fikru T.
59
Compiled by Fikru T.
60
Compiled by Fikru T.
You can note one thing that I3 and I6 are different because the second component in I3 and I6 is
different.
Apply goto on d of I2 for the rule C → ●d, $.
Now if we apply goto on a and d of I3 we will get I3 and I4 respectively and there is no point in
repeating the states. So, we will apply goto on C of I3.
For I4 and I5 there is no point in applying gotos. Applying goto on a and d of I6 gives I6 and I7
respectively. Hence, we will apply goto on I6 for C for the rule.
C → ●d, $.
For remaining states I7, I8 and I9 we cannot apply goto. Hence the process of construction of set
LR (1) items is completed. Thus, the set of LR (1) items consists of I0 to I9 states.
Construction of canonical LR Parsing Table
To construct the Canonical LR Parsing Table first of all we see the actual algorithm and then we
will learn how to apply that algorithm on some example. The parsing table is similar to SLR
parsing table comprised of action and goto parts.
Input: n Augmented grammar G’.
Output: The canonical LR parsing table
Algorithm:
1. Initially construct set of items C= {I0, I1, I2…In} where C is a collection of set of LR (1)
items for the input grammar G’.
2. The parsing actions are based on each item Ii. The actions are as given below:
a. If [A → α ● a β, b] is in Ii and goto (Ii, a) = Ij then create an entry in the action table
action[I,a]=shift j.
61
Compiled by Fikru T.
b. If there is a production A → α ● a] in Ii then in the action table action [Ii, a] = reduce
by A → α. Hence, A should not be S’.
c. If there is a production S’→ S●, $ in Ii then action [I, $] = accept.
3. The goto part of the LR table can be filled as: the goto transitions for state I is considered
for non-terminals only. If goto (Ii, A) = Ij then goto [Ii, A] =j.
4. All the entries not defined by rule 2 and 3 are considered to be “error”.
Example: Construct the LR (1) parsing table for the following grammar:
62
Compiled by Fikru T.
Fig.3.15 DFA [goto graph]
Now consider I0 in which there is a rule matching with [A→α●aβ,b] as C→●aC, a | d and if the
goto is applied on a the we get the state I3. Hence we will create entry action[0,a]=shift. Similarly,
In I0
C→●da|d
A → α ● a β, b
A=C, α=ε , a=d ,β=ε, b=a | d
goto (I0,d)=I4
Hence action[0,d]=shift 4
For state I4
63
Compiled by Fikru T.
C→d ●a|d
A→α●a
A=C, α=d, a=a | d
Action[4,a]=reduce by C → d i.e. ruel(3) of given grammar
S’ → S ●, $ in I1
So we will create action[1,$]=accept.
The goto table can be filled by using the goto functions.
For instance goto(I0,S)=I1. Hence goto(0,S)=1. Continueing in this fashion we can fill up the
LR(1) parsing table as follows:
64
Compiled by Fikru T.
Thus the given input string is successfully parsed using LR parser or canonical parser.
3.4.2.3 LALR Parser
In this type of parser the lookahead symbol is generated for each set of item. The table obtained
by this method are smaller in size than LR(k) parser. In fact the states os SLR and LALR parsing
are always same. Most of the programming languages use LALR parsers.
We follow the same steps as disccussed inSLR and canonical LR parsing techniques and those
are:
1. Construction of canonical set of items along with the lookahead
2. Building LALR parsing table
3. Parsing the input string using canoniacal LR parsing table
Construction of canonical set of items along with the lookahead:
The construction LR(1) items is same as dscussed in LR(1). But the only difference is that, in
construction of LR(1) items for LR parser, we have differed the two states if the second component
is different but in the case we will merge the two states of first and second components from both
the states.
Example 1:
65
Compiled by Fikru T.
We have merged two states I3 and I6 and made the second components as a or d or $. The
production rule will remain as it is. Similarly, in L4 and I7. The set of items consist of states {I0, I1,
I36, I47, I5, I89}.
Construction of LALR Parsing Table
The algorithm for construction of LALR Parsing Table is as given below:
1. Construct the LR (1) set of items.
2. Merge the two states Ii and Ij if the first component (i.e. the production rules with dots) are
matching and create a new state replacing one of the older state such as Iij = Ii U Ij.
3. The parsing actions are based on each item Ii. The actions are as given below.
a) If [A → α ● a β, b] is in Ii and goto (Ii, a) = Ij then create any entry in the action table action
[I, a] = shift j.
b) If there is production [A → α ●, a] in Ii in the action table action [Ii, a] = reduce by A → α.
Here A should not be S’.
c) If there is a production A → α ●, $ in Ii then action [I, $] = accept
4. The goto part of the LR table can be filled as: the goto transitions for state I is considered for
non-terminals only. If goto [Ii, A] = Ij then goto [I, A] =j.
66
Compiled by Fikru T.
5. If the parsing action conflict then the algorithm fails to produce LALR parser and grammar
is not LALR (1). All the entries not defined by rule 3 and 4 are considered to be “error
Example 1:
S → CC
C → aC
C→d
Construct the parsing table for LALR (1) parser.
Solution: First the set LR (1) items can be constructed as follows with merged states.
Now consider state I0 there is a match with the rule [A → α ● a β, b] and goto (Ii = a) = Ij.
C → ● aC, a | d | $ and if the goto is applied on a’ then we get the state I36. Hence, we will create
entry action [0, a] = shift 36. Similarly,
67
Compiled by Fikru T.
S’ → S ●, $ in I1
So, we create action [1, $] =accept.
The goto table can be filled by using the goto functions.
For instance, got (i0, S) = I1. Hence goto [0, S] = 1. Continuing in this fashion we
can fill up the LR (1) parsing table as follows.
The string belonging to given grammar can be parsed using LALR parser. The bank entries are
supposed to be syntactic error.
Parsing the Input String using LALR Parser
The string having regular expression = a * d a * dϵ grammar G. we will consider input string as
" aadd " for parsing by using LALR parsing table.
68
Compiled by Fikru T.
Thus, the LALR and LR parser will mimic one another on the same input.
Example 2: Construct LALR parsing table for the following grammar
S → Aa
S → bAc
S → dc
S → bda
A→d
Parse the input string bdc using table generated by you.
Solution: Let us first number the production rules as below:
Now we will construct canonical set of LR (1) items for the above grammar.
I0:
S’ → ● S, $
69
Compiled by Fikru T.
S → ● Aa, $
S → ● bAc, $
S → ● dc, $
S → ● bda, $
A → ● d, a
In the above set of items, we will start from S’ → ● S, $. The second component is $
initially. After ● the S comes, hence we will add the rules deriving S. Now we have got the
rule.
S → ● Aa, $
It is resembling with A → α ● x β, a
Now we can map x to A then the second component of X → is FIRST(βa)
In our rule
A → ● d and second component is FIRST($a) = a
Hence A → ● d, a will be added in I0.
I1: goto (I0, S)
S’ → S ●, $ we will carry second component as it is.
70
Compiled by Fikru T.
In the above set of canonical items, no states are having common production rules. Hence, we
cannot merge these states. The same set of items will be considered for building LALR parsing
table.
We will construct LALR parsing table using following rules.
1. If [A → α ●aβ, b] is in Ii and goto (Ii, a) = Ii then action [i, a] = shift j.
2. If there is a production [A → α ●, a] in some state Ii then action [i, a] = reduce by A → α.
3. If there is a product S’ → S ●, $ in Ii then action [i, $] = accept.
71
Compiled by Fikru T.
Consider the input "bdc” for parsing with the help of above LALR parsing table.
72
Compiled by Fikru T.
• Replacing a prefix by some string.
• Replacing comma by semicolon.
• Deleting extraneous semicolon.
• Inserting missing semicolon.
Advantage
• It can correct any input string.
Disadvantage
• It is difficult to cope up with actual error if it has occurred before the point of detection.
Error Production
Production which generate erroneous constructs are augmented to the grammar by considering
common errors that occur. These productions detect the anticipated errors during parsing. Error
diagnostics about the erroneous constructs are generated by the parser.
Global Correction
There are algorithms which make changes to modify an incorrect string into a correct string.
These algorithms perform minimal sequence of changes to obtain globally least-cost correction.
When a grammar G and an incorrect string w is given, these algorithms find a parse tree for a
string p related top with smaller number of transformations. The transformations may be insertions,
deletions, and change of tokens.
Advantage
• It has been used for phrase level recovery to find optimal replacement strings.
Disadvantage
• This strategy is too costly to implement in terms of time and space.
3.4.2.5 Parser generator
YACC is automatic toll for generating the parser program. YACC stands for Yet Another
Compiler Compiler. YACC provides a tool to produce a parser for a given grammar. YACC is a
program designed to compile a LALR (1) grammar. It is used to produce the source code of the
syntactic analyzer of the language produced by LALR (1) grammar. The input of YACC is the rule
or grammar and the output is a C++ program. These are some points about YACC:
• Input: A CFG- file.y
• Output: A parser y.tab.c (yacc)
✓ The output file "file.output" contains the parsing tables.
✓ The file "file.tab.h" contains declarations.
73
Compiled by Fikru T.
✓ The parser called the yyparse ().
✓ Parser expects to use a function called yylex () to get tokens.
74
Compiled by Fikru T.
Writing YACC specification program is the most logical activity. This specification file
contains the context free grammar and using the production rules of context free grammar the
parsing of input string can be done by y.tab.c.
YACC Specification
The YACC specification file consists of three parts declaration section, translation rule section
and supporting C functions.
75
Compiled by Fikru T.
. .
rule n action n
if there are more than one alternative to a single rule then those alternatives should be separated
by | character. The actions are typical C statements. If CFG is
3. C function Section: This section consists main function in which the routine yyparse() will
be called. And it also consists of required C functions.
76
Compiled by Fikru T.