0% found this document useful (0 votes)
219 views

Unit-2 2.1. Review of CFG Ambiguity of Grammars 2.1.1. Limitations of Regular Language

The document discusses context-free grammars and parsing. It covers: 1) The limitations of regular languages and how context-free grammars can describe recursive structures in programming languages. 2) The role of a parser in taking a sequence of tokens and outputting a parse tree, and how context-free grammars provide a way to distinguish valid and invalid token sequences. 3) Key concepts related to context-free grammars like productions, languages, derivations, parse trees, leftmost and rightmost derivations, and ambiguity.

Uploaded by

Rahul Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
219 views

Unit-2 2.1. Review of CFG Ambiguity of Grammars 2.1.1. Limitations of Regular Language

The document discusses context-free grammars and parsing. It covers: 1) The limitations of regular languages and how context-free grammars can describe recursive structures in programming languages. 2) The role of a parser in taking a sequence of tokens and outputting a parse tree, and how context-free grammars provide a way to distinguish valid and invalid token sequences. 3) Key concepts related to context-free grammars like productions, languages, derivations, parse trees, leftmost and rightmost derivations, and ambiguity.

Uploaded by

Rahul Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

UNIT- 2

2.1. REVIEW OF CFG AMBIGUITY OF GRAMMARS

2.1.1. Limitations of Regular Language

• A finite automaton that runs long enough must repeat states


• A finite automaton cannot remember # of times it has visited a particular state
• because a finite automaton has finite memory
⸺ Only enough to store in which state it is
⸺ Cannot count, except up to a finite limit
• Many languages are not regular • E.g., language of balanced parentheses is not regular: {
(i )i | i ≥

2.1.2. The Functionality of the Parser

• Input: sequence of tokens from lexer


• Output: parse tree of the program

Example-
• If-then-else statement
if (x == y) then z =1; else z = 2;
• Parser input
IF (ID == ID) THEN ID = INT; ELSE ID = INT;
• Possible parser output

Role of Parser
• Not all sequences of tokens are programs
• Parser must distinguish between valid and invalid sequences of tokens
• We need
⸺ A language for describing valid sequences of tokens
⸺ A method for distinguishing valid from invalid sequences of tokens

2.1.3. Context- Free Grammar


• Many programming language constructs have a recursive structure
• A STMT is of the form
⸺ if COND then STMT else STMT , or
⸺ while COND do STMT , or …
• Context-free grammars are a natural notation for this recursive structure
• A CFG consists of
⸺ A set of terminals T
⸺ A set of non-terminals N
⸺ A start symbol S (a non-terminal)
⸺ A set of productions
• Assuming X ∈ N the productions are of the form
X→ε, or
X → Y1 Y2 ... Yn where Yi ∈ N ∪T

Examples:
1. STMT → if COND then STMT else STMT
| while COND do STMT
| id = int
2. E→E*E
|E+E
|(E)
| id

The Language of a CFG:


Read productions as replacement rules:
X → Y1 ... Yn
Means X can be replaced by Y1 ... Yn
X→ε
Means X can be erased (replaced with empty string)
Key Idea
1. Begin with a string consisting of the start symbol “S”
2. Replace any non-terminal X in the string by a right-hand side of some production
X → Y1… Yn
3. Repeat (2) until there are no non-terminals in the string

• Let G be a context-free grammar with start symbol S. Then the language of G is:
{𝑎1 … 𝑎𝑛 | 𝑆 → 𝑎1 … 𝑎𝑛 𝑎𝑛𝑑 𝑒𝑣𝑒𝑟𝑦 𝑎𝑖 𝑖𝑠 𝑎 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑙}

Example:
Arithmetic Expressions:
E →E+E | E * E | (E) | id
Some elements of the language:
Id, id + id, (id), id * id, (id) * id, id * (id)

2.1.4. Derivations and Parse Tree


• A derivation is a sequence of productions
𝑆→⋯→⋯→⋯
• A derivation can be drawn as a tree
⸺ Start symbol is the tree’s root
⸺ For a production 𝑋 → 𝑌1 … 𝑌𝑛 add children 𝑌1 … 𝑌𝑛 to node X

Example:
Grammar: E→ E+E | E * E | (E) | id
String: id * id + id

Notes:
• A parse tree has
⸺ Terminals at the leaves
⸺ Non-terminals at the interior nodes
• An in-order traversal of the leaves is the original input
• The parse tree shows the association of operations, the input string does not

2.1.4.1. Left- most Derivation


At each step, replace the left-most non-terminal
Example:
Grammar: E→ E+E | E * E | (E) | id
String: id * id + id
Step1: Step2:

Step 3: Step 4:

Step 5: Step 6:

2.1.4.2. Right-most Derivation


At each step, replace right-most non-terminal.

Step1: Step2:

Step 3: Step 4:
Step 5: Step 6:

• right-most and left-most derivations have the same parse tree


• The difference is just in the order in which branches are added

2.1.5. Ambiguity

• A grammar is ambiguous if it has more than one parse tree for some string
⸺ Equivalently, there is more than one right-most or left-most derivation for some
string
• Ambiguity is bad
⸺ Leaves meaning of some programs ill-defined
• Ambiguity is common in programming languages
• Arithmetic expressions
• IF-THEN-ELSE
• There are several ways to handle ambiguity
• Most direct method is to rewrite grammar unambiguously
E→T+E|T
T → int * T | int | ( E )
• Enforces precedence of * over +

Example: The Dangling Else


• Consider the following grammar
S → if C then S
| if C then S else S
| OTHER
This grammar is ambiguous
• The expression
𝑖𝑓 𝐶1 𝑡ℎ𝑒𝑛 𝑖𝑓 𝐶2 𝑡ℎ𝑒𝑛 𝑆3 𝑒𝑙𝑠𝑒 𝑆4

has two parse trees


Typically we want second form
• Else matches the closest unmatched then
• We can describe this in the grammar
S → MIF /* all then are matched */
| UIF /* some then are unmatched */
MIF → if C then MIF else MIF
| OTHER
UIF → if C then S
| if C then MIF else UIF

2.1.5.1. Associativity Declarations


• Consider the grammar E → E + E | int
• Ambiguous: two parse trees of int + int + int

• Left associativity declaration: %left +

2.1.5.2. Precedence Declarations


• Consider the grammar E → E + E | E * E | int
• And the string int + int * int
• Precedence declarations: %left + %left *

2.2. ROLE OF PARSER


Parser for any grammar is program that takes as input string w (obtain set of strings tokens from
the lexical analyzer) and produces as output either a parse tree for w , if w is a valid sentences of
grammar or error message indicating that w is not a valid sentences of given grammar. The goal
of the parser is to determine the syntactic validity of a source string is valid, a tree is built for use
by the subsequent phases of the computer. The tree reflects the sequence of derivations or
reduction used during the parser. Hence, it is called parse tree. If string is invalid, the parse has to
issue diagnostic message identifying the nature and cause of the errors in string. Every
elementary sub tree in the parse tree corresponds to a production of the grammar.
There are two ways of identifying an elementary sub tree:
1. By deriving a string from a non-terminal or
2. By reducing a string of symbol to a non-terminal.
The two types of parsers employed are:
a. Top down parser: which build parse trees from top(root) to bottom(leaves)
b. Bottom up parser: which build parse trees from leaves and work up the root.

2.2.1. TOP DOWN PARSING


It can be viewed as an attempt to find a left-most derivation for an input string or an attempt to
construct a parse tree for the input starting from the root to the leaves.

Types of top down Parsing:


1. Recursive descent parsing
2. Predictive parsing
2.2.1.1. RECURSIVE DECENT PARSING
• Recursive descent parsing is one of the top-down parsing techniques that uses a set of
recursive procedures to scan its input.
• This parsing method may involve backtracking, that is, making repeated scans of the
input.

Example for backtracking :


Consider the grammar
G : S→cAd
A → ab | a
and the input string w=cad.
The parse tree can be constructed using the following top-down approach :
Step1:
Initially create a tree with single node labeled S. An input pointer points to ‘c’, the first symbol
of w. Expand the tree with the production of S.

Step2:
The leftmost leaf ‘c’ matches the first symbol of w, so advance the input pointer to the second
symbol of w ‘a’ and consider the next leaf ‘A’. Expand A using the first alternative.

Step3:
The second symbol ‘a’ of w also matches with second leaf of tree. So advance the input pointer
to third symbol of w ‘d’. But the third leaf of tree is b which does not match with the input
symbol d.
Hence discard the chosen production and reset the pointer to second position. This is called
backtracking.
Step4:
Now try the second alternative for A.

Now we can halt and announce the successful completion of parsing.

Example for recursive decent parsing:


A left-recursive grammar can cause a recursive-descent parser to go into an infinite loop. Hence,
elimination of left-recursion must be done before parsing.
Consider the grammar for arithmetic expressions
E → E+T | T
T → T*F | F
F → (E) | id
After eliminating the left-recursion the grammar becomes,
E → TE’
E’ → +TE’ |ε
T → FT’
T’ → *FT’ | ε
F → (E) | id
Now we can write the procedure for grammar as follows:

Recursive procedure:
Procedure E()
Begin
T();
EPRIME();
End

Procedure EPRIME()
Begin
If input_symbol=’+’ then
ADVANCE( );
T();
EPRIME();
end

Procedure T()
Begin
F();
TPRIME();
End
Procedure TPRIME()
Begin
If input_symbol=’*’ then
ADVANCE();
F();
TPRIME();
End
Procedure F()
Begin
If input-symbol=’id’ then
ADVANCE( );
else if input-symbol=’(‘ then
ADVANCE( );
E( );
else if input-symbol=’)’ then
ADVANCE( );
End

Stack Implementation:
PROCEDURE INPUT STRING

E( ) id+id*id
T( ) id+id*id
F( ) id+id*id
ADVANCE( ) id id*id
TPRIME( ) id id*id
EPRIME( ) id id*id
ADVANCE( ) id+id*id
T( ) id+id*id
F( ) id+id*id
ADVANCE( ) id+id*id
TPRIME( ) id+id*id
ADVANCE( ) id+id*id
F( ) id+id*id
ADVANCE( ) id+id*id
TPRIME( ) id+id*id

2.2.1.2. PREDICTIVE PARSING


� Predictive parsing is a special case of recursive descent parsing where no
backtracking is required.
� The key problem of predictive parsing is to determine the production to be applied
for a non-terminal in case of alternatives.

Non-recursive predictive Parsing:

The table-driven predictive parser has an input buffer, stack, a parsing table and an output
stream.
Input buffer:
It consists of strings to be parsed, followed by $ to indicate the end of the input string.
Stack:
It contains a sequence of grammar symbols preceded by $ to indicate the bottom of the stack.
Initially, the stack contains the start symbol on top of $.
Parsing table:
It is a two-dimensional array M[A, a], where ‘A’ is a non-terminal and ‘a’ is a terminal.

Predictive parsing program:


The parser is controlled by a program that considers X, the symbol on top of stack, and a, the
current input symbol. These two symbols determine the parser action. There are three
possibilities:
1. If X = a = $, the parser halts and announces successful completion of parsing.
2. If X = a ≠ $, the parser pops X off the stack and advances the input pointer to the next
input symbol.
3. If X is a non-terminal, the program consults entry M[X, a] of the parsing table M. This
entry will either be an X-production of the grammar or an error entry
If M[X, a] = {X → UVW},the parser replaces X on top of the stack by UVW
If M[X, a] =error, the parser calls an error recovery routine

Algorithm for non-recursive predictive parsing:

Input : A string w and a parsing table M for grammar G.


Output : If w is in L(G), a leftmost derivation of w; otherwise, an error indication.
Method : Initially, the parser has $S on the stack with S, the start symbol of G on top, and w$ in
the input buffer. The program that utilizes the predictive parsing table M to produce a parse for
the input is as follows:
set ip to point to the first symbol of w$;

repeat
letX be the top stack symbol andathe symbol pointed to by ip;
if X is a terminal or $then
if X = a then
popX from the stack and advance ip
else error()
else/* X is a non-terminal */
if M[X, a] = X →Y1Y2 … Yk then begin
pop X from the stack;
push Yk, Yk-1, … ,Y1 onto the stack, with Y1 on top;
output the production X → Y1 Y2 . . . Yk
end
elseerror()
until X = $

Predictive parsing table construction:

The construction of a predictive parser is aided by two functions associated with a grammar G :
1. FIRST
2. FOLLOW
Rules for first( ):
1. If X is terminal, then FIRST(X) is {X}.
2. If X → ε is a production, then add ε to FIRST(X).
3. If X is non-terminal and X → aα is a production then add a to FIRST(X).
4. If X is non-terminal and X → Y 1 Y2…Yk is a production, then place a in FIRST(X) if for
some i, a is in FIRST(Yi), and ε is in all of FIRST(Y1),…,FIRST(Yi-1); that is, Y1,….Yi-1
=> ε. If ε is in FIRST(Yj) for all j=1,2,..,k, then add ε to FIRST(X).
Rules for follow( ):
1. If S is a start symbol, then FOLLOW(S) contains $.
2. If there is a production A → αBβ, then everything in FIRST(β) except ε is placed in
follow(B).
3. If there is a production A → αB, or a production A → αBβ where FIRST(β) contains ε,
then everything in FOLLOW(A) is in FOLLOW(B).

Algorithm for construction of predictive parsing table:


Input : Grammar G
Output : Parsing table M
Method :
1. For each production A → α of the grammar, do steps 2 and 3.
2. For each terminal a in FIRST(α), add A → α to M[A, a].
3. If ε is in FIRST(α), add A → α to M[A, b] for each terminal b in FOLLOW(A). If ε is in
FIRST(α) and $ is in FOLLOW(A) , add A → α to M[A, $].
4. Make each undefined entry of M be error

Example:
Consider the following grammar :
E → E+T | T
T→T*F | F
F → (E) | id
After eliminating left-recursion the grammar is
E → TE’
E’ → +TE’ |ε
T → FT’
T’ → *FT’ | ε
F → (E) | id
First( ) :
FIRST(E) = { ( , id}
FIRST(E’) ={+ ,ε}
FIRST(T) = { ( , id}
FIRST(T’) = {*, ε }
FIRST(F) = { ( , id }
Follow( ):
FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, $, ) }
FOLLOW(T’) = { +, $, ) }
FOLLOW(F) = {+, * , $ , ) }
2.2.1.3. LL(1) grammar:
The parsing table entries are single entries. So each location has not more than one entry. This
type of grammar is called LL(1) grammar.
Consider this following grammar:
S → iEtS | iEtSeS | a
E→b
After eliminating left factoring, we have
S→iEtSS’ | a
S’→eS |ε E→b
To construct a parsing table, we need FIRST() and FOLLOW() for all the non-terminals.
FIRST(S) = { i, a }
FIRST(S’) = {e,ε}
FIRST(E) = { b}
FOLLOW(S) = { $ ,e }
FOLLOW(S’) = { $ ,e }
FOLLOW(E) = {t}

Since there are more than one production, the grammar is not LL(1) grammar.
Actions performed in predictive parsing:
1. Shift
2. Reduce
3. Accept
4. Error
Implementation of predictive parser:
1. Elimination of left recursion, left factoring and ambiguous grammar.
2. Construct FIRST() and FOLLOW() for all non-terminals.
3. Construct predictive parsing table.
4. Parse the given input string using stack and parsing table.
2.2.2. BOTTOM UP PARSING
Constructing a parse tree for an input string beginning at the leaves and going towards the root is
called bottom-up parsing. A general type of bottom-up parser is a shift-reduce parser

2.2.2.1. SHIFT-REDUCE PARSING


Shift-reduce parsing is a type of bottom-up parsing that attempts to construct a parse tree for an
input string beginning at the leaves (the bottom) and working up towards the root (the top).
Example: Consider the grammar:
S → aABe
A → Abc | b
B→d
The sentence to be recognized is abbcde.

REDUCTION (LEFTMOST) RIGHTMOST DERIVATION


abbcde (A→b) S → aABe
aAbcde (A → Abc) → aAde
aAde (B → d) →aAbcde
aABe (S → aABe) → abbcde
S
The reduction trace out the right-most derivation in reverse.

Handles:
A handle of a string is a substring that matches the right side of a production, and whose
reduction to the non-terminal on the left side of the production represents one step along the
reverse of a rightmost derivation.
Example:
Consider the grammar
E → E+E
E → E*E
E → (E)
E → id
And the input string 𝑖𝑑1 + 𝑖𝑑2 ∗ 𝑖𝑑3
The right most derivation is:
In the above derivation the underlined substrings are called handles

Handle pruning

A rightmost derivation in reverse can be obtained by “handle pruning”


(i.e.) if w is a sentence or string of the grammar at hand, then 𝑤 = 𝛾𝑛 , where 𝛾𝑛 is the 𝑛𝑡ℎ right-
sentinel form of some rightmost derivation.

Stack implementation of shift-reduce parsing:


Actions in shift-reduce parser:
• shift- The next input symbol is shifted onto the top of the stack.
• reduce- The parser replaces the handle within a stack with a non-terminal.
• accept- The parser announces successful completion of parsing.
• error- The parser discovers that a syntax error has occurred and calls an error recovery
routine.

Conflicts in shift-reduce parsing:


There are two conflicts that occur in shift shift-reduce parsing.
1. Shift-reduce conflict: The parser cannot decide whether to shift or to reduce.
2. Reduce-reduce conflict: The parser cannot decide which of several reductions to make.

2.2.2.2 OPERATOR PRECEDENCE GRAMMMAR

• Operator grammar – small, but an important class of grammars


• we may have an efficient operator precedence parser (a shift-reduce parser) for an
operator grammar.
• In an operator grammar, no production rule can have:
• ε at the right side
• two adjacent non-terminals at the right side.

Example:
E→AB E→EOE E→E+E |
A→a E→id E*E |
B→b O→+|*|/ E/E | id
Not operator grammar not operator grammar operator grammar

Precedence Relation

• In operator-precedence parsing, we define three disjoint precedence relations between


certain pairs of terminals.
a <- b b has higher precedence than a a=.b, b has same precedence as a
a -> b b has lower precedence than a

• The determination of correct precedence relations between terminals are based on the
traditional notions of associativity and precedence of operators. (Unary minus causes a
problem).

• The intention of the precedence relations is to find the handle of a right-sentential form,
< - with marking the left end,
=· appearing in the interior of the handle, and
- > marking the right hand.

• In our input string $a1a2...an$, we insert the precedence relation between the pairs of
terminals (the precedence relation holds between the terminals in that pair).

Example:

E → E+E | E-E | E*E | E/E | E^E | (E) | -E | id


The partial operator-precedence table for this grammar is as shown.
Then the input string id+id*id with the precedence relations inserted will be:
S<. id .> + <.id .> * <. id .> $

Using Precedence relations to find Handles

• Scan the string from left end until the first .> is encountered.
• Then scan backwards (to the left) over any =· until a <. Is encountered.
• The handle contains everything to left of the first .> and to the right of the <. is
encountered.
• The handles thus obtained can be used to shift reduce a given string.

Parsing Algorithm
The input string is w$, the initial stack is $ and a table holds precedence relations between
certain terminals.

set p to point to the first symbol of w$ ;


repeat forever
if ( $ is on top of the stack and p points to $ ) then return else {
let a be the topmost terminal symbol on the stack and let b be the symbol pointed
to by p;
if ( a <. b or a=. b ) then { /*SHIFT*/
push b onto the stack;
advance p to the next input symbol;
}
else if (a .> b) then /*REDUCE*/
repeat pop stack
until ( the top of stack terminal is related by <. To the terminal most recently
popped);
else error();
}

Example:

Stack Input Action


$ id+id*id$ $ <. Id shift
$id +id*id$ id .>+ reduce E → id
$ +id*id$ shift
$+ id*id$ shift
$+id *id$ id .> * reduce E → id
$+ *id$ shift
$+* id$ Shift
$+*id $ id .> $ reduce E → id
$+* $ * .> $ reduce E → E*E
$+ $ + .> $ reduce E → E+E
$ $ accept

Creating Operator-Precedence Relations from Associativity and Precedence

1. If operator O1 has higher precedence than operator O2, ->O1 .> O2 and O2<.O1

2. If operator O1 and operator O2 have equal precedence, they are left-associative ->O1 .>
O2 and O2 .> O1 they are right-associative- > O1<.O2 and O2<.O1

3. For all operators O,


O<.id, id .> O, O<.(, (<.O, O.> ), ) .> O, O .> $, and $<.O
4. Also, let
(=·) $<.( id.> ) ) .> $
(<.( $<.id id.>$ ).>)
(<.id

Example
The complete table for the Grammar E → E+E | E-E | E*E | E/E | E^E | (E) | -E | id is:
Operator Precedence Grammar:

There is another more general way to compute precedence relations among terminals:
1. a = b if there is a right side of a production of the form αaβbγ, where β is either a single
nonterminal or ε.

2. a < b if for some non-terminal A there is a right side of the form αaAβ and A derives to
γbδ where γ is a single non-terminal or ε.

3. a > b if for some non-terminal A there is a right side of the form αAbβ and A derives to
γaδ where δ is a single non-terminal or ε.

Note that the grammar must be unambiguous for this method. Unlike the previous method, it
does not take into account any other property and is based purely on grammar productions. An
ambiguous grammar will result in multiple entries in the table and thus cannot be used.
2.2.3. LR PARSING
2.2.3.1. SLR PARSER
2.2.3.2. CLR PARSER
Construction od CLR Parsing Table
2.2.3.3. LALR PARSER
2.3. PARSING WITH AMBIGUOUS GRAMMAR
• All grammars used in the construction of LR-parsing tables must be un-ambiguous.
• Can we create LR-parsing tables for ambiguous grammars?
• Yes, but they will have conflicts.
• We can resolve these conflicts in favor of one of them to disambiguate the grammar.
• At the end, we will have again an unambiguous grammar.
• Why we want to use an ambiguous grammar?
Some of the ambiguous grammars are much natural, and a corresponding unambiguous
grammar can be very complex.
• Usage of an ambiguous grammar may eliminate unnecessary reductions.

Example:

You might also like