Unit-2 2.1. Review of CFG Ambiguity of Grammars 2.1.1. Limitations of Regular Language
Unit-2 2.1. Review of CFG Ambiguity of Grammars 2.1.1. Limitations of Regular Language
Example-
• If-then-else statement
if (x == y) then z =1; else z = 2;
• Parser input
IF (ID == ID) THEN ID = INT; ELSE ID = INT;
• Possible parser output
Role of Parser
• Not all sequences of tokens are programs
• Parser must distinguish between valid and invalid sequences of tokens
• We need
⸺ A language for describing valid sequences of tokens
⸺ A method for distinguishing valid from invalid sequences of tokens
Examples:
1. STMT → if COND then STMT else STMT
| while COND do STMT
| id = int
2. E→E*E
|E+E
|(E)
| id
• Let G be a context-free grammar with start symbol S. Then the language of G is:
{𝑎1 … 𝑎𝑛 | 𝑆 → 𝑎1 … 𝑎𝑛 𝑎𝑛𝑑 𝑒𝑣𝑒𝑟𝑦 𝑎𝑖 𝑖𝑠 𝑎 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑙}
Example:
Arithmetic Expressions:
E →E+E | E * E | (E) | id
Some elements of the language:
Id, id + id, (id), id * id, (id) * id, id * (id)
Example:
Grammar: E→ E+E | E * E | (E) | id
String: id * id + id
Notes:
• A parse tree has
⸺ Terminals at the leaves
⸺ Non-terminals at the interior nodes
• An in-order traversal of the leaves is the original input
• The parse tree shows the association of operations, the input string does not
Step 3: Step 4:
Step 5: Step 6:
Step1: Step2:
Step 3: Step 4:
Step 5: Step 6:
2.1.5. Ambiguity
• A grammar is ambiguous if it has more than one parse tree for some string
⸺ Equivalently, there is more than one right-most or left-most derivation for some
string
• Ambiguity is bad
⸺ Leaves meaning of some programs ill-defined
• Ambiguity is common in programming languages
• Arithmetic expressions
• IF-THEN-ELSE
• There are several ways to handle ambiguity
• Most direct method is to rewrite grammar unambiguously
E→T+E|T
T → int * T | int | ( E )
• Enforces precedence of * over +
Step2:
The leftmost leaf ‘c’ matches the first symbol of w, so advance the input pointer to the second
symbol of w ‘a’ and consider the next leaf ‘A’. Expand A using the first alternative.
Step3:
The second symbol ‘a’ of w also matches with second leaf of tree. So advance the input pointer
to third symbol of w ‘d’. But the third leaf of tree is b which does not match with the input
symbol d.
Hence discard the chosen production and reset the pointer to second position. This is called
backtracking.
Step4:
Now try the second alternative for A.
Recursive procedure:
Procedure E()
Begin
T();
EPRIME();
End
Procedure EPRIME()
Begin
If input_symbol=’+’ then
ADVANCE( );
T();
EPRIME();
end
Procedure T()
Begin
F();
TPRIME();
End
Procedure TPRIME()
Begin
If input_symbol=’*’ then
ADVANCE();
F();
TPRIME();
End
Procedure F()
Begin
If input-symbol=’id’ then
ADVANCE( );
else if input-symbol=’(‘ then
ADVANCE( );
E( );
else if input-symbol=’)’ then
ADVANCE( );
End
Stack Implementation:
PROCEDURE INPUT STRING
E( ) id+id*id
T( ) id+id*id
F( ) id+id*id
ADVANCE( ) id id*id
TPRIME( ) id id*id
EPRIME( ) id id*id
ADVANCE( ) id+id*id
T( ) id+id*id
F( ) id+id*id
ADVANCE( ) id+id*id
TPRIME( ) id+id*id
ADVANCE( ) id+id*id
F( ) id+id*id
ADVANCE( ) id+id*id
TPRIME( ) id+id*id
The table-driven predictive parser has an input buffer, stack, a parsing table and an output
stream.
Input buffer:
It consists of strings to be parsed, followed by $ to indicate the end of the input string.
Stack:
It contains a sequence of grammar symbols preceded by $ to indicate the bottom of the stack.
Initially, the stack contains the start symbol on top of $.
Parsing table:
It is a two-dimensional array M[A, a], where ‘A’ is a non-terminal and ‘a’ is a terminal.
repeat
letX be the top stack symbol andathe symbol pointed to by ip;
if X is a terminal or $then
if X = a then
popX from the stack and advance ip
else error()
else/* X is a non-terminal */
if M[X, a] = X →Y1Y2 … Yk then begin
pop X from the stack;
push Yk, Yk-1, … ,Y1 onto the stack, with Y1 on top;
output the production X → Y1 Y2 . . . Yk
end
elseerror()
until X = $
The construction of a predictive parser is aided by two functions associated with a grammar G :
1. FIRST
2. FOLLOW
Rules for first( ):
1. If X is terminal, then FIRST(X) is {X}.
2. If X → ε is a production, then add ε to FIRST(X).
3. If X is non-terminal and X → aα is a production then add a to FIRST(X).
4. If X is non-terminal and X → Y 1 Y2…Yk is a production, then place a in FIRST(X) if for
some i, a is in FIRST(Yi), and ε is in all of FIRST(Y1),…,FIRST(Yi-1); that is, Y1,….Yi-1
=> ε. If ε is in FIRST(Yj) for all j=1,2,..,k, then add ε to FIRST(X).
Rules for follow( ):
1. If S is a start symbol, then FOLLOW(S) contains $.
2. If there is a production A → αBβ, then everything in FIRST(β) except ε is placed in
follow(B).
3. If there is a production A → αB, or a production A → αBβ where FIRST(β) contains ε,
then everything in FOLLOW(A) is in FOLLOW(B).
Example:
Consider the following grammar :
E → E+T | T
T→T*F | F
F → (E) | id
After eliminating left-recursion the grammar is
E → TE’
E’ → +TE’ |ε
T → FT’
T’ → *FT’ | ε
F → (E) | id
First( ) :
FIRST(E) = { ( , id}
FIRST(E’) ={+ ,ε}
FIRST(T) = { ( , id}
FIRST(T’) = {*, ε }
FIRST(F) = { ( , id }
Follow( ):
FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, $, ) }
FOLLOW(T’) = { +, $, ) }
FOLLOW(F) = {+, * , $ , ) }
2.2.1.3. LL(1) grammar:
The parsing table entries are single entries. So each location has not more than one entry. This
type of grammar is called LL(1) grammar.
Consider this following grammar:
S → iEtS | iEtSeS | a
E→b
After eliminating left factoring, we have
S→iEtSS’ | a
S’→eS |ε E→b
To construct a parsing table, we need FIRST() and FOLLOW() for all the non-terminals.
FIRST(S) = { i, a }
FIRST(S’) = {e,ε}
FIRST(E) = { b}
FOLLOW(S) = { $ ,e }
FOLLOW(S’) = { $ ,e }
FOLLOW(E) = {t}
Since there are more than one production, the grammar is not LL(1) grammar.
Actions performed in predictive parsing:
1. Shift
2. Reduce
3. Accept
4. Error
Implementation of predictive parser:
1. Elimination of left recursion, left factoring and ambiguous grammar.
2. Construct FIRST() and FOLLOW() for all non-terminals.
3. Construct predictive parsing table.
4. Parse the given input string using stack and parsing table.
2.2.2. BOTTOM UP PARSING
Constructing a parse tree for an input string beginning at the leaves and going towards the root is
called bottom-up parsing. A general type of bottom-up parser is a shift-reduce parser
Handles:
A handle of a string is a substring that matches the right side of a production, and whose
reduction to the non-terminal on the left side of the production represents one step along the
reverse of a rightmost derivation.
Example:
Consider the grammar
E → E+E
E → E*E
E → (E)
E → id
And the input string 𝑖𝑑1 + 𝑖𝑑2 ∗ 𝑖𝑑3
The right most derivation is:
In the above derivation the underlined substrings are called handles
Handle pruning
Example:
E→AB E→EOE E→E+E |
A→a E→id E*E |
B→b O→+|*|/ E/E | id
Not operator grammar not operator grammar operator grammar
Precedence Relation
• The determination of correct precedence relations between terminals are based on the
traditional notions of associativity and precedence of operators. (Unary minus causes a
problem).
• The intention of the precedence relations is to find the handle of a right-sentential form,
< - with marking the left end,
=· appearing in the interior of the handle, and
- > marking the right hand.
• In our input string $a1a2...an$, we insert the precedence relation between the pairs of
terminals (the precedence relation holds between the terminals in that pair).
Example:
• Scan the string from left end until the first .> is encountered.
• Then scan backwards (to the left) over any =· until a <. Is encountered.
• The handle contains everything to left of the first .> and to the right of the <. is
encountered.
• The handles thus obtained can be used to shift reduce a given string.
Parsing Algorithm
The input string is w$, the initial stack is $ and a table holds precedence relations between
certain terminals.
Example:
1. If operator O1 has higher precedence than operator O2, ->O1 .> O2 and O2<.O1
2. If operator O1 and operator O2 have equal precedence, they are left-associative ->O1 .>
O2 and O2 .> O1 they are right-associative- > O1<.O2 and O2<.O1
Example
The complete table for the Grammar E → E+E | E-E | E*E | E/E | E^E | (E) | -E | id is:
Operator Precedence Grammar:
There is another more general way to compute precedence relations among terminals:
1. a = b if there is a right side of a production of the form αaβbγ, where β is either a single
nonterminal or ε.
2. a < b if for some non-terminal A there is a right side of the form αaAβ and A derives to
γbδ where γ is a single non-terminal or ε.
3. a > b if for some non-terminal A there is a right side of the form αAbβ and A derives to
γaδ where δ is a single non-terminal or ε.
Note that the grammar must be unambiguous for this method. Unlike the previous method, it
does not take into account any other property and is based purely on grammar productions. An
ambiguous grammar will result in multiple entries in the table and thus cannot be used.
2.2.3. LR PARSING
2.2.3.1. SLR PARSER
2.2.3.2. CLR PARSER
Construction od CLR Parsing Table
2.2.3.3. LALR PARSER
2.3. PARSING WITH AMBIGUOUS GRAMMAR
• All grammars used in the construction of LR-parsing tables must be un-ambiguous.
• Can we create LR-parsing tables for ambiguous grammars?
• Yes, but they will have conflicts.
• We can resolve these conflicts in favor of one of them to disambiguate the grammar.
• At the end, we will have again an unambiguous grammar.
• Why we want to use an ambiguous grammar?
Some of the ambiguous grammars are much natural, and a corresponding unambiguous
grammar can be very complex.
• Usage of an ambiguous grammar may eliminate unnecessary reductions.
Example: