Syntax Analyzer 2-up to LALR
Syntax Analyzer 2-up to LALR
• Constructs parse tree for an input string beginning at the leaves (the
bottom) and working towards the root (the top)
• Example: id*id
E -> E + T | T id*id F * id T * id T* F T E
T -> T * F | F
F id T*F T
F -> (E) | id id F
id id F id T*F
id F id
id
Bottom Up Parsers
Shift-reduce parser
• E=>T=>T*F=>T*id=>F*id=>id*id
Shift-reduce parser
• Shift: Moving of the symbols from input buffer onto the stack, this
action is called shift. (push current input symbol to stack)
E→ExE
Stack Input Buffer Parsing Action
E → id $ id – id x id $ Shift
$ id – id x id $ Reduce E → id
Parse the input string $E – id x id $ Shift
$E– id x id $ Shift
id – id x id using a $ E – id x id $ Reduce E → id
shift-reduce parser. $E–E x id $ Shift
$E–Ex id $ Shift
$ E – E x id $ Reduce E → id
$E–ExE $ Reduce E → E x E
$E–E $ Reduce E → E – E
$E $ Accept
• Handle: Handle is a substring that matches the
body of a production. (Handle = RHS of production)
• Handle is a Right Sentential Form + position
where reduction can be performed + production
used for reduction
Handle pruning
• Basic operations:
• Shift
• Reduce Stack Input Action
• Accept
• Error $ id*id$ shift
• Example: id*id
$id *id$ reduce by F->id
$F *id$ reduce by T->F
$T *id$ shift
$T* id$ shift
$T*id $ reduce by F->id
$T*F $ reduce by T->T*F
$T $ reduce by E->T
$E $ accept
Handle will appear on top of the stack
S S
A
B
B A
α β γ z α γ z
y x y
Stack Input Stack Input
$αβγ yz$ $αγ xyz$
$αβB yz$ $αBxy z$
$αβBy z$
Conflicts during shift reduce parsing
Stack Input
… if expr then stmt else …$
Conflicts during shift reduce parsing
• There are two kinds of conflicts that can occur in an SLR(1) parsing table.
A shift-reduce conflict occurs in a state that requests both a shift action and
a reduce action. A reduce-reduce conflict occurs in a state that requests two
or more different reduce actions.
How to determine?
• A full parsing table is not needed, only the canonical collection. In the
canonical collection, find all final items (and only final items), and see if:
• There are both shift and reduce in the same item ("shift-reduce", s/r)
• There are two reduce actions in the same item ("reduce-reduce", r/r)
If none of these is true, there are no conflicts, even in LR(0). If there are
some of the above, SLR(1) still may solve it.
Shift reduce conflict and Reduce/reduce conflict
1. LR(0) Parser
2. Simple LR-Parser (SLR)
3. Canonical LR Parser (CLR)
4. Look ahead LALR Parser.
Comparison of LL & LR Methods
LL(1)
a1 a2 … ai … an $ Scanner
sm L R Parsing Engine
Xm
s m-1
Xm-1
Compiler Construction
…
s0 Parser
Action Goto Grammar
Generator
Stack
L R Parsing Tables
Bottom-Up Parsing Algorithms
LR(k) parsing
L: scan input Left to right
R: produce Rightmost derivation
k tokens of lookahead
LR(0)
zero tokens of look-ahead
SLR
Simple LR: like LR(0), but uses FOLLOW sets to build
more “precise” parsing tables
LR(0) is a toy, so we focus on SLR
LR Family:
LR(0)<= SLR(1)<=LALR(1)<=CLR(1)
• Shift: Moving of the symbols from input buffer onto the stack, this action is
called shift.
• Accept: If stack contains start symbol only and input buffer is empty at the
same time then that action is called accept.
• Error: A situation in which parser cannot either shift or reduce the symbols, it
cannot even perform accept action then it is called error action.
LR(0) steps:
• Right sentential form
CFG G, S-> alpha, alpha – T or NT, A right sentential form is a sentential
form that can be derived by right most derivation.
• LR(0) items
. Dot anywhere in the right side, including the beginning or end. In the case of
an epsilon production then B -> epsilon, B -> . Is an item
• A-> XY•Z
• A-> XYZ•
• The production A-> generates only one
item, A-> •.
• Complete Item : An Item where the Item Dot is at the end of the
RHS.
• The LR parser consists of
1) Input 2)Output 3)Stack 4) Driver Program 5) Parsing Table
• Only the Parsing Table changes from one parser to the other.
• In CLR method the stack holds the states from the LR(0)
automation and canonical LR and LALR methods are same
• The Driver Program uses the Stack to store a string
s 0 X 1 s 1 X 2 …X m s m
• A GOTO function.
Closure algorithm
SetOfItems CLOSURE(I) {
J=I;
repeat
α.Bβ in J)
for (each item A->
for (each prodcution B->γ of G)
if (B->.γ is not in J)
add B->.γ to J;
until no more items are added to J on one round;
return J;
GOTO algorithm
SetOfItems GOTO(I,X) {
J=empty;
if (A->α.X β is in I)
add CLOSURE(A-> αX. β ) to J;
return J;
}
The Action Table
• Parsing is completed
• The GOTO table specifies which state to put on top of the stack
after a reduce.
• The GOTO Table is important to find out the next state after every
reduction.
• The GOTO Table is indexed by a state of the parser and a Non Terminal
(Grammar Symbol).
ex : GOTO[S, A]
• The GOTO Table simply indicates what the next state of the parser if it
has recognized a certain.
LR(0) Parser
• The LR Parser is a Shift-reduce Parser that makes use of a
Deterministic Finite Automata, recognizing the Set Of All Viable
Prefixes by reading the stack from Bottom To Top.
• Augmented grammar:
• G with addition of a production: S’->S
• Closure of item sets:
• If I is a set of items, closure(I) is a set of items constructed from I by the
following rules:
• Add every item in I to closure(I)
• If A->α.Bβ is in closure(I) and B->γ is a production then add the item B-
>.γ to clsoure(I).
• Example:
I0=closure({[E’->.E]}
E’->E E’->.E
E -> E + T | T E->.E+T
T -> T * F | F E->.T
T->.T*F
F -> (E) | id T->.F
F->.(E)
F->.id
Constructing canonical LR(0) item sets
(cont.)
• Goto (I,X) where I is an item set and X is a grammar symbol is closure of
set of all items [A-> αX. β] where [A-> α.X β] is in I
• Example
I1
E’->E.
E E->E.+T
I0=closure({[E’->.E]}
E’->.E I2
E->.E+T T
E’->T.
E->.T T->T.*F
T->.T*F I3
T->.F ( F->(.E)
F->.(E) E->.E+T
E->.T
F->.id T->.T*F
T->.F
F->.(E)
F->.id
Canonical LR(0) items
Void items(G’) {
C= CLOSURE({[S’->.S]});
repeat
for (each set of items I in C)
for (each grammar symbol X)
if (GOTO(I,X) is not empty and not in C)
add GOTO(I,X) to C;
until no new set of items are added to C on a round;
}
E’->E
E -> E + T | T
Example T -> T * F | F
acc F -> (E) | id
$ I6 I9
E->E+.T
I1 T->.T*F T
E’->E. + T->.F
E->E+T.
T->T.*F
E E->E.+T
F->.(E)
F->.id
I0=closure({[E’->.E]} I2
E’->.E T I7
F I10
E->.E+T E->T. * T->T*.F
F->.(E) T->T*F.
E->.T T->T.*F id F->.id
T->.T*F id
T->.F I5
F->.(E)
F->.id ( F->id. +
I4
F->(.E)
I8 I11
E->.E+T
E->.T
E E->E.+T )
T->.T*F F->(E.) F->(E).
T->.F
F->.(E)
F->.id
I3
T>F.
Use of LR(0) automaton
• Example: id*id
INPUT a1 … ai … an $
• Method
• Construct C={I0,I1, … , In}, the collection of LR(0) items for G’
• State i is constructed from state Ii:
• If [A->α.aβ] is in Ii and Goto(Ii,a)=Ij, then set ACTION[i,a] to “shift j”
• If [A->α.] is in Ii, then set ACTION[i,a] to “reduceA->α” for all a in follow(A)
• If {S’->.S] is in Ii, then set ACTION[I,$] to “Accept”
• If any conflicts appears then we say that the grammar is not SLR(1).
• If GOTO(Ii,A) = Ij then GOTO[i,A]=j
• All entries not defined by above rules are made “error”
• The initial state of the parser is the one constructed from the set of items
containing [S’->.S]
Example grammar which is not SLR(1)
S -> L=R | R
L -> *R | id
R -> L
I0 I1 I3 I5 I7
S’->.S S’->S. S ->R. L -> id. L -> *R.
S -> .L=R
S->.R I2 I4 I6
I8
L -> .*R | S ->L.=R L->*.R S->L=.R
R -> L.
L->.id R ->L. R->.L R->.L
R ->. L L->.*R L->.*R I9
L->.id L->.id S -> L=R.
Action
=
Shift 6
2 Reduce R->L
More powerful LR parsers
SetOfItems Goto(I,X) {
initialize J to be the empty set;
for (each item [A->α.Xβ,a] in I)
add item [A->αX.β,a] to set J;
return closure(J);
}
void items(G’){
initialize C to Closure({[S’->.S,$]});
repeat
for (each set of items I in C)
for (each grammar symbol X)
if (Goto(I,X) is not empty and not in C)
add Goto(I,X) to C;
until no new sets of items are added to C;
}
Example
S’->S
S->CC
C->cC
C->d
Canonical LR(1) parsing table
• Method
• Construct C={I0,I1, … , In}, the collection of LR(1) items for G’
• State i is constructed from state Ii:
• If [A->α.aβ, b] is in Ii and Goto(Ii,a)=Ij, then set ACTION[i,a] to “shift j”
• If [A->α., a] is in Ii, then set ACTION[i,a] to “reduceA->α”
• If {S’->.S,$] is in Ii, then set ACTION[I,$] to “Accept”
• If any conflicts appears then we say that the grammar is not LR(1).
• If GOTO(Ii,A) = Ij then GOTO[i,A]=j
• All entries not defined by above rules are made “error”
• The initial state of the parser is the one constructed from the set of items
containing [S’->.S,$]
Example
S’->S
S->CC
C->cC
C->d
LALR Parsing Table
I4
C->d. , c/d
I47
C->d. , c/d/$
I7
C->d. , $
S’->S
S -> aAd | bBd | aBe | bAe
A -> c
B -> c
An easy but space-consuming LALR table
construction
• Method:
1. Construct C={I0,I1,…,In} the collection of LR(1) items.
2. For each core among the set of LR(1) items, find all sets having that core,
and replace these sets by their union.
3. Let C’={J0,J1,…,Jm} be the resulting sets. The parsing actions for state i,
is constructed from Ji as before. If there is a conflict grammar is not
LALR(1).
4. If J is the union of one or more sets of LR(1) items, that is J = I1 UI2…IIk
then the cores of Goto(I1,X), …, Goto(Ik,X) are the same and is a state
like K, then we set Goto(J,X) =k.
• This method is not efficient, a more efficient one is discussed in
the book
Compaction of LR parsing table
Disadvantage:
S
String : abcd
a A d
$<a<b=c>d>$
b c
Operator precedence parsing
1. Suppose A-> X1, X2,………Xn
• If Xi and Xi+1 are terminals
then Xi=Xi+1
• If Xi and Xi+2 are terminals and Xi+1 is non terminal, then
then Xi=Xi+2
• If Xi is terminal and Xi+1 is non terminal then
Xi < Lead (Xi+1)
• If Xi is non terminal and Xi+1 is terminal then
then Trial(Xi) > Xi+1
• $ < Lead (S) and Trial (S) > $
Operator precedence parsing
• Example:
1) T -> T +T | T * T | id
1) T -> T +T | T * T | id
id + * $
Id - ⋗ ⋗ ⋗
+ ⋖ ⋗ ⋖ ⋗
* ⋖ ⋗ ⋗ ⋗
$ ⋖ ⋖ ⋖ -
Operator precedence parsing
1) T -> T +T | T * T | id
Input String: id+id*id $
Operator Precedence Relation: Top of the stack: $ id + id * id
id ⋗ * ⋗ + ⋗ $
1. Id ⋗ + ………. Pop $ id
2. $ ⋖ id ……… push $ id
id + * $ 3. $ ⋖ + ………. Push $+
4. + ⋖ id ………. Push $ + id
Id - ⋗ ⋗ ⋗ 5. Id ⋗ * ………. Pop $+
6. + ⋖ * ………. Push $+*
+ ⋖ ⋗ ⋖ ⋗
7. * ⋖ id ……… push $ + * id
* ⋖ ⋗ ⋗ ⋗ 8. Id ⋗ $ ………. Pop $+*
9. * ⋗ $ ……… pop $+
$ ⋖ ⋖ ⋖ - 10. + ⋗ $ ……… pop $
11. $ $ ………… Stop Accept
Operator precedence parsing
• Example:
1) T -> T +T | T * T | id
T T T
Input String: id + id * id $
Operator precedence parsing
• Example:
1) T -> T +T | T * T | id
The graph representing the precedence
functions is-
Size: N*N
id + * $
Id - ⋗ ⋗ ⋗
+ ⋖ ⋗ ⋖ ⋗
* ⋖ ⋗ ⋗ ⋗
$ ⋖ ⋖ ⋖ -
Operator precedence parsing
• Example:
1) T -> T +T | T * T | id
The graph representing the precedence
functions is-
fid → gx → f+ → g+ → f$
gid → fx → gx → f+ → g+ → f$
Operator precedence parsing
• Example: The graph representing the precedence
functions is- (not in closed loop)
1) T -> T +T | T * T | id
Size: N*N id + * $
Id - ⋗ ⋗ ⋗
+ ⋖ ⋗ ⋖ ⋗
* ⋖ ⋗ ⋗ ⋗
$ ⋖ ⋖ ⋖ -
Size: 2*N id + * $
fid < gid …. 4<5
F fid – 4 2 4 0 f+ < g* …….2<3
g gid – 5 1 3 0 fid → gx → f+ → g+ → f$
gid → fx → gx → f+ → g+ → f$