Compiler Design Unit 3
Compiler Design Unit 3
UNIT –III
More Powerful LR parser (LR1,LALR) Using Armigers Grammars Equal Recovery in Lr parser
Syntax Directed Transactions Definition, Evolution order of SDTS Application of SDTS. Syntax
Directed Translation Schemes.
UNIT -3
CLR refers to canonical lookahead. CLR parsing use the canonical collection of LR (1) items to build
the CLR (1) parsing table. CLR (1) parsing table produces the more number of states as compare to the
SLR (1) parsing.
In the CLR (1), we place the reduce node only in the lookahead symbols.
In the SLR method we were working with LR(0)) items. In CLR parsing we will be using LR(1)
items. LR(k) item is defined to be an item using lookaheads of length k. So ,the LR(1) item is
comprised of two parts : the LR(0) item and the lookahead associated with the item. The look ahead
is used to determine that where we place the final item. The look ahead always add $ symbol for the
argument production.
LR(1) parsers are more powerful parser.
for LR(1) items we modify the Closure and GOTO function.
Closure Operation
Closure(I)
repeat
for (each item [ A -> ?.B?, a ] in I )
for (each production B -> ? in G’)
for (each terminal b in FIRST(?a))
add [ B -> .? , b ] to set I;
until no more items are added to I;
return I;
Goto Operation
Goto(I, X)
Initialise J to be the empty set;
for ( each item A -> ?.X?, a ] in I )
Add item A -> ?X.?, a ] to se J; /* move the dot one step */
return Closure(J); /* apply closure to the set */
LR(1) items
Void items(G’)
Initialise C to { closure ({[S’ -> .S, $]})};
Repeat
For (each set of items I in C)
For (each grammar symbol X)
if( GOTO(I, X) is not empty and not in C)
Add GOTO(I, X) to C;
Until no new set of items are added to C;
8. The inital state of the parser is the one constructed from the set of items
containing [S' -> .S, $].
Example,
Consider the following grammar,
S‟->S
S->CC
C->cC
C->d
Sets of LR(1) items
I0: S‟->.S,$
S->.CC,$
C->.Cc,c/d
C->.d,c/d
I1: S‟->S.,$
I2: S->C.C,$
C->.Cc,$
C->.d,$
I3: C->c.C,c/d
C->.Cc,c/d
C->.d,c/d
I4: C->d.,c/d
I5: S->CC.,$
I6: C->c.C,$
C->.cC,$
C->.d,$
I7: C->d.,$
I8: C->cC.,c/d
I9: C->cC.,$
- 43 -
JSVG Krishna, Associate Professor.
Downloaded by Priya Rana ([email protected])
lOMoARcPSD|20951282
3.3.LALR PARSER:
We begin with two observations. First, some of the states generated for LR(1) parsing
have the same set of core (or first) components and differ only in their second component,
the lookahead symbol. Our intuition is that we should be able to merge these states and
reduce the number of states we have, getting close to the number of states that would be
generated for LR(0) parsing. This observation suggests a hybrid approach: We can construct
the canonical LR(1) sets of items and then look for sets of items having the same core. We
merge these sets with common cores into one set of items. The merging of states with
common cores can never produce a shift/reduce conflict that was not present in one of the
original states because shift actions depend only on the core, not the lookahead. But it is
possible for the merger to produce a reduce/reduce conflict.
Our second observation is that we are really only interested in the lookahead symbol
in places where there is a problem. So our next thought is to take the LR(0) set of items and
add lookaheads only where they are needed. This leads to a more efficient, but much more
complicated method.
3.4 ALGORITHM FOR EASY CONSTRUCTION OF AN LALR TABLE
Input: G'
Output: LALR parsing table functions with action and goto for G'.
Method:
1. Construct C = {I0, I1 , ..., In} the collection of sets of LR(1) items for G'.
2. For each core present among the set of LR(1) items, find all sets having that core
and replace these sets by the union.
3. Let C' = {J0, J1 , ..., Jm} be the resulting sets of LR(1) items. The parsing actions
for state i are constructed from Ji in the same manner as in the construction of the
canonical LR parsing table.
4. If there is a conflict, the grammar is not LALR(1) and the algorithm fails.
5. The goto table is constructed as follows: If J is the union of one or more sets of
LR(1) items, that is, J = I0U I1 U ... U Ik, then the cores of goto(I0, X), goto(I1,
X), ..., goto(Ik, X) are the same, since I0, I1 , ..., Ik all have the same core. Let K
be the union of all sets of items having the same core asgoto(I1, X).
- 44 -
6. Then goto(J, X) = K.
Consider the above example,
I3 & I6 can be replaced by their union I36:C->c.C,c/d/$
C->.Cc,C/D/$
C->.d,c/d/$
I47:C->d.,c/d/$
I89:C->Cc.,c/d/$
Parsing Table
state c d $ S C
0 S36 S47 1 2
1 Accept
2 S36 S47 5
36 S36 S47 89
47 R3 R3
5 R1
89 R2 R2 R2
3.5HANDLING ERRORS
The LALR parser may continue to do reductions after the LR parser would have spotted an
error, but the LALR parser will never do a shift after the point the LR parser would have
discovered the error and will eventually find the error.
- 45 -
In many programming languages one may write conditionally executed code in two forms:
the if-then form, and the if-then-else form – the else clause is optional:
Here we have a shift-reduce error. Consider the first two items in I3. If we have a*b+c and
we parsed a*b, do we reduce using E ::= E * E or do we shift more symbols? In the former
case we get a parse tree (a*b)+c; in the latter case we get a*(b+c). To resolve this conflict, we
can specify that * has higher precedence than +. The precedence of a grammar production is
equal to the precedence of the rightmost token at the rhs of the production. For example, the
precedence of the production E ::= E * E is equal to the precedence of the operator *, the
precedence of the production E ::= ( E ) is equal to the precedence of the token ), and the
precedence of the production E ::= if E then E else E is equal to the precedence of the token
else. The idea is that if the look ahead has higher precedence than the production currently
used, we shift. For example, if we are parsing E + E using the production rule E ::= E + E
and the look ahead is *, we shift *. If the look ahead has the same precedence as that of the
current production and is left associative, we reduce, otherwise we shift. The above grammar
is valid if we define the precedence and associativity of all the operators. Thus, it is very
important when you write a parser using CUP or any other LALR(1) parser generator to
specify associativities and precedence‟s for most tokens (especially for those used as
operators). Note: you can explicitly define the precedence of a rule in CUP using the %prec
directive:
E ::= MINUS E %prec UMINUS
where UMINUS is a pseudo-token that has higher precedence than TIMES, MINUS etc, so
that -1*2 is equal to (-1)*2, not to -(1*2).
S ::= L = E ;
| { SL }
; | error ;
SL ::= S ; |
SL S ;
The special token error indicates to the parser what to do in case of invalid syntax for S (an
invalid statement). In this case, it reads all the tokens from the input stream until it finds the
first semicolon. The way the parser handles this is to first push an error state in the stack. In
case of an error, the parser pops out elements from the stack until it finds an error state where
it can proceed. Then it discards tokens from the input until a restart is possible. Inserting
error handling productions in the proper places in a grammar to do good error recovery is
considered very hard.
An LR parser will detect an error when it consults the parsing action table and find a
blank or error entry. Errors are never detected by consulting the goto table. An LR parser will
detect an error as soon as there is no valid continuation for the portion of the input thus far
JSVG Krishna, Associate Professor.
Downloaded by Priya Rana ([email protected])
lOMoARcPSD|20951282
The actions may include insertion or deletion of symbols from the stack or the input
or both, or alteration and transposition of input symbols. We must make our choices so that
the LR parser will not get into an infinite loop. A safe strategy will assure that at least one
input symbol will be removed or shifted eventually, or that the stack will eventually shrink if
the end of the input has been reached. Popping a stack state that covers a non terminal should
be avoided, because this modification eliminates from the stack a construct that has already
been successfully parsed.
Syntax Directed Translations
We associate information with a language construct by attaching attributes to the grammar symbol(s)
representing the construct, A syntax-directed definition specifies the values of attributes by associating
semantic rules with the grammar productions. For example, an infix-to-postfix translator might have a
production and rule
a syntax-directed translation scheme embeds program fragments called semantic actions within
production bodies
There are two notations for attaching semantic rules:
1. Synthesized Attributes. They are computed from the values of the attributes of the
children nodes.
2. Inherited Attributes. They are computed from the values of the attributes of both the
siblings and the parent nodes
S-ATTRIBUTED DEFINITIONS
Definition. An S-Attributed Definition is a Syntax Directed Definition that uses
only synthesized attributes.
L-attributed definition
Definition: A SDD its L-attributed if each inherited attribute of Xi in the RHS of A ! X1 :
:Xn depends only on
1. attributes of X1;X2; : : : ;Xi 1 (symbols to the left of Xi in the RHS)
2. inherited attributes of A.
Restrictions for translation schemes:
1 Dependency Graphs
2 Ordering the Evaluation of Attributes
3 S-Attributed Definitions
4 L-Attributed Definitions
"Dependency graphs" are a useful tool for determining an evaluation order for the attribute
instances in a given parse tree. While an annotated parse tree shows the values of attributes, a
dependency graph helps us determine how those values can be computed.
1 Dependency Graphs
A dependency graph depicts the flow of information among the attribute in-stances in a
particular parse tree; an edge from one attribute instance to an-other means that the value of the first is
needed to compute the second. Edges express constraints implied by the semantic rules. In more detail:
Suppose that a semantic rule associated with a production p defines the value of inherited
attribute B.c in terms of the value of X.a. Then, the dependency graph has an edge from X.a to B.c. For
each node N labeled B that corresponds to an occurrence of this B in the body of production p, create an
edge to attribute c at N from the attribute a at the node M that corresponds to this occurrence of X. Note
that M could be either the parent or a sibling of N.
Since a node N can have several children labeled X, we again assume that subscripts distinguish
among uses of the same symbol at different places in the production.
At every node N labeled E, with children corresponding to the body of this production, the synthesized
attribute val at N is computed using the values of val at the two children, labeled E and T. Thus, a
portion of the dependency graph for every parse tree in which this production is used looks like Fig. 5.6.
As a convention, we shall show the parse tree edges as dotted lines, while the edges of the dependency
graph are solid.
The dependency graph characterizes the possible orders in which we can evalu-ate the attributes
at the various nodes of a parse tree. If the dependency graph has an edge from node M to node N, then
the attribute corresponding to M must be evaluated before the attribute of N. Thus, the only allowable
orders of evaluation are those sequences of nodes N1, N2,... ,Nk such that if there is an edge of the
dependency graph from Ni to Nj, then i < j. Such an ordering embeds a directed graph into a linear
order, and is called a topological sort of the graph.
If there is any cycle in the graph, then there are no topological sorts; that is, there is no way to
evaluate the SDD on this parse tree. If there are no cycles, however, then there is always at least one
topological sort
3. S-Attributed Definitions
An SDD is S-attributed if every attribute is synthesized. When an SDD is S-attributed, we can
evaluate its attributes in any bottom-up order of the nodes of the parse tree. It is often especially simple
to evaluate the attributes by performing a postorder traversal of the parse tree and evaluating the
attributes at a node N when the traversal leaves N for the last time.
S-attributed definitions can be implemented during bottom-up parsing, since a bottom-up parse
corresponds to a postorder traversal. Specifically, postorder corresponds exactly to the order in which an
LR parser reduces a production body to its head.
4 L-Attributed Definitions
The idea behind this class is that, between the attributes associated with a production body,
dependency-graph edges can go from left to right, but not from right to left (hence "L-attributed"). More
precisely, each attribute must be either
1. Synthesized, or
2. Inherited, but with the rules limited as follows. Suppose that there is a production A -> X1 X2 .......
Xn, and that there is an inherited attribute Xi.a computed by a rule associated with this production.
Application of SDTS
1 Construction of Syntax Trees
2 The Structure of a Type
The main application is the construction of syntax trees. Since some compilers use syntax trees
as an intermediate representation, a common form of SDD turns its input string into a tree. To complete
the translation to intermediate code, the compiler may then walk the syntax tree, using another set of
rules that are in effect an SDD on the syntax tree rather than the parse tree.
Each node in a syntax tree represents a construct; the children of the node represent the
meaningful components of the construct. A syntax-tree node representing an expression E1 + E2 has
label + and two children representing the subexpressions E1 and E2
implement the nodes of a syntax tree by objects with a suitable number of fields. Each object
will have an op field that is the label of the node.
The objects will have additional fields as follows:
• If the node is a leaf, an additional field holds the lexical value for the leaf. A constructor function Leaf
(op, val) creates a leaf object. Alternatively, if nodes are viewed as records, then Leaf returns a pointer
to a new record for a leaf.
• If the node is an interior node, there are as many additional fields as the node has children in the
syntax tree. A constructor function Node takes two or more arguments: Node(op,ci,c2,... ,ck) creates an
object with first field op and k additional fields for the k children c1,... , .
Example
Figure 5.1 1 shows the construction of a syntax tree for the input a — 4 + c. The nodes of the
syntax tree are shown as records, with the op field first. Syntax-tree edges are now shown as solid lines.
The underlying parse tree, which need not actually be constructed, is shown with dotted edges. The
JSVG Krishna, Associate Professor.
Downloaded by Priya Rana ([email protected])
lOMoARcPSD|20951282
Nonterminal B generates one of the basic types int and float. T generates a basic type when T derives B
C and C derives e. Otherwise, C generates array components consisting of a sequence of integers, each
integer surrounded by brackets.
JSVG Krishna, Associate Professor.
Downloaded by Priya Rana ([email protected])
lOMoARcPSD|20951282
An annotated parse tree for the input string int [ 2 ] [ 3 ] is shown in Fig. 5.17. The corresponding type
expression in Fig. 5.15 is constructed by passing the type integer from B, down the chain of C's through
the inherited attributes b. The array type is synthesized up the chain of C's through the attributes t.
In more detail, at the root for T -» B C, nonterminal C inherits the type from B, using the inherited
attribute C.b. At the rightmost node for C, the production is C e, so C.t equals C.6. The semantic rules
for the production C [ num ] C1 form C.t by applying the operator array to the operands num.ua/ and
C1.t.
• If the parse is top-down, we perform a just before we attempt to expand this occurrence of Y (if Y a
nonterminal) or check for Y on the input (if Y is a terminal).
First, consider the simple case, in which the only thing we care about is the order in which the
actions in an SDT are performed. For example, if each action simply prints a string, we care only about
the order in which the strings are printed. In this case, the following principle can guide us:
When transforming the grammar, treat the actions as if they were terminal symbols.
This principle is based on the idea that the grammar transformation preserves the order of the terminals
in the generated string. The actions are therefore executed in the same order in any left-to-right parse,
top-down or bottom-up.
The "trick" for eliminating left recursion is to take two productions
A -> Aa | b
that generate strings consisting of a j3 and any number of en's, and replace them by productions that
generate the same strings using a new nonterminal R (for "remainder") of the first production:
A->bR
R —»• aR | e
If (3 does not begin with A, then A no longer has a left-recursive production. In regular-definition terms,
with both sets of productions, A is defined by 0(a)*.
Example 5 . 1 7 : Consider the following E-productions from an SDT for translating infix expressions
into postfix notation:
E -> E i + T { print('+'); }
E -> T
If we apply the standard transformation to E, the remainder of the left-recursive production is
a = + T { print('-r'); }
and the body of the other production is T. If we introduce R for the remainder of E, we get the set of
productions:
E --> T R
R --> + T { printC-h'); } R
R -> e
When the actions of an SDD compute attributes rather than merely printing output, we must be more
careful about how we eliminate left recursion from a grammar. However, if the SDD is S-attributed,
then we can always construct an SDT by placing attribute-computing actions at appropriate positions in
the new productions.