0% found this document useful (0 votes)
19 views

Syntax Directed Translation

Uploaded by

Laahiri
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Syntax Directed Translation

Uploaded by

Laahiri
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 173

Syntax directed Translation

• A notational framework for intermediate code


generation that is an extension of context free
grammars..
• This allows subroutines or semantic actions to
be attached to the productions of context free
grammar.
• These subroutines generate intermediate code
when called at appropriate times by a parser
for that grammar.
• This enables the compiler designer to express
the generation of intermediate code directly in
terms of the syntactic structure of the source
language.
Semantic actions:
• In designing an intermediate code there are two
issues.
– What intermediate code should we generate?
– Implement an algorithm for generating this code.
• A syntax directed translation is merely a context
free grammar in which a program fragment
called an output action (semantic action or
semantic rule) is associated with each
production.
For ex: A  XYZ { α }
The α is executed when ever the syntax analyzer
recognizes in its input a substring w which has a
• The output action may involve the
computation of variables belong to the
compiler, generation of intermediate code,
printing error messages or placement of some
value in a table.
• A value associated with a grammar symbol is
called translation of that symbol.
• We can usually denote the translation fields of
a grammar symbol with X.VAL, X.TRUE
• For ex: suppose we have
E  E(1) + E(2) { E.VAL := E(1) . VAL + E(2) .VAL }
these kind of translations are called a
Synthesized translations:
The value of the non terminal on the left side of
the production as a function of the translation
of non terminals on the right side.
Inherited Translations:
A  XYZ { Y.VAL := 2 * A.VAL }
Here the value of the non terminal on the right
side of the production as a function of the
translation of non terminals on the left side.
We shall use synthesized attributes more
exclusively, they are more natural than inherited
• Computation of Synthesized Attributes –
• Write the SDD using appropriate semantic
rules for each production in given grammar.
• The annotated parse tree is generated and
attribute values are computed in bottom up
manner.
• The value obtained at root node is the final
output.
• Computation of Synthesized Attributes –
• Write the SDD using apppropriate semantic
rules for each production in given grammar.
• The annotated parse tree is generated and
attribute values are computed in bottom up
manner.
• The value obtained at root node is the final
output.
Consider the following grammar
S --> E
E --> E1 + T
E --> T
T --> T1 * F
T --> F
F –> digit
The SDD for the above grammar is
Production Semantic action
S --> E { print E.VAL }
E --> E1 + T {E.VAL := E(1) . VAL + T .VAL }
ET { E.VAL := T.VAL }
T --> T1 * F {T.VAL := T(1) . VAL * F .VAL }
TF { T.VAL := F.VAL }
F –> digit { F.VAL := digit.VAL }

Let us assume an input string 4 * 5 + 6 for computing


synthesized attributes. The annotated parse tree for
the input string is
• For computation of attributes we start from leftmost
bottom node.
• The rule F –> digit is used to reduce digit to F and the
value of digit is obtained from lexical analyzer which
becomes value of F i.e. from semantic action F.val =
digit.lexval. Hence, F.val = 4 and since T is parent node of F
so, we get T.val = 4 from semantic action T.val = F.val.
• Then, for T –> T1 * F production, the corresponding
semantic action is T.val = T1.val * F.val . Hence, T.val = 4 * 5
= 20
• Similarly, combination of E1.val + T.val becomes E.val i.e.
E.val = E1.val + T.val = 26. Then,
• the production S –> E is applied to reduce E.val = 26 and
semantic action associated with it prints the result E.val .
Computation of Inherited Attributes –
• Construct the SDD using semantic actions.
• The annotated parse tree is generated and
attribute values are computed in top down
manner.
Example: Consider the following grammar
S --> T L
T --> int
T --> float
T --> double
L --> L1, id
L --> id
The SDD for the above grammar IS:
Production Semantic Action
S --> T L { L.in = T.Type }
T --> int { T.Type := int }
T --> float { T.Type := float }
T --> double { T.Type := double }
L --> L1, id {L1. in := L.in
entry_type (id.entry, L.in) }
L --> id {entry_type(id.entry, L.in) }
Let us assume an input string int a, c for computing
inherited attributes. The annotated parse tree for the
input string is
• The value of L nodes is obtained from T.type
(sibling) which is basically lexical value
obtained as int, float or double.
• Then L node gives type of identifiers a and c.
The computation of type is done in top down
manner or preorder traversal.
• Using function Enter_type the type of
identifiers a and c is inserted in symbol table
at corresponding id.entry.
Implementation of S.D.T
• The output defined is independent of kind of
parsing technique used or the kind of
mechanism used to compute the translations.
• Another convenience is that it is easy to
modify.
• New productions and semantic actions can be
often added without disturbing the existing
translations being computed.
One way to implement S.D.T Is to use extra fields
in the parser stack entries corresponding to the
grammar symbols. These extra fields holds the
values corresponding translations.
If the ith state symbol is E, then VAL[i] will hold
the value of translation E.VAL associated with
the parse tree node corresponding to this E.
For A  XYZ
STATE VAL

Z Z.VAL  TOP
Y Y.VAL
X X.VAL
S.D.T for Desk Calculator
Production Semantic Action
S  E$ { print E.VAL }
E  E(1) + E(2) {E.VAL := E(1) . VAL + E(2) .VAL }
E  E(1) * E(2) {E.VAL := E(1) . VAL * E(2) .VAL }
E  (E(1)) {E.VAL := E(1) . VAL }
EI {E.VAL := I . VAL }
I  I(1) digit {I.VAL := 10*I(1) . VAL + LEXVAL }
I  digit {I.VAL := LEXVAL }
Implementation of Desk calculator
Production Program Fragment
S  E$ print VAL[TOP]
E  E(1) + E(2) VAL[TOP] := VAL[TOP]+VAL[TOP-2]
E  E(1) * E(2) VAL[TOP] := VAL[TOP]*VAL[TOP-2]
E  (E(1)) VAL[TOP] := VAL[TOP -1]
EI NONE
I  I(1) digit VAL[TOP] := 10*VAL[TOP]+LEXVAL
I  digit VAL[TOP] := LEXVAL
Intermediate Code
In many compilers the source code is translated
into a language which in intermediate in
complexity between a programming language
and machine code.

Such a language is called Intermediate code or


intermediate text.
Four kinds of intermediate code used in
compilers.
• Postfix notation
• Syntax trees
• Quadruples
• Triples.
Postfix notation:
The ordinary way of writing expression is infix
notation (a+b), where as the postfix notation or
reverse polish notation for the same expression
is ab+
In general if e1 and e2 are two postfix
expressions and θ is binary operator, the result
of applying θ to the values denoted by e1 and
e2 is indicated in postfix notation by e1e2 θ.
Ex:
(a+b)*c  ab+c*
Postfix expressions can be easily derived for
binary operator expressions.
If it is a ternary operator expression or
conditional operators how to derive postfix
expressions
Let if e then x else y can be represented using
ternary operator as e? x: y this can be
represented in postfix notation as exy?

Ex: if a then if c-d then a+c else a*c else a+b


can be represented in postfix notation as

acd-ac+ac*?ab+?
One language that normally uses postfix
notation is SNOBOL language.
Control flow in Postfix code:
The solution for control flow code is introducing
jump and variety of conditional jumps such as
jump, jlt or jeqz etc..
Ex: if e then x else y
Can be represented in control flow as
e l1 jeqz x l2 jump l1:y l2:
Similarly for
if a then if c-d then a+c else a*c else a+b

al1 jeqz cd- l2 jeqz ac+ l3 jump l2: ac* l3jump


l1: ab+ l3:
S.D.T for Postfix code:

Production Semantic Action


E  E(1) op E(2) E.CODE:= E(1).CODE ||
E(2).CODE || ‘op’.

E  (E(1) ) E.CODE:= E(1).CODE

E  id E.CODE:= id
Syntax Trees:
The parse tree itself is a suitable intermediate
representation. A parse tree however contains
redundant information which can be eliminated.

Thus producing a more economical representation of


source program, which is called a syntax tree (abstract
tree), a tree in which each leaf represents an operand
and each interior node an operator.
S.D.T for Syntax Trees:

Production Semantic Action


E  E(1) op E(2) {E.VAL:= NODE(op, E(1).VAL,
E(2).VAL}
E  (E(1) ) {E.VAL:= E(1).VAL}
E  -E(1) {E.VAL:= UNARY(-,E(1).VAL}

E  id {E.VAL:= LEAF(id) }
Three Address Code:
Is a sequence of statements, typically of the
general form A := B op C.
Ex: for the expression X + Y * Z can be written as
T1 := Y * Z
T2 := X + T1
Where T1 and T2 are compiler generated
temporary names.
Some of the additional Three Address statements
used in Prog. Environment:

• A := B op C
• A := op B
• Goto L
• If a relop b goto L
• Param A and call P,n
Indexed assignment statements like
• A := B[I] and A[I] := B
• A := addr B A := *B *A := B
Quadruples:
We may use a record structure with four fields.
Op, Arg1, Arg2 and Result. This representation is
called quadruples.

Ex: A := -B * (C + D)

T1 := -B
T2 := C+D
T3 := T1 * T2
A := T3
The quadruples for the statements are:

Op Arg1 Arg2 result


(0) Uminus B - T1
(1) + C D T2
(2) * T1 T2 T3
(3) := T3 - A

Triples:
To avoid entering temporary variables into symbol
table, one can allow the statement computing a
temporary value to represent that value.
The Triples for the statements are:

Op Arg1 Arg2
(0) Uminus B -
(1) + C D
(2) * (0) (1)
(3) := A (2)

Indirect Triples:
Another implementation of three address code
which has been considered is that of listing pointers
to triples, rather than listing the triples them selves.
This implementation is naturally called indirect triples.
The Indirect Triples for the statements are:

Statement Op Arg1 Arg2


(0) (14) (14) Uminus B -
(1) (15) (15) + C D
(2) (16) (16) * (14) (15)
(3) (17) (17) := (A) (16)

Indirect Triples:
Another implementation of three address code
which has been considered is that of listing pointers
to triples, rather than listing the triples them selves.
This implementation is naturally called indirect triples.
S.D.T for Assignment statements
Assume that all identifiers denote primitive data
types. First we begin with a simple scheme in which
semantic checking is not necessary.

Assignment statements with integer types:


A  id := E
E  E+E | E*E | -E | (E) | id
Here A stands for an assignment statement.
Abstract translation scheme:
We wish to translate of E to be a structure with two
fields
• E.PLACE, the name that will hold the value of the
expression.
• E.CODE, a sequence of three address statements
evaluating the expression.
For the translation of A there is one field, A.CODE
which is the three address code to execute the
assignment.
We use id.PLACE to denote the name corresponding
to this instance of token id.
Production Semantic Action
A  id := E {A.CODE:=E.CODE||
id.PLACE||’:=‘||E.PLACE}
E  E(1) + E(2) {T:=NEWTEMP();
E.PLACE:=T;
E.CODE:=E(1).CODE||E(2).CODE||E.PLACE||’:=‘||
E(1).PLACE ||’+’ E(2).PLACE }
E  E(1) * E(2) {T:=NEWTEMP();
E.PLACE:=T;
E.CODE:=E(1).CODE||E(2).CODE||E.PLACE||’:=‘||
E(1).PLACE ||’*’||E(2).PLACE }
E  -E(1) {T:=NEWTEMP();
E.PLACE:=T;
E.CODE:=E(1).CODE||E.PLACE||’:= - ‘||E(1).PLACE }
Production Semantic Action

E  (E(1)) { E.PLACE:= E(1).PLACE;


E.CODE:= E(1).CODE }
E  id { E.PLACE:= id.PLACE;
E.CODE:= null }

More concrete translation scheme:

E.PLACE and id.PLACE can be represented by symbol table pointers and

these pointers can be stored on parsing stack of a bottom up parser

along with grammar symbols. E.CODE and A.CODE do not have to be

attached to a parse tree. It will be a time consuming process if we had

to repeatedly copy sequence of three address statements. For

notational convenience we shall use a procedure GEN(A:=B+C) .


GEN would enter the operator + and the values of A,B
and C into the quadruple array.

Production Call to GEN


A  id := E GEN(id.PLACE:= E.PLACE)
E  E(1) + E(2) GEN(E.PLACE:= E(1).PLACE + E(2).PLACE)
E  E(1) * E(2) GEN(E.PLACE:= E(1).PLACE * E(2).PLACE)
E  -E(1) GEN(E.PLACE:= -E(1).PLACE)
E  (E(1)) None
E  id None
Assignment statements with mixed types:
We can use the additional field namely E.MODE in the
translation of E, whose value is either real or
integer.
Ex:
X:=Y + I * J where X and Y are real , I and J are integers.
T1 := I int *J
T2 := inttoreal T1
T3 := Y real + T2
X := T3
T := NEWTEMP();
If E(1) .MODE = integer and E(2) .MODE = integer then
Begin
GEN( T:= E(1).PLACE intop E(2).PLACE );
E.MODE := integer
End
Else If E(1) .MODE = real and E(2) .MODE = real then
Begin
GEN( T:= E(1).PLACE realop E(2).PLACE );
E.MODE := real
End
Else If E(1) .MODE = integer and E(2) .MODE = real then
Begin
U:= NEWTEMP();
GEN(U:=inttoreal E(1).PLACE;
GEN( T:= U realop E(2).PLACE );
E.MODE := real
End
Else If E(1) .MODE = real and E(2) .MODE = integer
then
Begin
U:= NEWTEMP();
GEN(U:=inttoreal E(2).PLACE;
GEN( T:= E(1).PLACE realop U);
E.MODE := real
End
E.PLACE := T;
Boolean Expressions

Two Primary functions.


• Conditional expressions in statements that alter flow of
control.
• Compute logical values.

Grammar describes Boolean expressions:

E  E or E | E and E | not E | (E) | id | id relop id

While relop is any of <, <=, =, !=, >, >=


Translating Boolean Expressions
There are two principal methods of representing the
value of a boolean expression.
• The first method is to encode True and false
numerically and evaluate boolean expression
analogously to an arithmetic expression.
1 is used to denote true and 0 is used to denote false.

• The second method is by flow of control.


Representing the value of boolean expression by a
position reached in program. This is convenient in
flow control statements.
Numerical representation:
In numerical representation 1 is used to denote True and 0 is used to denote False.

For example the translation for A or B and C

T1 := B and C
T2 := A or T1
A relational expression such as A<B is equivalent to the
conditional statement
If A<B then 1 else 0
Which can be translated into three address statement
(1) If A<B goto (4 )
(2) T := 0
(3) Goto (5 )
(4) T := 1
Thus the boolean expression A<B or C
can be translated into
(1) If A<B goto (4 )
(2) T := 0
(3) Goto (5 )
(4) T := 1
(5) T2 := T or C
S.D.T For Boolean expressions with
Numerical representation:
Production Semantic Rule
E  E(1) or E(2) { T := NEWTEMP();
E.PLACE := T;
GEN(T:= E(1) .PLACE or E(2) .PLACE)}
E  id(1) relop id(2) {T := NEWTEMP();
E.PLACE := T;
GEN(if id(1) .PLACE relop id(2) .PLACE goto
NEXTQUAD+3);
GEN(T:=0);
GEN(GOTO NEXTQUAD+2);
S.D.T For Boolean expressions with Control Flow
representation:
If we evaluate the programs by position, we may
be able to avoid evaluating entire expression.
Ex: given the expression A or B, if we determine A
is true, then we can conclude the expression is
true with evaluating expression B.
Similarly for expression A and B, if we
determine A is False, then we can conclude the
expression is false with evaluating expression B.
The semantic definition determines whether all
parts of a boolean expression must be
In the context of conditional
statements:
If E then S else S and While E do S
(1) (2)

In these contexts we can associate two


kinds of exits with the boolean
expression. True exit and false exit.
IF Statement
Code for E

TRUE Code for S(1)

Code for S(2)


FALSE
WHILE Statement
Code for E

TRUE Code for S

GOTO

FALSE
An expression E will be translated into a
sequence of three address statements , that
evaluate E. This translation is a sequence of
conditional and unconditional jumps to one of
two locations.
Ex: consider the expression E(1) or E(2) . If E(1) is
true, then we know that E itself is True. So we
can make the location True for E(1) be the same
as True for E.
If E(1) is false then we must evaluate E(2) so we
make false for E(1) be the first statement in the
code for E(2) . The true and false exits of E(2) is
The problem arises when we produce code
bottom up is that we may not generate the
actual quadruples to which jumps are to be
made.
We call the subsequent filling in of quadruples
backpatching.
To manipulate the list of quadruples, we use
three functions.
MAKELIST()
MERGE(p1, p2)
BACKPATCH(p,i)
MAKELIST(): creates a new list containing only I,
an index into the array of quadruples being
generated.

MERGE(p1, p2): takes the lists pointed by p1, p2


concatenate them into one list.

BACKPATCH(p,i): makes each of the quadruples


on the list pointed to by p take quadruple as a
target.
We need to introduce a marker non terminal M, with
production
Mε { M.QUAD := NEXTQUAD }

The revised grammar is


E  E(1) or ME(2)
E(1) and ME(2)
Not E(1)
(E(1) )
id
id(1) relop id(2)
Mε
SDT for boolean expressions:
E  E(1) or ME(2)
{ BACKPATCH(E(1) .FALSE, M.QUAD)
E.TRUE:= MERGE(E(1) .TRUE, E(2) .TRUE)
E.FALSE := E(2) . FALSE }
E  E(1) and ME(2)
{ BACKPATCH(E(1) .TRUE, M.QUAD)
E.TRUE := E(2) . TRUE
E.FALSE:= MERGE(E(1) . FALSE, E(2) . FALSE)
SDT for boolean expressions:
E  notE(1)
{E.TRUE:= E(1) .FALSE
E.FALSE := E(1) . TRUE }
E  (E(1) )
{E.TRUE := E(1) . TRUE
E.FALSE:= E(1) . FALSE }
E  id
{E.TRUE:= MAKELIST(NEXTQUAD);
E.FALSE := MAKELIST(NEXTQUAD+1);
GEN(if id.PLACE goto _)
SDT for boolean expressions:
E  id(1) relop id(2)
{E.TRUE:= MAKELIST(NEXTQUAD);
E.FALSE := MAKELIST(NEXTQUAD+1);
GEN(if id(1).PLACE relop id(2).PLACE goto
_)
GEN(goto_); }

Mε { M.QUAD := NEXTQUAD }


SDT for Mixed Mode Expressions
E  E(1) + E(2)
{ E.MODE := arith;
If (E(1) .MODE = arith and E(2) .MODE = arith)
then
Begin
T := NEWTEMP();
E.PLACE := T;
GEN(T := E(1) .PLACE + E(2) .PLACE)
END
Else if (E(1) .MODE = bool and E(2) .MODE = arith)
then
Begin
T := NEWTEMP();
E.PLACE := T;
BACKPATCH(E(1) .FALSE, NEXTQUAD);
GEN(T := E(2) .PLACE);
GEN(GOTO NEXTQUAD+2);
BACKPATCH(E(1) .TRUE, NEXTQUAD);
GEN(T := E(2) .PLACE+1);
END
Else if (E(1) .MODE = arith and E(2) .MODE = bool)
then
Begin
T := NEWTEMP();
E.PLACE := T;
BACKPATCH(E(2) .FALSE, NEXTQUAD);
GEN(T := E(1) .PLACE);
GEN(GOTO NEXTQUAD+2);
BACKPATCH(E(2) .TRUE, NEXTQUAD);
GEN(T := E(1) .PLACE+1);
END
Else if (E(1) .MODE = bool and E(2) .MODE = bool)
then
Begin
T := NEWTEMP();
E.PLACE := T;
BACKPATCH(E(1) .FALSE, NEXTQUAD);
BACKPATCH(E(2) .FALSE, NEXTQUAD);
GEN(T := 0);
GEN(GOTO NEXTQUAD+2);
BACKPATCH(E(2) .TRUE, NEXTQUAD);
GEN(T := 1);
BACKPATCH(E(1) .TRUE, NEXTQUAD);
BACKPATCH(E(2) .FALSE, NEXTQUAD);
GEN(T := 1);
GEN(GOTO NEXTQUAD+2);
BACKPATCH(E(2) .TRUE, NEXTQUAD);
GEN(T := 2);
END
Statements that alter flow of control:
S  if E then S(1)
S  if E then S(1) else S(2)
S  while E do S(1)
S  begin L end
SA
L  L(1); S
LS
Statements that alter flow of control:
Revised grammar after including Marker Non
Terminals
S  if E then MS(1)
S  if E then M(1)S(1) N else M(2)S(2)
S  while M(1)E do M(2)S(1)
S  begin L end
SA
L  L(1); MS
LS
Statements that alter flow of control:
Revised grammar after including Marker Non Terminals
Mε { M.QUAD := NEXTQUAD }
S  if E then MS(1)
S  if E then M(1)S(1) N else M(2)S(2)
S  while M(1)E do M(2)S(1)
S  begin L end
SA
L  L(1); MS
LS
Nε { N.NEXT := MAKELIST(NEXTQUAD);
GEN(GOTO _) }
Statements that alter flow of control:
S  if E then M S(1)
{ BACKPATCH(E.TRUE, M.QUAD);
S.NEXT := MERGE(E.FALSE, S(1) .NEXT) }
S  if E then M(1)S(1) N else M(2)S(2)
{
BACKPATCH(E.TRUE, M(1).QUAD);
BACKPATCH(E.FALSE, M(2).QUAD);
S.NEXT := MERGE(S(1) .NEXT, N.NEXT, S(2) .NEXT )
}
S  while M(1)E do M(2)S(1)
{BACKPATCH(S(1).NEXT, M(1).QUAD);
BACKPATCH(E.TRUE, M(2).QUAD);
S.NEXT := E.FALSE;
GEN(GOTO M(1).QUAD); }
S  begin L end
{ S.NEXT := L.NEXT }

SA
{ S.NEXT := MAKELIST() }
L  L(1); MS
{BACKPATCH(L(1).NEXT, M.QUAD ) }
L.NEXT := S.NEXT }

LS
{ L.NEXT := S.NEXT }
Postfix Translations:
We call a translation scheme postfix if for each
production A  α, the translation rule for A.CODE consists of
the concatenation of the translation of Non terminals in α, in
the same order as they appear, followed by tail of output.

Factoring productions to achieve postfix form


taking a production of the form A x1x2..xn,
introducing a new non terminal B, and for some i, replacing
A  x1x2….xn by the pair of productions
A  Bxi+1xi+2…xn and B  x1x2..xi

Since there is only one production for B, the two new productions can be
used exactly where the old one was.
Postfix Translations:
Ex: S  while M(1)E do M(2)S for the while statement.
M(1) could record the first quadruple of the code for E, and
M(2) the first quadruple of the code for S.

An alternative approach to rewrite the production is

S  CS
C  WE do
W  While
where C and W are new non terminals introduced for the
purpose of postfix translation. We could give W the
translation W.QUAD, which would serve for M(1) .QUAD and C
would have the translation C.QUAD with the same value as
When we reduce to C, we can backpatch
E.TRUE to position NEXTQUAD immediately
so M(2) .QUAD is never needed.
A suitable S.D.T is
W  While { W.QUAD := NEXTQUAD }
C  WE do { C.QUAD := W.QUAD;
BACKPATCH(E.TRUE, NEXTQUAD);
C.FALSE := E.FALSE }
S  C S(1) { BACKPATCH(S(1).NEXT, C.QUAD);
S.NEXT := C.FALSE;
GEN(GOTO C.QUAD) }
S.D.T for Procedure Call Statements:
Grammar for simple procedure call statements
S  call id( elist )
elist  elist, E
elist  E
• The translation for a call includes a calling
sequence, a sequence of actions taken on entry to
an exit from each procedure.
• Calling sequence can differ, even for the
implementation of the same language.
• First the arguments are evaluated and put in some
known place so that they may be accessed by the
• Also put in a known place is the return address.
• Let us assume that parameters are passed by
reference. When generating three address code
for this type of call, it is sufficient to generate
three address code to evaluate the arguments,
then followed by list of param three address
statements one for each argument.
• We need to save the value of E.PLACE for each
expression E in id(E,E,E…E).
• Convenient data structure to save these values
is Queue.
S.D.T:
S  call id(elist)
{ for each item p on QUEUE do
GEN(param p);
GEN(call id.PLACE) }
elist  elist, E
{ append E.PLACE to the end of QUEUE }

elist  E
{ initialize QUEUE to contain only E.PLACE
}
S.D.T for declaration Statements:
The simplest form of declaration syntax found in
programming languages is a keyword denoting an
attribute, followed by list of names having that
attribute.

Grammar:
D  integer namelist
 real namelist
namelist  id, namelist
 id
The problem with this is that one cannot get the
attribute associated with namelist until the entire
To adopt this difficulty we can go for another variation
Grammar:
D  integer intlist
 real reallist
intlist  id, intlist
 id
reallist  id, reallist
 id

This approach is undesirable since the number of


productions grows as the number of possible
attributes.
The efficient variation is
Grammar:
D  D, id
 integer id
 real id
S.D.T
D  integer id { ENTER(id.PLACE, integer);
D.ATTR := integer }
 real id { ENTER(id.PLACE, real);
D.ATTR := real }
 D(1), id { ENTER(id.PLACE, D(1).ATTR);
D.ATTR := D(1).ATTR }
S.D.T for Record Structure statements:
Record structures are mostly found in COBOL,
PL/I, PASCAL and C languages.
The basic types in C are characters, integers,
reals and double. From these we can recursively
create new types.

• An array of elements of a given type is a type


• A pointer to any type is a type
• A procedure returning an item of any type
• A structure consisting list of fields.
EX:
Struct {
struct {
char FIRST[15];
char MIDDLE;
char LAST[15];
} NAME;
int GRADE[20];
} CLASS[100];
Translating structure declarations:
Grammar:
type  struct{ fieldlist }
 ptr
 char
 int
 float
 double
fieldlist  fieldlist field;
 field;
field  type id
• Field names are stored in symbol table, and an offset
is associated with each one. That offset is the no. of
memory units preceding the field in any structure in
which it is declared.
• In order to compute offset, we must know the width
of previous fields in the structure.
• Thus the width of a field must also be computed.
• If a name is an array declaration the no. of elements
must be computed. This figure also effects the width
of the field with that name.
D_ENTER(name, size)
W_ENTER(name, width)
O_ENTER(name, offset)
field  type id
{ field.WIDTH := type.WIDTH;
field.NAME := id.NAME;
W_ENTER(id.NAME, type.WIDTH) }

field  field(1) [integer]


{field.WIDTH :=
field(1).WIDTH*integer.VAL;
field.NAME := field(1).NAME;
D_ENTER(field(1).NAME, integer.VAL) }
fieldlist  field ;
{ O_ENTER(field.NAME, 0) }
fieldlist.width := field.WIDTH;
fieldlist  fieldlist(1) field;
{fieldlist.WIDTH := fieldlist(1).WIDTH +
field.WIDTH;
{ O_ENTER(field.NAME,
fieldlist(1).WIDTH ) }
type  struct { fieldlist }
{ type.WIDTH := fieldlist.WIDTH }
type  char { type.WIDTH := 1 }
type  ptr { type.WIDTH := 4 }
Symbol Tables
• The compiler needs to collect and use
information about the names appearing in the
same program, this information entered into a
data structure called a symbol table.
• The information collected about a name includes
name, type, form, location and other attributes.
• Each entry in symbol table is a pair (name,
information).
• Each time a name is encountered, the symbol
table is searched to see whether the name
already exist?
Contents of Symbol table:
A symbol table is a table with two fields, a name
field and an information field.
The information associated with symbol table is:
• String of characters denoting name.
• Attributes of name.
• Parameters such as no. of dimensions.
• Offset describing the position.
This information entered at various times.
Attributes are entered in response to
Capabilities of Symbol table:

• Determine whether the given name is in table


• Add a new name to the table
• Access the information associated with name
• Add new information to given name
• Delete a name or group of names from
symbol table.
Names and symbol table records:
• The simplest way to implement symbol table
is a linear array of records, one record per
name.
• A record consists of a known no. of
consecutive words of memory.
• This method is appropriate if there is a
modest upper limit on length of an identifier.
Ex: IBM FORTAN permits identifiers upto 8
characters length which can fit in two word of
memory of IBM 370 Machine.
In case of ALGOL, with no limit on length of
identifier or in PL/I with the rarely approached
limit of 31. it would be better to use indirect
scheme.

Identifier DIMP
LE
Identifier . 6 (length)

Information Attributes Information Attributes

Identifier Identifier

information information

R F D I MP L E
Reusing Symbol table space:
• The identifier used by a program to denote a
particular name must be preserved in the
symbol table until no further references to
that identifier.
• This is essential so that all uses of identifier
can be associated with same symbol table
entry.
• However a compiler can be designed to run in
less space if the space used to store id’s can be
reused by subsequent passes.
Reusing Symbol table space:
• The identifier used by a program to denote a
particular name must be preserved in the
symbol table until no further references to
that identifier.
• This is essential so that all uses of identifier
can be associated with same symbol table
entry.
• However a compiler can be designed to run in
less space if the space used to store id’s can be
reused by subsequent passes.
• One exception concerns external names.
• When a name is declared external to the
program, the corresponding identifier must be
preserved.

Array names in symbol table:


If the language places a limit on no. of
dimensions, then all subscript information can
be placed in symbol table.
Ex:
ANS FORTRAN limits to 3 dimensions, and in
each case the lower limit is 1. thus the
information associated with array subscript is:
A

UL1 UL2
UL3 B

The portion of the word denoted A consists of 2 bits


which indicate the no. of dimensions. 00 means the
name is not an array. The rest of the word
• in the next record it is space for 3 values UL1,
UL2 and UL3 as needed to indicate the upper
limits along each dimension.
• if the array is a formal parameter, the upper
limit may be another formal parameter. For
that possibility we must provide 3 bits
denoted B.
• If the ith limit is such , then the ith bit of B will
be 1 and ULi will be a pointer to the symbol
table record for the formal parameter
representing the upper limit.
• Where as for the programming languages like
PL/I the upper and lower limits can be any
expression evaluatable at run time.
• Since the no. of dimensions is not limited, we
can not store all information regarding limits
in fixed size entry.
Symbol table record for
A
No. of
. LL1 LL2 LLn
dimensions UL2 ULn
UL1
• Implementation of Symbol table –
Following are commonly used data structure for
implementing symbol table :-
• List –
– In this method, an array is used to store names and associated
information.
– A pointer “available” is maintained at end of all stored records
and new names are added in the order as they arrive
– To search for a name we start from beginning of list till
available pointer and if not found we get an error “use of
undeclared name”
– While inserting a new name we must ensure that it is not
already present otherwise error occurs i.e. “Multiple defined
name”
– Insertion is fast O(1), but lookup is slow for large tables – O(n)
on average
• Self organizing List –
– This implementation is using linked list. A link field is
added to each record.
– Searching of names is done in order pointed by link of
link field.
– A pointer “First” is maintained to point to first record
of symbol table.
– Insertion is fast O(1), but lookup is slow for large tables
– O(n) on average
• Hash Table –
– In hashing scheme two tables are maintained – a hash
table and symbol table and is the most commonly
used method to implement symbol tables..
– A hash table is an array with index range: 0 to table
size – 1.These entries are pointer pointing to names of
symbol table.
– To search for a name we use hash function that will
result in any integer between 0 to table size – 1.
– Insertion and lookup can be made very fast – O(1).
– Advantage is quick search is possible and disadvantage
is that hashing is complicated to implement.
• Binary Search Tree –
– Another approach to implement symbol table is to use
binary search tree i.e. we add two link fields i.e. left
and right child.
– All names are created as child of root node that always
follow the property of binary search tree.
– Insertion and lookup are O(log2 n) on average.
Representing Scope information:
• Many Languages have facilities for defining names
with limited scopes.
• Two canonical examples are FORTRAN. Where the
scope of a name is single subroutine and ALGOL,
where the scope of a name is the block or
Procedure in which it is declared.
• This allows that in the same program the same
identifier may be declared several times as distinct
names with different attributes.
• It is the responsibility of symbol table to keep
different locations of the same identifier distinct.
Limited scope give the compiler an opportunity to allow
symbol table entries to share space.

FORTRAN:
• A FORTRAN program consists of main program,
subroutines and functions.
• Each name has a scope consisting of one routine only.
• We can generate object code for each routine upon
reaching the end of that routine.
• If we do so it is possible that most of the information
in the symbol table can be expunged.
• We need to preserve only names that are external to
the routine just processed.
Hash table permanent reusable
data storage data
storage
. NAME 1 NAME 3

NAME 2 NAME 4

Hash table with permanent and temporary storage.


ALGOL:
• The block structure in ALGOL makes all names local t
the block or procedure in which they declared.
• It is required to apply the most closely nested rule
for binding identifiers to declarations.
• We need a data structure, which as the source
program scanned, makes currently active names
available for reference.
• On the other hand names no longer active cannot
disappear without a trace, as the information about
them will be needed in subsequent passes.
• The general idea is to divide the symbol table in to
two parts. One for active and other for inactive
• To add a new name say XYZ declared in current
block, we search down its chain to make sure it is
not declared in the block.
• Assuming no other declarations of XYZ in current
block is found.
• We place XYZ in the first available locations in the
string table and make a record for it in the first
available space in the symbol table.
• The record for XYZ is placed at the beginning of
chain, since the chain is searched from beginning
to end.
• The most difficult set of operations required when
we reach the end of a block.
• Transfer the record for BLOCK 4 to the position
marked END_B in the block storage table.
• Transfer the record for each name belonging to
BLOCK 4in the symbol table to positions above the
point marked END in the symbol table.
• Position the AVAIL pointer for the string table at
the beginning of the space used for BLOCK 4. this
action has the effect of erasing all names for
BLOCK 4.
UNIT IV
Runtime Storage Administration
• A Compiler may allocate the resources of target
machine to represent data objects.
– Elementary data types: equivalent data objects
– Aggregates: several words of memory
• The rules that define scope, determines the
strategies that can be used to locate storage to
data objects.
• Simplest strategy is static allocation scheme.
• Complex strategy required for languages like
ALGOL
• Yet more complex strategy required for languages
such as PL/I.
Implementation of a simple stack allocation scheme:

• Consider the implementation of Programming


Language like C.
• Data in C can be
global, allocated static storage and available to any
Procedure, local available to a particular procedure.
• With stack allocation strategy each procedure has
an activation record on the stack, in which the
values of local names are kept.
Memory organization for C program

Static area for programs


and global (static ) data

Top Extra storage for R Direction of


Growth Activation record for R
Extra storage for Q
SP
Activation record for Q
Extra storage for P
Activation record for P
The low numbered memory locations contain the
code for the various procedures and space for the
global data.

Organization of C activation record:


In addition to local data there are 5 other items that
appear in the activation record.
• The values of actual parameters
• The count of no. of arguments
• The return address
• The return value
• The value of SP for the activation record below
Activation record for a C procedure

Local data
SP
Old SP
Return value
Return address
Arg. Count
Actual parameters
• In C all local data (including arrays) are of fixed size.
so the size of the activation record computed by
compiler.
• Hence a simple local name X can be referred by
X[SP], where X stands for offset of X.
Procedure calls in C:
Param T1
Param T2
.
Param Tn
Call P,n
PUSH(n) - store the argument count
PUSH(l1) - l1 is the label of return address
PUSH() - leave space for return value
PUSH(SP) - store the old stack pointer
Goto l2 - l2 is the first statement of the called
procedure P.
• The first statement of the called procedure must be a
special three address statement “procbegin”
• Which sets the stack pointer to the place holding the
old SP, and sets TOP to the top of activation record.
SP := TOP
TOP := SP + sp - sp is the size of P (local data for P)
Assuming that TOP points to the lowest numbered used
location on the stack and that memory locations are
counted by words.
We could translate each param T into PUSH(T),
where PUSH(X) stands for
TOP := TOP – 1
*TOP := X

The translation of call P,n statement is


• first store the argument count n,
• the return address
• the old stack pointer and
• then jump to first statement of the procedure called.
The return statement in C can have the form
Return (expression)
The statement can be implemented by three address
code to evaluate the expression into temporary T
followed by
1[SP] := T - 1 is the offset for location of return value
TOP := SP+2 - TOP now points to return address
SP := *SP - restore SP
L := *TOP - the value of L is now return address
TOP := TOP+1 - TOP points to argument count
TOP := TOP+1+*TOP - *TOP is the no. of parameters
Goto *L
Implementation of Block structured Languages:
ALGOL, PL/I present certain complexities which are
not found in C.
• Not only procedures, even blocks may have their
own data. - Activation records must be
preserved.
• Permits arrays of adjustable length.
• Data referencing environment.

• The compiler for these languages determine to


which declaration of name X a use of X refers
using most-closely-nested rule.
Displays:
• Let us ignore blocks and consider only nested
procedures.
• During execution a procedure P can refer a data object
in the top most activation record of any procedure that
surrounds P in the program.
• Due to the adjustable length arrays the position of
activation record for a procedure may vary.
• It is required to have some method to keep track of
location of various activation records.

1. Store with each procedure, a pointer called a static link


to the top most activation record of that procedure
which physically surrounds it in the program.
• If the nesting level of procedures is deep,
resolving references to non local data can
become expensive.
• Display, is a common way of providing more
direct access to non local data.
• A display consists of an array of pointers to the
currently accessible activation records of
procedures surrounding current active
procedure.
• There are several places where displays can be
stored.
– Memory : indirect addressing/ indexed addressing
– Run time stack. Create new copy at each block/
procedure entry.

• Use SP and offset to find appropriate display


pointer.
• Use display pointer + an offset to find the desired
data object in the activation record.
EXAMPLE:

Procedure MAIN();
Procedure P(a);
Procedure Q(b);
L1: R(x,y);
end Q;
L2: Q(z);
end P;
Procedure R(c,d);
end R;
L3: P(w);
L4: R(u,v);
end MAIN;
• When we execute MAIN at L5, we call P(w) at L3,
which calls Q(z) at L2, which calls R(x,y) at L1.
The stack and displays are as shown:
TOP
Activation Record for
R
Activation Record for
Q
Activation Record for
DISPLAY[1] P
.
DISPLAY[2] Activation Record for
. MAIN
• Suppose Q calls R(x,y) at L1, and R has the
following declaration of local data

Integer I;
Real Array A[0:n-1, 1:m];
Real Array B[2:10];
• Activation Record for R(x,y)

Data descriptor and


TOP elements of B
Data descriptor and
elements of A
Display pointer for MAIN
Pointer to B
Pointer to A
Value of I
Old SP
SP Return address
Arg count = 2
Actual parameter y
Actual parameter x
Procedure calls:
• Suppose we currently executing Procedure Q and
we execute a procedure call R(x,y) at L1
• The level of Q is three. (no. of procedures in
static environment).
• While Q is in execution the DISPLAY has 3
pointers one each to MAIN, P and Q.
• However R at level two has environment
consisting only MAIN and itself.
• Thus we create a proper display for R
• In general when a procedure P1 at level l1 call P2
at level l2
• the P2 must be the part of static environment of
P1.
• That is P2 must be defined by some procedure P3
at level l3 in the environment of P1.
• In the above example Q calling R : P1, P2 and P3
are Q, R and MAIN respectively.
• When P1 calls P2, to create the display for P2 the
l1-l3 top display pointers must be popped off P1’s
display
• And replaced by a pointer to P2’s activation
record.
The display pointers for Procedure Q are as shown:

TOP
Activation Record for
R
Activation Record for
Q
Activation Record for
P
DISPLAY[1] .
Activation Record for
DISPLAY[2] . MAIN
DISPLAY[3] .

LEVEL 3
The display pointers for Procedure R are as shown:

TOP
Activation Record for
R
Activation Record for
Q
Activation Record for
P
. Activation Record for
DISPLAY[1] . MAIN
DISPLAY[2]

l1 is 3 for Q
l3 is 1 for MAIN
l1 – l3 = 3-1 =2 pop 2 display pointers for display
Error Detection and Recovery:

• There are variety of ways in which a compiler can react to


mistakes in the source program.
• Unacceptable modes of behavior are to produce
– a system crash
– emit invalid input output
– quit on the first detected error
• Compiler should attempt to recover from each error and
continue analyzing its input.
• Simple compiler – may stop all activities other than LA and
SA
• A more complex compiler may attempt to repair the error
• A sophisticated compiler may attempt to correct the
• Detection
• Recovery
• Repair
• Correction
Are just terms used to describe possible reactions to
errors.
• No compiler can do true correction.
Reporting errors
Good error diagnostics can significantly help reduce
debugging and maintenance effort.

Properties:
• Should pin point the errors in terms of original
source program.
• Understandable by the user.
• Should be specific and should localize the problem
• Should not be redundant.
Sources of error:

• Design specification for the program may be


inconsistent
• Algorithm used to meet design specification may be
inadequate or incorrect.
• Programmer may introduce errors (logical errors)
• Keypunching or transcription errors
• The program may exceed a compiler or machine
limit
• Compiler can insert errors
The sources of error determines to some degree with
which a compiler can repair the error.
• Algorithmic errors compiler can do little.
• On coding errors compiler may do better.
• But some errors are very difficult to correct.
Ex: DO WHILE (0 < I < 10)

There will be confusion whether 0<I<10 will always


evaluate to true or (DO I = 1 to 9 or DO while (0<I &&
I<10)
With transcription errors there is much more
redundancy.
• Insertion of an extra character
• Deletion of a required character
• Replacement of character or token by an incorrect
character or token
• Transposition of two adjacent characters or tokens.
Syntactic Errors:

Common examples of syntactic errors


• Min(A, 2 *(3+B))
• DO 10 I = 1, 100
• I = 1;
J = 2;
• F: PORCEDURE OPTIONS(MAIN)
• /* COMMENT * /

In these examples it is easy for a human being to


determine the type of error. And position at which error
occurs.
For example consider the statement
A:= B – C * D + E)
Here we know error occurred, but we cant be certain
what is the exact position of error.

Similarly consider the statement

IFA = B THEN
SUM = SUM + A;
ELSE
SUM = SUM – A ;
Minimum distance correction of errors:
• On theoretical way of defining errors and their
location is the minimum hamming distance method.
• We define a collection of error transformations.
• A program P has k errors if the shortest sequence of
error transformations that will map any valid
program in to P has length k.

• Although minimum distance method is a convenient


theoretical one, but not generally used because it is
costly to implement.
Semantic errors:
• Can be detected both at compile time and at run
time.
• The most common semantic errors are errors of
declaration and scope.
Typical examples are
• Undeclared or multiply declared identifiers.
• Type incompatibilities between operators and
operands.
• Type incompatibilities between formal and actual
parameters
II Assignment Questions
1. Explain about the data structures used for symbol
table.
2. Discuss syntax-directed translations for statements
that alter flow of control.
3. Solve the expression (a+b)*(c+d)+(a+b+c) into
quadruples, triples and indirect triples.
4. Explain syntax-directed translations for Assignment
Statements.
5. Apply the syntax-directed translations for Boolean
expressions and translate the following into three
address code.
A<B or C<D and E<F
Plan of error detector and corrector
Diagnostic
message
printer Symbol
table

Lexical Syntactic
corrector corrector

Lexical Semantic
Parser checker
analyzer tokens intermediate code
Source
code
Lexical phase errors
• The lexical analyzer detects an error when it discovers that an input's
prefix does not fit the specification of any token class.
• After detecting an error, the lexical analyzer can invoke an error
recovery routine. This can entail a variety of remedial actions.
• The simplest possible error recovery is to skip the erroneous
characters until the lexical analyzer finds another token.
• But this is likely to cause the parser to read a deletion error, which can
cause severe difficulties in the syntax analysis and remaining phases.
• One way the parser can help the lexical analyzer can improve its
ability to recover from errors is to make its list of legitimate tokens (in
the current context) available to the error recovery routine.
• The error-recovery routine can then decide whether a remaining
input's prefix matches one of these tokens closely enough to be treated
as that token.
Syntactic Phase errors
• A parser detects an error when it has no legal move from its
current configuration.
• The LL (1) and LR (1) parsers use the valid prefix property;
therefore, they are capable of announcing an error as soon as they
read an input that is not a valid continuation of the previous
input's prefix.
• This is earliest time that a left-to-right parser can announce an
error. But there are a variety of other types of parsers that do not
necessarily have this property.
• The advantages of using a parser with a valid-prefix-property
capability is that it reports an error as soon as possible, and it
minimizes the amount of erroneous output passed to subsequent
phases of the compiler.
Panic Mode Recovery
• Panic mode recovery is an error recovery method
that can be used in any kind of parsing, because
error recovery depends somewhat on the type of
parsing technique used.
• In panic mode recovery, a parser discards input
symbols until a statement delimiter, such as a
semicolon or an end, is encountered .
• The parser then deletes stack entries until it finds an
entry that will allow it to continue parsing, given the
synchronizing token on the input.
• This method is simple to implement, and it never
gets into an infinite loop.
Error Recovery in LR Parsing:
• A systematic method for error recovery
in LR parsing is to scan down the stack until a
state S with a goto on a particular nonterminal A is
found
• Then discard zero or more input symbols until a
symbol a is found that can legitimately follow A .
• The parser then shifts the state goto [ S, A ] on the
stack and resumes normal parsing.
• There might be more than one choice for the
nonterminal A . Normally, these would be
nonterminals representing major program pieces,
such as statements.
• Another method of error recovery that can be
implemented is called "phrase level recovery".
• Each error entry in the LR parsing table is examined, and,
based on language usage, an appropriate error-recovery
procedure is constructed .
• For example, to recover from an construct error that
starts with an operator, the error-recovery routine will
push an imaginary id onto the stack and cover it with the
appropriate state.
• While doing this, the error entries in a particular state
that call for a particular reduction on some input symbols
are replaced by that reduction.
• This has the effect of postponing the error detection until
one or more reductions are made; but the error will still
State Id + * ( ) $ E
0 S3 S2 1
1 S4 S5 ACCEP
T

2 S3 S2 6
3 R4 R4 R4 R4
4 S3 S2 7
5 S3 S2 8
6 S4 S5 S9
7 R1 S5 R1 R1
8 R2 R2 R2 R2
9 R3 R3 R3 R3
State Id + * ( ) $ E
0 S3 e1 e1 S2 e2 e1 1
1 e3 S4 S5 e3 e2 ACCEP
T

2 S3 e1 e1 S2 e2 e1 6
3 r4 R4 R4 R4 R4 R4
4 S3 e1 e1 S2 e2 e1 7
5 S3 e1 e1 S2 e2 e1 8
6 e3 S4 S5 e3 S9 e4
7 R1 R1 S5 R1 R1 R1
8 R2 R2 R2 R2 R2 R2
9 R3 R3 R3 R3 R3 R3
• e1: this routine is called from states 0,2,4 and 5
all of which expect the beginning of operand.
Instead an operator or end of output was
found.
Action: push an imaginary id on to the stack and
cover it with state 3
error Diagnostic: Missing Operand
• e2: This routine is called from states 0,1,2,4 and
5 on finding a right parenthesis
action: remove the next input symbol
error diagnostic: Unbalanced right Parenthesis
• e3: this routine is called from states 1 or 6 when
expecting an operator and an id or right
parenthesis is found
Action: push + on to the stack and cover it with
state 4
error Diagnostic: Missing Operator
• e4: This routine is called from states 6 when the
end of input is found. State 6 expects an
operator or a right parenthesis
action: push right parenthesis on to the stack
and cover it with state 9
error diagnostic: Missing right Parenthesis
• Consider the input id+)
Stack input
0 id+)$
0id3 +)$
0E1 +)$
0E1+4 )$ /* ( removed by routine e2
0E1+4 $
0E1+4id3 $ /* id pushed onto the stack by e1
0E1+4E7 $
0E1 $ /* parsing completed */
Error Recovery in LL Parsing:

• Error correction strategies in LL parser parallel


the strategies for LR.
1. Panic Mode of error correction of an LL Parser.
2. Fill in the LL parser’s blank entries with pointers
to error routines.
Constructing Predictive Parsing table:
E  TE`
E`  +TE` | ε
T  FT`
T`  * FT` | ε
F  (E ) | id id + * ( ) $

E E – TE` E – TE`

E` E`  E`  € E`  €
+TE`

T T  FT` T  FT`

T` T`  € T`  T`  € T`  €
*FT`

F F  id F  (E)
id + * ( ) $

E E – TE` e1 e1 E – TE` e1 e1

E` E`  € E`  +TE` E`  € E`  € E`  € E`  €

T T  FT` e1 e1 T  FT` e1 e1

T` T`  € T`  € T`  *FT` T`  € T`  € T`  €

F F  id e1 e1 F  (E) e1 E1

Id Pop

+ Pop

* Pop

( Pop

) e2 e2 e2 e2 Pop e2

$ e3 e3 e3 e3 e3 accept
• The table includes pop actions which match an
input symbol against an identical terminal on
the stack.
• Some entries are still left blank. These entries
which can obviously never be exercised, even
on erroneous input.
• We also inserted E`  € and T`  € in certain
places where an error can be detected.
• This may postpone some error detection, but
cannot cause an error to be missed.
• e1: this routine is called when an operand
beginning with an id or a left parenthesis is
expected. But an operator , right parenthesis or
the end of input was found.
Action: push id on to the input
error Diagnostic: Missing Operand
• e2: here we have right parenthesis on top of
the stack but not on the input.
action: pop the right parenthesis from the stack
error diagnostic: Missing right Parenthesis
• e3: the stack has been emptied, but input
remains.
Action: remove remaining all symbols from
input.
error Diagnostic: unexpected input
Semantic errors:
The primary sources of semantic errors are
– Undeclared names
– Type incompatibilities
• Recovery from undeclared names is straight forward.
• First time we encounter an undeclared name we
make an entry for that name in symbol table with
appropriate attributes.
• The attributes can be determined by the context in
which the name is used.
• A flag in the symbol table entry is set to indicate that
the entry was made in response to a semantic error
rather than a declaration.
Code Generation
• Input to our code generation is an optimized
intermediate code that can be a sequence of
– Quadruples
– Triples
– A Tree
– Postfix polish string
• The output of Code generator is the object
program. This may take a variety of forms
– An absolute Machine language program
– A relocatable Machine language program
– An assembly language program
– Some other programming language
• The advantage of generating absolute machine code
is it can be placed in a fixed memory location and
immediately executed.
ex: student job compilers
• Relocatable object program allows subprograms to
be compiled separately. Set of relocatable object
modules can be linked together and loaded for
executing.
– More flexible
• Producing an assembly language as output
makes the process easier.
• Producing high level language as output
simplifies code generation even further.
FORTRAN is the output of ALTRAN Compiler.
Problems in code generation:

There are three main sources of difficulty.


• Deciding what machine instructions to
generate.
• Deciding what order the computations should
be done
• Deciding which registers to use
What instructions should we generate?
• Most machines permit certain computations to be done in
variety of ways.

Ex: “add one to storage”


the three address statement
A:= A+1
can be generating single instruction
AOS A (or)
LOAD A
ADD #1
STORE A
In what order should we perform computation?
• Some computation order requires fewer
registers to hold intermediate results than
others.
• Picking the best order is very difficult problem in
general.
• We shall generate the code for three address
statements in the order in which they generated
by semantic routines.
What registers should we use?
• Deciding the optimal assignment of registers to variables is
difficult.
• The problem is further complicated as some machines require
register pairs for some operands and results.

EX: IBM 370 machine requires register pairs for integer


multiplication and integer division.
M X,Y where X is the multiplicand refers to even
register of even/ odd register pair. The product occupies the
entire register pair.

D X,Y where the 64 bit dividend occupies the even/


odd register pair whose even register is X, Y represents the divisor.
after division the even register holds the remainder and the odd
register holds the quotient.
Machine Model:
A good code generator requires an intimate
knowledge of the target machine.
Addressing modes:
1. r(register mode)
2. *r (indirect register mode)
3. X(r) (indexed mode)
4. *X(r) (indirect indexed mode)
5. #X (immediate)
6. X (absolute)
• Cost of the instruction (length of the instruction)
• If space is important we should minimize the
length.
• On some machines time taken to fetch the a
word from memory is more than the time to
execute the instruction.

Ex: 1. Mov R0, R1 Cost= 1


2. Mov R5, M Cost= 2
3. Add #1, R3 Cost= 2
4. SUB 4(R0), *5(R1) Cost=3
• For a quadruple of the form A:=B+C where B and C
are simple variables in distinct memory locations.
We can generate a variety of code sequences
1. Mov B, R0 cost= 6
ADD C, R0
MOV R0, A
2. MOV B,A cost= 6
ADD C,A
3. MOV *R1, *R0 cost= 2
ADD *R2, *R0
4. ADD R2, R1 (assuming R0, R1 and R2 contains A, B and C)
MOV R1, A cost=3
Simple Code Generator
Next Use information
• To make more informed decisions regarding
register allocation we compute next use of
each name in quadruple
• If Quadruple I assigns a value to A, the
Quadruple j has A an operand and control can
flow from i to j without any intervening
assignments to A, then we say Quadruple j
uses the value of A computed in i
Register Descriptors
• To perform register allocation we shall maintain
register descriptor that keeps track of what is
currently in each register.
• Initially the register descriptor says all registers
are empty.
Address Descriptors
• For each name in the block we shall maintain
address descriptor that keep track of the
location where the current value of the name
can be found at run time.
• The location might be a register, a stack
The function GETREG():
GETREG() function will return the location L to hold
the value of A for the assignment A:=B op C.
1. If the name B is in register that holds the value of
no other names, and B is not live and has no next
use after execution of A:= B op C, then return
register of B for L.
2. Failing(1), return an empty register for L if there is
3. Failing(2), if A has Next use in the Block, or op is an
indexing operator find an occupied Register R.
4. If A is not used in the block, or no suitable
occupied register can be found, select the memory
location of A as L.
Code Generation Algorithm:
• Determine the present location of B by consulting the
Address Descriptor of B. If B is not presently in
register L, then generate the following instruction to
copy the value of B to L
MOV B’, L
• Generate the instruction OP C’, L, where C’ is the
current location of C.
• Update the address descriptor of A to indicate that A
is in location L.
• If B and C have no next use, are not live on exit from
block and are in registers, alter the register descriptor
that those registers no longer contain B and/or C
Ex: W:= (A-B) +(A-C) + (A-C)
Three address statements:
T := A – B
U := A – C
V := T + U
W := V + U
Statement Code R. Desc A.Desc
T:=A – B MOV A, R0 R0 Contains T T in R0
SUB B,R0
U:=A-C MOV A,R1 R0 Contains T T in R0 SUB C, R1
R1 Contains U U in R1
V:=T+U ADD R1, R0 R0 contains V U in R1
R1 Contains U V in R0
W:=V+U ADD R1,R0 R0 Contains W W in R0
MOV R0, W W in R0
and memory

You might also like