0% found this document useful (0 votes)
32 views59 pages

Unit-4-4

The document discusses intermediate code generation and representations including three-address code. It also covers declarations, nested procedures, records, and assignment statements as they relate to code generation and symbol tables.

Uploaded by

Jefferson Aaron
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views59 pages

Unit-4-4

The document discusses intermediate code generation and representations including three-address code. It also covers declarations, nested procedures, records, and assignment statements as they relate to code generation and symbol tables.

Uploaded by

Jefferson Aaron
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

18CSC304J- COMPILER DESIGN

UNIT-4

SRMIST, Vadapalani Campus


UNIT-IV SYLLABUS
1. Intermediate Code Generation
10. Code Generation
2. Intermediate Languages - prefix - postfix
11. Issues in the design of code generator
3. Quadruple - triple - indirect triples
12. The target machine – Runtime Storage
Representation
management
4. Syntax tree- Evaluation of
10. A simple Code generator
expression-three-address code
11. Code Generation Algorithm
5. Synthesized attributes – Inherited
12. Register and Address Descriptors
attributes
13. Generating Code of Assignment Statements
6. Intermediate languages – Declarations
14. Cross Compiler – T diagrams
7. Assignment Statements
15. Issues in Cross compilers
8. Boolean Expressions, Case Statements
9. Back patching – Procedure calls

2
SRMIST, Vadapalani Campus
Intermediate Languages

3
SRMIST, Vadapalani Campus
Intermediate Languages
● Syntax trees and postfix notation, respectively, are two kinds of
intermediate representations.
● A third, called three-address code, will be used here.
● The semantic rules for generating three-address code from common
programming language constructs are similar to those for constructing
syntax trees or for generating postfix notation.

4
CS416 Compiler Design
Graphical Representations
● A syntax tree depicts the
natural hierarchical
structure of a source
program.
● A dag gives the same
information but in a more
compact way because
common subexpressions
are identified.
● A syntax tree and dag for
the assignment statement
a := b* -c + b* -c
5
CS416 Compiler Design
DECLARATIONS
● As the sequence of declarations in a procedure block is examined, storage for
names local to the procedure are defined.
● For each local name, a symbol-table entry with information like type and relative
address of the storage of the name is used.
● The relative address consists of an offset from the base of the static data area or
the field for local data in an activation record.
● When the front code generates address, it may have a target machine in mind
● Suppose that addresses of consecutive integers differ by 4 on a byte addressable
machine.
● The address calculations generated by the front end may therefore include
multiplications by 4.

6
Declarations in a procedure
● The syntax of languages such as C, Pascal, and Fortran, allows all the
declarations in a single procedure to be processed as a group.
● A global variable, offset can keep track of the next available relative
address.
● The procedure enter (name, type, offset) creates a symbol-table entry for
name, gives it type type and relative address offset in its data area.
● We use synthesized attributes type and width for nonterminal T to indicate
the type and width or number of memory units taken by objects or that
type.
● Attribute type represents a type expression constructed from the basic
types integer and real by applying the type constructors pointer and array,
● If type expressions are represented by graphs, then attribute type might be
a pointer to the node representing a type expression
7
In the translation nonterminal P generates a sequence SDT for declaration
of declarations of the form id : T

The first declaration is considered, offset is set to 0.

As each new name is seen, that name is entered in the


symbol table with offset equal to the current value of
offset, and offset is incremented by the width of the data
object denoted by that name.

The width of an array is obtained by multiplying the


width or each element by the number of elements in the
array.

The initialization of offset in the translation scheme is


more evident if the first production appears as on one
line:

Nonterminal generating ε, called marker


nonterminals can be used to rewrite productions so
that all actions appear at the ends of right sides.
Using a marker nonterminal M, can be restated as: 8
9
Keeping Track of scope Information

● In a language with nested procedures, names local to each procedure can


be assigned relative addresses
● When a nested procedure is seen, processing of declarations in the
enclosing procedure is temporarily suspended.
● This approach will be illustrated by adding semantic rules to the following
language,

● The nonterminal T has synthesized attributes type and width

10
Example: Nested Procedures The nesting of procedure definitions in the Pascal program
indicated by the following indentation:

11
The semantic rules are defined in terms of the following operations:

1. mktable(previous):
-creates a new symbol table and returns a pointer to the new table.
-The argument previous points to a previously created symbol table.
-The pointer previous is placed in a header for the new symbol table,
along with additional information such as the nesting depth of a procedure.
2. enter(table, name, type, offset):
-creates a new entry for name name in the symbol table pointed by table.
-enter places type type and relative address offset in fields within the entry
3. addwidth(table, width):
-This records the cumulative width of all the entries in table in the header
4. enterproc(table, name, newtable)
-creates a new entry for procedure name in the symbol table pointed to table.
-The argument newtable points to the symbol table for this procedure name

12
The action for nonterminal M initializes stack tblptr with a
symbol table for the outermost scope, created by operation
mktable (nil)

The action also pushes relative address 0 onto stack offset

The nonterminal N plays a similar role when a procedure


declaration appears, uses the operation mktable(top(tblptr))
to create a new symbol table.
For each variable declaration id : T, an entry Is
created for id in the current symbol table.

This declaration leaves the stack tblptr unchanged:


the top of stack offset is incremented by T.width.

The action on the right side of proc id: ND1;S occurs,


the width of all declarations generated by D1 is on
top of stack offset; it is recorded using addwidth.

Stacks tblptr and offset are then popped and revert to examining the declarations in the enclosing
procedure and then name of the enclosed procedure is entered into the symbol table 13
Field Names in Records

A pointer to this symbol table is pushed onto stack tblptr


and relative address 0 is pushed onto stack offset.
After the keyword record is seen, the action
associated with the marker L creates a new The action for D🡪 id : T enters information about the
symbol table for the field names. field name id into the symbol table for the record.

The top of stack offset will hold the width of all the data
objects within the record after the fields have been
examined

The end returns this width as synthesized attribute


T.width.

The type T.type is obtained by applying the constructor


record to the pointer to the symbol table for this record.
14
ASSIGNMENT STATEMENTS
● Expressions can be of type Integer, real, array, and record;
○ How names can be looked up in the symbol table
○ How elements of arrays and records can be accessed.
Names in the Symbol Table
● The translation scheme shows how such symbol-table entries can be found.
● The lexeme for the name represented by id is given by attribute id.name
● Operation lookup(id.name) checks if there is an entry for this occurrence of
the name in the symbol table.
● If so, a pointer to the entry is returned; otherwise, lookup returns nil to
indicate that no entry was found.

15
● The semantic actions use procedure emit to
emit three-address statements to an output
file, rather than building up code attributes for
nonterminals
● Example:
Suppose that the context in which an assignment
appears is given by the following grammar.

Nonterminal P becomes the new start -symbol


when the following new productions are added to
the existing productions

16
Reusing Temporary Names
● Assuming that newtemp generates a new temporary name each time a temporary
is needed.
● It is useful, in optimizing compilers, to actually create a distinct name each time
newtemp is called;
● The temporaries used to hold intermediate values in expression calculations and
need separate space has to be allocated to hold their values
● Temporaries can be reused by changing newtemp

● From the rules for the synthesized attribute E.place follows that t1and t2 are not
used elsewhere in the program.
● The lifetimes of these temporaries are nested like matching pairs of balanced
parentheses.
● It is possible to modify newtemp so that it uses a small array in a procedure’s data
area to hold temporaries. 17
The sequence of three-address statements that would be generated by semantic rules if newtemp were
modified. The table also contains an indication of the "current" value of c after the generation of each
statement.
Temporaries that may be assigned and/or used
more than once

In a conditional assignment, cannot be assigned


names in the last-in first out manner

Problem of temporaries defined or used more than


once occurs when we perform code optimization
such as combining common subexpressions or
moving a computation out of a loop

A reasonable strategy is to create a new name


whenever we create an additional definition or use
for a temporary or move its computation.

18
Addressing Array Elements
● Elements of an array can be accessed quickly if the elements are stored in a block of
consecutive locations. If the width of each array element is w, then the ith element of
array A begins in location

● where low is the lower bound on the subscript and base is the relative address of the
storage allocated for the array. i.e., base is the relative address of A[low].
● The above expression can be partially evaluated at compile time as

● The subexpression c = base-low * w can be evaluated when the declaration of the


array is seen.
● Assume that c is saved in the symbol table entry for A, so the relative address of A[i]
is obtained by simply adding i X w to c.
● Similarly compile-time address calculations of elements of multi-dimensional arrays
A two-dimensional array is normally stored in one of two forms, either row-major
(row-by-row) or column-major (column-by-column). Eg: 2 x 3 array A 19
Fortran uses column-major form

Pascal uses row-major form, because A[i, j] is


equivalent to A[i][j] and the elements of each
array A[i] are stored consecutively

In two dimensional array stored in row-major


form, the relative address of A[i1,i2] can be
written as

where low and base, are the lower bounds on the values of i1
and i2 and n2 is the number of values that i2 can take

if high2 is the upper bound on the value of i2, then

Assuming that i1 and i2 are the only values that are not known at compile time,
rewrite the above expression as

20
Generalize row- or column-major form to many dimension as

Again the expression above can be generalized to the following expression for the relative
address of A[i1,i2,i3,…,ik]

For all j, nj = highj, - lowj + 1 is assumed as fixed,


the second line term can be computed by the compiler and saved with the symbol-table entry for A

Column-major form generalizes to the opposite arrangement, with the leftmost subscripts

Some languages permit the sizes of arrays to be specified dynamically, when a


procedure is called at run-time.

The formulas for accessing the elements of such arrays are the same as for
fixed-size arrays, but the upper and lower limits are not known at compile time.

21
Example:
Array references can be permitted in assignments if nonterminal L with the following productions as

As we group index expression into an Elist, it is useful to rewrite the productions as

These productions allow a pointer to the symbol-table entry for the array name to be passed as a
synthesized attribute array of Elist
Also Elist.ndim to record the number of dimensions (index expressions) in the Elist.

And Elist.place denotes the temporary holding a value computed from index expressions in Elist

An /-value L will have two attributes, L.place and L.offset

L.place will be a pointer to the symbol-table entry for that name; L.offset will be null, indicating that the
l-value is a simple name rather than an array reference.

The nonterminal E has the same translation E.place


22
23
The Translation Scheme for Addressing Array Elements

We generate a normal assignment if L is a simple name, and an indexed assignment into the
location denoted by L :

24
The code for arithmetic expressions is

When an array reference L is reduced to E, we use indexing to


obtain the contents of the location L.place [L.offset]:

Here L-offset is a new temporary representing the first term in multi-dimensional array

Function width(Elist.array) returns w

L.place represents the second term by the function


c (Elist.array).
25
A null offset indicates a simple name

When the next index expression is seen, the recurrence is applied

Elist1.place corresponds to em-1 and Elist.place to em.

E.place holds both the value of the expression E and the value for m=1.

26
Example: Let A be a 10x20 array with
low1= low2 = 1. Also, n1= 10 and n2 = 20 w to be 4

An annotated parse tree for the assignment


x := A[ y,z ] is shown on right side

The assignment is translated into the following


sequence of three-address statements:

27
Type Conversions within Assignment

There would be many different types of Variables


and constants, so the compiler must either reject
certain mixed-type operations or generate
appropriate coercion (type conversion)
instructions

Introduce another attribute E.type, whose value is


either real or integer
The semantic rule for E.type associated with the
production E🡪E + E is:

The semantic rule for E🡪E + E must be modified to


generate, three-address statements of the form
x := * inttoreal, i.e., to convert integer y to a real of
equal value, called x
28
29
Accessing Fields in Records

The compiler must keep track of both the types and relative addresses of the fields of a record

An advantage of keeping this information in symbol-table entries for the field names is that the routine for
looking up names in the symbol table can also be used for field names.

If t is a pointer ta the symbol table for a record type, then the type record(t) formed by applying the
constructor record to the pointer was returned as T.type

We use the expression to illustrate how a pointer 10 the symbol table can be extracted
from an attribute E.type

Here p must be a pointer to a record with a field name info whose type is arithmetic

If types are constructed the type of p must be given by a type expression

30
BOOLEAN EXPRESSIONS
● Boolean expressions have two primary purposes. They are used to compute logical
values, but more often they are used as conditional expressions in statements that
alter the flow of control, such as if-then, if-then-else, or while-do statements.
● Boolean expressions are composed of the boolean operators (and, or, and not)
applied to elements that are boolean variables or relational expressions.
● Relational expressions are of the form E1 relop E2, where E1 and E2 are arithmetic
expressions.
● Consider boolean expressions generated by the following grammar:

31
Methods of Translating Boolean Expressions
● There are two principal methods of representing the value of a boolean expression
● The first method is to encode true and false numerically and to evaluate a boolean
expression analgously to an arithmetic expression. Often 1 is used to denote true
and 0 to denote false
Example: let any nonnegative quantity denote true and any negative number denote
false.
● The second principal method of implementing boolean expressions is by flow of
control, i.e., representing the value of a boolean expression by a position reached
in a program.
● This method is convenient in implementing the boolean expressions in
flow-of-control statements, such as the if-then and while-do statements.
Example: Consider the expression E1 or E2, if we determine that E1 is true, then we
can conclude that the entire expression is true without having to evaluate E2.

32
Numerical Representation
● Consider the implementation of boolean expressions using 1 to denote true and 0
to denote false.
● Expressions will be evaluated completely, from Left to right, similar to arithmetic
expressions
● Example: translation for

33
Example:
A translation scheme for producing three-address code for boolean expression is shown here.

Here nextstat gives the index of


the next three-address statement
in the output sequence, and

emit increments nextstat after


producing each three-address
statement.

34
Short-Circuit Code
The scheme (previous slide) would generate the three-address code for the expression
a < b or c < d and e < f.

We can also translate a boolean expression into three-address code without generalizing code for any
of the boolean operators and without having the code necessarily evaluate the entire expression.
This style of evaluation is sometimes called "short-circuit" or "'jumping" code.

35
Flow-of-control statements
● Consider the translation of boolean expressions into three-address code in the
context of if-then, if-then-else, and while-do statements such as are generated by
the following grammar:

● E is the boolean expression to be translated; the function newlabel returns a new


symbolic label each time it is called .
● A boolean expression E is associated with two labels: E.true the label to which
control flows if E is true, and E.false, the label to which control flows if E is false
● The semantic rules for translating a flow-of-control statement S allow control to
flow from the translation S.code to the three-address instruction immediately
following S.code.
● The value of S.next is a label that is attached to the first three-address instruction
to be executed after the code for S.
36
If-then
The code for E
generates a jump if-then-else
to E.true if E is the code for the boolean
true and a jump expression E has jumps
to S.next if E is out of it to the first
false instruction of the code for
S1 if E is true, and to the
first instruction of the
code for S2 if E is false
While-do
A new label S.begin is created and
attached to the first instruction
generated for E. Another new label
E.true is attached to the first
instruction for S1. The code for E
generates a jump to this label if E
is true a jump to S.next if E is false

37
An alternative method, called
"backpatching" emits code for
such statements in one pass

38
Control-Flow Translation of Boolean Expressions
E is translated into a sequence of three-address statements that evaluates E as a sequence of
conditional and unconditional jumps to one of two locations: E.true the place control is to reach if E
is true, and E.false, the place control is to reach if E is false.

The bask idea behind the translation is the following. Suppose E is of the form a < b
Then the generated code is of the form

Suppose E is of the form E1 or E2. If E1 is true, then E itself is true, so E1.true is the same as E.true.

If E1 is false, then E2 must be evaluated, E1.false be the label of the first statement in the code for E2.
The true and false exits of E2 can be made the same as the true and false exits of E.

Example: A SDD that generates three-address code for boolean expressions

Note: the true and false attributes are inherited.

39
40
From these examples
redundant instructions and type of local
transformation of this form can be
subsequently removed by a simple
peephole optimizer
41
Mixed-Mode Boolean Expressions

E + E produces an integer
arithmetic result

while expressions E and E and


E relop E produce boolean
values represented by flow of
control.

Expression E and E requires


both arguments to be boolean,
but the operations + and relop
take either type of argument,
including mixed ones.

The second statement is a jump over the third. E🡪id is also arithmetic

42
CASE STATEMENTS
● The "switch" or "case" statement is available in a variety of
languages; Switch-statement syntax is
● There is a selector expression, which is to be evaluated,
followed by n constant values that the expression might
take, perhaps including a default "value," which always
matches the expression if no other value does
● The intended translation of a switch is code to:
1. Evaluate the expression
2. Find which value in the list of cases is the same as the
value of the expression. Recall that the default value
matches the expression if none of the values explicitly
mentioned in cases does
3. Execute the statement associated with the value found
43
Syntax Directed Translation of Case Statements
With a SDT scheme, it is convenient to translate
this case statement into intermediate code

The keyword switch, generate two


new labels test and next and a
new temporary t.
Then the expression E, generate
code to evaluate E into t.
After processing E, the jump goto
test is generated.

44
We can also generate a sequence of three-address statements of the form

where t is the name holding the value of the selector expression E,


and Ln, is the label for the default statement.

At the code-generation phase, these sequences of case statements


can be translated into an n-way branch of the most efficient type,
depending on how many there are and whether the values fall into a
small range.

45
BACK PATCHING
● The easiest way to implement the SDD is to use two passes.
● First, construct a syntax tree for the input, and then walk the tree in depth-first
order, computing the translations given in the definition.
● The main problem with generating code for boolean expressions and flow-of
control statements in a single pass is that during one single pass we may not know
the labels that control must go to at the time the jump statements are generated.
● We can get around this problem by generating a series of branching statements
with the targets of the jumps temporarily left unspecified.
● Each such statement will be put on a list of goto statements whose labels will be
filled in when the proper label can be determined.
● This subsequent filling in of labels Backpatching

46
● We show how backpatching can be used to generate code for boolean
expressions and flow-of-control statements in one pass.
● To manipulate lists of labels, we use three functions:
○ makelist(i) creates a new list containing only i, an index into the array of
quadruples. makelist returns a pointer to the list it has made.
○ merge (p1,p2) concatenates the lists pointed to by p1 and p2, and returns a
pointer to the concatenated list.
○ backpatch(p,i) inserts i as the target label for each of the statements on the
list pointed to by p
Boolean Expression
● To construct a translation scheme suitable for producing quadruples for boolean
expressions during bottom-up parsing.
● Insert a marker nonterminal M into the grammar to cause a semantic action to
pick up, the index of the next quadruple to be generated.

47
Synthesized attributes truelist and falselist of nonterminal E are
used to generate jumping code for boolean expressions.

As code is generated for E, jumps to the true and false exits are
left incomplete, with the label field unfilled.

These incomplete jumps are placed on lists pointed to by


E.truelist and E.falselist

Consider the production E🡪E1 and M E2

Attribute M.quad records the number of the first statement of E2.code.

With the production M🡪ε the associated semantic action as

The variable nextquad holds the index of the next quadruple to follow

This value will be backpatched onto the E1.truelist on to the production E🡪E1 and M E2

48
The translation scheme is
as follows:

49
The marker nonterminal M in the production E🡪E1 or M E2
records the value of nextquad

The marker nonterminal in this production records the current


value of nextquad

50
51
Flow-of Control Statements

● How backpatching can be used to translate flow-of-control statements in


one pass.

The code that follows a given statement in execution also follows it physically in the quadruple array.
If that is not true, an explicit jump must be provided

52
Schemes to Implement the Translation

● SDT scheme to generate translations for the flow-of control constructs


● The nonterminal E has two attributes E.truelist and E.falselist,
● L and S each also need a list of unfilled quadruples that must eventually
be completed by backpatching.
● These lists are pointed by the attributes L.nextlist and S.nextlist.
● S.nextlist is a pointer to a list of all conditional and unconditional jumps to
the quadruple following the statement S in execution order.
● L.nextlist is defined similarly
Example:

53
For (1)
Backpatch the jumps when E is true
to the quadruple M1.quad, which is
the beginning of the code for S1.

Similarly, backpatch jumps when E is


false to go to the begining of the
code for S2.

The list S.nextlist includes all jumps


out of S1 and S2, as well as the
jump generated by N.
The flow of control does is
cause the proper backpatching
so that the assignments and
boolean expression evaluations
will connect properly.

54
Labels and Gotos

● The most elementary programming language construct for changing the


flow of control in a program is the label and goto.
● When a compiler encounters a statement like goto L, it must check that
there is exactly one statement with label L in the scope of this goto
statement.
● When a label L is encountered for the first time in the source program,
either in a declaration or as the target of a forward goto, we enter L into
the symbol table and generate a symbolic Label for L.

55
PROCEDURE CALLS
● The procedure is such an important and frequently used programming
construct that it is imperative for a compiler to generate good code for
procedure calls and returns.
● The run-time routines that handle procedure argument passing, calls, and
returns are part of the run-time support package
● The code that is typically generated for procedure calls and returns

56
Calling Sequences
● The translation for a call includes a calling sequence, a sequence of actions taken
on entry to and exit from each procedure
● When a procedure call occurs, space must be allocated for the activation record
of the called procedure.
● The arguments of the called procedure must be evaluated and made available to
the called procedure in a known place
● Environment pointers must be established to enable the called procedure to access
data in enclosing blocks
● The state of the calling procedure must be saved so it can resume execution after
the call.
● The return address is usually the location of the instruction that follows the call in
the calling procedure.
● Finally, a jump to the beginning of the code for the called procedure must be
generated.
57
● When a procedure returns, several actions also must take place.
● If the called procedure is a function, the result must be stored in a known
place
● The activation record of the calling procedure must be restored.
● A jump to the calling procedure’s return address must be generated

58
A simple Example
Let us consider a simple example in which parameters are passed by reference and
storage is statically allocated
E will include a step to store E.place on a
queue queue

The code for S is the code for elist, which


evaluates the arguments, followed by a
param p statement for each argument,
followed by a call statement

A convenient data structure in which to save these values is a queue, a first-in first-out list.

59

You might also like