Code Generation I
Code Generation I
Code Generation
1
Code genaration
• The final phase in compiler model is the code
generator.
• It takes as input an intermediate
representation of the source program and
produces as output an equivalent target
program.
• The code generation techniques presented below
can be used whether or not an optimizing phase
occurs before code generation.
2
Position of a Code Generator in the
Compiler Model
3
ISSUES IN THE DESIGN OF A CODE
GENERATOR(15 marks UQ CUSAT
April 2017)
• The following issues arise during the
code generation phase:
• 1. Input to code generator
• 2. Target program
• 3. Memory management
• 4. Instruction selection
• 5. Register allocation
• 6. Evaluation order
4
1. Input to code generator:
• The input to the code generation consists
of the intermediate representation of
the source program produced by front
end , together with information in the
symbol table to determine run-time
addresses of the data objects denoted by
the names in the intermediate
representation.
5
• Intermediate representation can be :
• a. Linear representation such as postfix
notation
• b. Three address representation such as
quadruples
• c. Virtual machine representation such
as stack machine code
• d. Graphical representations such as
syntax trees and dags.
6
2. Target program:
• The output of the code generator is the target
program. The output may be :
• a). Absolute machine language
•
• - It can be placed in a fixed memory location and can
be executed immediately.
• b). Relocatable machine language
• - It allows subprograms to be compiled separately.
• C). Assembly language
• - It makes the code generation is made easier.
•
7
3. Memory management:
• Names in the source program are mapped to
addresses of data objects in run-time memory
by the front end and code generator.
• It makes use of symbol table, that is, a name
in a three-address statement refers to a
symbol-table entry for the name.
• Labels in three-address statements have to
be converted to addresses of instructions.
8
• j:gotoi generates jump instruction as follows:
•
• * if i < j, a backward jump instruction with target
address equal to location of code for quadruple i
is generated.
•
• * if i > j, the jump is forward. We must store on
a list for quadruple i the location of the first
machine instruction generated for quadruple j.
When i is processed, the machine locations for all
instructions that forward jumps to i are filled.
9
4. Instruction selection:
• The instructions of target machine
should be complete and uniform.
• Instruction speeds and machine idioms
are important factors when efficiency of
target program is considered.
• The quality of the generated code is
determined by its speed and size.
10
Example
•
• a:=b+c
• d:=a+e
•
• MOV b,R0
• ADD c,R0
• MOV R0,a
• MOV a,R0
• ADD e,R0
• MOV R0,d
11
5. Register allocation
• Instructions involving register operands are
shorter and faster than those involving
operands in memory. The use of registers is
subdivided into two subproblems :
• 1. Register allocation - the set of variables that
will reside in registers at a point in the
program is selected.
• 2. Register assignment - the specific register
that a value picked
• Certain machine requires even-odd register
pairs for some operands and results. 12
• For example , consider the division
instruction of the form :Div x, y
where, x - dividend even register in
even/odd register pair y-divisor
• even register holds the remainder
• odd register holds the quotient
13
6. Evaluation order
• The order in which the computations
are performed can affect the
efficiency of the target code.
• Some computation orders require
fewer registers to hold intermediate
results than others.
14
Target Program Code
• The back-end code generator of a
compiler may generate different forms of
code, depending on the requirements:
– Absolute machine code (executable code)
– Relocatable machine code (object files for
linker)
– Assembly language (facilitates debugging)
– Byte code forms for interpreters (e.g. JVM)
15
The Target Machine
• Implementing code generation requires
thorough understanding of the target machine
architecture and its instruction set
• Our (hypothetical) machine:
– Byte-addressable (word = 4 bytes)
– Has n general purpose registers R0,
R1, …, Rn-1
– Two-address instructions of the form
op source, destination 16
The Target Machine: Op-codes and
Address Modes
• Op-codes (op), for example
MOV (move content of source to destination)
ADD (add content of source to destination)
SUB (subtract content of source from dest.)
• Address modes
Mode Form Address Added Cost
Absolute M M 1
Register R R 0
Indexed c(R) c+contents(R) 1
Indirect contents(c+contents(R
*c(R) 1
indexed )) 17
Instruction Costs
• Define the cost of instruction
= 1 + cost(source-mode) + cost(destination-
mode)
• Eg: The instruction MOV R0,R1 copies the contents
of register R0 into register R1.
• This instruction has cost 1, since it occupies only one
word of memory.
• The instruction MOV R5, M , copies the contents of
register R5 into memory location M. This instruction
has cost 2, since the address of memory location M is
in the word following the instruction.
18
Examples
19
Instruction Selection
• Instruction selection is important to obtain efficient
code
• Suppose we translate three-address code
x:=y+z
to: MOV y,R0
ADD z,R0
MOV R0,x a:=a+1 MOV a,R0
ADD #1,R0
MOV R0,a
Cost = 6
Better Better
21
Need for Global Machine-Specific
Code Optimizations
• Suppose we translate three-address code
x:=y+z
to: MOV y,R0
ADD z,R0
MOV R0,x
• Then, we translate
a:=b+c
d:=a+e
to: MOV a,R0
ADD b,R0
MOV R0,a
MOV a,R0 Redundant
ADD e,R0
MOV R0,d
22
Register Allocation and Assignment
• Efficient utilization of the limited set of registers is
important to generate good code
• Registers are assigned by
– Register allocation to select the set of variables that will
reside in registers at a point in the code
– Register assignment to pick the specific register that a
variable will reside in
• Finding an optimal register assignment in general is
NP-complete
23
Example
t:=a+b t:=a*b
t:=t*c t:=t+a
t:=t/d t:=t/d
MOV a,R0
ADD b,R0
MOV R0,t1
t1:=a+b MOV c,R1
t2:=c+d ADD d,R1
a+b-(c+d)*e MOV e,R0
t3:=e*t2
t4:=t1-t3 MUL R1,R0 MOV c,R0
MOV t1,R1 ADD d,R0
reorder SUB R0,R1 MOV e,R1
MOV R1,t4 MUL R0,R1
t2:=c+d MOV a,R0
t3:=e*t2 ADD b,R0
t1:=a+b SUB R1,R025
t4:=t1-t3 MOV R0,t4
A Simple Code Generator
• A code generator generates target code for a
sequence of three- address statements and
effectively uses registers to store operands of
the statements.
• For example: consider the three-address
statement a := b+c It can have the following
sequence of codes:
26
A Simple Code Generator
29
2. Consult the address descriptor for y to
determine y’, the current location of y.
• Prefer the register for y’ if the value of y is
currently both in memory and a register.
• If the value of y is not already in L,
generate the instruction MOV y’ , L to
place a copy of y in L.
30
3. Generate the instruction OP z’ , L
where z’ is a current location of z.
• Prefer a register to a memory location if
z is in both.
• Update the address descriptor of x to
indicate that x is in location L.
• If x is in L, update its descriptor and
remove x from all other descriptors.
31
4. If the current values of y or z have
no next uses, are not live on exit
from the block, and are in registers,
alter the register descriptor to
indicate that, after execution of x : =
y op z , those registers will no longer
contain y or z
32