Code Generation and Instruction Selection Unit-8
Code Generation and Instruction Selection Unit-8
As we see that the final phase in any compiler is the code generator. It takes as input an
intermediate representation of the source program and produces as output an equivalent target
program, as shown in the figure. Optimization phase is optional as far as compiler's correct
working is considered. In order to have a good compiler following conditions should hold:
1. Output code must be correct: The meaning of the source and the target program must remain
the same i.e., given an input, we should get same output both from the target and from the source
program. We have no definite way to ensure this condition. What all we can do is to maintain a
test suite and check.
2. Output code must be of high quality: The target code should make effective use of the
resources of the target machine.
3. Code generator must run efficiently: It is also of no use if code generator itself takes hours or
minutes to convert a small piece of code.
. Input: Intermediate representation with symbol table assume that input has been validated by
the front end
. Target programs :
• Input to the code generator: The input to the code generator consists of the intermediate
representation of the source program produced by the front end, together with the
information in the symbol table that is used to determine the runtime addresses of the
data objects denoted by the names in the intermediate representation. We assume that
prior to code generation the input has been validated by the front end i.e., type checking,
syntax, semantics etc. have been taken care of. The code generation phase can therefore
proceed on the assumption that the input is free of errors.
• Target programs: The output of the code generator is the target program. This output may
take a variety of forms; absolute machine language, relocatable machine language, or
assembly language.
• Producing an absolute machine language as output has the advantage that it can be placed
in a fixed location in memory and immediately executed. A small program can be thus
compiled and executed quickly.
• Producing a relocatable machine code as output allows subprograms to be compiled
separately. Although we must pay the added expense of linking and loading if we
produce relocatable object modules, we gain a great deal of flexibility in being able to
compile subroutines separately and to call other previously compiled programs from an
object module.
• Producing an assembly code as output makes the process of code generation easier as we
can generate symbolic instructions. The price paid is the assembling, linking and loading
steps after code generation.
The nature of the instruction set of the target machine determines the difficulty of instruction
selection. The uniformity and completeness of the instruction set are important factors. So, the
instruction selection depends upon:
• Instructions used i.e. which instructions should be used in case there are multiple
instructions that do the same job.
• Uniformity i.e. support for different object/data types, what op-codes are applicable on
what data types etc.
• Completeness: Not all source programs can be converted/translated in to machine code
for all architectures/machines. E.g., 80x86 doesn't support multiplication.
• Instruction Speed: This is needed for better performance.
• Register Allocation:
• Instructions involving registers are usually faster than those involving operands memory.
• .Store long life time values that are often used in registers.
• Evaluation Order: The order in which the instructions will be executed. This increases
performance of the code.
Eg:
a=b+c Mov b, R 0
d=a+e Add c, R 0
Mov R0 , a
Mov a, R0 can be eliminated
Add e, R0
Mov R 0 , d
Here, "Inc a" takes lesser time as compared to the other set of instructions as others take almost 3
cycles for each instruction but "Inc a" takes only one cycle. Therefore, we should use "Inc a"
instruction in place of the other set of instructions.
Target Machine
. Addressing modes
Familiarity with the target machine and its instruction set is a prerequisite for designing a good
code generator. Our target computer is a byte addressable machine with four bytes to a word and
n general purpose registers, R 0 , R1 ,..Rn-1 . It has two address instructions of the form
op source, destination
In which op is an op-code, and source and destination are data fields. It has the following op-
codes among others:
The source and destination fields are not long enough to hold memory addresses, so certain bit
patterns in these fields specify that words following an instruction contain operands and/or
addresses. The address modes together with their assembly-language forms are shown above.
We can add flow control information to the set of basic blocks making up a program by
constructing a directed graph called a flow graph. The nodes of a flow graph are the basic nodes.
One node is distinguished as initial; it is the block whose leader is the first statement. There is a
directed edge from block B1 to block B2 if B2 can immediately follow B1 in some execution
sequence; that is, if
• There is conditional or unconditional jump from the last statement of B1 to the first
statement of B2 , or
• B2 immediately follows B1 in the order of the program, and B1 does not end in an
unconditional jump. We say that B1 is the predecessor of B 2 , and B 2 is a successorof B 1
. assume all temporaries are dead on exit and all user variables are live on exit
Example
1: t1 = a * a
2: t 2 = a * b
3: t3 = 2 * t2
4: t4 = t 1 + t3
5: t5 = b * b
6: t6 = t 4 + t5
7: X = t 6
The six temporaries in the basic block can be packed into two locations. These locations
correspond to t 1 and t 2 in:
1: t 1 = a * a
2: t 2 = a * b
3: t2 = 2 * t2
4: t1 = t 1 + t2
5: t2 = b * b
6: t1 = t1 + t 2
7: X = t1