Code Generation
Position of a Code Generator in
the Compiler Model
Intermediate Intermediate
code Code code Code
Front-End
Optimizer Generator
Lexical error
Syntax error
Semantic error
Symbol Table
Requirements
• Output code must be correct
• Must be of high quality; must make
effective use of resources
• Code generator must run efficiently
• Good Vs Optimal
Issues in the design of a Code
Generator
• Input to the Code Generator
• Target Programs
• Memory Management
• Instruction Selection
• Register Allocation
• Choice of evaluation order
Input to the Code Generator
• Input
– Intermediate representation produced by the front end
• Linearized rep -Postfix form, three address
statements –quads
• Virtual machine rep –stack machine code
• Graphical rep -syntax trees, DAGs
– Front end together with information in the Symbol table
–used to determine runtime addresses of the data
objects.
– Intermediate code available –values of names
appearing in the intermediate language represented
by bits, integers, reals and pointers
– Type checking done
– Input is free of errors
Target Programs
• Absolute machine language –placed in a fixed memory
location and immediately executed.
• Relocatable machine language – subprograms to be
compiled separately.
– A set of relocatable object modules can be linked
together and loaded for execution by al inking loader.
– Expensive for linking and loading
– Flexible
• Assembly language
– It can generate symbolic instructions and use the macro facilities
of the assembler to help generate code.
– Easier
Memory management
• Name in 3 address stmt refers to symbol table (width,
relative address).
• During declaration symbol table entries are created.
• Static allocation Vs Stack allocation
• Labels in 3 address stmt have to be converted to
addresses
• For each quad – machine instruction address is tracked
• Backpatching
• j: goto i
• i<j – backward jump
• i>j – forward jump
Instruction Selection
• Uniformity and completeness of instruction set
• Instruction speed
• For each type of statement a skeleton can be
maintained
x := y + z
MOV y, R0
ADD z, R0
MOV R0, x
• May produce redundant code. Ex: a:= b + c d:=
a+e
Instruction Selection
MOV b, R0
ADD c, R0
MOV R0, a
MOV a, R0
ADD e, R0
MOV R0, d
• There may be several ways of implementing an instruction
– Increment a=a+1
MOV a, R0
ADD #1, R0
MOV R0, a
Register Allocation
• Instructions involving register operands
are shorter and faster than operands in
memory.
• Efficient utilization of registers is needed
• Register Allocation
• Register Assignment
– Optimal assignment of register to variable is
an NP-complete problem
• Register pairs may be required for some
instructions
Choice of Evaluation order
• Order in which computations are
performed can affect the efficiency of the
target code.
• Fewer registers maybe required
• Choosing order is also NP complete
Target machine
• Prerequisite for good code generator-
Familiarity of the target machine and its
instruction set.
• Byte addressable
• 4 bytes/word
• n general purpose registers
• 2 address instruction format
– op source, destination
• Opcodes
– MOV (move source to destination)
– ADD (add source to destination)
– SUB (subtract source from destination)
Target Machine Addressing modes
Mode Form Address Added Cost
Absolute M M 1
Register R R 0
Indexed c(R) c+contents(R) 1
Indirect register *R contents(R) 0
Indirect indexed *c(R) contents(c+contents(R)) 1
Instruction Cost
• Define the cost of instruction
= 1 + cost(source-mode) + cost(destination-mode)
• MOV R0, M
– stores contents of R0 into M
• MOV 4(R0), M
– Stores value contents(4 + contents(R0)) into M
• MOV *4(R0),M
– Stores value contents(contents(4 + contents(R0))) into M
• MOV #1, R0
Instruction costs
Instruction Operation Cost
MOV R0,R1 Store content(R0) into register R1 1
MOV R0,M Store content(R0) into memory location M 2
MOV M,R0 Store content(M) into register R0 2
MOV 4(R0),M Store contents(4+contents(R0)) into M 3
MOV *4(R0),M Store contents(contents(4+contents(R0))) into M 3
MOV #1,R0 Store 1 into R0 2
ADD 4(R0),*12(R1) Add contents(4+contents(R0))
to contents(contents(12+contents(R1))) 3
Instruction Selection
Can translate a:=b+c into
MOV b,R0
ADD c,R0
MOV R0,a (or)
MOV b,a
ADD c,a
Assuming addresses of a, b, and c are stored in R0, R1, and R2
MOV *R1,*R0
ADD *R2,*R0
Assuming R1 and R2 contain values of b and c
ADD R2,R1
MOV R1,a