Code Generator
• Code generator is the final phase of compiler
• The front end translates the src prg into intermediate code
• Then code optimizer phase it’s an optional phase which is used for
optimizing the intermediate code. It means that it can reduce the [Link]
stmts in the intermediate code or it can replace the complex operations by
simpler one. Its o/p is also intermediate code
• For the code generator i/p is intermediate representation and the o/p is the
target prg
• All the phases are connected with the symbol table
Issues In The Design of Code Generator
1. Input to the Code Generator
2. Target Program
3. Memory Management
4. Instruction Selection
5. Register Allocation
6. Choice of Evaluation Order
Input to the Code Generator
• Input for the code generator is intermediate code
• The intermediate code may be a
Linear Representation : postfix notation
Three Address Representation : quadruple, Triple, Indirect Triple
Graphical Representation : syntax tree or DAG
syntax tree – interior nodes represent operator and leaf’s represent operand
In DAG common sub expression is identified and it wont construct it again
Target Program
• It is the o/p of code generator
• Code generator o/p may be of 3 kinds which produces
1. Absolute Machine Language
2. Relocatable Machine Language
3. Assembly Language
Absolute Machine Language
• it can be placed in fixed location in memory and can be immediately
executed.,
• so that there is no need to change address.
• If the prg is small then only we can use this absolute m/c lang bcz the
memory space must be available
Relocatable Machine Language
• we can store/load our prg anywhere in the memory and it can be
executed
• For ex. If our prg address starts with 0….10 but in memory we have
spaces from 100 to 110 which means that all these references 0…10 must
be converted to 100 to 110. so this 100 must be added to all the addresses
• All the references must be converted into this address(100). Therefore it
is called as Relocatable Machine Language, which is actually an object
code
• Here the sub prgs are compiled separately and all the sub prg relocatable
object files, library files must be linked together using linker and that
should be loaded into proper location into memory by using loader
• In Relocatable Machine Language linker and loader are necessary
Assembly Language
• It will produce pneumonic instructions which is called as assembly
language
MOV a, R0
ADD b, R0
• so if we use assembly language as target prg then the code generation
phase is easier. Because in the previous 2 kinds of o/p 0’s and 1’s must be
constructed where as this 3rd kind of o/p uses pneumonic instructions
• Here assembler is necessary to convert assembly language into m/c
language
• Even though this assembler is used separately this is the easier method for
generating the code generation phase
• And this is further used by target m/c hvng smaller memory
Memory Management
• Mapping of names in the src prg to runtime memory should also be done.
So this is done with the help of symbol table information
• Because whenever we see the sequence of declarations in the src prg that
will be entered into the symbol table along with the name, type and its
address(offset) also
• So for storing the offset we have used with information, so that we can
find the relative addresses of names and then that names can be entered
into the corresponding addresses into run time memory
Instruction Selection
• It also plays an important role in the efficiency of the code generation
phase
• MOV is an opcode and b,R0 are src and destination operands
• So this type of stmt by stmt three address code sequence sometimes
produce the poor code
• Some redundant code will also be generated
• Consider another ex. Here the 4th stmt is redundant. In the 3rd stmt the
content of R0 is x. R0 has the result x and again moving the same x into
R0 in the 4th stmt. So it is redundant and it is not necessary
• If the x in the 3rd stmt is not used further then it is also [Link]
there is no need to move the result R0 to x
• Because after adding z to R0 the y+z will be in R0. so in that content S
can be added
• So instead of 6 stmts we can use 4 stmts
• Constant 1 should be added to x
• #1 -- immediate addressing node
• Instead of these 3 stmts, if our target m/c has the instructions INC means
then we can use INC x. so that [Link] instructions are reduced
Object Code Forms
• It is also known as the target m/c (or) A simple target m/c model
• Some of the op’s that cmp computes are as follows:
1. Load Operations: load memory word to register
LD R1,X //load the value in location X to register
2. Store Operations: store the content of register into memory
ST X,R1 // store the value in register R1 into location X
3. Computational Operations: add,sub,mul all instr’s are computational
operations. These are represented in the form of op dst, src1, src2
op – the operation which we are performing, dst is the destination
argument(sometimes acts as both src and destination arguments),
remaning are source arguments Ex: ADD R1,R2,R3 it can be written as
R1=R2+R3 or R1=R1+R2+R3 Ex: INC X the content of X will be
incremented and stored in X
4. Unconditional Jump: without checking any condition the ctrl will be
transferred to the corresponding location
Ex: BR L
BR 1000 // without checking any condition ctrl will be
transferred to L which is 1000
5. Conditional Jump: by checking the condition ctrl will goes to the
corresponding location
Ex: BLTZ r,L //Branch, Less Than,Zero r is a register L is a label(addr)
If the content of reg is <0 then branch to label
If the content of the reg is -2 and L is 1000
So -2<0 , then the ctrl will goes to the L
If the stmt is false then next stmt will be executed
• The target m/c will be hvng variety of addressing modes
• By using addressing modes we can get effective address which contains
corresponding operand
Addressing Mode Form Address
Absolute M M
Register R R
Index C(R) C+Contents(R)
Indirect Register *R Contents(R)
Indirect Indexed *C(R) Contents(C+Contents(R))
Literal(or)Immediate # N/A
Absolute addr mode is represented with M, where M is the memory loc
• As instr format mainly contains 2 fields
1. opcode//specifies the operation which we are performing
[Link] of operand//memory loc
• Addr of M is 1000 where 1000 specifies the operand which is 10
Register Addr mode – in place of the memory loc we can have addr of the reg .
Let addr of R1 is 2000 so 2000 contains operand value
Indexed addr mode – R is the addr of register which uses some content which
is added with the addr C which gives the effective addr of the operand // C is the
addr of corresponding memory loc
R specifies indexed register
Indirect register AM – the instr format conatians addr of reg but the register
doesn’t conatin effective addr here the reg contains the addr and that addr will
give effective addr which provides operand value * will give the value at addr
Indirect Indexed addr mode – similar to Indexed addr mode. If we combine
indexed as well as indirect register then we will get Indirect Indexed addr mode
• R is the addr of reg, with the help of R we need to get the contents of R,
the contents of R will be added to C addr in order to get an addr, but this
addr doesn’t specify the value here it specifies effective addr here which
gives the operand
• Literal AM: instead of addr directly we can have
the value here
NA stands for not applicable because here we can
have operand in the instr so there is no need of any addr, in order to
specify the constant we use #
Code Generation Algorithm
• It generates target code for a seq of instr // generates m/c code for
optimized intermediate code
• It uses a fn getReg() to assign registers to variables
• It uses 2 data structures
[Link] Descriptor
[Link] Descriptor
• Register Descriptor – used to keep track of which variable is stored in a
reg. initially all registers are empty //specifies info abt reg i.e. which reg
contains which variable
• Address Descriptor– used to keep track of location where variables is
stored. Location may be register, memory addr, stack etc…// specifies
loc of a variable i.e which variable is stored in reg
The algorithm takes a sequence of three-address statements as input. For each
three address statement of the form x:= y op z perform the various actions.
These are as follows:
1. Invoke a function getreg to determine(find out) the location L where the
result of computation y op z should be stored.
Normally a getreg will return an empty register. So if any register is
empty that will be written as location L
If empty register is not available it checks whether suitable occupied
register(dead variable) is there or not. It means that the register may be
occupied by some other name but it is no longer used
If empty register and occupied register both are available then getreg will
written the memory location as L
2. Consult the address descriptor for y to determine y'. If the value of y currently
in memory and register both then prefer the register y' . If the value of y is not
already in L then generate the instruction MOV y' , L to place a copy of y in L.
y’ is the current location of y
Address descriptor tells about location of y
Sometimes y maybe in both in memory and register then we need
consider only the register location because preference will be given to the
register. For that purpose y’ is used.
But if y is not in register means we have to consider memory location
So first we need to consult address descriptor for y then we need to check
whether it is already available in location L
If it is not available in L then we need to move y’ to L.
So we have to generate the instruction MOV y' , L because
For example x=y+z and R0 is the location
Check y is in register or not
y is not in reg so we need to consider memory location for y
Then we have to check whether that y is already in R0 or not
y is not in location
So target instr will be produced as MOV y,R0
Here L is R0
3. Generate the instruction OP z' , L where z' is used to show the current
location of z. if z is in both then prefer a register to a memory location. Update
the address descriptor of x to indicate that x is in location L(x is in L). if L is a
register update its descriptor to indicate that it contains the value of x(update
register descriptor to indicate that reg contains the value of x)
op represents the operator
Here we need to consult address descriptor of z to determine z’ i.e the
current value of z
Register descriptor tells about current value
In the previous example operator is +
Assume that z is not in reg and it resides in memory so that ADD Z,Ro
In this ex: x is in R0
In our example L is a register
4. If y and z have no next uses(RHS variables) and not live on exit, update the
descriptors to remove y and z (bcz they are no longer needed)
But from our ex: x=y+z if it is the final stmt in the basic block then we will
consider that name as live on exit, in that case we need to perform store
instruction at the end
Register and Address Descriptors:
• A register descriptor contains the track of what is currently in each
register. The register descriptors show that all the registers are initially
empty.
• An address descriptor is used to store the location where current value of
the name can be found at run time.
Generating Code for Assignment Statements:
• Consider the example
• The assignment statement d:= (a-b) + (a-c) + (a-c) can be translated into
the following sequence of three address code:
t1:= a-b
t2:= a-c
t3:= t1 +t2
d:= t3+t2
Assume that these 4 stmts are in a basic block and assume that d is live on exit
Code sequence for the example is as follows:
Here registers R0 and R1 considered
Initially all the reg’s are empty
• The 1st stmt, initially all the registers will be empty
• getReg() will return R0 for performing fn. So R0 is L. in the location
only we are doing computation
• We need to check current value of y’, if it is not in L we have to perform
MOV y’,L
• a is not in reg so we need to consider only L
• 3rd step is op z’,L so we need to check whether b is in reg or L. as b is in L
then update the descriptors
• Address descriptor the variable in which location
For 2nd stmt invokes getReg() and returns R1
• Now L is R1
• For the 2nd stmt with the help of getreg it is finding the location of ‘a’
whether it is in memory or register.
• We need to check only register descriptor [Link] is going to tell that R0
contains t1 only not a.
• So we need to perform MOV inst
• So for this purpose a is assigned to the register R1
• And also check whether c is in any register or not
For the 3rd stmt directly we can perform 3rd step of algorithm ADD R1,R0 i.e.
the content of t2 is a added with t1
In the 4th stmt directly we can perform add instruction by seeing the previous
register descriptor
• d=t3+t2 the content of t2 should be added with t3
• After producing this instruction ADD as this is the final stmt we need to
perform store operation also. So MOV is performed
• d is in both memory and register
Register Allocation and Assignment
• Efficient utilization of registers is important in code generation strategy
Register Allocation tells about what values in a prg should reside in a register
Register Assignment tells that in which register each value should reside
• Normally in a basic block some variables should store in registers and if
we are at the end of the basic block then all the live variables must be
stored in a memory
• But in global register assignment throughout a loop some variables must
be used. So that frequently used variables must be loaded in some fixed
registers throughout a loop
• If there are 6 reg’s, 3 reg’s will be fixed for loading the frequently used
variables
Usage count in Register Allocation
• Consider a variable x
1. If x is in register, then we save 1 unit of cost for each reference to x, that
is not preceded by an assignment to x in the same block. (i.e. if the
variable is in register then we can save 1 unit of cost for each use of x in
the basic block)
2. We can also save2 units of cost, if we can avoid a store of x at the end of
the block(i.e. if the variable is live at the end of the basic block means we
can save 2 units bcz we are not going to store it into memory location it
will remain in the register only) and in which x is assigned a value
3. If we want to allocate x to a register what is the benefit.,this can be
calculated with the help of the formula,
4. B is a block. L is a loop, it can have any [Link] blocks. For all the blocks
in L we have to calculate this formula. summation must be performed for
all the blocks. If x is dead it is 0
Simple Flow Graph for Inner Loop
• This loop consists of 4 blocks
• Every block is having instructions
• In B1 bcdf are alive on entry to the basic block 1 and asdef are the
variables which are are live on the exit of the B1
Finding of Usage Count
• In B1 x is variable a. how many times a appeared in the RHS. So here a is
used 1 time but it should not be preceded by an assignment a. but in the
1st instruction a is preceded by an assignment to a. therefore 0.
• Check whether a is live or not. If a is live then it must be computed with
some other value in the block. i.e. a must be assigned to some value
• So in a total value is 4 which means that we can save 4 units of cost by
selecting a for one of the global register. If we have 3 global registers we
can save 4 units of cost if we assign a in a register
• We can save 6 units of cost if we assign b to the register
• Assume 3 registers R0,R1,R2
• we can assign a for Ro(also e and f) b for R1 d for R2