06 Codegeneration PDF
06 Codegeneration PDF
Code generation
Lexical analysis
token stream
Syntax analysis
syntax tree
Semantic analysis
syntax tree
Code generation
machine code
Code optimization
machine code
1. Introduction
2. Instruction selection
3. Register allocation
4. Memory management
1. Introduction
2. Instruction selection
3. Register allocation
4. Memory management
Code generation
Figure 9.3: succ, gen and kill for the program
324
Liveness analysis: gen and kill
For each IR instruction, we define two functions:
194gen[i]: set of variables that may be read9.byREGISTER
CHAPTER instruction i
ALLOCATION
kill[i]: set of variables that may be assigned a value by instruction i
Instruction i gen[i] kill[i]
LABEL l 0/ 0/
x := y {y} {x}
x := k 0/ {x}
x := unop y {y} {x}
x := unop k 0/ {x}
x := y binop z {y, z} {x}
x := y binop k {y} {x}
x := M[y] {y} {x}
x := M[k] 0/ {x}
M[x] := y {x, y} 0/
M[k] := y {y} 0/
GOTO l 0/ 0/
IF x relop y THEN lt ELSE l f {x, y} 0/
x := CALL f (args) args {x}
Code generation 325
Liveness analysis: in and out
For each program instruction i, we use two sets to hold liveness
information:
I in[i]: the variables that are live before instruction i
I out[i]: the variables that are live at the end of i
in and out are defined by these two equations:
(Mogensen)
global register allocation, i.e., find for each variable a register that it can
Figure since
ll points in the program (procedure, actually, 9.5: Interference
a program ingraph
termsfor the program in figu
mediate language corresponds to a procedure in a high-level language).
s that, for the purpose of register allocation, two variables interfere if
We can
at any point in the program. Also, evenuse definition
though 9.2 tois generate
interference defined in interference for each
ric way in definition 9.2,ment
the conclusion
in the program in figure 9.2: variables
that the two involved
Code generation 330
Register allocation
Registers
Registers
Registers
R0 R R1 R R2 R R3 R
0 1 2 3
R0 R1 R2 R3
(Keith Schwarz)
Code generation 333
Chaitins algorithm
1. Introduction
2. Instruction selection
3. Register allocation
4. Memory management
Heap
Global/static data
Code
MemoryRuntime
is generally
Stack
divided into four main parts:
Code: contains the
Each active function code
call has its ownofunique
the program
stack frame. In a stack frame (activation
record) we hold the following information:
Static data: contains static data allocated at compile-time
1) frame pointer: pointer value of the previous stack frame so we can reset the top
Stack: used
of stack for
whenfunction calls and
we exit this function. This local variables
is also sometimes called the dynamic
link.
Heap:2) for
staticthe
link: rest (e.g.,(like
in languages data
Pascalallocated at run-time)
but not C or Decaf) that allow nested
function declarations, a function may be able to access the variables of the
Computers have registers
function(s) that
within which it iscontain addresses
declared. In the static link,that
we holddelimits
the pointerthese
value of the stack frame in which the current function was declared.
dierent parts
3) return address: point in the code to which we return at the end of execution of
Code generation the current function. 339
known size and address at compile time. Furthermore, the allocated memory stays
allocated throughout the execution of the program.
Static data Most modern computers divide their logical address space into a text section
(used for code) and a data section (used for data). Assemblers (programs that con-
vert symbolic machine code into binary machine code) usually maintain current
address pointers
Contains to both the at
data allocated textcompile-time
area and the data area. They also have pseudo-
instructions (directives) that can place labels at these addresses and move them. So
Address of such data is then hardwired in the generated code
you can allocate space for, say, an array in the data space by placing a label at the
current-address
Used e.g. in Cpointer in the dataglobal
to allocate space and then move the current-address pointer
variables
up by the size of the array. The code can
There are facilities in assemblers to allocate use the label to access
such the array. Alloca-
space:
tion of space for an array A of 1000 32-bit integers (i.e., 4000 bytes) can look like
I Example to allocate an array of 4000 bytes
this in symbolic code:
.data # go to data area for allocation
baseofA: # label for array A
.space 4000 # move current-address pointer up 4000 bytes
.text # go back to text area for code generation
Malloc:
(a) The initial free list.
I Search through the free list for a block of sufficient size
I If found, it is possibly split in two with one removed from free list
12 12 20
I If not found, ask operating system for a new chunck of memory
Free:
16
I Insert the block back into the free list
Allocation is linear in the size of the free list, deallocation is done in
constant time (b) After allocating 12 bytes.
Advantages:
I Programmer does not have to worry about freeing unused resources
Limitations:
I Programmer cant reclaim unused resources
I Difficult to implement and add a significant overhead
Add an extra field in each memory block (of the free list) with a
count of the incoming pointers
I When creating an object, set its counter to 0
I When creating a reference to an object, increment its counter
I When removing a reference, decrement its counter.
I If zero, remove all outgoing references from that object and reclaim
the memory
(Keith Schwarz)
Root Set
Advantage:
I More precise than reference counting
I Can handle circular references
I Run time can be made proportional to the number of reachable
objects (typically much lower than number of free blocks)
Disadvantages:
I Introduce huge pause times
I Consume lots of memory
Conclusion
Conclusion 354
Structure of a compiler
character stream
Lexical analysis
token stream
Syntax analysis
syntax tree
Semantic analysis
syntax tree
Code generation
machine code
Code optimization
machine code
Conclusion 355
Summary
Part 1, Introduction:
I Overview and motivation...
Part 2, Lexical analysis:
I Regular expression, finite automata, implementation, Flex...
Part 3, Syntax analysis:
I Context-free grammar, top-down (predictive) parsing, bottom-up
parsing (SLR and operator precedence parsing)...
Part 4, Semantic analysis:
I Syntax-directed translation, abstract syntax tree, type and scope
checking...
Part 5, Intermediate code generation and optimization:
I Intermediate representations, IR code generation, optimization...
Part 6, Code generation:
I Instruction selection, register allocation, liveliness analysis, memory
management...
Conclusion 356
More on compilers
Related topics:
I Natural language processing
I Domain-specific languages
I ...
Conclusion 357