Code Gen 1
Code Gen 1
Approach #1
Syntax-Directed Translations
E0 ! E1 + E2 E0.place := NewTemp ()
E0.code := E1.code || E2.code ||
IR (E0.place, ‘:=’, E1.place, ‘+’, E2.place)
CFG Rule
“IR” - A routine to create IR instructions
Approach #2
We have parsed the program and built an in-memory representation
(Abstract Syntax Tree)
We will create methods to walk this AST and emit code
* *
b Unary- b Unary-
Graphical Representations
Tree: := DAG: :=
a +
a +
* * *
c c c
Graphical Representations
Tree: := DAG: :=
a +
a +
* * *
c c c
0 id b -
Structures and Pointers
Minus 1 id c -
2 unary- 1 -
3 mult 0 2
4 id b -
An array of fixed-sized records 5 id c -
Q: How to build DAG’s (i.e., trees with 6 unary- 5 -
shared, common parts)? 7 mult 4 6
8 add 3 7
A: When you are about to allocate a
9 id a -
new node; look to see if you already
10 assign 9 8
have one with the same info.
© Harry H. Porter, 2006 6
CS-322 Code Generation-Part 1
Three-Address Instructions
Idea:
One operation
Three “addresses” (fields, args), at most
Two operands
One result
(Some instructions have only 0, 1, or 2 addresses)
Much closer to machine language
Notation used in Textbook
Examples:
t1 := -c neg c " t1
t2 := b * t1 mult b,t1 " t2
t3 := -c neg c " t3
t4 := b * t3 mult b,t3 " t4
t5 := t2 + t4 add t2,t4 " t5
a := t5 move t5 " a
•!Looks like source code, but...
•!NOTE: Lots of temp variables
Tend to create many
Will try to eliminate during optimization
Source:
a = (b * -c) + (b * -c);
Translation #1:
t1 := -c
t2 := b * t1
t3 := -c
t4 := b * t3
t5 := t2 + t4
a := t5
The shared sub-expression
Translation #2: is computed only once
t1 := -c
t2 := b * t1
t5 := t2 + t2
a := t5
You can design the exact instruction set in any way that facilitates compilation.
But you must be careful to be clear and precise about the IR instructions’ meanings.
More IR instructions " Easier to compile to.
...but more work during final code generation
Fewer IR instructions " < Reverse >
:= a t1 -
Triples
Don’t store the result directly.
Implicitly associate a temporary result with each triple.
t0 := x + y :=
t1 := t0 * z
a := t1 a *
Avoids creating the temporaries.
Saves storage. + z
Difficult to re-order instructions.
0 + x y
x y
1 * 0 z
2 := a 1
3
Indirect Triples
Get around the re-ordering problem
... by introducing another data structure.
0: 100 100: + x y
1: 101 101: * 100 z
2: 102 102: := a 101
3: 103 103: := b w
Quadruples
Less indirection, simpler
Easier to manipulate, reorder
Triples
Indirect Triples
About same amount of space as quadruples
May save space when lots of shared sub-expressions
More complex
Indirect Triples
Get around the re-ordering problem
... by introducing another data structure.
0: 100 100: + x y
1: 101 101: * 100 z
2: 103 102: := a 101
3: 102 103: := b w
Quadruples
Less indirection, simpler
Easier to manipulate, reorder
Triples
Indirect Triples
About same amount of space as quadruples
May save space when lots of shared sub-expressions
More complex
Translating Expressions
Idea: Use Syntax-directed translations
For each expression, we’ll synthesize two attributes:
E.code
This is the code we will generate for expression E.
(It is a sequence of all the IR instructions in the translation.)
When executed (at runtime), these instructions will compute the
value of the expression and place the value into some variable.
E.place
The name of the variable (often a temporary variable)
into which this code will move the final result value when executed.
For each statement, we will synthesize one attribute:
S.code
The IR code for this source statement.
Goal
Take a source statement and produce a sequence of IR quads:
Example:
x := y + z;
IR Quads:
t1 := y + z
x := t1
Example:
x := (y + z) * (u + v);
IR Quads:
t1 := y + z
t2 := u + v
t3 := t1 * t2
x := t3
Work bottom-up.
ID E0
sval=“x” code=
Assume we already have place=
E1.place = “y”
E1.code = “ ”
E2.place = “z” E1 + E2
E2.code = “ ” code= code=
place= place=
E0 ! E1 + E2 E0.place := NewTemp ()
E0.code := E1.code || E2.code ||
IR (E0.place, ‘:=’, E1.place, ‘+’, E2.place)
E0 ! E1 + E2 E0.place := NewTemp ()
E0.code := E1.code || E2.code ||
IR (E0.place, ‘:=’, E1.place, ‘+’, E2.place)
E0 ! E1 * E2 E0.place := NewTemp ()
E0.code := E1.code || E2.code ||
IR (E0.place, ‘:=’, E1.place, ‘*’, E2.place)
E0 ! ID E0.place := ID.svalue
E0.code := “ ”
E0 ! - E 1 E0.place := NewTemp ()
E0.code := E1.code || IR (E0.place, ‘:=’, ‘-’, E1.place)
E0 ! ( E 1 ) E0.place := E1.place
E0.code := E1.code
Run-time Activations
At any moment at runtime, “foo” may have
•!Zero activations
• One activations
• Many activations (if “foo” is recursive)
Standard Terminology
“Routine”
“Procedure” - Returns no result
“Function” - Returns a result
Other Terminology
Terminology
Static
A routine is called a a place in the program
e.g., “quicksort” is called on lines 16, 17, and 23
“formal parameters” Like variables
“actual arguments” Expressions (which yield a value when executed)
Runtime
When a routine is called, it is a new “activation” (or “invocation”)
The routine is “invoked”.
The “lifetime of an activation”
From the moment of invocation to the moment of return
The “Caller”
The “Callee” (the “called” routine)
Static:
“quicksort” calls “partition” on line 12.
Dynamic:
This activation of quicksort calls quicksort with arguments 4 and 7.
Nested Activations
At runtime...
Assume p calls f
Then: f must return before p returns
Assumptions:
•!Single thread of control
• No errors / exceptions
• No suspended closures
•!No gotos.
Recursion
Recursive routines
foo entered
foo entered “Calling Graph”
foo entered for a program
...
foo returns quicksort
foo returns
foo returns calls
foo
Mutually recursive routines (“Indirectly recursive”)
calls
foo entered calls
bar entered
goo entered bar
goo calls
foo entered calls
...
foo returns myFunct
goo returns
bar returns
foo returns
readarray qs(1,9)
Time
partition(1,3) qs(1,0) qs(2,3)
readarray qs(1,9)
Time
partition(1,3) qs(1,0) qs(2,3)
partition(2,3) qs(2,1)
Time
TOP
part(2,3) qs(2,1)
••• qs(2,3) qs(2,3) qs(2,3) •••
frames qs(1,3) qs(1,3) qs(1,3)
readArray qs(1,9) qs(1,9) qs(1,9)
main main main main main
© Harry H. Porter, 2006 31
CS-322 Code Generation-Part 1
Declarations
There may be several declarations for some variable name,
procedure foo (...) is
var i: integer := ...;
procedure bar (...) is
var i: integer := ...;
begin ... end;
begin ... end;
“Scope Rules”
The scope of a declaration
The part of the program (static) where we can use the declared name.
“Local” VarDecl
“Non-Local”
Symbol Table
Match variable uses to declarations
var x: ...
... Variable
... x ...
myDef
Binding of Names
“environment” “state”
name storage value
Done Statically
Done Dynamically
L-Values
R-Values
Memory
.text segment
will be read-only
code (includkng library routines) misc.
fixed (constant) data
.asciz strings .text
Dangling References
“A pointer to storage that has been freed / deallocated”
Dangling References
Problem:
The programmer explicitly frees / deallocates data
...but the programmer frees data “too soon”.
Solution:
Don’t let programmer free data!
Problem:
The program uses more memory than necessary.
Solution:
The runtime system identifies objects that
cannot possibly be used again by the program
“Garbage” objects
The runtime system reclaims this space
The “Garbage Collector”
The Heap
Simplest Organization
Allocate space at the end (top) of the heap
Abort when “StackTop < HeapTop”
Never free / release space
Garbage
Memory is organized into “objects” which point to each other.
Objects are allocated at random during program execution.
Some objects are “reachable” from the program variables.
All other objects are considered to be garbage.
w:
x:
y:
z:
Stack a:
Frames b:
i:
j:
••
• The Heap of Objects
Garbage
Memory is organized into “objects” which point to each other.
Objects are allocated at random during program execution.
Some objects are “reachable” from the program variables.
All other objects are considered to be garbage.
w:
x:
y:
z:
Stack a:
Frames b:
i:
j:
••
• The Heap of Objects
Garbage
Memory is organized into “objects” which point to each other.
Objects are allocated at random during program execution.
Some objects are “reachable” from the program variables.
All other objects are considered to be garbage.
w:
x:
y:
z:
Stack a:
Frames b:
i:
j:
••
• The Heap of Objects
“Compact” Memory!
“Compact” Memory!
“Compact” Memory!
Pointer Problems
The program frees memory “too soon”.
Dangling references
Program crashes
Program accesses/overwrites other data
Incorrect / weird behavior
Pointer Problems
Automatic Garbage Collection
Dangling references no longer possible.
Freeing memory...
The Garbage Collector is conservative.
May fail to collect some garbage.
What if this object is
never used again?
(Ought to collect it!)
w:
x:
y:
z:
Stack of
Frames
••
• The Heap of Objects
Pointer Problems
Automatic Garbage Collection
Dangling references no longer possible.
Freeing memory... Set this field to NULL,
The Garbage Collector is conservative. so the collector will
May fail to collect some garbage. identifiy the object
as garbage
w:
x: NULL
y:
z:
Stack of
Frames
••
• The Heap of Objects