Unit-4
Unit-4
Intermediate Code
Generation
What is intermediate code?
• During the translation of a source program into the
object code for a target machine, a compiler may
generate a middle-level language code, which is known
as intermediate code
Intermediate code
• The following are commonly used intermediate code
representation :
• Syntax tree
• Postfix Notation
• Three-Address Code
Syntax tree
•Syntax tree is more than condensed form of a parse
tree.
•The operator and keyword nodes of the parse tree are
moved to their parents and a chain of single productions
is replaced by single link in syntax tree.
•The internal nodes are operators and child nodes are
operands.
• To form syntax tree put parentheses in the expression,
this way it's easy to recognize which operand should
come first.
Syntax tree
• Example : x = (a + b * c) / (a – b * c)
Postfix Notation
• The ordinary (infix) way of writing the sum of a and b is with
operator in the middle : a + b
• The postfix notation for the same expression places the
operator at the right end as ab +.
• In general, if e1 and e2 are any postfix expressions, and + is
any binary operator, the result of applying + to the values
denoted by e1 and e2 is postfix notation by e1e2 +.
• No parentheses are needed in postfix notation because the
position and arity (number of arguments) of the operators
permit only one way to decode a postfix expression.
• In postfix notation the operator follows the operand.
Postfix Notation
• Example –
• The postfix representation of the expression
(a – b) * (c + d) + (a – b) is
ab–cd+*ab–+
Three-Address Code
•A statement involving no more than three references(two for operands and
one for result) is known as three address statement.
•A sequence of three address statements is known as three address code.
Three address statement is of the form x = y op z , here x, y, z will have
address (memory location).
•Sometimes a statement might contain less than three references but it is
still called three address statement.
•For Example :a = b + c * d;
The intermediate code generator will try to divide this expression into sub-
expressions and then generate the corresponding code.
r1 = c * d;
r2 = b + r1;
a = r2
Common Three Address Code
Instructions
1.Arithmetic operations:
•x = y op z (where op is +, -, *, /, etc.)
2.Assignment:
•x = y 7.Function calls:
3.Unary operations: •param x
•x = op y (where op is -, !, ~, etc.) •call f, n (where n is number of parameters)
4.Conditional jumps: •x = call f, n
•if x relop y goto L (where relop is 8. Pointer operations:
• S --> E
• E --> E1 + T
• E --> T
• T --> T1 * F
• T --> F
• F --> digit
Annoted parse tree for the 4*5+6
E (val=21) | T (val=21) /|\ / | \ T (val=3) * F (val=7) | | F (val=3) ( E (val=7) ) | /|\ num (3) / | \ E (5) + T (2) | | T (5) F (2) | | F (5) num (2) | num (5)
Inherited Attributes
• Definition:
An attribute whose value is computed from parent or
sibling nodes in the parse tree.
• Direction of Flow:
Information flows top-down or sideways (from parent
to child or sibling to sibling).
• Used in:
• Typically used with LL parsers (top-down parsing).
• Example:
Type checking, symbol table propagation.
Consider the following grammar
S --> T L
T --> int
T --> float
T --> double
L --> L1, id
L --> id
Input string : int a, c
Intermediate Language or Three
address code
• Three-address code (TAC) is an intermediate
representation where each instruction has at most three
operands (typically two sources and one destination).
The semantic rules for generating TAC for various
programming language constructs can be specified
using SDD.
SDT for Assignment
Productions Semantic Rules
S → id := E {gen(id.place = E.place)}
E → E₁ + E₂ {E.place = newtemp();
gen(E.place = E₁.place +
E₂.place)}
E → E₁ * E₂ {E.place = newtemp();
gen(E.place = E₁.place *
E₂.place)}
E → id {E.place = id.place}
Semantic Rules for Boolean
Expression:
Example: Code for a < b or c < d
and e <f
SDT for Switch Case statement:
• Syntax for switch case statement
switch E
begin
case V1: S1
case V2: S2
…
case Vn-1: Sn-1
default: Sn
end
Procedure call
• Consider the grammar for a simple procedure call
statement,
(1) S→ call id(Elist)
(2) Elist → E
(3) Elist → Elist,E
Backpatching
• Backpatching is a technique used in compiler design to resolve forward
references in control flow statements during code generation. It allows
the compiler to generate code for statements like if, while, and goto
without knowing the target addresses of jumps at the time of initial
code generation.
How Backpatching Works
1.Forward References Problem: When compiling control structures,
the target of a jump may not be known when the jump instruction is
first generated.
2.Two-Pass Solution:
1.First pass: Generate code with placeholders for unknown jump targets
2.Second pass: Fill in the actual addresses (backpatching)
- Maintain a list of instructions that need to be patched when the target becomes
known.
Example
• if (x < y) goto L1;
• ...
• L1: ...
Steps:
• When generating the goto L1, L1's location isn't known yet.
• When L1's location is encountered, all instructions in its patch list are updated
with the correct address.
Backpatching with Boolean Expressions:
Grammar and Semantic Rules
• Grammar for Boolean Expressions
E → E₁ or E₂ | E₁ and E₂ | not E | (E) | id relop id | true | false
Stack Allocation:
It is used while allocating memory for the function call.
Automatic LIFO memory for function calls (locals, params)
Blazing fast but limited size (stack overflows!)
Heap Allocation:
It is used while allocating memory for dynamic memory allocation statemnets.
Dynamic runtime memory (malloc, new, manual/free)
Flexible but slower (fragmentation, leaks possible)
Activation Record