0% found this document useful (0 votes)
2 views

Unit-4

Intermediate code is a middle-level language generated by compilers during the translation of source programs into object code. Common representations of intermediate code include syntax trees, postfix notation, and three-address code, which facilitate easier manipulation and optimization. The document also discusses semantic rules, backpatching techniques, and the code generation phase, highlighting the importance of instruction selection, register allocation, and evaluation order.

Uploaded by

atkalajadu69
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Unit-4

Intermediate code is a middle-level language generated by compilers during the translation of source programs into object code. Common representations of intermediate code include syntax trees, postfix notation, and three-address code, which facilitate easier manipulation and optimization. The document also discusses semantic rules, backpatching techniques, and the code generation phase, highlighting the importance of instruction selection, register allocation, and evaluation order.

Uploaded by

atkalajadu69
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

Unit-4

Intermediate Code
Generation
What is intermediate code?
• During the translation of a source program into the
object code for a target machine, a compiler may
generate a middle-level language code, which is known
as intermediate code
Intermediate code
• The following are commonly used intermediate code
representation :
• Syntax tree
• Postfix Notation
• Three-Address Code
Syntax tree
•Syntax tree is more than condensed form of a parse
tree.
•The operator and keyword nodes of the parse tree are
moved to their parents and a chain of single productions
is replaced by single link in syntax tree.
•The internal nodes are operators and child nodes are
operands.
• To form syntax tree put parentheses in the expression,
this way it's easy to recognize which operand should
come first.
Syntax tree
• Example : x = (a + b * c) / (a – b * c)
Postfix Notation
• The ordinary (infix) way of writing the sum of a and b is with
operator in the middle : a + b
• The postfix notation for the same expression places the
operator at the right end as ab +.
• In general, if e1 and e2 are any postfix expressions, and + is
any binary operator, the result of applying + to the values
denoted by e1 and e2 is postfix notation by e1e2 +.
• No parentheses are needed in postfix notation because the
position and arity (number of arguments) of the operators
permit only one way to decode a postfix expression.
• In postfix notation the operator follows the operand.
Postfix Notation
• Example –
• The postfix representation of the expression
(a – b) * (c + d) + (a – b) is

ab–cd+*ab–+
Three-Address Code
•A statement involving no more than three references(two for operands and
one for result) is known as three address statement.
•A sequence of three address statements is known as three address code.
Three address statement is of the form x = y op z , here x, y, z will have
address (memory location).
•Sometimes a statement might contain less than three references but it is
still called three address statement.
•For Example :a = b + c * d;
The intermediate code generator will try to divide this expression into sub-
expressions and then generate the corresponding code.
r1 = c * d;
r2 = b + r1;
a = r2
Common Three Address Code
Instructions
1.Arithmetic operations:
•x = y op z (where op is +, -, *, /, etc.)
2.Assignment:
•x = y 7.Function calls:
3.Unary operations: •param x
•x = op y (where op is -, !, ~, etc.) •call f, n (where n is number of parameters)
4.Conditional jumps: •x = call f, n
•if x relop y goto L (where relop is 8. Pointer operations:

<, >, ==, etc.)


x = *y
5.Unconditional jumps: *x = y
•goto L
6.Array operations:
•x = y[i]
•x[i] = y
Example
• Original expression: x = (a + b) * (c - d)

• Three Address Code:


t1 = a + b
t2 = c - d
t3 = t1 * t2
x = t3
Three-Address Code
• A three-address code has at most three address
locations to calculate the expression.
A three- address code can be represented in three forms :
o Quadruples
o Triples
o Indirect Triples
Quadruples
Example :a = b + c * d;
Here, triples are pointed by other pointers.
S-Attributed and Inherited Attributes
in Syntax-Directed Definitions (SDD)
• Syntax Directed Translation (SDT), A formal
specification that associates semantic rules with
grammar productions. The actual
computations attached to productions in an SDD.
• In compiler design, attributes are used to associate
semantic information (like type, value, scope) with
grammar symbols. These are classified into two main
types:
1.S-Attributed (Synthesized Attributes)
2.I –Attributed (Inherited Attributes)
S-Attributed (Synthesized
Attributes)
• Definition:
An attribute whose value is computed only from its children
nodes in the parse tree.
• Direction of Flow:
Information flows bottom-up (from leaves to root).
• Used in:
• Typically used with LR parsers (bottom-up parsing).
• Example:
Computing the value of an arithmetic expression.

• The parse tree containing the values of attributes at each node


for given input string is called annotated or decorated parse
tree.
• Consider the following grammar

• S --> E
• E --> E1 + T
• E --> T
• T --> T1 * F
• T --> F
• F --> digit
Annoted parse tree for the 4*5+6

E (val=21) | T (val=21) /|\ / | \ T (val=3) * F (val=7) | | F (val=3) ( E (val=7) ) | /|\ num (3) / | \ E (5) + T (2) | | T (5) F (2) | | F (5) num (2) | num (5)
Inherited Attributes
• Definition:
An attribute whose value is computed from parent or
sibling nodes in the parse tree.
• Direction of Flow:
Information flows top-down or sideways (from parent
to child or sibling to sibling).
• Used in:
• Typically used with LL parsers (top-down parsing).
• Example:
Type checking, symbol table propagation.
Consider the following grammar

S --> T L
T --> int
T --> float
T --> double
L --> L1, id
L --> id
Input string : int a, c
Intermediate Language or Three
address code
• Three-address code (TAC) is an intermediate
representation where each instruction has at most three
operands (typically two sources and one destination).
The semantic rules for generating TAC for various
programming language constructs can be specified
using SDD.
SDT for Assignment
Productions Semantic Rules
S → id := E {gen(id.place = E.place)}

E → E₁ + E₂ {E.place = newtemp();
gen(E.place = E₁.place +
E₂.place)}

E → E₁ * E₂ {E.place = newtemp();
gen(E.place = E₁.place *
E₂.place)}

E → (E₁) {E.place = E₁.place}

E → id {E.place = id.place}
Semantic Rules for Boolean
Expression:
Example: Code for a < b or c < d
and e <f
SDT for Switch Case statement:
• Syntax for switch case statement
switch E
begin
case V1: S1
case V2: S2

case Vn-1: Sn-1
default: Sn
end
Procedure call
• Consider the grammar for a simple procedure call
statement,
(1) S→ call id(Elist)
(2) Elist → E
(3) Elist → Elist,E
Backpatching
• Backpatching is a technique used in compiler design to resolve forward
references in control flow statements during code generation. It allows
the compiler to generate code for statements like if, while, and goto
without knowing the target addresses of jumps at the time of initial
code generation.
How Backpatching Works
1.Forward References Problem: When compiling control structures,
the target of a jump may not be known when the jump instruction is
first generated.
2.Two-Pass Solution:
1.First pass: Generate code with placeholders for unknown jump targets
2.Second pass: Fill in the actual addresses (backpatching)
- Maintain a list of instructions that need to be patched when the target becomes
known.
Example
• if (x < y) goto L1;
• ...
• L1: ...

Steps:
• When generating the goto L1, L1's location isn't known yet.

• The compiler emits a jump instruction with a placeholder address.

• The placeholder is added to the patch list for L1.

• When L1's location is encountered, all instructions in its patch list are updated
with the correct address.
Backpatching with Boolean Expressions:
Grammar and Semantic Rules
• Grammar for Boolean Expressions
E → E₁ or E₂ | E₁ and E₂ | not E | (E) | id relop id | true | false

• Key attributes in Backpatching Semantic Rules:

E.truelist: List of instructions to be patched when E evaluates to true


E.falselist: List of instructions to be patched when E evaluates to false
M.quad: Instruction number where E₂ begins (used in or and and productions)
nextquad: Current instruction number
makelist(i): Creates a new list containing instruction i
merge(p₁,p₂): Merges two lists of instructions
backpatch(p,i): Patches all instructions in list p to jump to i
• Productions Semantic Rules
-----------------------------------------------------------------------
• E → E₁ or M E₂ { backpatch(E₁.falselist, M.quad);
E.truelist = merge(E₁.truelist, E₂.truelist);
E.falselist = E₂.falselist; }

• E → E₁ and M E₂ { backpatch(E₁.truelist, M.quad);


E.truelist = E₂.truelist;
E.falselist = merge(E₁.falselist, E₂.falselist); }

• E → not E₁ { E.truelist = E₁.falselist;


E.falselist = E₁.truelist; }

• E → (E₁) { E.truelist = E₁.truelist;


E.falselist = E₁.falselist; }
• E → id₁ relop id₂ { E.truelist = makelist(nextquad);
E.falselist = makelist(nextquad+1);
emit('if ' id₁.place ' ' relop.op ' ' id₂.place ' goto _');
emit('goto _'); }

• E → true { E.truelist = makelist(nextquad);


emit('goto _'); }

• E → false { E.falselist = makelist(nextquad);


emit('goto _'); }

• M→ε { M.quad = nextquad; }


Example: (a < b) or (c > d)
• For (a < b): • Final code after
100: if a < b goto _ backpatching:
101: goto _ 100: if a < b goto T
truelist = [100], falselist = [101] 101: goto 102
• For (c > d): 102: if c > d goto T
102: if c > d goto _ 103: goto F
103: goto _
truelist = [102], falselist = [103]
• For OR operation:
Backpatch falselist of first expression (101) with
102
Merge truelists: [100, 102]
New falselist: [103]
• 1. Create the annoted parse tree by Evaluate the S-attribute
values for 5+6*7 by considering the following grammar

• 2. reate three address code using backpatching (c < d) and (e


> f)
Code Generation phase
A code generator is a component of a compiler that translates the intermediate
representation (IR) of a program into executable machine code or another target
language (e.g., assembly). Its primary function is to produce efficient and correct
output code while adhering to the semantics of the source program.
Key Functions of a Code Generator:
1. Instruction Selection
Chooses appropriate machine instructions for each IR operation.
Example: t1= b + c
Mov b.R1
Add c,R1
Mov R1,t1
2. Register Allocation
Assigns variables to CPU registers to minimize memory access.
3. Instruction Scheduling
Orders instructions to maximize CPU pipeline efficiency and avoid stalls.
Inputs to the Code Generator Phase
The code generator in a compiler can receive different
intermediate representations (IR) as input, each with its
own advantages. The three most common forms are:
1.Three-Address Code (TAC)
2.Postfix Notation (Reverse Polish Notation - RPN)
3.Syntax Tree (Abstract Syntax Tree - AST)
Target Programs in Compiler Code Generation
The code generator in a compiler can produce different
forms of target programs, depending on the design of
the compiler and the needs of the system. The three
main types are:
1.Assembly Language
2.Absolute Machine Language
3.Relocatable Machine Language
Instruction Selection
Chooses appropriate machine instructions for each
IR(Intermediate representation) operation.
Example: t1= b + c
Mov b.R1
Add c,R1
Mov R1,t1
Register Allocation
• Register Allocation is a crucial step in the code
generation phase of a compiler. Its main task is to
assign a limited number of CPU registers to a potentially
large number of variables or temporary values
(intermediate results like t1, t2, etc.) generated during
intermediate code generation.
Example: t1= b + c
Mov b.R1
Add c,R1
Mov R1,t1
• In the code generation phase of a compiler, evaluation order refers to the
sequence in which sub-expressions of a complex expression are evaluated.
The order affects the efficiency, correctness, and register usage of the
generated machine code.
• Example
Step 1: t1 = b * c
Step 2: t2 = a + t1

Correct Evaluation order:


Mov b, R1 ; R1 = b
Mul c, R1 ; R1 = b * c → t1
Add a, R1 ; R1 = a + (b * c) → t2
Mov R1, result ; store final result
Concept Explanation
Sequence in which expressions are
Evaluation Order
evaluated
Why It Matters Ensures correct results and efficient code
Influenced by Operator precedence, associativity
Run time environment
Storage Organization
Static Allocation:
It is used for allocating memory for the whole program and variables.
Fixed memory at compile time, lives for entire program
Fast but inflexible (globals, static variables)

Stack Allocation:
It is used while allocating memory for the function call.
Automatic LIFO memory for function calls (locals, params)
Blazing fast but limited size (stack overflows!)

Heap Allocation:
It is used while allocating memory for dynamic memory allocation statemnets.
Dynamic runtime memory (malloc, new, manual/free)
Flexible but slower (fragmentation, leaks possible)
Activation Record

You might also like