0% found this document useful (0 votes)
1 views

cd_3rd unit _15

The document discusses intermediate code generation in compiler design, focusing on the role of intermediate representations and static type checking. It covers various methods for constructing directed acyclic graphs (DAGs), three-address code, and data structures like quadruples and triples for representing code. Additionally, it addresses type expressions, type checking, and flow-of-control statements in the context of compiler design.

Uploaded by

vainatha32
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

cd_3rd unit _15

The document discusses intermediate code generation in compiler design, focusing on the role of intermediate representations and static type checking. It covers various methods for constructing directed acyclic graphs (DAGs), three-address code, and data structures like quadruples and triples for representing code. Additionally, it addresses type expressions, type checking, and flow-of-control statements in the context of compiler design.

Uploaded by

vainatha32
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 58

COMPILER DESIGN

Topic: Intermediate code generation


Introduction

• Intermediate code is the interface between front end and back end in a
compiler
• Ideally the details of source language are confined to the front end and
the details of target machines to the back end (a m*n model)
• In this chapter we study intermediate representations, static type
checking and intermediate code generation

Static Intermediate Code Code


Parser
Checker Generator Generator

Front end Back end


Variants of syntax trees

1)DAG
A directed acyclic graph ( DAG) for an expression identifies the
common subexpressions (subexpressions that occur more than once) of the

expression .
• It is sometimes beneficial to crate a DAG instead of tree for Expressions.
• This way we can easily show the common sub-expressions and then use
that knowledge during code generation
• Example: a+a*(b-c)+(b-c)*d
+

+ *

*
d
a -

b c
DAG

• The leaf for ‘a’ has two parents, because ‘a’ appears twice in the
expression. More interestingly, the two occurrences of the common
subexpression b-c are represented by one node, the node labeled -.
That node has two parents, representing its two uses in the
subexpressions a*(b-c) and (b-c)*d. Even though b and c appear
twice in the complete expression, their nodes each have one parent,
since both uses are in the common subexpression b-c
SDD for creating DAG’s

Production Semantic Rules


1) E -> E1+T E.node= new Node(‘+’, E1.node,T.node)
2) E -> E1-T E.node= new Node(‘-’, E1.node,T.node)
3) E -> T E.node = T.node
4) T -> (E) T.node = E.node
5) T -> id T.node = new Leaf(id, id.entry)
6) T -> num T.node = new Leaf(num, num.val)
Example:
1) p1=Leaf(id, entry-a) 8) p8=Leaf(id,entry-b)=p3
2) P2=Leaf(id, entry-a)=p1 9) p9=Leaf(id,entry-c)=p4
3) p3=Leaf(id, entry-b) 10) p10=Node(‘-’,p3,p4)=p5
4) p4=Leaf(id, entry-c) 11) p11=Leaf(id,entry-d)
5) p5=Node(‘-’,p3,p4) 12) p12=Node(‘*’,p5,p11)
6) p6=Node(‘*’,p1,p5) 13) p13=Node(‘+’,p7,p12)
7) p7=Node(‘+’,p1,p6)
2)Value-number method for
constructing DAG’s

= id To entry for i
num 10
+ + 1 2
3 1 3
i 10

(a) DAG (b) Array.


• the node labeled + has value number 3, and its left and right children
have value numbers 1 and 2, respectively. In practice, we could use
pointers to records or references to objects instead of integer indexes,
but we shall still refer to the reference to a node as its "value number."
• Suppose that nodes are stored in an array, as in Fig. 6.6, and each node
is referred to by its value number. Let the signature of an interior node
be the triple (op, 1, r), where op is the label, 1 its left child's value
number, and r its right child's value number. A unary operator may be
assumed to have r = 0
2)Value-number method for
constructing DAG’s
• Algorithm 6.3: The value-number method for constructing the nodes
of a DAG.
• INPUT: Label op, node 1, and node r.
• OUTPUT: The value number of a node in the array with signature
(op, 1, r).
• METHOD: Search the array for a node M with label op, left child I,
and right child r. If there is such a node, return the value number of
M. If not, create in the array a new node N with label op, left child 1,
and right child r, and return its value number.
3)Three address code
In three-address code, there is at most one operator on the right
side of an instruction; that is, no built-up arithmetic expressions
are permitted. Thus a source-language expression like x+y*z
might be translated into the sequence of three-address
instructions where tl and t2 are compiler-generated temporary
names.
• In a three address code there is at most one operator at the right side of
an instruction
• Example:

+
t1 = b – c
+ * t2 = a * t1
t3 = a + t2
* t4 = t1 * d
d
t5 = t3 + t4
a -

b c

a)DAG b) THREE ADRESS CODE


Data structures for three address
codes
• Quadruples
• Has four fields: op, arg1, arg2 and result
• Triples
• Temporaries are not used and instead references to instructions are made
• Indirect triples
• In addition to triples we use a list of pointers to triples
I)quadruple

• A quadruple (or just "quad') has four fields, which we call op, arg,, arg2, and
result. The op field contains an internal code for the operator. For instance, the
three-address instruction x = y +x is represented by placing + in op, y in arg,, 2 in
argz, and x in result.
• The following are some exceptions to this rule:
• I. Instructions with unary operators like x = minusy or x = y do not use arg,. Note
that for a copy statement like x = y, op is =, while for most other operations, the
assignment operator is implied.
• 2. Operators like param use neither arg2 nor result.
• 3. Conditional and unconditional jumps put the target label in result
2)TRIPLE

• A triple has only three fields, which we call op, arg,, and arg2. Note
that the result field in Fig. (b) is used primarily for temporary names.
Using triples, we refer to the result of an operation x op y by its
position, rather than by an explicit temporary name. Thus, instead of
the temporary tl in Fig. (b) , a triple representation would refer to
position (0). Parenthesized numbers represent pointers into the triple
structure itself.
• 3)Indirect triples consist of a listing of pointers to triples, rather than
a listing of triples themselves. For example, let us use an array
instruction to list pointers to triples in the desired order.
Example Three address code
t1 = minus c
t2 = b * t1
• b * minus c + b * minus c t3 = minus c
t4 = b * t3
t5 = t2 + t4
a = t5

a)Quadruples b)Triples Indirect Triples


op arg1 arg2 result op arg1 arg2 op op arg1 arg2
minus c t1 0 minus c 35 (0) 0 minus c
* b t1 t2 1 * b (0) 36 (1) 1 * b (0)
minus c t3 2 minus c 37 (2) 2 minus c
* b t3 t4 3 * b (2) b (2)
38 (3) 3 *
+ t2 t4 t5 4 + (1) (3) 39 (4) 4 + (1) (3)
= t5 a 5 = a (4) 40 (5) 5 = a (4)
Types and Declarations

1)Type Expressions
Example: int[2][3]
array(2,array(3,integer))

• A basic type is a type expression


• A type name is a type expression
• A type expression can be formed by applying the array type constructor to a
number and a type expression.
• A record is a data structure with named field
• A type expression can be formed by using the type constructor g for function
types
• If s and t are type expressions, then their Cartesian product s*t is a type
expression
• Type expressions may contain variables whose values are type expressions
2)Type Equivalence

• They are the same basic type.


• They are formed by applying the same constructor to structurally
equivalent types.
• One is a type name that denotes the other.
3)Declarations

Nonterminal D generates a sequence of declarations.


Nonterminal T generates basic, array, or record types.
Nonterminal B generates one of the basic types int and float.
Nonterminal C, for "component," generates strings of zero or
more integers, each integer surrounded by brackets. An array
type consists of a basic type specified by B, followed by array
components specified by nonterminal C. A record type (the
second production for T) is a sequence of declarations for
the fields of the record, all surrounded by curly braces
4)Storage Layout for Local
Names
• Computing types and their widths
Storage Layout for Local Names

• The body of the T-production consists of nonterminal B, an action,


and nonterminal C, which appears on the next line. The action
between B and C sets t to B.type and w to B. width. If B -+ int then B.
type is set to integer and B. width is set to 4, the width of an integer.
Similarly, if B -+ float then B. type is float and B. width is 8, the width
of a float. The productions for C determine whether T generates a
basic type or an array type. If C -+ e, then t becomes C.type and w
becomes C.width. Otherwise, C specifies an array component. The
action for C -+ [ num 1 Cl forms C.type by applying the type
constructor array to the operands num.value and Cl .type.
Storage Layout for Local
Names
Syntax-directed translation of array types
5)Sequences of Declarations
The translation scheme of Fig. deals with a sequence of
declarations of the form T id, where T generates a type as
in Fig.. Before the first declaration is considered, ofset is set
to 0.

The semantic action within the production D -> T id ; D1


creates a symboltable entry by executing top.put(id.
lexeme, T. type, ofset). Here top denotes the current
symbol table.
• The method top.put creates a symbol-table entry for
id.lexerne, with type T.type and relative address ogset in its
data area.

• Actions at the end:



6)Fields in Records and
Classes


TYPE CHECKING
• Rules for Type Checking
• Type checking can take on two forms:
• synthesis and inference. Type synthesis builds up the type of an
expression from the types of its subexpressions. It requires names to
be declared before they are used. The type of El + E2 is defined in
terms of the types of El and E2. A typical rule for type synthesis has
the form
• if f has type s -> t and x has type s,
• then expression f (x) has type t
• Here, f and x denote expressions, and s -> t denotes a function from s
to t. This rule for functions with one argument carries over to
functions with several arguments. The rule can be adapted for El +
E2 by viewing it as a function application add(E1 , E2) .
• Type inference determines the type of a language construct from the
way it is used.
TYPE CHECKING
Conversions between
primitive types in Java
TYPE CHECKING
TYPE CHECKING

The semantic action for checking E -> El + E2 uses two functions:


• 1. max(tl, t2) takes two types tl and tz and returns the maximum (or
least upper bound) of the two types in the widening hierarchy. It
declares an error if either tl or ta is not in the hierarchy;
• e.g., if either type is an array or a pointer type
2)widen(a, t, w) generates type conversions if needed to widen an
address a of type t into a value of type w. It returns a itself if t and w
are the same type. Otherwise, it generates an instruction to do the
conversion and place the result in a temporary t, which is returned as
the result. Pseudocode for widen, assuming that the only types
are .integer and float, appears in Fig. 6.26
Introducing type conversions into
expression evaluation
• E -+ El+E2 {E.type = max(El.type,E2.type);
a1 = widen(El . addr, El .type, E.type);
a2 = widen(E2. addr, E2 .type, E. type);
E.addr = new Temp 0;
gen(E. addr '=I a1 '+I a2); )

FIG: Introducing type conversions into expression evaluation


Introducing type conversions into
expression evaluation
Type Inference and
polymorphic functions
fun length(x) =
if null(x) then 0 else length(tl(x)+1)

Abstract syntax tree for the function definition

This is a polymorphic function


in ML language
Inferring a type for the function
length
Algorithm for Unification
Unification algorithm
two similar variables are present in
two similar expressions
boolean unify (Node m, Node n) {
s = find(m); t = find(n);
if ( s = t ) return true;
else if ( nodes s and t represent the same basic type ) return true;
else if (s is an op-node with children s1 and s2 and
t is an op-node with children t1 and t2) {
union(s , t) ;
return unify(s1, t1) and unify(s2, t2);
}
else if s or t represents a variable {
union(s, t) ;
return true;
}
else return false;
}
Control Flow

boolean expressions are often used to:


• Alter the flow of control.
• Compute logical values.
Short-Circuit Code


Flow-of-Control Statements

We now consider the translation of boolean expressions into


three-address code in the context of statements such as
those generated by the following grammar :

In these productions, nonterminal B represents a boolean expression and


nonterminal S represents a statement.

The translation of if (B) S1 consists of B. code followed by Sl. code, as


illustrated in Fig. 6.35(a).
Within B. code are jumps based on the value of B. If B is true, control flows
to the first instruction of S1 .code, and if B is false, control flows to the
instruction immediately following Sl .code.
Flow-of-Control Statements

• The labels for the jumps in B.code and S.code are managed using
inherited attributes. With a boolean expression B, we associate two
labels: B.true, the label to which control flows if B is true, and B.false,
the label to which control flows if B is false. With a statement S, we
associate an inherited attribute S.next denoting a label for the
instruction immediately after the code for S. In some cases, the
instruction immediately following S.code is a jump to some label L. A
jump to a jump to L from within S.code is avoided using S.next.
Flow-of-Control
Statements
FIG 6.3 Code for if-, if-else-, and while-statements
Syntax-directed definition for Flow-
of-Control Statements
Control-Flow Translation of Boolean Expressions

Generating three-address code for booleans


translation of a simple if-
statement


Translation of a switch-statement
The "switch" or "case" statement is available in a variety of
languages. Our switch-statement syntax is shown in Fig.
6.48. There is a selector expression E, which is to be
evaluated, followed by n constant values Vl , V2, . . - , Vn
that the expression might take, perhaps including a default
"value," which always matches the expression if no other
value does .
Translation of Switch-Statements

• The intended translation of a switch is code to:


• 1. Evaluate the expression E.
• 2. Find the value V, in the list of cases that is the same as the value of
the expression. Recall that the default value matches the expression
if none of the values explicitly mentioned in cases does.
• 3. Execute the statement Sj associated with the value found
Translation of a switch-
statement
Intermediatecode for procedures

Suppose that a is an array of integers, and


that f is a function from integers to integers.
Then, the assignment
n=f(a[i])
might translate into the following three-
address code:
1) tl=i*4
2) t2 = a [t1]
3) param t2
4) t3 = call f
5) n = t3
Intermediatecode for procedures

• Nonterminals D and T generate declarations and types, respectively,


as in Section 6.3. A function definition generated by D consists of
keyword define, a return type, the function name, formal parameters
in parentheses and a function body consisting of a statement.
Nonterminal F generates zero or more formal parameters, where a
formal parameter consists of a type followed by an identifier.
Nonterminals S and E generate statements and expressions,
respectively. The production for S adds a statement that returns the
value of an expression. The production for E adds function calls, with
actual parameters generated by A. An actual parameter is an
expression.
Intermediatecodefor procedures

D ->define T id ( F ) { S }
F -> ε | T id,F
S -> return E ;
E ->id(A)
A -> ε |E,A
Figure 6.52: Adding functions to the source language
Syntax-Directed Definitions

• Syntax-Directed Definitions
• A syntax-directed definition (SDD) is a context-free grammar together
with, attributes and rules. Attributes are associated with grammar
symbols and rules are associated with productions.
• If X is a symbol and a is one of its attributes, then we write X.a to
denote the value of a at a particular parse-tree node labeled X.
• Inherited and Synthesized Attributes
• 1. A synthesized attribute for a nonterminal A at a parse-tree node N
is defined by a semantic rule associated with the production at N.
Note that the production must have A as its head. A synthesized
attribute at node N is defined only in terms of attribute values at the
children of N and at N itself.
Syntax Directed Translation

• An SDT is a Context Free grammar with program fragments


embedded within production bodies
• Those program fragments are called semantic actions
• They can appear at any position within production body
• Any SDT can be implemented by first building a parse tree
and then performing the actions in a left-to-right depth first
order
• Typically SDT’s are implemented during parsing without
building a parse tree
Postfix Translation Schemes
• Simplest SDDs are those that we can parse the
grammar bottom-up and the SDD is s-attributed

• For such cases we can construct SDT where each


action is placed at the end of the production and is
executed along with the reduction of the body to the
head of that production

• SDT’s with all actions at the right ends of the


production bodies are called postfix SDT’s
Example of postfix SDT

1) L -> E n {print(E.val);}
2) E -> E1 + T {E.val=E1.val+T.val;}
3) E -> T {E.val = T.val;}
4) T -> T1 * F {T.val=T1.val*F.val;}
5) T -> F {T.val=F.val;}
6) F -> (E) {F.val=E.val;}
7) F -> digit {F.val=digit.lexval;}
Parse-Stack implementation of
postfix SDT’s
• In a shift-reduce parser we can easily implement semantic action
using the parser stack
• For each nonterminal (or state) on the stack we can associate a record
holding its attributes
• Then in a reduction step we can execute the semantic action at the
end of a production to evaluate the attribute(s) of the non-terminal at
the leftside of the production
• And put the value on the stack in replace of the rightside of
production
Example
L -> E n {print(stack[top-1].val);
top=top-1;}
E -> E1 + T {stack[top-2].val=stack[top-2].val+stack.val;
top=top-2;}
E -> T
T -> T1 * F {stack[top-2].val=stack[top-2].val+stack.val;
top=top-2;}
T -> F
F -> (E) {stack[top-2].val=stack[top-1].val
top=top-2;}
F -> digit
SDT’s with actions inside productions
• For a production B->X {a} Y
– If the parse is bottom-up then we perform
action “a” as soon as this occurrence of X 1) L -> E n
appears on the top of the parser stack 2) E -> {print(‘+’);} E1 + T
– If the parser is top down we perform “a” just 3) E -> T
before we expand Y 4) T -> {print(‘*’);} T1 * F
• Sometimes we can’t do things as easily as 5) T -> F
explained above 6) F -> (E)
7) F -> digit {print(digit.lexval);}
• One example is when we are parsing this
SDT with a bottom-up parser
SDT’s with actions inside productions (cont)
L

• Any SDT can be implemented as


follows E
1. Ignore the actions and produce a
parse tree
2. Examine each interior node N and {print(‘+’);}
add actions as new children at the E + T
correct position
3. Perform a postorder traversal and T F
execute actions when their nodes
are visited
{print(4);}
{print(‘*’);} T * F digit
{print(5);}
F digit

{print(3);}
digit
Infix to prefix translation during parsing
L

• Show the parse tree for


expression 3*5+4 E

{print(‘+’);}
E + T

T F

{print(4);}
{print(‘*’);} T * F digit
{print(5);}
F digit

{print(3);}
digit
SDT’s for L-Attributed definitions
• We can convert an L-attributed SDD into an SDT using following two
rules:
– Embed the action that computes the inherited attributes for a nonterminal A
immediately before that occurrence of A. if several inherited attributes of A
are dependent on one another in an acyclic fashion, order them so that those
needed first are computed first
– Place the action of a synthesized attribute for the head of a production at the
end of the body of the production
Implementing L-Attributed SDD

• A recursive-descent parser has a function A for each non-terminal A


we can extend the parser into a translator as follows:
• a) The arguments of function A are the inherited attributes of nonterminal A.
• b) The return-value of function A is the collection of synthesized attributes of
nonterminal A.
• In the body of function A, we need to both parse and handle attributes:
– 1. Decide upon the production used to expand A.
– 2. Check that each terminal appears on the input when it is required. We shall assume
that no backtracking is needed, but the extension to recursive-descent parsing with
backtracking can be done by restoring the input position upon failure
– 3. Preserve, in local variables, the values of all attributes needed to compute inherited
attributes for non-terminals in the body or synthesized attributes for the head non-
terminal.
– 4. Call functions corresponding to non-terminals in the body of the selected
production, providing them with the proper arguments. Since the underlying SDD is
L-attributed, we have already computed these attributes and stored them in local
variables.
Example
S -> while (C) S1 L1=new();
L2=new();
S1.next=L1;
C.false=S.next;
C.true=L2;
S.code=label||L1||C.code||label||L2||
S1.code
S -> while ( {L1=new();L2=new();C.false=S1.next;C.true=L2;}
C) {S1.next=L1;}
S1{S.code=label||L1||C.code||label||L2||S1.code;}

You might also like