0% found this document useful (0 votes)
4 views

MUUnit 4

The document outlines the concepts of Intermediate Code Generation and Memory Management in compiler design, including the role of intermediate code generators, various intermediate languages, and the structure of directed acyclic graphs (DAGs). It discusses the representation of syntax trees, three-address code, and different forms of intermediate code such as quadruples, triples, and indirect triples. Additionally, it covers syntax-directed definitions, synthesized and inherited attributes, and the management of runtime environments in relation to source language issues.

Uploaded by

Dev Savaliya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

MUUnit 4

The document outlines the concepts of Intermediate Code Generation and Memory Management in compiler design, including the role of intermediate code generators, various intermediate languages, and the structure of directed acyclic graphs (DAGs). It discusses the representation of syntax trees, three-address code, and different forms of intermediate code such as quadruples, triples, and indirect triples. Additionally, it covers syntax-directed definitions, synthesized and inherited attributes, and the management of runtime environments in relation to source language issues.

Uploaded by

Dev Savaliya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Department of CE

CD : COMPILER DESIGN

Intermediate Code Unit no : 4


Intermediate Code

Generation and Generation and


Memory

Memory Management Management


(01CE0714)

Prof. Jaydip Siyara


Outline :
Role of Intermediate Code Generator
Department of CE
Intermediate Languages
Directed Acyclic Graph (DAG)
Quadruple, Triples, Indirect Triples Unit no : 4
Intermediate Code
SDD : Synthesized and Inherited Attribute
Generation and
Syntax Directed Translation Memory
Activation Record Management
Parameter Passing Methods (01CE0714)

Symbol Table
Prof. Dhara Joshi
 In the analysis – synthesis model of compiler, the front
end translates a source program into an intermediate
representation from which the back end generates target
code. Intermediate
Code
Intermediate
Code
Role of Parser Static Checker Code
Generator
Generator
Intermediate
Code  Source Program can be translated directly into machine
language, some benefits of using a machine –
Generator independent intermediate form are :
1) Retargeting is facilitated ; a compiler for a different
machine can be created by attaching a back end for the
new machine to an existing front end.
2) A machine independent code optimizer can be
applied to the intermediate representation.
Abstract Syntax Tree
Intermediate
Language /
Different Postfix notation
Intermediate
Form
Three Address Code
 Syntax Tree + DAG (Directed Acyclic Graph)
 A syntax tree depicts the natural hierarchical structure of a
source program.
 A DAG (Directed Acyclic Graph) gives the same information
but in a more compact way because common
subexpressions are identified.
 Example : a := b * -c + b * -c
Abstract =
=

Syntax Tree + +
a a

* * *

b uminus b uminus b uminus

c c c
• Postfix notation is a linearized representation of a syntax
tree; it is a list of the nodes of the tree in which a node
appears immediately after its children.
• For example : a + b * c
Postfix notation : bc*
Postfix a bc* +
Notation
 Example : a := b * -c + b * -c
Postfix notation : a b c uminus * b c uminus * + =
a = b * -c + b * -c
 Three address code is a sequence of statements of the general
form
x := y op z
Where x, y and z are names, constants, or compiler generated
temporaries ; op stands for any operator, such as fixed or
floating point arithmetic operator, or a logical operator on a
Boolean data value.
Three  Thus a source language expression like x + y * z might be
Address Code translated into a sequence
t1 := y * z
t2 := x + t1
Where t1 and t2 are compiler generated temporary names
 Three address code is a linearized representation of a
syntax tree or a DAG in which explicit names correspond to
the interior nodes of the graph.
 Example : a := b * -c + b * -c

t1 := uminus c
t2 := b * t1
t3 := uminus c
t4 := b * t3
Three t5 = t2 + t4
Address Code a = t5

 The reason for the term “Three Address Code” is that each
statement usually contains three addresses, two for the
operands and one for the result.
 Each Node of DAG contains a unique value.
 It does not contain any cycles in it, hence called Acyclic.

Example : a = a + 10

Directed =
Acyclic Graph
Representation
1 id Entry to a +
2 Num 10
3 + 1 2 a 10
4 = 1 3
1) ( a + b ) * ( a + b + c )

Directed
Acyclic Graph
Examples
2) a + a * (b - c ) + ( b - c ) * d

3) i = i + 10
Directed
Acyclic Graph 4) ( ( ( a + a ) + ( a + a ) ) + ( ( a + a ) + ( a + a ) ) )
Examples
(Self
Practice)
Representation
of Syntax Tree
a := b * -c + b * -c
 A three address statement is an abstract form of
intermediate code.

 In a compiler, these statements can be implemented as


records with fields for the operator and the operands.
Quadruple,
Triples and  Three such representations are
1) Quadruples
Indirect Triples 2) Triples
3) Indirect Triples
Quadruples :
 A Quadruple is a record structure with four fields :
 Op
 Arg1
 Arg2
 Result
Quadruple,  The address statement x := y op z is represented by
placing y in arg1 , z in arg2, and x in result.
Triples and  Statements with unary operators like x := -y or x := y do
Indirect Triples not use arg2.
 Conditional and unconditional jumps put the target label
in result.
 The contents of fields arg1, arg2 and result are normally
pointers to the symbol table entries for the names
represented by these fields.
 If so, temporary names must be entered into the symbol
table as they are created.
Example a := b * -c + b * -c
op arg1 arg2 result
(0) uminus c t1
(1) * b t1 t2
Quadruple, (2) uminus c t3
Triples and (3) * b t3 t4
Indirect Triples (4) + t2 t4 t5
(5) := t5 a
Triples :
 To avoid creating temporary names into the symbol
table, we might refer to a temporary value by the position
of the statement that computes it.
 A Triple is a record structure with three fields :
 Op
Quadruple,  Arg1
Triples and  Arg2
 Note that the copy statement a := t5 is encoded in the
Indirect Triples triple representation by placing a in the arg1 field and
using the operator assign.
 x [i] := y requires two entries in the triple structure.
op arg1 arg2
(0) []= x i
(1) assign (0) y
Example a := b * -c + b * -c
op arg1 arg2
(0) uminus c
(1) * b (0)
Quadruple, (2) uminus c
Triples and (3) * b (2)
Indirect Triples (4) + (1) (3)
(5) := a (4)
Indirect Triples :
 Another implementation of three address code that has
been considered is that of listing pointers to triples,
rather than listing the triples themselves.
 This implementation is naturally called indirect triples.

Quadruple, Statement op arg1 arg2


(0) (14) (14) uminus c
Triples and
(1) (15) (15) * b (14)
Indirect Triples (2) (16) (16) uminus c
(3) (17) (17) * b (16)
(4) (18) (18) + (15) (17)
(5) (19) (19) := a (18)
Practice Examples :

1) a * – (b + c)

2) a + b * c / e ↑ f + b * c
Quadruple, Quadruple
Triples and Location Op Arg1 Arg2 Result

Indirect Triples (0) ↑ e f T1


(1) * b c T2
(2) / T2 T1 T3
(3) * b c T4
(4) + a T3 T5
(5) + T5 T4 T6
 A syntax directed definition specifies the values of
attributes by associating semantic rules with the
grammar productions.
 Grammar + Semantic Rules
Syntax  A syntax-directed translation is an executable specification
of SDD.
Directed  Syntax directed definition is a generalization of context
Definition free grammar in which each grammar symbol has an
associated set of attributes.
 Types of attributes are:
1. Synthesized attribute
2. Inherited attribute
 Value of synthesized attribute at a node can be
computed from the value of attributes at the children of
that node in the parse tree.
 Syntax directed definition that uses synthesized attribute
exclusively is said to be S-attribute definition.
 A parse tree for an S-attribute definition can always be
Synthesized annotated by evaluating the semantic rules for the
Attribute attribute at each node bottom up, from the leaves to root.
 An Annotated parse tree is a parse tree showing the
value of the attributes at each node.
 The process of computing the attribute values at the node
is called Annotating or Decorating the parse tree.
 Example: Simple desk calculator

Production Semantic rules

Synthesized L  En
E  E1+T
Print (E.val)
E.val = E1.val + T.val
Attribute ET E.val = T.val
T  T1 * F T.val=T1.val*F.val
TF T.val=F.val
F  (E) F.val=E.val
F  digit F.val=digit.lexval
 Example: Simple desk calculator
 String: 3*5+4n; Production Semantic
L rules
n L  En Print (E.val)
E.val=19
E  E1+T E.val = E1.val +
T.val
+ T.val=4 ET E.val = T.val
E.val=15
Synthesized T  T1 * F T.val=T1.val*F.

Attribute T.val=15 F.val=4


TF
val
T.val=F.val
digit.lexval=4 F  (E) F.val=E.val
*
T.val=3 F.val=5
F  digit F.val=digit.lexv
al
F.val=3 digit.lexval=5

digit.lexval=3
Annotated parse tree for 3*5+4n
 An inherited attribute at a node in a parse tree is defined
in terms of attributes at the parent and/or siblings of the
node.
 Convenient way of expressing the dependency of a
programming language construct on the context in which
it appears.
Inherited  We can use inherited attributes to keep track of whether
Attribute an identifier appears on the left or right side of an
assignment to decide whether the address or value of the
assignment is needed.
 The inherited attribute distributes type information to the
various identifiers in a declaration.
Production Semantic rules

DTL L.in = T.type

T  int T.Type = integer

T  real T.type= real

Inherited L  L1,id L1.in = L.in, addtype(id.entry,L.in)

Attribute L  id addtype(id.entry,L.in)

Symbol T is associated with a synthesized attribute type.


Symbol L is associated with an inherited attribute in.
real id1,id2,id3 Producti Semantic rules
on
DTL L.in = T.type
D
T  int T.Type = integer

L
L.in=real T  real T.type= real
T.type=real
T
L  L1,id L1.in = L.in,
Inherited real ,
id3
id
addtype(id.entry,L.in)

Attribute
L1
L.in=real
L  id addtype(id.entry,L.in)

,
L.in=real
L1 id2
id

id
id1

L → id
DTL
L → L1 , id
Synthesized attribute Inherited attribute
Value of synthesized attribute at Values of inherited attribute at a
a node can be computed from the node can be computed from the
value of attributes at the children value of attribute at the parent
of that node in parse tree. and/ or siblings of the node
Pass the information from Pass the information top to
bottom to top in the parse tree bottom in the parse tree or from
Difference Synthesized attribute is used by
left siblings to the right siblings.
Inherited attribute is used by
both S-attributed SDT and L- only L-attributed SDT.
attributed STD.
 A compiler must accurately implement the abstractions
embodied in the source language definition. These
abstractions typically include the concepts such as
names, scopes, bindings, data types, operators,
procedures, parameters, and flow-of-control constructs.
 The compiler must cooperate with the operating system
and other systems software to support these abstractions
Run time on the target machine.
Environment  To do so, the compiler creates and manages a run-time
environment in which it assumes its target programs are
being executed.
1. Procedure Call:-
 A Procedure definition is a declaration that associates an
identifier with a statement.
 The identifier is the procedure name and the statement is
the procedure body.
 For example, the following is the definition of procedure
Source named readarray :
Procedure readarray;
Language var I : integer;
Issue begin
For i := 1 to 9 read(a[i])
end;
 When procedure name appears within an executable
statement, the procedure is said to be called at that point.
2. Activation tree
 Each execution of a procedure body is referred to as an
activation of the procedure.
 The lifetime of an activation of a procedure p is the
sequence of steps between the first and last steps in the
execution of the procedure body.
Source  An activation tree is used to depict the way control enters
and leaves activations. In an activation tree,
Language
1. Each node represents an activation of a procedure.
Issue
2. The root represents the activation of the main program.
3. The node for a is the parent of the node for b if and only
if control flows from activation a to b.
4. The node for a is to the left of the node for b if and only
if the lifetime of a occurs before the lifetime of b.
Source
Language
Issue
3. Control Stack
 The flow of control in a program corresponds to a depth
first traversal of the activation tree that starts at the root,
visits a node before its children, and recursively visits
children at each node in a left to right order.
 A control stack is used to keep track of live procedure
activations.
Source
 The idea is to push the node for an activation onto the
Language control stack as the activation begins and to pop the node
Issue when the activation ends.
 The contents of the control stack are related to paths to
the root of the activation tree.
 When node n is at the top of control stack, the stack
contains the nodes along the path from n to the root.
Source
Language
Issue
4. The Scope of a Declaration
 A declaration is a syntactic construct that associates
information with a n Declarations may be explicit, such
as: var i : integer ;
 Or they may be implicit.
 Example, any variable name starting with I is assumed to
Source denote an integer in a Fortran program unless otherwise
declared.
Language
 The portion of the program to which a declaration
Issue applies is called the scope of that declaration.
 An occurrence of a name in procedure is said to be local to
the procedure if it is in the scope of a declaration within
the procedure ; otherwise , the occurrence is said to be
non local.
5. Binding of names
 Even if each name is declared once in a program, the same
name may denote different data objects at run time.
 “Data object” corresponds to a storage location that holds
values.
 The term environment refers to a function that maps a
Source name to a storage location.
Language  The term state refers to a function that maps a storage
location to the value held there.
Issue
 When an environment associates storage location s with a
name x, we say that x is bound to s.
 This association is referred to as a binding of x.
 The activation record is a block of memory used for
managing information needed by a single execution of a
procedure.

Activation
Record

Activation record
 Fortran uses the static data area to store the activation
record where as in pascal and C the activation record is
situated in stack area.
1. Temporary values: The temporary variables are needed
during the evaluation of expressions. Such variables are
stored in the temporary field of activation record.
2. Local variable: The local data is a data that is local to
the execution of procedure which is stored in this field
of activation record.
3. Saved machine register: This field holds the information
Activation regarding the status of machine just before the
procedure is called. This field contains the machine
Record registers and program counters.
4. Control link: This field is optional. It points to the
activation record of the calling procedure. This link is
also called the dynamic link.
5. Access link: This field is also optional. It refers to the
non local data in other activation record. This field is
also called static link field.
6. Actual parameter: This field holds the information of
actual parameter. These actual parameter are passes to the
called procedure.
7. Return values: This field is used to store the result of a
function call.

Activation
Record
Static memory allocation Dynamic memory allocation
Memory is allocated before the Memory is allocated during the
execution of program begins. execution of program.
No memory allocation or de- Memory bindings are
allocation actions are performed established and destroyed
during execution. during execution.
Static Execution speed is faster Execution speed is slower
memory and compared to dynamic allocation. compared to static allocation.
Dynamic To access the variable pointer is No
needed.
need of
allocated pointers.
dynamically

memory More memory space is required Less memory space required in


Allocation as we allocate memory before dynamic memory allocation.
execution.
Variable remains permanently Allocated only when program
allocated. unit is active.
Implemented using stack and Implemented using data
heaps. segment.
 All programming languages have a notion of a procedure,
but they can differ in how these procedures get their
arguments. In this section, we shall consider how the
actual parameters are associated with the formal
parameters.
1. Call by value
Parameter 2. Call by reference
Passing 3. Copy restore
Methods 4. Call by name
1. Call by Value:
 In call-by-value, the actual parameter is evaluated (if it is
an expression) or copied (if it is a variable). The value is
placed in the location belonging to the corresponding
formal parameter of the called procedure.
Parameter  The operation in formal parameter do not change its
value in actual parameter.
Passing
 This method is used in C, Java and is a common option in
Methods C++, as well as other languages.
1. Call by Value:

Parameter
Passing
Methods
2. Call by Reference:
 This method is also called call by address or call by
location.
 In call-by-reference, the address of the actual parameter is
passed to the callee as the value of the corresponding
Parameter formal parameter.
Passing  Uses of the formal parameter in the code of the callee are
implemented by following this pointer to the location
Methods indicated by the caller.
 Changes to the formal parameter thus appear as changes
to the actual parameter.
2. Call by Reference:

Parameter
Passing
Methods
Parameter
Passing
Methods
Parameters Call by Value Call by Reference

A copy of the variable A variable itself is


Basic
is passed. passed.

Change in a cop of Change in a copy of


Parameter effect
variable doesn’t variable modify the
modify the original original value of
Passing value of variable. variable.
Methods function_name(variabl function_name(&varia
Syntax e_name1,variable_nam ble_name1,&variable_
e2…) name2…)

Primitive Objects are implicitly


Default Calling types are passed using passed using
“call_by_value”. “call_by_reference”.
3. Call by Copy - Restore:
 This method is hybrid between call by value and call by
reference. This method is also called copy-in-copy-out or
values result.
 The calling procedure calculates the value of actual
Parameter parameter and it then copies to activation record for the
called procedure.
Passing  During execution of called procedure, the actual
Methods parameters value is not affected.
 If the actual parameter has X-values then, at return the
value of formal parameter is copied to actual parameter.
3. Call by Copy - Restore:

Parameter
Passing
Methods
4. Call by Name:
 This is less popular method of parameter passing.
 Procedure is treated like macro. The procedure body is
substituted for call in caller with actual parameters
substituted for formulas.
Parameter  The actual parameters can be surrounded by parenthesis
Passing to preserve their integrity.

Methods  The local names of called procedure and names of calling


procedure are distinct.
4. Call by name:

Parameter
Passing
Methods
 Definition:
 Symbol table is a data structure used by compiler to keep
track of semantics of a variable. That means symbol table
stores the information about scope and binding
information about names.
 Symbol table is built in lexical and syntax analysis phases.
Symbol table
 Symbol table entries:
1) Variable names
2) constants
3) Function names
4) Procedure names

Symbol table 5) Literals and strings


6) Compiler generated temporaries
7) Labels in source language
 Information used by compiler from symbol table:
1) Data types
2) Name
3) Declaring Procedure
4) Offset in storage

Symbol table 5) If structure or record then pointer to structure table


6) For parameters, whether parameter passing is by value
or reference
7) Number and type of arguments passed to the function
8) Base address
How to store names in symbol table:

There are two types of representation:


1. Fixed length name
A fixed space for each name is allocated in symbol table. In
this type of storage if name is too small then there is
Symbol table wastage of space.
The name can be referred by pointer to symbol table
entries
There are two types of representation:
2. Variable length name
The amount of space required by string is used to store the
names. The name can be stored with the help of starting
index and length of each name.

Symbol table
1) List data structure:
 Linear list is a simplest kind of mechanism to implement
the symbol table.
 In this method an array is used to store names and
associated information.
Data  New names can be added in the order as they arrive.
structure for  A pointer “available” is maintained at end of all stored
Symbol table record.
 To search for a name we start from beginning of list till
available pointer and if not found we get an error “use of
undeclared name”
 While inserting a new name we must ensure that it is not
already present otherwise error occurs i.e. “Multiple
defined name”
 Advantage is that it takes minimum amount of space.

Data
structure for
Symbol table
2) Self organizing list:
 This symbol table implementation is using linked list. A
link field is added to each record.
 We search the records in the order pointed by link of link
field.
Data  A pointer first is maintained to point to first record of the
structure for symbol table.

Symbol table  The reference to this name can be name 3, name 1, name
4 and name 2.
 When the name is referenced or created it is moved to the
front of the list.
 The most frequently referred names will tend to be front
of the list. Hence time to most referred names will be the
least.

first
Data
structure for
Symbol table
3. Hash Tables :
 In hashing scheme two tables are maintained – a hash
table and symbol table and is the most commonly used
method to implement symbol tables.
 A hash table is an array with index range:
Data  0 to t_size – 1.

structure for  These entries are pointer pointing to names of symbol


table.
Symbol table
 To search for a name we use hash function that will result
in any integer between 0 to t_size – 1.
 Insertion and lookup can be made very fast – O(1).
 Advantage is quick search is possible and disadvantage is
that hashing is complicated to implement.
4. Binary Search Tree
 Another approach to implement symbol table is to use
binary search tree i.e. we add two link fields i.e. left and
right child.
 All names are created as child of root node that always
Data follow the property of binary search tree.
structure for  Insertion and lookup are O(log2 n) on average

Symbol table
1) Explain Quadruples and Triples form of three address code with
example.
2) Draw a DAG for expression: a + a * (b – c) + (b – c) * d.
3) Construct syntax tree and DAG for following expression.
a = (b+c+d) * (b+c-d) + a
4) Explain quadruples, triples and indirect triples with examples
5) Translate following arithmetic expression ( a * b ) + ( c + d ) - ( a + b)
into
1] Quadruples
Questions 2] Triple
3] Indirect Triple
6) Draw syntax tree and DAG for the statement
x=(a+b)*(a+b+c)*(a+b+c+d)
7) What is Symbol Table?
8) Differentiate Static and Dynamic Memory Allocation
9) Explain Activation record.
10) Explain Parameter passing methods with example.
Thanks
Prof. Jaydip Siyara

You might also like