MUUnit 4
MUUnit 4
CD : COMPILER DESIGN
Symbol Table
Prof. Dhara Joshi
In the analysis – synthesis model of compiler, the front
end translates a source program into an intermediate
representation from which the back end generates target
code. Intermediate
Code
Intermediate
Code
Role of Parser Static Checker Code
Generator
Generator
Intermediate
Code Source Program can be translated directly into machine
language, some benefits of using a machine –
Generator independent intermediate form are :
1) Retargeting is facilitated ; a compiler for a different
machine can be created by attaching a back end for the
new machine to an existing front end.
2) A machine independent code optimizer can be
applied to the intermediate representation.
Abstract Syntax Tree
Intermediate
Language /
Different Postfix notation
Intermediate
Form
Three Address Code
Syntax Tree + DAG (Directed Acyclic Graph)
A syntax tree depicts the natural hierarchical structure of a
source program.
A DAG (Directed Acyclic Graph) gives the same information
but in a more compact way because common
subexpressions are identified.
Example : a := b * -c + b * -c
Abstract =
=
Syntax Tree + +
a a
* * *
c c c
• Postfix notation is a linearized representation of a syntax
tree; it is a list of the nodes of the tree in which a node
appears immediately after its children.
• For example : a + b * c
Postfix notation : bc*
Postfix a bc* +
Notation
Example : a := b * -c + b * -c
Postfix notation : a b c uminus * b c uminus * + =
a = b * -c + b * -c
Three address code is a sequence of statements of the general
form
x := y op z
Where x, y and z are names, constants, or compiler generated
temporaries ; op stands for any operator, such as fixed or
floating point arithmetic operator, or a logical operator on a
Boolean data value.
Three Thus a source language expression like x + y * z might be
Address Code translated into a sequence
t1 := y * z
t2 := x + t1
Where t1 and t2 are compiler generated temporary names
Three address code is a linearized representation of a
syntax tree or a DAG in which explicit names correspond to
the interior nodes of the graph.
Example : a := b * -c + b * -c
t1 := uminus c
t2 := b * t1
t3 := uminus c
t4 := b * t3
Three t5 = t2 + t4
Address Code a = t5
The reason for the term “Three Address Code” is that each
statement usually contains three addresses, two for the
operands and one for the result.
Each Node of DAG contains a unique value.
It does not contain any cycles in it, hence called Acyclic.
Example : a = a + 10
Directed =
Acyclic Graph
Representation
1 id Entry to a +
2 Num 10
3 + 1 2 a 10
4 = 1 3
1) ( a + b ) * ( a + b + c )
Directed
Acyclic Graph
Examples
2) a + a * (b - c ) + ( b - c ) * d
3) i = i + 10
Directed
Acyclic Graph 4) ( ( ( a + a ) + ( a + a ) ) + ( ( a + a ) + ( a + a ) ) )
Examples
(Self
Practice)
Representation
of Syntax Tree
a := b * -c + b * -c
A three address statement is an abstract form of
intermediate code.
1) a * – (b + c)
2) a + b * c / e ↑ f + b * c
Quadruple, Quadruple
Triples and Location Op Arg1 Arg2 Result
Synthesized L En
E E1+T
Print (E.val)
E.val = E1.val + T.val
Attribute ET E.val = T.val
T T1 * F T.val=T1.val*F.val
TF T.val=F.val
F (E) F.val=E.val
F digit F.val=digit.lexval
Example: Simple desk calculator
String: 3*5+4n; Production Semantic
L rules
n L En Print (E.val)
E.val=19
E E1+T E.val = E1.val +
T.val
+ T.val=4 ET E.val = T.val
E.val=15
Synthesized T T1 * F T.val=T1.val*F.
digit.lexval=3
Annotated parse tree for 3*5+4n
An inherited attribute at a node in a parse tree is defined
in terms of attributes at the parent and/or siblings of the
node.
Convenient way of expressing the dependency of a
programming language construct on the context in which
it appears.
Inherited We can use inherited attributes to keep track of whether
Attribute an identifier appears on the left or right side of an
assignment to decide whether the address or value of the
assignment is needed.
The inherited attribute distributes type information to the
various identifiers in a declaration.
Production Semantic rules
Attribute L id addtype(id.entry,L.in)
L
L.in=real T real T.type= real
T.type=real
T
L L1,id L1.in = L.in,
Inherited real ,
id3
id
addtype(id.entry,L.in)
Attribute
L1
L.in=real
L id addtype(id.entry,L.in)
,
L.in=real
L1 id2
id
id
id1
L → id
DTL
L → L1 , id
Synthesized attribute Inherited attribute
Value of synthesized attribute at Values of inherited attribute at a
a node can be computed from the node can be computed from the
value of attributes at the children value of attribute at the parent
of that node in parse tree. and/ or siblings of the node
Pass the information from Pass the information top to
bottom to top in the parse tree bottom in the parse tree or from
Difference Synthesized attribute is used by
left siblings to the right siblings.
Inherited attribute is used by
both S-attributed SDT and L- only L-attributed SDT.
attributed STD.
A compiler must accurately implement the abstractions
embodied in the source language definition. These
abstractions typically include the concepts such as
names, scopes, bindings, data types, operators,
procedures, parameters, and flow-of-control constructs.
The compiler must cooperate with the operating system
and other systems software to support these abstractions
Run time on the target machine.
Environment To do so, the compiler creates and manages a run-time
environment in which it assumes its target programs are
being executed.
1. Procedure Call:-
A Procedure definition is a declaration that associates an
identifier with a statement.
The identifier is the procedure name and the statement is
the procedure body.
For example, the following is the definition of procedure
Source named readarray :
Procedure readarray;
Language var I : integer;
Issue begin
For i := 1 to 9 read(a[i])
end;
When procedure name appears within an executable
statement, the procedure is said to be called at that point.
2. Activation tree
Each execution of a procedure body is referred to as an
activation of the procedure.
The lifetime of an activation of a procedure p is the
sequence of steps between the first and last steps in the
execution of the procedure body.
Source An activation tree is used to depict the way control enters
and leaves activations. In an activation tree,
Language
1. Each node represents an activation of a procedure.
Issue
2. The root represents the activation of the main program.
3. The node for a is the parent of the node for b if and only
if control flows from activation a to b.
4. The node for a is to the left of the node for b if and only
if the lifetime of a occurs before the lifetime of b.
Source
Language
Issue
3. Control Stack
The flow of control in a program corresponds to a depth
first traversal of the activation tree that starts at the root,
visits a node before its children, and recursively visits
children at each node in a left to right order.
A control stack is used to keep track of live procedure
activations.
Source
The idea is to push the node for an activation onto the
Language control stack as the activation begins and to pop the node
Issue when the activation ends.
The contents of the control stack are related to paths to
the root of the activation tree.
When node n is at the top of control stack, the stack
contains the nodes along the path from n to the root.
Source
Language
Issue
4. The Scope of a Declaration
A declaration is a syntactic construct that associates
information with a n Declarations may be explicit, such
as: var i : integer ;
Or they may be implicit.
Example, any variable name starting with I is assumed to
Source denote an integer in a Fortran program unless otherwise
declared.
Language
The portion of the program to which a declaration
Issue applies is called the scope of that declaration.
An occurrence of a name in procedure is said to be local to
the procedure if it is in the scope of a declaration within
the procedure ; otherwise , the occurrence is said to be
non local.
5. Binding of names
Even if each name is declared once in a program, the same
name may denote different data objects at run time.
“Data object” corresponds to a storage location that holds
values.
The term environment refers to a function that maps a
Source name to a storage location.
Language The term state refers to a function that maps a storage
location to the value held there.
Issue
When an environment associates storage location s with a
name x, we say that x is bound to s.
This association is referred to as a binding of x.
The activation record is a block of memory used for
managing information needed by a single execution of a
procedure.
Activation
Record
Activation record
Fortran uses the static data area to store the activation
record where as in pascal and C the activation record is
situated in stack area.
1. Temporary values: The temporary variables are needed
during the evaluation of expressions. Such variables are
stored in the temporary field of activation record.
2. Local variable: The local data is a data that is local to
the execution of procedure which is stored in this field
of activation record.
3. Saved machine register: This field holds the information
Activation regarding the status of machine just before the
procedure is called. This field contains the machine
Record registers and program counters.
4. Control link: This field is optional. It points to the
activation record of the calling procedure. This link is
also called the dynamic link.
5. Access link: This field is also optional. It refers to the
non local data in other activation record. This field is
also called static link field.
6. Actual parameter: This field holds the information of
actual parameter. These actual parameter are passes to the
called procedure.
7. Return values: This field is used to store the result of a
function call.
Activation
Record
Static memory allocation Dynamic memory allocation
Memory is allocated before the Memory is allocated during the
execution of program begins. execution of program.
No memory allocation or de- Memory bindings are
allocation actions are performed established and destroyed
during execution. during execution.
Static Execution speed is faster Execution speed is slower
memory and compared to dynamic allocation. compared to static allocation.
Dynamic To access the variable pointer is No
needed.
need of
allocated pointers.
dynamically
Parameter
Passing
Methods
2. Call by Reference:
This method is also called call by address or call by
location.
In call-by-reference, the address of the actual parameter is
passed to the callee as the value of the corresponding
Parameter formal parameter.
Passing Uses of the formal parameter in the code of the callee are
implemented by following this pointer to the location
Methods indicated by the caller.
Changes to the formal parameter thus appear as changes
to the actual parameter.
2. Call by Reference:
Parameter
Passing
Methods
Parameter
Passing
Methods
Parameters Call by Value Call by Reference
Parameter
Passing
Methods
4. Call by Name:
This is less popular method of parameter passing.
Procedure is treated like macro. The procedure body is
substituted for call in caller with actual parameters
substituted for formulas.
Parameter The actual parameters can be surrounded by parenthesis
Passing to preserve their integrity.
Parameter
Passing
Methods
Definition:
Symbol table is a data structure used by compiler to keep
track of semantics of a variable. That means symbol table
stores the information about scope and binding
information about names.
Symbol table is built in lexical and syntax analysis phases.
Symbol table
Symbol table entries:
1) Variable names
2) constants
3) Function names
4) Procedure names
Symbol table
1) List data structure:
Linear list is a simplest kind of mechanism to implement
the symbol table.
In this method an array is used to store names and
associated information.
Data New names can be added in the order as they arrive.
structure for A pointer “available” is maintained at end of all stored
Symbol table record.
To search for a name we start from beginning of list till
available pointer and if not found we get an error “use of
undeclared name”
While inserting a new name we must ensure that it is not
already present otherwise error occurs i.e. “Multiple
defined name”
Advantage is that it takes minimum amount of space.
Data
structure for
Symbol table
2) Self organizing list:
This symbol table implementation is using linked list. A
link field is added to each record.
We search the records in the order pointed by link of link
field.
Data A pointer first is maintained to point to first record of the
structure for symbol table.
Symbol table The reference to this name can be name 3, name 1, name
4 and name 2.
When the name is referenced or created it is moved to the
front of the list.
The most frequently referred names will tend to be front
of the list. Hence time to most referred names will be the
least.
first
Data
structure for
Symbol table
3. Hash Tables :
In hashing scheme two tables are maintained – a hash
table and symbol table and is the most commonly used
method to implement symbol tables.
A hash table is an array with index range:
Data 0 to t_size – 1.
Symbol table
1) Explain Quadruples and Triples form of three address code with
example.
2) Draw a DAG for expression: a + a * (b – c) + (b – c) * d.
3) Construct syntax tree and DAG for following expression.
a = (b+c+d) * (b+c-d) + a
4) Explain quadruples, triples and indirect triples with examples
5) Translate following arithmetic expression ( a * b ) + ( c + d ) - ( a + b)
into
1] Quadruples
Questions 2] Triple
3] Indirect Triple
6) Draw syntax tree and DAG for the statement
x=(a+b)*(a+b+c)*(a+b+c+d)
7) What is Symbol Table?
8) Differentiate Static and Dynamic Memory Allocation
9) Explain Activation record.
10) Explain Parameter passing methods with example.
Thanks
Prof. Jaydip Siyara