0% found this document useful (0 votes)
20 views

CompilerDesign UNIT 4 CD-DIGITALMATEIAL

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

CompilerDesign UNIT 4 CD-DIGITALMATEIAL

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Please read this disclaimer before

proceeding:
This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
1. CONTENTS

Page
S. No Contents
No

1 Course Objectives 5

2 Pre Requisites 6

3 Syllabus 7

4 Course outcomes 9

5 CO- PO/PSO Mapping 10

6 Lecture Plan 11

7 Activity based learning 12

8 UNIT -IV Lecture Notes 13

9 Assignments 47

10 Part A Questions & Answers 48

11 Part B Questions 52

12 Supportive online Certification courses 53

13 Real time Applications 54

14 Contents beyond the Syllabus 55

15 Assessment Schedule 58

16 Prescribed Text Books & Reference Books 59

17 Mini Project Suggestions 60

3
21CS601
Compiler Design

Department: CSE
Batch/Year: 2021-25 / III

Created by:
Dr. P. EZHUMALAI, Prof & Head/RMDEC
Dr. A. K. JAITHUNBI, Associate Professor/RMDEC
V.SHARMILA, Assistant Professor/RMDEC

Date: 26.02.2024

4
1. CONTENTS

S. Page
Contents
No No
1 Course Objectives 5

2 Pre Requisites 6

3 Syllabus 7

4 Course outcomes 9

5 CO- PO/PSO Mapping 10

6 Lecture Plan 11

7 Activity based learning 12

8 Lecture Notes 13

9 Assignments 47

10 Part A Questions & Answers 48

11 Part B Questions 55

12 Supportive online Certification courses 58

13 Real time Applications 59

14 Contents beyond the Syllabus 60

15 Assessment Schedule 64

16 Prescribed Text Books & Reference Books 65

17 Mini Project Suggestions 66

5
2. COURSE OBJECTIVES

• To study the different phases of Compiler.

• To understand the techniques for tokenization and parsing.

• To understand the conversion of source program into an intermediate


representation.

• To learn the different techniques used for assembly code generation.

• To analyze various code optimization techniques.

6
3. PRE REQUISITES
• Pre-requisite Chart

21CS601 - COMPILER DESIGN

21CS503 THEORY OF COMPUTATION

21MA302
21CS201
Discrete Mathematics
Data Structures

21CS02 Python
21GE101 -
Programming (Lab
Problem solving and C
Integrated)
Programming

7
4. SYLLABUS
21CS601 COMPILER DESIGN (Lab Integrated) LTPC
3 02 4
OBJECTIVES

• To study the different phases of compiler.


• To understand the techniques for tokenization and parsing.
• To understand the conversion of source program into an intermediate
representation.
• To learn the different techniques used for assembly code generation.
• To analyze various code optimization techniques.

UNIT I INTRODUCTION TO COMPILERS 9


Introduction–Structure of a Compiler–Role of the Lexical Analyzer - Input
Buffering - Specification of Tokens - Recognition of Tokens–The Lexica Analyzer
Generator LEX- Finite Automata - From Regular Expressions to Automata -
conversion from NFA to DFA, Epsilon NFA to DFA – Minimization of Automata.
UNIT II SYNTAX ANALYSIS 9
Role of the Parser - Context-free grammars – Derivation Trees – Ambiguity in
Grammars and Languages- Writing a grammar – Top-Down Parsing –Bottom Up
Parsing -LR Parser-SLR, CLR - Introduction to LALR Parser -Parser Generators –
Design of a parser generator –YACC.
UNITIII INTERMEDIATE CODE GENERATION 9
Syntax Directed Definitions - Evaluation Orders for Syntax Directed Definitions–
Application of Syntax Directed Translation - Intermediate Languages - Syntax Tree
-Three address code – Types and Declarations - Translation of Expressions - Type
Checking.

UNIT IV RUN-TIME ENVIRONMENT AND CODE GENERATION 9


Run Time Environment: Storage Organization-Stack Allocation of space - Access
to nonlocal data on stack – Heap management - Parameter Passing - Issues in Code
Generation - Design of a simple Code Generator Code generator using DAG –
Dynamic programming based code generation.
UNIT V CODE OPTIMIZATION 9
Principal Sources of Optimization – Peep-hole optimization - Register allocation
and assignment - DAG -Basic blocks and flow graph - Optimization in Basic
blocks – Data Flow Analysis.

8
4. SYLLABUS

LIST OF EXPERIMENTS:
1. Develop a lexical analyzer to recognize a few patterns in C. (Ex.
identifiers, constants, comments, operators etc.). Create a symbol
table, while recognizing identifiers.
2. Design a lexical analyzer for the given language. The lexical analyzer
should ignore redundant spaces, tabs and new lines, comments etc.
3. Implement a Lexical Analyzer using Lex Tool.
4. Design Predictive Parser for the given language.
5. Implement an Arithmetic Calculator using LEX and YACC.
6. Generate three address code for a simple program using LEX and YACC.
7. Implement simple code optimization techniques (Constant folding,
Strength reduction and Algebraic transformation).
8. Implement back-end of the complier for which the three address code
is given as input and the 8086 assembly language code is produced as
output.

9
5. COURSE OUTCOME

• At the end of the course, the student should be able to:

COURSE OUTCOMES HKL

Understand the different phases of compiler and identify the


CO1 K2
tokens using Automata

Construct the parse tree and check the syntax of the given
CO2 K3
source program.

Generate intermediate code representation for any source


CO3 K4
programs.

Analyze the different techniques used for assembly code


CO4 K4
generation.

Implement code optimization techniques with simple code


CO5 K3
generators.

10
• HKL = Highest Knowledge Level
6. CO - PO / PSO MAPPING

PROGRAM OUTCOMES PSO


K3, PS PS PS
CO HKL K3 K4 K5 K5 K4, A3 A2 A3 A3 A3 A3 A2 O1 O2 O3
K5
PO PO PO PO PO PO PO PO PO PO PO PO
-1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12

CO1 K2 3 2 1 - - - - 1 1 1 - 1 2 - -

CO2 K3 3 2 1 - - - - 1 1 1 - 1 2 - -

CO3 K4 3 2 1 - - - - 1 1 1 - 1 2 - -

CO4 K4 3 2 1 - - - - 1 1 1 - 1 2 - -

CO5 K3 3 2 1 - - - - 1 1 1 - 1 2 - -

• Correlation Level - 1. Slight (Low) 2. Moderate (Medium)


3. Substantial (High) , If there is no correlation, put “-“.

11
7. LECTURE PLAN
UNIT – IV RUN-TIME ENVIRONMENT AND CODE GENERATION

S. Propos Topic Actual Pertai High Mod Deli After successful Rem
No ed Lecture ning est e of very completion of the arks
Lectur CO(s) Cog Deli Reso course, the students
e n very urce should be able to
Period Period itive s (LU Outcomes)
Leve
l
1 CO K2 MD1 T1
313.4
Storage understand the storage
Organization organization concepts

2 CO K2 MD1 T1
Stack 313.4 MD5 describe stack allocation
Allocation
space
Space

3 Access to CO K3 MD1 T1
Non-local 313.4 procedures to access non-
Data on the local data on stack
Stack
4 CO K2 MD1 T1
313.4 MD2
Heap get idea for managing
Management Heap storage

5 CO K2 MD1 T1
MD2 Compare different
Parameter 313.4
parameter passing
Passing
methods

6 CO K2 MD1 T1
Issues in 313.4
identify the issues in code
Code
generation
Generation

7 CO K2 MD1 T1
Design of a 313.4
simple Code
design a code generator
Generator
using DAG

8 Dynamic CO K3 MD1 T1 design a code generator


programming 313.4 using dynamic
based code programming
generation
8. ACTIVITY BASED LEARNING : UNIT – IV

UNIT – IV
• TO UNDERSTAND THE BASIC CONCEPTS OF COMPILERS ,
STUDENTS ABLE TO TAKE QUIZ AS AN ACTIVITY.
• LINKS WILL BE PROVIDED BELOW.
https://create.kahoot.it/share/cs8602-unit-4/2e5c742f-541a-
4bd0-84b2-9e2ca3b47170
Join at www.kahoot.it
or with the Kahoot! app use the below game pin to play the Quiz.

Hands-on Assignment:
1. visualize the use/live of variables in programs, illustrate it
with the examples
2. Here is a sketch of two C functions / and g:
int f(int x) { int i; ••• return i+1; ••• }
int g(int y) { int j; ••• f(j+D ••• 1
That is, function g calls /. Draw the top of the stack,
starting with the activation record for g, after g calls /, and / is
about to return. You can consider only return
values,parameters, control links, and space for local variables;
you do not have to consider stored state or temporary or local
values not shown in the code sketch. However, you should
indicate:

Which function creates the space on the stack for each


element?
Which function writes the value of each element?
To which activation record does the element belong?

13
9. LECTURE NOTES : UNIT – IV

STORAGE ORGANIZATION
o When the target program executes then it runs in its own logical address space
in which the value of each program has a location.

o The logical address space is shared among the compiler, operating system and
target machine for management and organization. The operating system is
used to map the logical address into physical address which is usually spread
throughout the memory.

SUBDIVISION OF RUN-TIME MEMORY

Runtime storage comes into blocks, where a byte is used to show the smallest unit
of addressable memory. Using the four bytes a machine word can form. Object of
multibyte is stored in consecutive bytes and gives the first byte address.

14
Run-time storage can be subdivide to hold the different components of an executing
program:
Generated executable code
Static data objects
Dynamic data-object- heap
Automatic data objects- stack

Run-time Storage Organization

The run-time environment is the structure of the target computers registers and
memory that serves to manage memory and maintain information needed to guide
a programs execution process.

Types of Runtime Environments :


1.Fully Static:
Fully static runtime environment may be useful for the languages in
which pointers or dynamic allocation is not possible in addition to no support for
recursive function calls.
Every procedure will have only one activation record which is allocated before
execution.
Variables are accessed directly via fixed address.
Little bookkeeping overhead; i.e., at most return address may have to be stored in
activation record.
The calling sequence involves the calculation of each argument address and storing
into its appropriate parameter location and saving the return address and then a
jump is made.
2. Stack Based :
In this, activation records are allocated (push of the activation record) whenever a
function call is made. The necessary memory is taken from the stack portion of the
program. When program execution return from the function the memory used by
the activation record is deallocated (pop of the activation record). Thus, the stack
grows and shrinks with the chain of function calls.
3.Fully Dynamic :
Functional language such as Lisp, ML, etc. use this style of call stack management.
Silently, here activation record is deallocated only when all references to them
have disappeared, and this requires the activation records to dynamically freed at
arbitrary times during execution. Memory manager (garbage collector) is needed.
The data structure that handles such management is heap an this is also called as
Heap Management.

15
Activation Record:

• Information needed by a single execution of a procedure is managed using a


contiguous block of storage called “activation record”.

• An activation record is allocated when a procedure is entered and it is


deallocated when that procedure is exited. It contain temporary data, local
data, machine status, optional access link, optional control link, actual
parameters and returned values.

∙ Program Counter (PC) – whose value is the address of the next instruction to
be executed.

∙ Stack Pointer (SP) – whose value is the top of the (top of the stack, ToS).

∙ Frame pointer (FP) – which points to the current activation.

Control stack is a run time stack which is used to keep track of the live procedure
activations i.e. it is used to find out the procedures whose execution have not been
completed.

When it is called (activation begins) then the procedure name will push on to the
stack and when it returns (activation ends) then it will popped.

An activation record is pushed into the stack when a procedure is called and it is
popped when the control returns to the caller function.

The contents of Activation records are

1. RETURN VALUE

2. ACTUAL PARAMETERS

3. CONTROL LINK

4. ACCESS LINK

5. SAVED MACHINE STATUS

6. LOCAL DATA

7. TEMPORARIES

16
Return Value: It is used by calling procedure to return a value to calling
procedure.

Actual Parameter: It is used by calling procedures to supply parameters to the


called procedures.

Control Link: It points to activation record of the caller.

Access Link: It is used to refer to non-local data held in other activation records.

Saved Machine Status: It holds the information about status of machine before
the procedure is called.

Local Data: It holds the data that is local to the execution of the procedure.

Temporaries: It stores the value that arises in the evaluation of an expression.

ACTIVATION TREES

Stack allocation would not be feasible if procedure calls, or activations of


procedures, did not nest in time.

The activations of procedures during the running of an entire program by a tree,


called an activation tree.

The use of a run-time stack is enabled by several useful relationships between the
activation tree and the behavior of the program

1. The sequence of procedure calls corresponds to a preorder traversal of the


activation tree.

2. The sequence of returns corresponds to a post order traversal of the activation


tree.

3. Suppose that control lies within a particular activation of some procedure,


corresponding to a node N of the activation tree.

Calling sequences

Procedure calls are implemented by what are known as calling sequences, which
consists of code that allocates an activation record on the stack and enters
information into its fields

17
Properties of activation trees are :-
Each node represents an activation of a procedure.
The root shows the activation of the main function.
The node for procedure ‘x’ is the parent of node for procedure ‘y’ if and only if the control flows from
procedure x to procedure y.
Example – Consider the following program of Quicksort
main() {
int n;
readarray();
quicksort(1,n);
}
quicksort(int m, int n)
{
int i= partition(m,n);
quicksort(m,i-1);
quicksort(i+1,n);
}

The activation tree for this program will be:

First main function as root then main calls readarray and quicksort. Quicksort in turn calls partition
and quicksort again. The flow of control in a program corresponds to the depth first traversal of
activation tree which starts at the root.

18
Example 2: Activation Trees

Recall the fibonacci sequence 1,1,2,3,5,8, ... defined by f(1)=f(2)=1 and, for n>2, f(n)=f(n-1)+f(n-
2). Consider the function calls that result from a main program calling f(5). Surrounding the more-
general pseudocode that calculates (very inefficiently) the first 10 fibonacci numbers, we show the
calls and returns that result from main calling f(5). On the left they are shown in a linear fashion and,
on the right, we show them in tree form. The latter is sometimes called the activation tree or call
tree.

We can make the following observations about these procedure calls.

1. If an activation of p calls q, then that activation of p terminates no earlier than the activation
of q.
2. The order of activations (procedure calls) corresponds to a preorder traversal of the call tree.
3. The order of de-activations (procedure returns) corresponds to postorder traversal of the call
tree.
4. If execution is currently in an activation corresponding to a node N of the activation tree,
then the activations that are currently live are those corresponding to N and its ancestors in
the tree.
5. These live activations were called in the order given by the root-to-N path in the tree, and the
returns will occur in the reverse order.

19
STORAGE ALLOCATION TECHNIQUES

I. Static Storage Allocation

○ For any program if we create memory at compile time, memory will be


created in the static area.
○ For any program if we create memory at compile time only, memory is
created only once.
○ It don’t support dynamic data structure i.e memory is created at
compile time and deallocated after program completion.
○ The data structure cannot be created dynamically. i.e the static
allocation cannot manage the allocation of memory t run-time.
○ The drawback with static storage allocation is recursion is not
supported.
○ Another drawback is size of data should be known at compile time

Eg- FORTRAN was designed to permit static storage allocation.

II. Stack Storage Allocation

● Storage is organised as a stack and activation records are pushed and popped
as activation begin and end respectively. Locals are contained in activation
records so they are bound to fresh storage in each activation.
● Dynamic – grows and shrinks
● When node n is at the top of the control stack, the stack contains the nodes
along the path from n to the root.
● Recursion is supported in stack allocation
● Limitations : The memory addressing can be done using pointers an index
registers. Hence this type of allocation is slower than static allocation.
● The flow of the control in a program corresponds to a depth-first traversal of
the activation tree that:
○ starts at the root,
○ visits a node before its children, and
○ recursively visits children at each node an a left-to-right order

20
Calling sequences: (Caller’s responsibility )

The calling sequence, executed when one procedure (the caller) calls another (the callee) allocates an
activation recod (AR) on the stack and fills in the fields. Part of this work is done by the caller; the remainder by
the callee. Although the work is shared, the AR is called the callee’s AR.

Divisions of task between caller and callee.

Calling sequence (Caller’s responsibility):

The following actions occur during a call

1. The caller begins the process of creating the callee’s AR by evaluating the arguments and
placing them in the AR of the callee.

2. The Caller stores the return address and the sp in the callee’s AR

3. The caller increments sp so that instead of pointing into its AR, it points to the corresponding
point in the callee’s AR.

4. The callee saves the registers and other system dependent information

5. The callee allocates and initializes its local area,

6. The callee begins execution

Return Sequence (Callee’s responsibility)

When the procedure returns, the following actions are performed by the callee, essentially
undoing the effects of the calling sequence.

1. The callee stores the return value, Note that thus address can be determined by the caller
using the old (soon to be restored) sp.

2. The callee restores sp and the registers

3. The callee jumps to the return address.

21
III. Heap Storage Allocation

• Memory allocation and deallocation can be done at any time and at any place
depending on the requirement of the user.
• Heap allocation is used to dynamically allocate memory to the variables and
claim it back when the variables are no more required.
• If the values of non local variables must be retained even after the activation
record then such a retaining is not possible by stack allocation. This limitation
of stack allocation is because of its Last in First Out nature. For retaining of
such local variables heap allocation strategy is used.
• The heap allocation allocates the continuous block of memory when required
of storage of activation records or other data object, this allocated memory
can be deallocated when activation ends. This dealloacted space can be
further resued by heap manager.
• The efficient heap management can be done by creating a linked list for the
free blocks and when any memory is deallocated that block of memory is
appended in the linked list.
• Allocate the most suitable block of memory form the linked list i.e. use best fit
technique for allocation of block.
• Recursion is supported.

22
Variable-Length Data on the Stack

There are two flavors of variable-length data.

Data obtained by malloc/new have hard to


determine lifetimes and are stored in the
heap instead of the stack. `

Data, such as arrays with bounds determined


by the parameters are still stack like in their
lifetimes (if A calls B, these variables of A are
allocated before and released after the
corresponding variables of B).

It is the second flavor that we wish to allocate on the stack. The goal is for the callee to be able to
access these arrays using addresses determined at compile time even though the size of the arrays is
not known until the program is called, and indeed often differs from one call to the next (even when
the two calls correspond to the same source statement).

The solution is to leave room for pointers to the arrays in the AR. These pointers are fixed size and
can thus be accessed using static offsets. When the procedure is invoked and the sizes are known,
the pointers are filled in and the space allocated.

A difficulty caused by storing these variable size items on the stack is that it no longer is obvious
where the real top of the stack is located relative to sp. Consequently another pointer (call it real-top-
of-stack) is also kept. This is used on a call to tell where the new allocation record should begin.

23
Variable-Length Data on the Stack

The run-time memory management system must deal frequently with the allocation
of space for objects the sies of which are not known at compile time, but which are
local to a procedure and thus may be allocated on the stack.

In modern languages, objects whose size cannot be determined at compile time are
allocated space in the heap.

Variable-length arrays means array size depends on the value of one or more
parameters of the called procedure.

ACCESS TO NONLOCAL DATA ON THE STACK


Access becomes more complicated in languages where procedures can
be declared inside other procedures.
1. Data access without nested procedures
For languages that do not allow nested procedure
declarations, allocation of storage for variables and access to those variables is
simple.
Global variables are allocated static storage
Any other name must be local to the activation at the top of the stack
2. Issues with Nested procedures
Access becomes far more complicated when a language allows
procedure declarations to be nested and also uses the normal static scoping rule,
that is a procedure can access variables of the procedures whose declarations
surround its own declaration, following the nested scoping rule .
3. A language with nested procedure declarations
ML is the language with nested procedure. ML is a functional language,
meaning that variables, once declared and initialized, are not changed. There are
only a few exceptions, such as the array, whose elements can be changed by
special function calls.
4. Nesting depth
The functions calls are put in the form of tree structure and the levels from the root
to the leaf node level is the nested depth.
Nesting depth 1 is given to procedures that are not nested within any other
procedure

24
5. Access links
A direct implementation of the normal static scope rule for nested
functions is obtained by adding a pointer called the access link to each activation
record. If procedure p is nested immediately within procedure q in the source code,
then the access link in any activation of p points to the most recent activation of q.

6. Manipulating access links


When a procedure call is to a particular procedure whose name is given explicitly in
the procedure call then the access links will be manipulated.
The harder case is when the call is to a procedure-parameter; in that case, the
particular procedure being called is not known until run time and the nesting depth
of the called procedure may differ in different executions of the call.

7. Access links for procedure parameters


When a procedure p is passed to another procedure q as a parameter,
and q then calls its parameter ( and therefore calls p in this activation of q), it is
possible that q does not know the context in which p appears in the program. If so,
it is impossible for q to know how to set the access link for p.
The solution to this problem is as follows: when procedures are used as
parameters, the caller needs to pass, along with the name of the procedure-
parameter, the proper access link for that parameter.

8. Displays
The functions calls are maintained in the form of array for the purpose of display. A
more efficient implementation uses an auxiliary array d, called the display, which
consists of one pointer for each nesting depth.
The advantage of using a display is that if procedure p is executing, and it needs to
access element x belonging to some procedure q, we need to look only in d[i],
where i is the nesting depth of q, we follow the pointer d[i] to the activation record
for q, wherein x is found at a known offset.

25
For languages that do not allow nested procedure declarations, allocation of storage for
variables and access to those variables is simple:
1. Global variables are allocated static storage. The locations of these variables remain fixed
and are known at compile time. So to access any variable that is not local to the currently
executing procedure, we simply use the statically determined address.
2. Any other name must be local to the activation at the top of the stack. We may access
these variables through the top sp pointer of the stack.

An important benefit of static allocation for globals is that declared procedures may be
passed as parameters or returned as results (in C, a pointer to the function is passed), with
no substantial change in the data-access strategy. With the C static-scoping rule, and
without nested procedures, any name nonlocal to one procedure is nonlocal to all
procedures, regardless of how they are activated. Similarly, if a procedure is returned as a
result, then any nonlocal name refers to the storage statically allocated for it.

• Scope rules of a language determine the treatment of references to non-


local names
• Common rule: lexical or static scope determines the declaration that applies
to a name by examining the program text alone;
Eg: Pascal, Ada, C
• Alternative rule:the dynamic-scope rule determines the declaration applicable
to a name( at run time) by considering the current activations.
Eg: Lisp, APL, Snobol

BLOCKS
• A block is a statement containing its own local data declarations
• C, a block has the syntax {Declarations statements}
• In Algol we use ( ) as delimiters
• Nesting Property of Block structure: it is not possible for two blocks B1 and
B2 to overlap in such a way that first B1 begins then B2 but B1 ends before B2 nesting
property
• Scope of a declaration in a block-structu red language is given by the
most closely nested rule
The scope of a declaration in a block B includes B.
If a name x is not declared in a block B, then an occurrence of x in B is in
the scope of a declaration of x in an enclosing block B’ such that
a) B’ has a declaration of x, and
b) B’is more closely nested around 8 than any other block
The scope of declaration of b in B0 does not include B1 because b is redeclared in
B1,indicated by B0-B1,such a gap is called hole in scope of declaration.
• Block structure can be implemented using stack allocation.
• scope of a declaration does not extend outside the block in which it appear the space
for the declared name can be allocated when the block is entered and reallocated when
control leaves the block
• view treats a block as a parameterized procedure, called only from the point just
before the block and returning only to the point just after the block

26
• to allocate storage for a complete procedure body at one time
• If there are blocks within the procedure then allowance is made for the
storage needed for declarations within the blocks
• For block B0,we can allocate storage as in Fig.Subscripts on locals a and b
identify the blocks that the locals are declared in. Note that a2 and b3 may be
assigned the same storage because they are in blocks that are not alive at the
same time

• In the absence of variable-length data, the maximum amount storage needed


during any execution of a block can be determined at compile time.
• By making this determination, we conservatively assume that all control paths
in the program can indeed be taken.
• That is, we assume that both the then- and else-parts of a conditional
statement can be executed, and that all statements within a while loop can be
reached.

27
Lexical Scope Without Nested Procedures
• procedure definition cannot appear within another
• If there is a non-local reference to a name a in some function, then it must
be declared outside any function. The scope of a declaration outside a function
consists of the function bodies that follow the declaration, with holes if the name
is redeclared within a function.
• In the absence of nested procedures, the stack- allocation strategy for local
names
• Storage for all names declared outside any procedures can he allocated
statically.
• The position of this storage is known at compile time, so if a name is
nonlocal in some procedure body, we simply use the statically determined
address.
• Any other name must be a local of the activation at the top of the stack,
accessible through the top pointer.
• An important benefit of static allocation for non-locals is that declared
procedures can freely he passed as parameters and returned as result.
Lexical Scope with Nested Procedures
• A non-local occurrence of a name a in a Pascal procedure is in the
scope of the most closely nested declaration of a in the static program
text.
• The nesting of procedure definitions in the Pascal program of
quicksort is indicated by the following indentation:
Sort( )
readarray ( )
exchange ( )
quicksort ( )
Partition ( )
Nesting Depth
• The notion of nesting depth of a procedure is used below to
implement lexical scope.
• Let the name of the main program be at nesting depth 1;we add 1
to the nesting depth as we go from an enclosing to an enclosed
procedure.
Access Links
• direct implementation of lexical scope for nested procedures is
obtained be adding a pointer called an access link to each activation
record.
• If procedure p is nested immediately within q in the source text,
then the access link in an activation record for p points to the access link
in the record for the most recent activation of q.

Suppose procedure p at nesting depth np refers to a non-local a with nesting depth


na<=np. The storage for a can be found as follows.
1. When control is in p, an activation record for p is at top of the stack. Follow
np - na access links from the record at the top of the stack. the value of np - na
can be precomputed at compiler time. If the access link in one record points to the
access link in another, then performing a single indirection operation can follow a
link.
28
2. After following np - na links, we reach an activation record for the procedure that
a is local to. As discussed in the last section, its storage is at a fixed offset relative
to a position in the record. In particular, the offset can be relative to the access
link.

Hence, the address of non-local a in procedure p is given by the following pair


computed at compile time and stored in the symbol table

(np - na, offset within activation record containing a)

Suppose procedure p at nesting depth np calls procedure x at nesting depth nx.


The code for setting up the access link in the called procedure depends on whether
or not the called procedure is nested within the caller.

1. Case np < nx. Since the called procedure x is nested more deeply than p it
must be declared within p, or it would not be accessible to p.

2. Case np >= nx. From the scope rules, the enclosing procedures at nesting
depths

1,2,3…. nx-1 of the called and calling procedures must be the same.Following np-
nx+1 access links from the caller we reach the most recent activation record of
procedure that statically encloses both the called and calling procedures most
closely. The access link reached is the one to which the access link in the called
procedure must point. Again np-nx +1 can be computed at compile time.

29
Displays

• Faster access to non-locals than with access links can be obtained using an
array d of pointers to activation records, called a display.
• We maintain the display so that storage for a non-local a at nesting depth i is
in the activation record pointed to by display element d [i].
• Suppose control is in an activation of a procedure p at nesting depth j.
• Then, the first j-1 elements of the display point to the most recent activations
of the procedures that lexically enclose procedure p, and d [j] points to the
activation of p.
• Using a display is generally faster than following access link because the
activation record holding a non-local is found by accessing an element of d and
then following just one pointer.
• When a new activation record for a procedure at nested depth i is set up, we
Save the value of d[i] in the activation record and Set d [i] to the new activation
record. Just before activation ends, d [i] is reset to the saved value.

30
PARAMETER PASSING

The communication medium among procedures is known as parameter passing.


The values of the variables from a calling procedure are transferred to the called
procedure by some mechanism.

Various Parameter passing methods:

o Call by value : passing r-valuses

o Call by reference : passing l-values

o Call by copy-restore : hybrid between call by-value and call-by reference

o Call by name : passing via name substitution

Basic terminology :

R- value: The value of an expression is called its r-value. The value contained in
a single variable also becomes an r-value if its appear on the right side of the
assignment operator. R-value can always be assigned to some other variable.

L-value: The location of the memory(address) where the expression is stored is


known as the l-value of that expression. It always appears on the left side if the
assignment operator.

Formal Parameter: Variables that take the information passed by the caller
procedure are called formal parameters. These variables are declared in the
definition of the called function.

Actual Parameter: Variables whose values and functions are passed to the
called function are called actual parameters. These variables are specified in the
function call as arguments.

Different ways of passing the parameters to the procedure

Call by Value
In call by value the calling procedure pass the r-value of the actual parameters
and the compiler puts that into called procedure’s activation record. Formal
parameters hold the values passed by the calling procedure, thus any changes
made in the formal parameters does not affect the actual parameters.

31
Call by Reference In call by reference the formal and actual parameters refers
to same memory location. The l-value of actual parameters is copied to the
activation record of the called function. Thus the called function has the address
of the actual parameters. If the actual parameters does not have a l-value (eg-
i+3) then it is evaluated in a new temporary location and the address of the
location is passed. Any changes made in the formal parameter is reflected in the
actual parameters (because changes are made at the address.

Call by Copy Restore


In call by copy restore compiler copies the value in formal parameters when the
procedure is called and copy them back in actual parameters when control
returns to the called function. The r-values are passed and on return r-value of
formals are copied into l-value of actuals.

Call by Name
In call by name the actual parameters are substituted for formals in all the
places formals occur in the procedure. It is also referred as lazy evaluation
because evaluation is done on parameters only when needed.

syntax :

Call by value: swap(10,20);

Call by name: swap(a,b);

Call by reference: swap(&a,&b);

32
HEAP MANAGEMENT
The heap is the portion of the store that is used for data that lives
indefinitely, or until the program explicitly deletes it.
1 The Memory Manager
2 The Memory Hierarchy of a Computer
3 Locality in Programs
4 Reducing Fragmentation
5 Manual Deallocation Requests

The heap is the portion of the store that is used for data that lives indefinitely, or
until the program explicitly deletes it. While local variables typically become
inaccessible when their procedures end, many languages enable us to create objects
or other data whose existence is not tied to the procedure activation that creates
them. For example, both C + + and Java give the programmer new to create
objects that may be passed or pointers to them may be passed from procedure to
procedure, so they continue to exist long after the procedure that created them is
gone.

1. The Memory Manager

The memory manager keeps track of all the free space in heap storage at all
times. It performs two basic functions:

• Allocation. When a program requests memory for a variable or object,3 the


memory manager produces a chunk of contiguous heap memory of the requested
size. If possible, it satisfies an allocation request using free space in the heap; if no
chunk of the needed size is available, it seeks to increase the heap storage space
by getting consecutive bytes of virtual memory from the operating system. If
space is exhausted, the memory manager passes that information back to the
application program.

• Deallocation. The memory manager returns deallocated space to the pool of free
space, so it can reuse the space to satisfy other allocation requests. Memory
managers typically do not return memory to the operating sys-tem, even if the
program's heap usage drops.

33
Memory management would be simpler if (a) all allocation requests were
for chunks of the same size, and (b) storage were released predictably, say,
first-allocated first-deallocated. There are some languages, such as Lisp, for
which condition (a) holds; pure Lisp uses only one data element — a two-
pointer cell — from which all data structures are built. Condition (b) also holds
in some situations, the most common being data that can be allocated on the run-
time stack. However, in most languages, neither (a) nor (b) holds in general.
Rather, data elements of different sizes are allocated, and there is no good way to
predict the lifetimes of all allocated objects.

Thus, the memory manager must be prepared to service, in any order, allo-cation
and deallocation requests of any size, ranging from one byte to as large as the
program's entire address space.

Here are the properties we desire of memory managers:

• Space Efficiency. A memory manager should minimize the total heap space
needed by a program. Doing so allows larger programs to run in a fixed virtual
address space. Space efficiency is achieved by minimizing "fragmentation,"
discussed in Section 7.4.4.

• Program Efficiency. A memory manager should make good use of the memory
subsystem to allow programs to run faster. The time taken to execute an
instruction can vary widely depending on where objects are placed in memory.
Fortunately, programs tend to exhibit "locality," a phenomenon which refers to the
nonrandom clustered way in which typical programs access memory. By attention
to the placement of objects in memory, the memory manager can make better use
of space and, hopefully, make the program run faster.

• Low Overhead. Because memory allocations and deallocations are fre-quent


operations in many programs, it is important that these operations be as efficient
as possible. That is, we wish to minimize the overhead — the fraction of execution
time spent performing allocation and dealloca-tion. Notice that the cost of
allocations is dominated by small requests; the overhead of managing large
objects is less important, because it usu-ally can be amortized over a larger
amount of computation.

34
2. The Memory Hierarchy of a Computer

Memory management and compiler optimization must be done with an aware-ness


of how memory behaves. Modern machines are designed so that program-mers can
write correct programs without concerning themselves with the details of the
memory subsystem. However, the efficiency of a program is determined not just by
the number of instructions executed, but also by how long it takes to execute each
of these instructions. The time taken to execute an instruction can vary significantly,
since the time taken to access different parts of memory can vary from nanoseconds
to milliseconds. Data-intensive programs can there-fore benefit significantly from
optimizations that make good use of the memory subsystem. They can take
advantage of the phenomenon of "locality" — the nonrandom behavior of typical
programs.

The large variance in memory access times is due to the fundamental limitation in
hardware technology; we can build small and fast storage, or large and slow
storage, but not storage that is both large and fast.

A memory hierarchy, consists of a series of storage elements, with the smaller faster
ones "closer" to the processor, and the larger slower ones further away.

Typically, a processor has a small number of registers, whose contents are under
software control. Next, it has one or more levels of cache, usually made out of static
RAM, that are kilobytes to several megabytes in size. The next level of the hierarchy
is the physical (main) memory, made out of hundreds of megabytes or gigabytes of
dynamic RAM. The physical memory is then backed up by virtual memory, which is
implemented by gigabytes of disks. Upon a memory access, the machine first looks
for the data in the closest (lowest-level) storage and, if the data is not there, looks
in the next higher level, and so on.

3. Locality in Programs

Most programs exhibit a high degree of locality; that is, they spend most of their
time executing a relatively small fraction of the code and touching only a small
fraction of the data. We say that a program has temporal locality if the memory
locations it accesses are likely to be accessed again within a short period of time. We
say that a program has spatial locality if memory locations close to the location
accessed are likely also to be accessed within a short period of time.

The conventional wisdom is that programs spend 90% of their time executing 10%
of the code. Here is why:

Programs often contain many instructions that are never executed. Pro-grams built
with components and libraries use only a small fraction of the provided functionality.
Also as requirements change and programs evolve, legacy systems often contain
many instructions that are no longer used.

35
Static and Dynamic RAM

Most random-access memory is dynamic, which means that it is built of very simple
electronic circuits that lose their charge (and thus "forget" the bit they were
storing) in a short time. These circuits need to be refreshed —
that is, their bits read and rewritten — periodically. On the other hand, static
RAM is designed with a more complex circuit for each bit, and consequently the bit
stored can stay indefinitely, until it is changed. Evidently, a chip can store more bits if
it uses dynamic-RAM circuits than if it uses static-RAM circuits, so we tend to see
large main memories of the dynamic variety, while smaller memories, like caches, are
made from static circuits.

4. Reducing Fragmentation

At the beginning of program execution, the heap is one contiguous unit of free
space. As the program allocates and deallocates memory, this space is broken up into
free and used chunks of memory, and the free chunks need not reside in a
contiguous area of the heap. We refer to the free chunks of memory as holes. With
each allocation request, the memory manager must place the requested chunk of
memory into a large-enough hole. Unless a hole of exactly the right size is found, we
need to split some hole, creating a yet smaller hole.

With each deallocation request, the freed chunks of memory are added back to the
pool of free space. We coalesce contiguous holes into larger holes, as the holes can
only get smaller otherwise. If we are not careful, the memory may end up getting
fragmented, consisting of large numbers of small, noncontiguous holes. It is then
possible that no hole is large enough to satisfy a future request, even though there
may be sufficient aggregate free space

36
.

Best - Fit and Next - Fit Object Placement

We reduce fragmentation by controlling how the memory manager places new


objects in the heap. It has been found empirically that a good strategy for mini-
mizing fragmentation for real-life programs is to allocate the requested memory in
the smallest available hole that is large enough. This best-fit algorithm tends to
spare the large holes to satisfy subsequent, larger requests. An alternative,
called first-fit, where an object is placed in the first (lowest-address) hole in which
it fits, takes less time to place objects, but has been found inferior to best-fit in
overall performance.

Managing and Coalescing Free Space

When an object is deallocated manually, the memory manager must make its
chunk free, so it can be allocated again. In some circumstances, it may also be
possible to combine (coalesce) that chunk with adjacent chunks of the heap, to
form a larger chunk. There is an advantage to doing so, since we can always use a
large chunk to do the work of small chunks of equal total size, but many small
chunks cannot hold one large object, as the combined chunk could.

5. Manual Deallocation Requests

The manual memory management, where the programmer must explicitly arrange
for the deallocation of data, as in C and C + + . Ideally, any storage that will no
longer be accessed should be deleted. Conversely, any storage that may be
referenced must not be deleted. Unfortunately, it is hard to enforce either of these
properties. In addition to considering the difficulties with manual deallocation, we
shall describe some of the techniques programmers use to help with the
difficulties.

37
Problems with Manual Deallocation

Manual memory management is error-prone. The common mistakes take two forms:
failing ever to delete data that cannot be referenced is called a memory-leak error,
and referencing deleted data is a dangling-pointer-dereference error.

It is hard for programmers to tell if a program will never refer to some stor-age in the
future, so the first common mistake is not deleting storage that will never be
referenced. Note that although memory leaks may slow down the exe-cution of a
program due to increased memory usage, they do not affect program correctness, as
long as the machine does not run out of memory. Many pro-grams can tolerate
memory leaks, especially if the leakage is slow. However, for long-running programs,
and especially nonstop programs like operating systems or server code, it is critical
that they not have leaks.

Automatic garbage collection gets rid of memory leaks by deallocating all the
garbage. Even with automatic garbage collection, a program may still use more
memory than necessary. A programmer may know that an object will never be
referenced, even though references to that object exist somewhere. In that case, the
programmer must deliberately remove references to objects that will never be
referenced, so the objects can be deallocated automatically.

Being overly zealous about deleting objects can lead to even worse problems than
memory leaks. The second common mistake is to delete some storage and then try
to refer to the data in the deallocated storage. Pointers to storage that has been
deallocated are known as dangling pointers. Once the freed storage has been
reallocated to a new variable, any read, write, or deallocation via the dangling
pointer can produce seemingly random effects. We refer to any operation, such as
read, write, or deallocate, that follows a pointer and tries to use the object it points
to, as dereferencing the pointer.

38
Programming Conventions and Tools

We now present a few of the most popular conventions and tools that have been
developed to help programmers cope with the complexity in managing memory:

• Object ownership is useful when an object's lifetime can be statically rea-


soned about. The idea is to associate an owner with each object at all times.
The owner is a pointer to that object, presumably belonging to some function
invocation. The owner (i.e., its function) is responsible for either deleting the
object or for passing the object to another owner. It is possible to have other,
nonowning pointers to the same object; these pointers can be overwritten
any time, and no deletes should ever be ap-plied through them. This
convention eliminates memory leaks, as well as attempts to delete the same
object twice. However, it does not help solve the dangling-pointer-reference
problem, because it is possible to follow a nonowning pointer to an object
that has been deleted.
• Reference counting is useful when an object's lifetime needs to be deter-
mined dynamically. The idea is to associate a count with each dynamically
allocated object. Whenever a reference to the object is created, we incre-
ment the reference count; whenever a reference is removed, we decrement
the reference count. When the count goes to zero, the object can no longer
be referenced and can therefore be deleted. This technique, however, does
not catch useless, circular data structures, where a collection of objects
cannot be accessed, but their reference counts are not zero, since they refer
to each other. For an illustration of this problem. Reference counting does
eradicate all dangling-pointer references, since there are no outstanding
references to any deleted objects. Reference counting is expensive because it
imposes an overhead on every operation that stores a pointer.
• Region-based allocation is useful for collections of objects whose
lifetimes are tied to specific phases in a computation.When objects are
created to be used only within some step of a computation, we can allocate
all such objects in the same region. We then delete the entire region once
that computation step completes. This region-based allocation technique has
limited applicability. However, it is very efficient whenever it can be used;
instead of deallocating objects one at a time, it deletes all objects in the
region in a wholesale fashion.

39
CODE GENERATION
The final phase in compiler model is the code generator. It takes as input an intermediate
representation of the source program and produces as output an equivalent target program.
The code generation techniques presented below can be used whether or not an optimizing
phase occurs before code generation.

POSITION OF THE CODE GENERATOR

ISSUES IN THE DESIGN OF A CODE GENERATOR

The following issues arise during the code generation phase :

1. Input to code generator


2. Target program
3. Memory management
4. Instruction selection
5. Register allocation
6. Evaluation order

1. Input to code generator:


The input to the code generation consists of the intermediate
representation of the source program produced by front end , together
with information in the symbol table to determine run-time addresses of
the data objects denoted by the names in the intermediate
representation.

40
Intermediate representation can be :
Linear representation such as postfix notation
Three address representation such as quadruples
Virtual machine representation such as stack machine code
Graphical representations such as syntax trees and dags.

Prior to code generation, the front end must be scanned, parsed and translated into
intermediate representation along with necessary type checking. Therefore, input to
code generation is assumed to be error-free.
Target program:
The output of the code generator is the target program. The output may be
Absolute machine language
It can be placed in a fixed memory location and can be executed
immediately.
Relocatable machine language
It allows subprograms to be compiled separately.
Assembly language
Code generation is made easier.
Memory management:
Names in the source program are mapped to addresses of data
objects in run-time memory by the front end and code generator.
It makes use of symbol table, that is, a name in a three-address
statement refers to a symbol-table entry for the name.
Labels in three-address statements have to be converted to addresses of
instructions. For example,
j:gotoigenerates jump instruction as follows :
ifi<j, a backward jump instruction with target address
equal to location of code for quadrupleiis generated.
ifi>j, the jump is forward. We must store on a list for
quadrupleithe location of the first machine instruction
generated for quadruplej.Wheniis
processed, the machine locations for all instructions that
forward jumps toi are filled.
Instruction selection:
The instructions of target machine should be complete and uniform.
Instruction speeds and machine idioms are important factors when efficiency of
target program is considered.
The quality of the generated code is determined by its speed and size.
The former statement can be translated into the latter statement as shown below:

41
Register allocation

Instructions involving register operands are shorter and faster than those involving
operands in memory.
The use of registers is subdivided into two subproblems :
Register allocation– the set of variables that will reside in registers at a point in
the program is selected.
Register assignment– the specific register that a variable will reside in is picked.

Certain machine requires even-odd register pairs for some operands and results.
For example , consider the division instruction of the form :
D x, y

where, x – dividend even register in even/odd register pair y – divisor

even register holds the remainder

odd register holds the quotient

Evaluation order
The order in which the computations are performed can affect the
efficiency of the target code. Some computation orders require
fewer registers to hold intermediate results than others.

42
For example :
MOV R0, M stores contents of Register R0 into memory location M ;
MOV 4(R0), M stores the value contents(4+contents(R 0)) into M.

Instruction costs :

For example : MOV R0, R1 copies the contents of register R0 into R1. It has cost one,

The three-address statement a : = b + c can be implemented by many different instruction


sequences :

MOV b, R0

ADD c, R0 cost = 6

MOV R0, a

MOV b, a

ADD c, a cost = 6

Assuming R0, R1 and R2 contain the addresses of a, b, and c :


MOV *R1, *R0
ADD *R2, *R0 cost = 2

In order to generate good code for target machine, we must utilize its addressing
capabilities efficiently.

43
Code for other statement types:

Indexing and pointer operations in three address statements are handled in the
same way as binary operations. The code sequences for the indexed assignment
statements :

a:=b[i] ;

a[i]=b;

‘i’ in Register Ri ‘i’ in Memory Mi ‘i’ in Stack


Stateme
nts Cos
Code Code Cost Code Cost
t

MOV Mi , R MOV Si(A) , R


a:=b[i] MOV b(Ri) , R 2 4 4
MOV b(R) , R MOV b(R) , R

MOV Si(A) , R
a[i]:=b MOV b , a(Ri) 3 MOV Mi,R 5 5
MOV b , a(R)
MOV b,a(R)

The code sequence generated for the pointer assignment statements

a:=*p *p:=a
Stateme ‘i’ in Register Ri ‘i’ in Memory Mi ‘i’ in Stack
nts
Code Cost Code Cost Code Cost

MOV *Rp,a 2 MOV Mp, R 3 MOV Sp(A) , R 3


a:=*p
MOV *R, R MOV *R , R

*p:=a MOV a,*Rp 2 MOV Mp, R 4 MOV Sp(A) , R 4

MOV a ,*R MOV R , Sp(A)

44
A SIMPLE CODE GENERATOR

A code generator generates target code for sequence of three-address


statements and effectively uses registers to store operands of the statements.

Familiarity with the target machine and its instruction set is a


prerequisite for designing a good code generator.
The target computer is a byte-addressable machine with 4 bytes to a word.

It hasngeneral-purpose registers, R0, R1, . . . , Rn-1.

It has two-address instructions of the form:

op source, destination
where, op is an op-code, and source and destination are data fields.

It has the following op-codes :

MOV (move source to destination)

ADD (add source to destination)

SUB (subtract source from destination)

The source and destination of an instruction are specified by


combining registers and memory locations with address modes.

Address modes with their assembly-language forms

45
A code generation Algorithm
The algorithm takes as input a sequence of three-address statements constituting a basic
block. For each three-address statement of the form x : = y op z, perform the following
actions:

1. Invoke a function getreg to determine the location L where the result of the
computation y op z should be stored.

2. Consult the address descriptor for y to determine y’, the current location of y. Prefer
the register for y’ if the value of y is currently both in memory and a register. If the value
of y is not already in L, generate the instruction MOV y’ , L to place a copy of y in L.

3. Generate the instruction OP z’ , L where z’ is a current location of z. Prefer a register


to a memory location if z is in both. Update the address descriptor of x to indicate that x
is in location L. If x is in L, update its descriptor and remove x from all other descriptors.

4. If the current values of y or z have no next uses, are not live on exit from the block,
and are in registers, alter the register descriptor to indicate that, after execution of x : =
y op z , those registers will no longer contain y or z

The code sequence for d=(a-b)+(a-c)+(a-c)

Statement code generation Register descriptor Address descriptor


statement Code Register Address
generation Descriptor Descriptor

t=a-b MOV a,R0 R0 contains t t in R0

SUB b,R0

u=a-c MOV a,R1 R0 contains t t in R0

SUB c,R1 R1 contains u u in R1

v=t+u ADD R1,R0 R0 contains v u in R1

R1 contains u v in R0

d=v+u ADD R1,R0 R0 contains d d in R0

MOV R0,d d in R0 and


me
mory

46
9. ASSIGNMENTS

S.No Questions K CO
Level Level
1. Find the instruction code for different addressing modes.
K3 CO4

2. Generate a code for the given statements

x = u - t;

y = x * v;
K3 CO4
x = y + w;

y = t - z;

y = x * y;
3. Let fib(n) be the function

int fib(n) {

if (n == 0)

return 0;

else if (n == 1)

return 1;

else

return fib(n-1) + fib(n-2); K3 CO4


}

Show the activation tree for fib(3).

Show the activation records that are on the run-time stack


when fib(1) is invoked for the first time during the
invocation of fib(3). Just show four fields in each activation
record: the returned value, the argument, the control link
(which is a pointer the caller's AR), and the return address.

As a function of n, what is the time complexity of this


program?

47
10. PART A : Q & A : UNIT – I
SNo Questions and Answers CO K
What is meant by storage organization?
The operating system maps the logical addresses into
physical addresses, which are usually spread throughout
1 K1
memory. The management and organization of this logical
address space is shared between the compiler, operating
system and target machine.

Mention the two strategies for dynamic storage allocation.


2 Stack storage K2
Heap storage

Define activation tree?


An activation tree depicts the way control enters and leaves
3 activations. K1
Stack allocation would not be feasible if procedure calls or
activations of procedures, did no nest in time.
Define activation record?
CO4
Information needed by a single execution of a procedure is
managed using a contiguous block of storage called an
activation record.
Various fields of activation records:
Temporary values
4 Local values K2
Saved machine registers
Control link
Access link
Actual parameters
Return values

What is meant by calling sequences?


Procedure calls are implemented by what are known as
5 calling sequences, which consists of code that allocates an K1
activation record on the stack and enters information into its
fields.

48
10. PART A : Q & A : UNIT – I
SN
Questions and Answers CO K
o
What do you mean by dangling reference?
A dangling reference occurs when there is a reference
6 K2
to storage that has been deallocated.

What are the limitations of static allocation?


The size of data objects is known at compile time.
It uses static allocation
7 K2
The compiler can determine the amount of storage
required by each data object.

What are the different storage allocation


strategies?
Static allocation.
It lays out storage for all data objects at compile time.
Stack allocation.
8 CO4
It manages the runtime storage as a stack.
K2
Heap allocation.
It allocates and deallocates storage as needed at
runtime from a data area.

What are the issues with nested procedures?


Access becomes far more complicated when a
9 language allows procedure declarations to be
nested and also uses the normal static scoping
rule. K2

How do you calculate the cost of an instruction?


We take the cost of an instruction to be one plus the
costs associated with the source and destination address
modes. This cost corresponds to the length (in words) of
10
the instruction. Address modes involving registers have K2
cost zero, while those with a memory location or literal
in them have cost one, because such operands have to
be stored with the instruction.

49
10. PART A : Q & A : UNIT – I

SNo Questions and Answers CO K

Define access links.


A direct implementation of the normal static scope rule for
11 K1
nested functions is obtained by adding a pointer called
the access link to each activation record.

Define code generations with ex?


It is the final phase in compiler model and it takes as an
input an
intermediate representation of the source program and
12 output produces as equivalent target K2
programs. Then intermediate instructions are each
translated
into a sequence of machine instructions that perform the
same task.
CO4

What are the issues in the design of code


generator?
Input to the generator
Target programs
13 Memory management K2
Instruction selection
Register allocation
Choice of evaluation order
Approaches to code generation.

Give the variety of forms in target program.


Absolute machine language.
14 K2
Relocatable machine language.
Assembly language.

Give the factors of instruction selections.


Uniformity and completeness of the instruction sets
15
Instruction speed and machine idioms
Size of the instruction sets.
10. PART A : Q & A : UNIT – I

SNo Questions and Answers CO K

What are the actions to perform the code


generation algorithms?
Invoke a function get reg to determine the location L.
Consult the address descriptor for y to determine y‟, the
16 K1
current location of y.
If the current values of y and/or z have no next uses, are
not live on exit from the block,
and are in register, alter the register descriptor.
What is meant by address descriptors?
An address descriptor keeps track of the location where
ever the current Value of the
17 K1
name can be found at run time. The location might be a
register, a Stack location, a memory
address,
What is meant by register descriptors?
A register descriptor keeps track of what is currently in
18 each register. It is CO4 K1
Consulted whenever a new register is needed.

What are the sub problems in register allocation


strategies?
During register allocation, we select the set of variables
that will reside in register at a
19 K1
point in the program.
During a subsequent register assignment phase, we pick
the specific register that a
variable reside in.
Define Heap. What is meant by Heap
management?
The Heap is the portion of the store that is used
20 for data that lives indefinitely or until the program
explicitly deletes it. The memory manager keeps
track of allocation of deallocation of the free space
within the heap ie., heap management.

51
PART B QUESTIONS : UNIT – I
(CO4, K2)
1. Explain in detail about stack allocation space.
2. Draw the activation tree for the quick soft algorithm.
3. (i) Describe the source language issues in detail.
(ii) Describe in detail about storage organization.
1. What are the uses of symbol-table? Describe the symbol table storage
allocation information
2. (i) Explain the non-local names in runtime storage managements.
(ii) Explain about activation records and its purpose.
1. (i) What are the issues in the design of code generator?
(ii) Explain simple code generator with suitable example.

PART C QUESTIONS
(CO4, K3)
1. Write a code generation algorithm. Explain about the descriptor and function
getreg(). Generate a code for x = (( a + b) / (b-c)) – ( a + b) * ( b-c) +f.
2. Draw the symbol tables for each of the procedures in the following pascal
code (including main) and show their nesting relationship by linking them via
a pointer reference in the structure (or record) used to implement them in
memory. Include the entries or fields for the local variables, arguments and
any other information you find relevant for the purposes of code generation,
such as its type and location at run-time.

52
12. Supportive online Certification courses

NPTEL : https://nptel.ac.in/courses/106/105/106105190/
Swayam : https://www.classcentral.com/course/swayam-compiler-design-12926
coursera : https://www.coursera.org/learn/nand2tetris2
Udemy : https://www.udemy.com/course/introduction-to-compiler-construction-and-design/
Mooc : https://www.mooc-list.com/course/compilers-coursera
edx : https://www.edx.org/course/compilers

53 53
13. Real time Applications in day to day life and to Industry

1. μAO-MPC: A free code generation tool for embedded real-time linear


model predictive control
2. Automatic code generation for real-time implementation of Model
Predictive Control
3. Platform-dependent code generation for embedded real-time
software
4. Create an activation tree for a forward with the ball during a corner
kick in the system
5. SMART adjusts the allocation of resources dynamically and
seamlessly.

54 54
14. CONTENTS BEYOND SYLLABUS : UNIT – IV

Code generation using tree matching and dynamic programming


Compiler-component generators, such as lexical analyzer generators and parser
generators, have long been used to facilitate the construction of compilers. A
tree-manipulation language called twig has been developed to help construct
efficient code generators. Twig transforms a tree-translation scheme into a code
generator that combines a fast top-down tree-pattern matching algorithm with
dynamic programming. Twig has been used to specify and construct code
generators for several experimental compilers targeted for different machines.

A Dynamic programming Approach to optimal integrated code generation


• We report on research in progress on a novel method for fully integrated
code generation that is based on dynamic programming. In particular, we
introduce the concept of a time profile. We focus on the basic block level
where the data dependences among the instructions form a DAG. Our
algorithm aims at combining time-optimal scheduling with optimal instruction
selection, given a limited number of general-purpose registers. An extension
for irregular register sets, spilling of register contents, and intricate structural
constraints on code compaction based on register usage is currently under
development, as well as a generalization for global code generation.
• A prototype implementation is operational, and we present first experimental
results that show that our algorithm is practical also for medium-size problem
instances. Our implementation is intended to become the core of a future,
retargetable code generation system.
• The dynamic programming algorithm proceeds in three phases
(suppose the target machine has r registers):
1. Compute bottom-up for each node n of the expression
tree T an array C of costs, in which the zth component C[i] is the optimal cost
of computing the subtree S rooted at n into a register, assuming i registers
are available for the computation, for 1 < i < r.
2. Traverse T, using the cost vectors to determine which
subtrees of T must be computed into memory.
3. Traverse each tree using the cost vectors and associated
instructions to generate the final target code. The code for the subtrees
computed into memory locations is generated first.
• Each of these phases can be implemented to run in time linearly proportional
to the size of the expression tree.

55
The cost of computing a node n includes whatever loads and stores are necessary to
evaluate S in the given number of registers. It also includes the cost of computing the
operator at the root of S. The zeroth component of the cost vector is the optimal cost of
computing the subtree S into memory. The contiguous evaluation property ensures that an
optimal program for S can be generated by considering combinations of optimal programs
only for the subtrees of the root of S. This restriction reduces the number of cases that need
to be considered.

In order to compute the costs C[i] at node n, we view the instructions as tree-
rewriting rules, as in Section 8.9. Consider each template E that matches the input tree at
node n. By examining the cost vectors at the corresponding descendants of n, determine
the costs of evaluating the operands at the leaves of E. For those operands of E that are
registers, consider all possible orders in which the corresponding subtrees of T can be
evaluated into registers. In each ordering, the first subtree corresponding to a register
operand can be evaluated using i available registers, the second using i -1 registers, and so
on. To account for node n, add in the cost of the instruction associated with the template E.
The value C[i] is then the minimum cost over all possible orders.

The cost vectors for the entire tree T can be computed bottom up in time linearly
proportional to the number of nodes in T. It is convenient to store at each node the
instruction used to achieve the best cost for C[i] for each value of i. The smallest cost in the
vector for the root of T gives the minimum cost of evaluating T.
Example: Consider a machine having two registers RO and Rl, and the following
instructions, each of unit cost:

In these instructions, Ri is either RO or Rl, and Mi is a memory location. The operator op


corresponds to arithmetic operators.

Let us apply the dynamic programming algorithm to generate optimal code for the syntax
tree in Fig 8.26. In the first phase, we compute the cost vectors shown at each node. To
illustrate this cost computation, consider the cost vector at the leaf a. C[0], the cost of
computing a into memory, is 0 since it is already there. C[l], the cost of computing a into a
register, is 1 since we can load it into a register with the instruction LD RO, a. C[2], the cost
of loading a into a register with two registers available, is the same as that with one register
available. The cost vector at leaf a is therefore (0,1,1).

56
consider the cost vector at the root. We first determine the minimum
cost of computing the root with one and two registers available. The
machine instruction ADD RO, RO, M matches the root, because the root
is labeled with the operator . Using this instruction, the minimum cost of
evaluating the root with one register available is the minimum cost of
computing its right subtree into memory, plus the minimum cost of
computing its left subtree into the register, plus 1 for the instruction. No
other way exists. The cost vectors at the right and left children of the
root show that the minimum cost of computing the root with one
register available is 5 2 1 = 8.

Now consider the minimum cost of evaluating the root with two
registers available. Three cases arise depending on which instruction is
used to compute the root and in what order the left and right subtrees
of the root are evaluated.

57
15. ASSESSMENT SCHEDULE

• Tentative schedule for the Assessment

Name of the
S.NO Start Date End Date Portion
Assessment

1 Unit Test 1 29.1.2024 3.2.2024 UNIT 1

2 IAT 1 10.2.2024 16.2.2024 UNIT 1 & 2

3 Unit Test 2 11.3.2024 16.3.2024 UNIT 3

4 IAT 2 1.4.2024 6.4.2024 UNIT 3 & 4

UNIT 5 , 1 &
5 Revision 1 13.5.2024 16.5.2024
2

6 Revision 2 17.4.2024 19.4.2024 UNIT 3 & 4

7 Model 20.4.2024 30.4.2024 ALL 5 UNITS

58
16. PRESCRIBED TEXT BOOKS & REFERENCE BOOKS

• TEXT BOOKS:

• Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman,


Compilers: Principles, Techniques and Tools‖, Second Edition,
Pearson Education, 2009.

• REFERENCE BOOKS:

• Randy Allen, Ken Kennedy, Optimizing Compilers for Modern


Architectures: A Dependence based Approach, Morgan
Kaufmann Publishers, 2002.
• Steven S. Muchnick, Advanced Compiler Design and Implementation‖,
Morgan Kaufmann Publishers - Elsevier Science, India, Indian Reprint
2003.
• Keith D Cooper and Linda Torczon, Engineering a Compiler‖,
Morgan Kaufmann Publishers Elsevier Science, 2004.
• V. Raghavan, Principles of Compiler Design‖, Tata McGraw Hill Education
Publishers,2010.
• Allen I. Holub, Compiler Design in C‖, Prentice-Hall Software Series, 1993.

59
17. MINI PROJECT SUGGESTION

1. Design the code generator for anyone microprocessor chip.

2. Identify the microprocessor configuration and according to the intermediate


code generated independently generate the code according to the chip.

3. Design the optimized code for the storage organization

4. Design dynamic programming based code generation.

60
Thank you

Disclaimer:

This document is confidential and intended solely for the educational purpose of RMK Group of Educational
Institutions. If you have received this document through email in error, please notify the system manager. This
document contains proprietary information and is intended only to the respective group / learning community as
intended. If you are not the addressee you should not disseminate, distribute or copy through e-mail. Please notify
the sender immediately by e-mail if you have received this document by mistake and delete this document from
your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly prohibited.

61

You might also like