0% found this document useful (0 votes)
18 views

Compiler Design - CS3501 - Notes

Uploaded by

Maha Lakshmi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Compiler Design - CS3501 - Notes

Uploaded by

Maha Lakshmi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 163

Click on Subject/Paper under Semester to enter.

Environmental Sciences
Professional English and Sustainability -
Professional English - - II - HS3252 Discrete Mathematics GE3451
I - HS3152 - MA3354
Statistics and Theory of Computation
Matrices and Calculus Numerical Methods - Digital Principles and - CS3452
3rd Semester

4th Semester
- MA3151 MA3251 Computer Organization
1st Semester

2nd Semester

- CS3351 Artificial Intelligence


Engineering Graphics and Machine Learning
Engineering Physics - - CS3491
- GE3251 Foundation of Data
PH3151
Science - CS3352
Database Management
Physics for
Engineering Chemistry System - CS3492
Information Science Data Structure -
- CY3151 - PH3256 CS3301

Basic Electrical and


Algorithms - CS3401
Problem Solving and Electronics Engineering Object Oriented
Python Programming - - BE3251 Programming - CS3391 Introduction to
GE3151 Operating Systems -
Programming in C -
CS3451
CS3251

Computer Networks - Object Oriented


CS3591 Software Engineering
- CCS356
Compiler Design - Human Values and
5th Semester

CS3501 Embedded Systems Ethics - GE3791


7th Semester

8th Semester
6th Semester

and IoT - CS3691


Cryptography and Open Elective 2
Cyber Security - Open Elective-1 Project Work /
CB3491
Open Elective 3 Intership
Distributed Computing Elective-3
- CS3551 Open Elective 4
Elective-4
Elective 1
Management Elective
Elective-5
Elective 2
Elective-6
All Computer Engg Subjects - [ B.E., M.E., ] (Click on Subjects to
enter)
Programming in C Computer Networks Operating Systems
Programming and Data Programming and Data Problem Solving and Python
Structures I Structure II Programming
Database Management Systems Computer Architecture Analog and Digital
Communication
Design and Analysis of Microprocessors and Object Oriented Analysis
Algorithms Microcontrollers and Design
Software Engineering Discrete Mathematics Internet Programming
Theory of Computation Computer Graphics Distributed Systems
Mobile Computing Compiler Design Digital Signal Processing
Artificial Intelligence Software Testing Grid and Cloud Computing
Data Ware Housing and Data Cryptography and Resource Management
Mining Network Security Techniques
Service Oriented Architecture Embedded and Real Time Multi - Core Architectures
Systems and Programming
Probability and Queueing Theory Physics for Information Transforms and Partial
Science Differential Equations
Technical English Engineering Physics Engineering Chemistry
Engineering Graphics Total Quality Professional Ethics in
Management Engineering
Basic Electrical and Electronics Problem Solving and Environmental Science and
and Measurement Engineering Python Programming Engineering
CS3501 Compiler Design

UNIT-I
INTRODUCTION TO LANGUAGE PROCESSING:
As Computers became inevitable and indigenous part of human life, and several languages
with different and more advanced features are evolved into this stream to satisfy or comfort the user
in communicating with the machine , the development of the translators or mediator Software‘s
have become essential to fill the huge gap between the human and machine understanding. This
process is called Language Processing to reflect the goal and intent of the process. On the way to
this process to understand it in a better way, we have to be familiar with some key terms and
concepts explained in following lines.

LANGUAGE TRANSLATORS :

Is a computer program which translates a program written in one (Source) language to its
equivalent program in other [Target]language. The Source program is a high level language where as
the Target language can be any thing from the machine language of a target machine (between
Microprocessor to Supercomputer) to another high level languageprogram.

 Two commonly Used Translators are Compiler and Interpreter


1. Compiler : Compiler is a program, reads program in one language called Source Language
and translates in to its equivalent program in another Language called Target Language, in
addition to this its presents the error information to the User.

 If the target program is an executable machine-language program, it can then be called by


the users to process inputs and produce outputs.

Input Target Program Output

Figure1.1: Running the target Program

12 | P a g e
CS3501 Compiler Design

2. Interpreter: An interpreter is another commonly used language processor. Instead of


producing a target program as a single translation unit, an interpreter appears to directly
execute the operations specified in the source program on inputs supplied by theuser.

Source Program
Interpreter Output
Input
Figure 1.2: Running the target Program

LANGUAGE PROCESSING SYSTEM:


Based on the input the translator takes and the output it produces, a language translator can be
called as any one of the following.
Preprocessor: A preprocessor takes the skeletal source program as input and produces an extended
version of it, which is the resultant of expanding the Macros, manifest constants if any, and
including header files etc in the source file. For example, the C preprocessor is a macro processor
that is used automatically by the C compiler to transform our source before actual compilation. Over
and above a preprocessor performs the following activities:
 Collects all the modules, files in case if the source program is divided into different modules
stored at different files.
 Expands short hands / macros into source languagestatements.
Compiler: Is a translator that takes as input a source program written in high level language and
converts it into its equivalent target program in machine language. In addition to above the compiler
also
 Reports to its user the presence of errors in the source program.
 Facilitates the user in rectifying the errors, and execute the code.
Assembler: Is a program that takes as input an assembly language program and converts it into its
equivalent machine language code.
Loader / Linker: This is a program that takes as input a relocatable code and collects the library
functions, relocatable object files, and produces its equivalent absolute machine code.
Specifically,
 Loading consists of taking the relocatable machine code, altering the relocatable
addresses, and placing the altered instructions and data in memory at the proper locations.
 Linking allows us to make a single program from several files of relocatable machine
code. These files may have been result of several different compilations, one or more
may be library routines provided by the system available to any program that needs them.

12 | P a g e
CS3501 Compiler Design

In addition to these translators, programs like interpreters, text formatters etc., may be used in
language processing system. To translate a program in a high level language program to an
executable one, the Compiler performs by default the compile and linking functions.
Normally the steps in a language processing system includes Preprocessing the skeletal Source
program which produces an extended or expanded source program or a ready to compile unit of
the source program, followed by compiling the resultant, then linking / loading , and finally its
equivalent executable code is produced. As I said earlier not all these steps are mandatory. In
some cases, the Compiler only performs this linking and loading functions implicitly.
The steps involved in a typical language processing system can be understood with following
diagram.
Source Program [ Example: filename.C ]

Preprocessor

Modified Source Program [ Example: filename.C ]

Compiler Target Assembly Program

Assembler
Relocatable Machine Code [ Example: filename.obj ]

Loader/Linker Library files


Relocatable Object files
Target Machine Code [ Example: filename. exe ]
Figure1.3 : Context of a Compiler in Language Processing System

TYPES OF COMPILERS:
Based on the specific input it takes and the output it produces, the Compilers can be classified
into the following types;

Traditional Compilers(C, C++, Pascal): These Compilers convert a source program in a HLL
into its equivalent in native machine code or object code.

12 | P a g e
CS3501 Compiler Design

Interpreters(LISP, SNOBOL, Java1.0): These Compilers first convert Source code into
intermediate code, and then interprets (emulates) it to its equivalent machine code.

Cross-Compilers: These are the compilers that run on one machine and produce code for
another machine.

Incremental Compilers: These compilers separate the source into user defined–steps;
Compiling/recompiling step- by- step; interpreting steps in a given order

Converters (e.g. COBOL to C++): These Programs will be compiling from one high level
language to another.

Just-In-Time (JIT) Compilers (Java, Micosoft.NET): These are the runtime compilers from
intermediate language (byte code, MSIL) to executable code or native machine code. These
perform type –based verification which makes the executable code more trustworthy

Ahead-of-Time (AOT) Compilers (e.g., .NET ngen): These are the pre-compilers to the native
code for Java and .NET

Binary Compilation: These compilers will be compiling object code of one platform into object code
of another platform.

PHASES OF A COMPILER:

Due to the complexity of compilation task, a Compiler typically proceeds in a Sequence of


compilation phases. The phases communicate with each other via clearly defined interfaces.
Generally an interface contains a Data structure (e.g., tree), Set of exported functions. Each
phase works on an abstract intermediate representation of the source program, not the source
program text itself (except the first phase)

Compiler Phases are the individual modules which are chronologically executed to perform their
respective Sub-activities, and finally integrate the solutions to give target code.

It is desirable to have relatively few phases, since it takes time to read and write immediate files.
Following diagram (Figure1.4) depicts the phases of a compiler through which it goes during the
compilation. There fore a typical Compiler is having the following Phases:

1. Lexical Analyzer (Scanner), 2. Syntax Analyzer (Parser), 3.Semantic Analyzer,


4.Intermediate Code Generator(ICG), 5.Code Optimizer(CO) , and 6.Code
Generator(CG)

In addition to these, it also has Symbol table management, and Error handler phases. Not all
the phases are mandatory in every Compiler. e.g, Code Optimizer phase is optional in some

12 | P a g e
CS3501 Compiler Design

12 | P a g e
CS3501 Compiler Design

cases. The description is given in next section.

The Phases of compiler divided in to two parts, first three phases we are called as
Analysis part remaining three called as Synthesis part.

Figure1.4 : Phases of a Compiler

PHASE, PASSES OF A COMPILER:

In some application we can have a compiler that is organized into what is called passes.
Where a pass is a collection of phases that convert the input from one representation to a
completely deferent representation. Each pass makes a complete scan of the input and produces
its output to be processed by the subsequent pass. For example a two pass Assembler.

THE FRONT-END & BACK-END OF A COMPILER

12 | P a g e
CS3501 Compiler Design

All of these phases of a general Compiler are conceptually divided into The Front-end,
and The Back-end. This division is due to their dependence on either the Source Language or
the Target machine. This model is called an Analysis & Synthesis model of a compiler.
The Front-end of the compiler consists of phases that depend primarily on the Source
language and are largely independent on the target machine. For example, front-end of the
compiler includes Scanner, Parser, Creation of Symbol table, Semantic Analyzer, and the
Intermediate Code Generator.

The Back-end of the compiler consists of phases that depend on the target machine, and
those portions don‘t dependent on the Source language, just the Intermediate language. In this we
have different aspects of Code Optimization phase, code generation along with the necessary
Error handling, and Symbol table operations.

LEXICAL ANALYZER (SCANNER): The Scanner is the first phase that works as interface
between the compiler and the Source language program and performs the following functions:

 Reads the characters in the Source program and groups them into a stream of tokens in
which each token specifies a logically cohesive sequence of characters, such as an
identifier , a Keyword , a punctuation mark, a multi character operator like := .

 The character sequence forming a token is called a lexeme of the token.

 The Scanner generates a token-id, and also enters that identifiers name in the Symbol
table if it doesn‘t exist.

 Also removes the Comments, and unnecessary spaces.

The format of the token is < Token name, Attribute value>

SYNTAX ANALYZER (PARSER): The Parser interacts with the Scanner, and its subsequent
phase Semantic Analyzer and performs the following functions:

 Groups the above received, and recorded token stream into syntactic structures, usually
into a structure called Parse Tree whose leaves are tokens.

 The interior node of this tree represents the stream of tokens that logically belongs
together.

 It means it checks the syntax of program elements.

SEMANTIC ANALYZER: This phase receives the syntax tree as input, and checks the
semantically correctness of the program. Though the tokens are valid and syntactically correct, it

12 | P a g e
CS3501 Compiler Design

may happen that they are not correct semantically. Therefore the semantic analyzer checks the
semantics (meaning) of the statements formed.

 The Syntactically and Semantically correct structures are produced here in the form of a
Syntax tree or DAG or some other sequential representation like matrix.

INTERMEDIATE CODE GENERATOR(ICG): This phase takes the syntactically and


semantically correct structure as input, and produces its equivalent intermediate notation of the
source program. The Intermediate Code should have two important properties specified below:

 It should be easy to produce,and Easy to translate into the target program. Example
intermediate code forms are:

 Three address codes,

 Polish notations, etc.

CODE OPTIMIZER: This phase is optional in some Compilers, but so useful and beneficial in
terms of saving development time, effort, and cost. This phase performs the following specific
functions:

 Attempts to improve the IC so as to have a faster machine code. Typical functions


include –Loop Optimization, Removal of redundant computations, Strength reduction,
Frequency reductions etc.

 Sometimes the data structures used in representing the intermediate forms may also be
changed.

CODE GENERATOR: This is the final phase of the compiler and generates the target code,
normally consisting of the relocatable machine code or Assembly code or absolute machine
code.

 Memory locations are selected for each variable used, and assignment of variables to
registers is done.

 Intermediate instructions are translated into a sequence of machine instructions.

The Compiler also performs the Symbol table management and Error handling throughout the
compilation process. Symbol table is nothing but a data structure that stores different source
language constructs, and tokens generated during the compilation. These two interact with all
phases of the Compiler.

12 | P a g e
CS3501 Compiler Design

For example the source program is an assignment statement; the following figure shows how the
phases of compiler will process the program.

The input source program is Position=initial+rate*60

Figure1.5: Translation of an assignment Statement

12 | P a g e
CS3501 Compiler Design

LEXICAL ANALYSIS:
As the first phase of a compiler, the main task of the lexical analyzer is to read the
input characters of the source program, group them into lexemes, and produce as output tokens
for each lexeme in the source program. This stream of tokens is sent to the parser for syntax
analysis. It is common for the lexical analyzer to interact with the symbol table as well.
When the lexical analyzer discovers a lexeme constituting an identifier, it needs to
enter that lexeme into the symbol table. This process is shown in the following figure.

Figure 1.6 : Lexical Analyzer

. When lexical analyzer identifies the first token it will send it to the parser, the parser
receives the token and calls the lexical analyzer to send next token by issuing the
getNextToken() command. This Process continues until the lexical analyzer identifies all the
tokens. During this process the lexical analyzer will neglect or discard the white spaces and
comment lines.

TOKENS, PATTERNS AND LEXEMES:

A token is a pair consisting of a token name and an optional attribute value. The token
name is an abstract symbol representing a kind of lexical unit, e.g., a particular keyword, or a
sequence of input characters denoting an identifier. The token names are the input symbols that
the parser processes. In what follows, we shall generally write the name of a token in boldface.
We will often refer to a token by its token name.

A pattern is a description of the form that the lexemes of a token may take [ or match]. In the
case of a keyword as a token, the pattern is just the sequence of characters that form the
keyword. For identifiers and some other tokens, the pattern is a more complex structure that is
matched by many strings.

12 | P a g e
CS3501 Compiler Design

A lexeme is a sequence of characters in the source program that matches the pattern for a
token and is identified by the lexical analyzer as an instance of that token.

Example: In the following C language statement ,

printf ("Total = %d\nǁ, score) ;

both printf and score are lexemes matching the pattern for token id, and "Total = %d\nǁ
is a lexeme matching literal [or string].

Figure 1.7: Examples of Tokens

LEXICAL ANALYSIS Vs PARSING:

There are a number of reasons why the analysis portion of a compiler is normally separated into
lexical analysis and parsing (syntax analysis) phases.

 1. Simplicity of design is the most important consideration. The separation of Lexical


and Syntactic analysis often allows us to simplify at least one of these tasks. For
example, a parser that had to deal with comments and whitespace as syntactic units
would be considerably more complex than one that can assume comments and
whitespace have already been removed by the lexical analyzer.

 2. Compiler efficiency is improved. A separate lexical analyzer allows us to apply


specialized techniques that serve only the lexical task, not the job of parsing. In addition,
specialized buffering techniques for reading input characters can speed up the compiler
significantly.

 3. Compiler portability is enhanced: Input-device-specific peculiarities can be


restricted to the lexical analyzer.

12 | P a g e
CS3501 Compiler Design

12 | P a g e
CS3501 Compiler Design

INPUT BUFFERING:

Before discussing the problem of recognizing lexemes in the input, let us examine
some ways that the simple but important task of reading the source program can be speeded.
This task is made difficult by the fact that we often have to look one or more characters beyond
the next lexeme before we can be sure we have the right lexeme. There are many situations
where we need to look at least one additional character ahead. For instance, we cannot be sure
we've seen the end of an identifier until we see a character that is not a letter or digit, and
therefore is not part of the lexeme for id. In C, single-character operators like -, =, or <
could also be the beginning of a two-character operator like ->, ==, or <=. Thus, we shall
introduce a two-buffer scheme that handles large look aheads safely. We then consider an
improvement involving "sentinels" that saves time checking for the ends of buffers.

Buffer Pairs

Because of the amount of time taken to process characters and the large number of characters
that must be processed during the compilation of a large source program, specialized buffering
techniques have been developed to reduce the amount of overhead required to process a single
input character. An important scheme involves two buffers that are alternately reloaded.

Figure1.8 : Using a Pair of Input Buffers

Each buffer is of the same size N, and N is usually the size of a disk block, e.g., 4096
bytes. Using one system read command we can read N characters in to a buffer, rather than
using one system call per character. If fewer than N characters remain in the input file, then a
special character, represented by eof, marks the end of the source file and is different from any
possible character of the source program.

 Two pointers to the input are maintained:

1. The Pointer lexemeBegin, marks the beginning of the current lexeme, whose extent
we are attempting to determine.

2. Pointer forward scans ahead until a pattern match is found; the exact strategy
whereby this determination is made will be covered in the balance of this chapter.

12 | P a g e
CS3501 Compiler Design

Once the next lexeme is determined, forward is set to the character at its right end. Then,
after the lexeme is recorded as an attribute value of a token returned to the parser, 1exemeBegin
is set to the character immediately after the lexeme just found. In Fig, we see forward has passed
the end of the next lexeme, ** (the FORTRAN exponentiation operator), and must be retracted
one position to its left.

Advancing forward requires that we first test whether we have reached the end of one
of the buffers, and if so, we must reload the other buffer from the input, and move forward to
the beginning of the newly loaded buffer. As long as we never need to look so far ahead of the
actual lexeme that the sum of the lexeme's length plus the distance we look ahead is greater
than N, we shall never overwrite the lexeme in its buffer before determining it.

Sentinels To Improve Scanners Performance:

If we use the above scheme as described, we must check, each time we advance forward,
that we have not moved off one of the buffers; if we do, then we must also reload the other
buffer. Thus, for each character read, we make two tests: one for the end of the buffer, and one
to determine what character is read (the latter may be a multi way branch). We can combine the
buffer-end test with the test for the current character if we extend each buffer to hold a sentinel
character at the end. The sentinel is a special character that cannot be part of the source program,
and a natural choice is the character eof. Figure 1.8 shows the same arrangement as Figure 1.7,
but with the sentinels added. Note that eof retains its use as a marker for the end of the entire
input.

Figure1.8 : Sentential at the end of each buffer

Any eof that appears other than at the end of a buffer means that the input is at an end. Figure 1.9
summarizes the algorithm for advancing forward. Notice how the first test, which can be part of

12 | P a g e
CS3501 Compiler Design

a multiway branch based on the character pointed to by forward, is the only test we make, except
in the case where we actually are at the end of a buffer or the end of the input.

switch ( *forward++ )
{
case eof: if (forward is at end of first buffer )
{
reload second buffer;
forward = beginning of second buffer;
}

else if (forward is at end of second buffer )


{
reload first buffer;
forward = beginning of first buffer;
}
else /* eof within a buffer marks the end of input */
terminate lexical analysis;

break;
}
Figure 1.9: use of switch-case for the sentential

SPECIFICATION OF TOKENS:

Regular expressions are an important notation for specifying lexeme patterns. While they cannot express
all possible patterns, they are very effective in specifying those types of patterns that we actually need for
tokens.

LEX the Lexical Analyzer generator

Lex is a tool used to generate lexical analyzer, the input notation for the Lex tool is
referred to as the Lex language and the tool itself is the Lex compiler. Behind the scenes, the
Lex compiler transforms the input patterns into a transition diagram and generates code, in a
file called lex .yy .c, it is a c program given for C Compiler, gives the Object code. Here we need
to know how to write the Lex language. The structure of the Lex program is given below.

12 | P a g e
CS3501 Compiler Design

Structure of LEX Program : A Lex program has the following form:

Declarations
%%
Translation rules
%%

Auxiliary functions definitions

The declarations section : includes declarations of variables, manifest constants (identifiers


declared to stand for a constant, e.g., the name of a token), and regular definitions. It appears
between %{. . .%}

In the Translation rules section, We place Pattern Action pairs where each pair have the form

Pattern {Action}

The auxiliary function definitions section includes the definitions of functions used to install
identifiers and numbers in the Symbol tale.

LEX Program Example:


%{
/* definitions of manifest constants LT,LE,EQ,NE,GT,GE, IF,THEN, ELSE,ID, NUMBER,
RELOP */
%}
/* regular definitions */
delim [ \t\n]
ws { delim}+
letter [A-Za-z]
digit [o-91
id {letter} ({letter} | {digit}) *
number {digit}+ (\ . {digit}+)? (E [+-I]?{digit}+)?
%%
{ws} {/* no action and no return */}
if {return(1F) ; }

12 | P a g e
CS3501 Compiler Design

then {return(THEN) ; }
else {return(ELSE) ; }
(id) {yylval = (int) installID(); return(1D);}
(number) {yylval = (int) installNum() ; return(NUMBER) ; }
ǁ< ǁ {yylval = LT; return(REL0P) ; )}
― <=ǁ {yylval = LE; return(REL0P) ; }
―=ǁ {yylval = EQ ; return(REL0P) ; }
―<>ǁ {yylval = NE; return(REL0P);}
―<ǁ {yylval = GT; return(REL0P);)}
―<=ǁ {yylval = GE; return(REL0P);}
%%
int installID0() {/* function to install the lexeme, whose first character is pointed to by yytext,
and whose length is yyleng, into the symbol table and return a pointer
thereto */
int installNum() {/* similar to installID, but puts numerical constants into a separate table */}
Figure 1.10 : Lex Program for tokens common tokens

SYNTAX ANALYSIS (PARSER)


THE ROLE OF THE PARSER:

In our compiler model, the parser obtains a string of tokens from the lexical analyzer,
as shown in the below Figure, and verifies that the string of token names can be generated
by the grammar for the source language. We expect the parser to report any syntax errors in
an intelligible fashion and to recover from commonly occurring errors to continue processing the
remainder of the program. Conceptually, for well-formed programs, the parser constructs a parse
tree and passes it to the rest of the compiler for further processing.

12 | P a g e
CS3501 Compiler Design

12 | P a g e
Figure2.1: Parser in the Compiler

During the process of parsing it may encounter some error and present the error information back
to the user

Syntactic errors include misplaced semicolons or extra or missing braces; that is,
―{" or "}." As another example, in C or Java, the appearance of a case statement
without an enclosing switch is a syntactic error (however, this situation is usually allowed by
the parser and caught later in the processing, as the compiler attempts to generate code).

Based on the way/order the Parse Tree is constructed, Parsing is basically classified in to
following two types:

1. Top Down Parsing : Parse tree construction start at the root node and moves to the
children nodes (i.e., top down order).

2. Bottom up Parsing: Parse tree construction begins from the leaf nodes and proceeds
towards the root node (called the bottom up order).

IMPORTANT (OR) EXPECTED QUESTIONS

1. What is a Compiler? Explain the working of a Compiler with your own example?
2. What is the Lexical analyzer? Discuss the Functions of Lexical Analyzer.
3. Write short notes on tokens, pattern and lexemes?
4. Write short notes on Input buffering scheme? How do you change the basic input
buffering algorithm to achieve better performance?
5. What do you mean by a Lexical analyzer generator? Explain LEX tool.

16 | P a g e
CS3501 : Compiler Design
ASSIGNMENT QUESTIONS:
1. Write the differences between compilers and interpreters?

2. Write short notes on token reorganization?

3. Write the Applications of the Finite Automata?

4. Explain How Finite automata are useful in the lexical analysis?

5. Explain DFA and NFA with an Example?


Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
UNIT-III SYNTAX DIRECTED TRANSLATION

SEMANTIC ANALYSIS
 Semantic Analysis computes additional information related to the meaning of the
program once the syntactic structure is known.
 In typed languages as C, semantic analysis involves adding information to the symbol
table and performing type checking.
 The information to be computed is beyond the capabilities of standard parsing
techniques, therefore it is not regarded as syntax.
 As for Lexical and Syntax analysis, also for Semantic Analysis we need both a
Representation Formalism and an Implementation Mechanism.
 As representation formalism this lecture illustrates what are called Syntax Directed
Translations.
SYNTAX DIRECTED TRANSLATION
 The Principle of Syntax Directed Translation states that the meaning of an input
sentence is related to its syntactic structure, i.e., to its Parse-Tree.
 By Syntax Directed Translations we indicate those formalisms for specifying
translations for programming language constructs guided by context-free grammars.
o We associate Attributes to the grammar symbols representing the language
constructs.
o Values for attributes are computed by Semantic Rules associated with
grammar productions.
 Evaluation of Semantic Rules may:
o Generate Code;
o Insert information into the Symbol Table;
o Perform Semantic Check;
o Issue error messages;
o etc.
There are two notations for attaching semantic rules:
1. Syntax Directed Definitions. High-level specification hiding many implementation
details (also called Attribute Grammars).
2. Translation Schemes. More implementation oriented: Indicate the order in which
semantic rules are to be evaluated.
Syntax Directed Definitions
• Syntax Directed Definitions are a generalization of context-free grammars in which:
1. Grammar symbols have an associated set of Attributes;
2. Productions are associated with Semantic Rules for computing the values of attributes.
 Such formalism generates Annotated Parse-Trees where each node of the tree is a
record with a field for each attribute (e.g.,X.a indicates the attribute a of the grammar
symbol X).
 The value of an attribute of a grammar symbol at a given parse-tree node is defined by a
semantic rule associated with the production used at that node.

We distinguish between two kinds of attributes:


1. Synthesized Attributes. They are computed from the values of the attributes of the
children nodes.
2. Inherited Attributes. They are computed from the values of the attributes of both the
siblings and the parent nodes

Syntax Directed Definitions: An Example


• Example. Let us consider the Grammar for arithmetic expressions. The Syntax Directed
Definition associates to each non terminal a synthesized attribute called val.
S-ATTRIBUTED DEFINITIONS
Definition. An S-Attributed Definition is a Syntax Directed Definition that uses only
synthesized attributes.
• Evaluation Order. Semantic rules in a S-Attributed Definition can be evaluated by a
bottom-up, or PostOrder, traversal of the parse-tree.
• Example. The above arithmetic grammar is an example of an S-Attributed
Definition. The annotated parse-tree for the input 3*5+4n is:
L-attributed definition
Definition: A SDD its L-attributed if each inherited attribute of Xi in the RHS of A ! X1 :
:Xn depends only on
1. attributes of X1;X2; : : : ;Xi1 (symbols to the left of Xi in the RHS)
2. inherited attributes of A.
Restrictions for translation schemes:
1. Inherited attribute of Xi must be computed by an action before Xi.
2. An action must not refer to synthesized attribute of any symbol to the right of that action.
3. Synthesized attribute for A can only be computed after all attributes it references have
been completed (usually at end of RHS).
SYMBOL TABLES
A symbol table is a major data structure used in a compiler. Associates attributes with
identifiers used in a program. For instance, a type attribute is usually associated with each
identifier. A symbol table is a necessary component Definition (declaration) of identifiers
appears once in a program .Use of identifiers may appear in many places of the program text
Identifiers and attributes are entered by the analysis phases. When processing a definition
(declaration) of an identifier. In simple languages with only global variables and implicit
declarations. The scanner can enter an identifier into a symbol table if it is not already there
In block-structured languages with scopes and explicit declarations:
The parser and/or semantic analyzer enter identifiers and corresponding attributes
Symbol table information is used by the analysis and synthesis phases
To verify that used identifiers have been defined (declared)
To verify that expressions and assignments are semantically correct – type checking
To generate intermediate or target code

 Symbol Table Interface


The basic operations defined on a symbol table include:
 allocate – to allocate a new empty symbol table
 free – to remove all entries and free the storage of a symbol table
 insert – to insert a name in a symbol table and return a pointer to its entry
 lookup – to search for a name and return a pointer to its entry
 set_attribute – to associate an attribute with a given entry
 get_attribute – to get an attribute associated with a
given entry
Other operations can be added depending on requirement For example, a delete
operation removes a name previously inserted Some identifiers become invisible (out
of scope) after exiting a block
This interface provides an abstract view of a symbol table
Supports the simultaneous existence of multiple tables
Implementation can vary without modifying the interface
Basic Implementation Techniques
First consideration is how to insert and lookup names
Variety of implementation techniques
Unordered List
Simplest to implement
Implemented as an array or a linked list
Linked list can grow dynamically – alleviates problem of a fixed size array
Insertion is fast O(1), but lookup is slow for large tables – O(n) on average
Ordered List
If an array is sorted, it can be searched using binary search – O(log2 n)
Insertion into a sorted array is expensive – O(n) on average
Useful when set of names is known in advance – table of reserved words
Binary Search Tree
Can grow dynamically
Insertion and lookup are O(log2 n) on average
RUNTIME ENVIRONMENT
 Runtime organization of different storage locations
 Representation of scopes and extents during program execution.
 Components of executing program reside in blocks of memory (supplied by OS).
 Three kinds of entities that need to be managed at runtime:
o Generated code for various procedures and programs.
forms text or code segment of your program: size known at compile time.
o Data objects:
Global variables/constants: size known at compile time
Variables declared within procedures/blocks: size known
Variables created dynamically: size unknown.
o Stack to keep track of procedure
activations. Subdivide memory conceptually into
code and data areas:
 Code:
Program instructions
 Stack: Manage activation of procedures at runtime.
 Heap: holds variables created dynamically
STORAGE ORGANIZATION
1. Fixed-size objects can be placed in predefined locations.
2. Run-time stack and heap The STACK is used to store:
o Procedure activations.
o The status of the machine just before calling a procedure, so that the status can be
restored when the called procedure returns.
o The HEAP stores data allocated under program control (e.g. by malloc() in C).
Activation records
Any information needed for a single activation of a procedure is stored in the
ACTIVATION RECORD (sometimes called the STACK FRAME). Today, we’ll
assume the stack grows DOWNWARD, as on, e.g., the Intel architecture. The
activation record gets pushed for each procedure call and popped for each procedure
return.
STATIC ALLOCATION
Statically allocated names are bound to storage at compile time. Storage
bindings of statically allocated names never change, so even if a name is local to a
procedure, its name is always bound to the same storage. The compiler uses the type of
a name (retrieved from the symbol table) to determine storage size required. The
required number of bytes (possibly aligned) is set aside for the name.The address of the
storage is fixed at compile time.
Limitations:
− The size required must be known at compile time.
− Recursive procedures cannot be implemented as all locals are
statically allocated.
− No data structure can be created dynamically as all data’s static.
Return value offset = 0

float f(int k)
{
float c[10],b;
Parameter k offset = 4
b = c[k]*3.14;

return b;

Local c[10] offset = 8

Local b offset = 48

 Stack-dynamic allocation
 Storage is organized as a stack.
 Activation records are pushed and popped.
 Locals and parameters are contained in the activation records for the call.
 This means locals are bound to fresh storage on every call.
 If we have a stack growing downwards, we just need a stack_top pointer.
 To allocate a new activation record, we just increase stack_top.
 To deallocate an existing activation record, we just decrease stack_top.

 Address generation in stack allocation


The position of the activation record on the stack cannot be determined statically.
Therefore the compiler must generate addresses RELATIVE to the activation record.
If we have a downward-growing stack and a stack_top pointer, we generate addresses
of the form stack_top + offset
HEAP ALLOCATION
Some languages do not have tree-structured allocations. In these cases,
activations have to be allocated on the heap. This allows strange situations, like callee
activations that live longer than their callers’ activations. This is not common Heap is
used for allocating space for objects created at run timeFor example: nodes of dynamic
data structures such as linked lists and trees
Dynamic memory allocation and deallocation based on the requirements of
the programmalloc() and free() in C programs
new()and delete()in C++ programs
new()and garbage collection in Java programs

Allocation and deallocation may be completely manual (C/C++), semi-automatic(Java), or


fully automatic (Lisp)
PARAMETERS PASSING
A language has first-class functionsif functions can bedeclared within any scope
passed as arguments to other functions returned as results of functions.In a language
with first-class functions and static scope, a function value is generally represented by
a closure. a pair consisting of a pointer to function code a pointer to an activation
record.Passing functions as arguments is very useful in structuring of systems using
upcalls
An example:
main()
{ int
x=
4;
int f
(int
y) {
retur
n
x*y;
}
int g (int →int h){
int x = 7;
return h(3) + x;
}

Call-by-Value

The actual parameters are evaluated and their r-values are passed
to the called procedure

A procedure called by value can affect its caller either through nonlocal
names or through pointers.
Parameters in C are always passed by value. Array is unusual, what is
passed by value is a pointer.
Pascal uses pass by value by default, but var parameters are passed by reference.

Call-by-Reference

Also known as call-by-address or call-by-location. The caller


passes to the called procedure the l-value of the parameter.
If the parameter is an expression, then the expression is evaluated in a
new location, and the address of the new location is passed.
Parameters in Fortran are passed by reference an old implementation bug in Fortran

func(a,b) { a = b};
call func(3,4); print(3);

Copy-Restore
A hybrid between call-by-value and call-by reference.
The actual parameters are evaluated and their r-values are passed as in
call- by-value. In addition, l values are determined before the call.
When control returns, the current r-values of the formal parameters are
copied back into the l-values of the actual parameters.
Call-by-Name
The actual parameters literally substituted for the formals. This is
like a macro- expansion or in-line expansion Call-by-name is not used
in practice. However, the conceptually related technique of in-line
expansion is commonly used. In-lining may be one of the most effective
optimization transformations if they are guided by execution profiles.
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Scanned by TapScanner
Click on Subject/Paper under Semester to enter.
Environmental Sciences
Professional English and Sustainability -
Professional English - - II - HS3252 Discrete Mathematics GE3451
I - HS3152 - MA3354
Statistics and Theory of Computation
Matrices and Calculus Numerical Methods - Digital Principles and - CS3452
3rd Semester

4th Semester
- MA3151 MA3251 Computer Organization
1st Semester

2nd Semester

- CS3351 Artificial Intelligence


Engineering Graphics and Machine Learning
Engineering Physics - - CS3491
- GE3251 Foundation of Data
PH3151
Science - CS3352
Database Management
Physics for
Engineering Chemistry System - CS3492
Information Science Data Structure -
- CY3151 - PH3256 CS3301

Basic Electrical and


Algorithms - CS3401
Problem Solving and Electronics Engineering Object Oriented
Python Programming - - BE3251 Programming - CS3391 Introduction to
GE3151 Operating Systems -
Programming in C -
CS3451
CS3251

Computer Networks - Object Oriented


CS3591 Software Engineering
- CCS356
Compiler Design - Human Values and
5th Semester

CS3501 Embedded Systems Ethics - GE3791


7th Semester

8th Semester
6th Semester

and IoT - CS3691


Cryptography and Open Elective 2
Cyber Security - Open Elective-1 Project Work /
CB3491
Open Elective 3 Intership
Distributed Computing Elective-3
- CS3551 Open Elective 4
Elective-4
Elective 1
Management Elective
Elective-5
Elective 2
Elective-6
All Computer Engg Subjects - [ B.E., M.E., ] (Click on Subjects to
enter)
Programming in C Computer Networks Operating Systems
Programming and Data Programming and Data Problem Solving and Python
Structures I Structure II Programming
Database Management Systems Computer Architecture Analog and Digital
Communication
Design and Analysis of Microprocessors and Object Oriented Analysis
Algorithms Microcontrollers and Design
Software Engineering Discrete Mathematics Internet Programming
Theory of Computation Computer Graphics Distributed Systems
Mobile Computing Compiler Design Digital Signal Processing
Artificial Intelligence Software Testing Grid and Cloud Computing
Data Ware Housing and Data Cryptography and Resource Management
Mining Network Security Techniques
Service Oriented Architecture Embedded and Real Time Multi - Core Architectures
Systems and Programming
Probability and Queueing Theory Physics for Information Transforms and Partial
Science Differential Equations
Technical English Engineering Physics Engineering Chemistry
Engineering Graphics Total Quality Professional Ethics in
Management Engineering
Basic Electrical and Electronics Problem Solving and Environmental Science and
and Measurement Engineering Python Programming Engineering

You might also like