CD Unit-1 (Complete)
CD Unit-1 (Complete)
UNIT-1
Translators
A translator is a program that
takes as input a program written in one programming language
(the source language)
produces as output a program in another language (object or
target language).
It takes program written in source code and converts it into
machine code.
It also discovers and identifies the error during translation.
2
Need for Translators
Binary representation used for communication within a
computer system is called machine language.
With machine language we can communicate with the
computer in terms of bits.
3
Machine Language
Machine language
Computer can understand only one language i.e. machine language.
Normally written as strings of binary 0’s and 1’s.
Due to these limitations, some languages have been designed which are easily
understandable by the user and also machine independent.
A software program is required which can convert this machine independent
language into machine language.
This software program is called a Translator.
4
Assembly Language
To overcome limitations of machine language, assembly language was
introduced in 1952.
Instructions can be written in the form of letters and symbols and not in
binary form.
For example,
to add two numbers
ADD A, B
6
Types of Translators
Compilers
Assemblers
Interpreters
Macros
Preprocessors
Linkers & Loaders
7
Compilers
A compiler is a translator which converts high level language
(source program) into low level language (object program or
machine program)
8
Advantages & Disadvantages of Compiler
Advantages:
Compiler translates complete program in a single run.
It takes less time.
More CPU utilization.
Easily supported by many high level languages like C, C++ etc.
Disadvantages:
Not flexible.
Consumes more space.
Error localization is difficult.
Source program has to be compiled for every modification.
9
Interpreters
An interpreter like compiler is a translator that translates high level
language program(source program) into low level language (object
program or machine program)
An interpreter reads a source program written in a high level language
as well as data for this program.
It runs the program against the data to produce results.
Advantages:
Translates the program line by line.
Flexible
Error localization is easier.
Disadvantages:
Consumes more time as it is slower.
CPU utilization is less.
Less efficient.
10
Other Translators
Assembler
An assembler is a translator that translates the assembly
language instructions into machine code.
Macros
A macro is a translator that translates assembly language
instructions into machine code.
It is a variation of assembler.
Preprocessor
It is a program that transform the source code before
compilation.
Preprocessor tells the compiler to include some header files
11 into the program.
Linkers and Loaders
A linker is a program that combines object modules to form
an executable program
Loader is a program which accepts the input as linked
modules & loads them into main memory for execution by
the computer.
12
Analysis-Synthesis Model of Compiler
There are two parts of compilation:
Analysis
Breaks up the source program into constituent pieces
Creates an intermediate representation of source program
Synthesis
Constructs the desired source program from an intermediate
representation
13
Analysis of source program
In compilation, analysis consists of three parts:
Linear analysis
Stream of characters in source program is read from left to right and
grouped into tokens.
These tokens are the sequence of characters having a collective meaning.
Hierarchical analysis
Characters or tokens are grouped hierarchically into nested collections
with collective meaning.
Semantic analysis
Certain checks are performed to ensure that the components of a
program fit together meaningfully.
14
Phases of a Compiler
A compiler takes as input a source program and produces as
output an equivalent sequence of machine instructions.
16
Lexical Analysis
The lexical analysis is an interface between the source
program and the compiler.
It reads the source program character by character separating
them into groups that logically belongs together.
The sequence of character groups are called tokens.
The character sequence that forms a token is called a
“lexeme”.
The software that performs lexical analysis is called a lexical
analyzer or scanner.
17
An Example
Consider the statement:
Sum:=bonus+basic*50
The statement is grouped into 7 tokens as follows:
S. No. Lexeme Token
(Sequence of characters) (Category of lexeme
1 Sum Identifier
2 := Assignment operator
3 Bonus Identifier
4 + Addition operator
5 Basic Identifier
6 * Multiplication operator
7 50 Integer constant
Semantic Analysis
a+b*10 a + b * int to real (10)
20
Intermediate Code Generation
After performing syntax and semantic analysis on program, compiler generates an intermediate
code
Intermediate between source language and machine language
21
Code Optimization
This phase improves the intermediate code so that faster
running object code can be produced.
It performs the following tasks:
Improve target code
Eliminate redundant information (common sub-expressions)
Remove unnecessary operation.
Replaces slow instructions with faster ones.
Types of Optimization:
Local optimization
Loop optimization
Global data flow analysis
22
Types of Optimization
Local Optimization
Removes common sub expressions or redundant information
E.g. T1=P+Q
S=P+Q+R Local Optimization
S=T1+R
Z=P+Q+M
Z=T1+M
Loop Optimization
It is very important to optimize loops so as to increase the performance of the whole
program.
Statement which computes same value every time when the loop is executed is called
“Loop invariant computation”.
These statements can be takes outside the loop resulting in decreasing the execution time
of loop and the whole program.
int a=5; int a=5;
int c; int c;
for (int i=1; i<=5; i++) Loop Optimization c=a+2;
{ cout<<i; for (int i=1; i<=5; i++)
c=a+2; { cout<<i;
} }
24
Symbol Table & Error Handler
Symbol Table
It is a data structure which contains tokens.
Keeps record of each token & its attributes (i.e. identifier name,
data types, location etc.)
This information will be used later by semantic analyzer and
code generator.
Error Handler
It detects and reports errors occurred at each phase of
compiler.
25
Example
26
Compiler Construction Tools
Software tools developed to create one or more phases of compiler are called
compiler construction tools.
Some of these are:
Scanner generator
Generates lexical analyzers
Basic lexical analyzer is produced by finite automata which takes input in the form of regular
expressions.
Parser generator
Produces syntax analyzers which takes input in the form of programming language based on
context free grammar.
28
Role of Lexical Analyzer
Main functions of lexical analyzer are:
Separate tokens from program and return those tokens to
parser.
Eliminate comments, white spaces, new line characters etc.
from string.
Inserts tokens into symbol table.
Returns a numeric code fro each token to parser.
29
Input Buffering
To identify tokens, lexical analyzer has to access secondary
memory every time.
It is costly and time consuming.
So, input strings are stored into buffer and scanned by lexical
analyzer.
Lexical analyzer scans input strings from left to right one
character at a time to identify tokens.
Edges
States are connected by edges arrows.
32
Transition Diagram
(Constants and Identifiers)
33
Transition Diagram (Keywords)
34
Transition Diagram (Relational Operators)
35
Specification of Tokens(Regular Expressions)
Regular expressions are used to specify tokens.
Provides convenient and useful notation.
RE’s define the language accepted by Finite Automata (Transition
Diagram).
RE’s are defined over an alphabet ∑.
If R is a regular expression, then L(R) represents language
denoted by RE.
Language
It is a collection of strings over some fixed alphabet.
Empty string can be denoted by ε.
E.g.
If L= set of strings of 0’s and 1’s of length two
Then, L= {00, 01, 10, 11}
36
Operations on Languages
Operations that can be performed on Languages are:
Union
Concatenation
Kleen Closure
Positive Closure
Union
L1 ∪ L2 = {set of strings in L1 & set of strings in L2}
Concatenation
L1L2 = {set of strings in L1 followed by strings in L2}
Kleen Closure
L1* = L10 ∪ L11 ∪ L12 ∪ …….
Positive Closure
37
L1+ = L11 ∪ L12 ∪ L13 ∪ …….
Rules of Regular Expressions
ε is a Regular expression.
Union of two Regular expressions R1 and R2 is also a Regular
expression.
Concatenation of two Regular expressions R1 and R2 is also a
Regular expression.
Closure of Regular Expression is also a Regular Expression.
If R is a Regular Expression then (R) is also a Regular
Expression.
38
Algebraic Laws
R1|R2 = R2|R1 or R1+R2 = R2+R1 (Commutative)
εR = R ε = R (Concatenation)
39
Recognition of Token (Finite Automata)
It is a machine or a recognizer for a language that is used to
check whether string is accepted by a language or not.
In Finite Automata,
Finite means finite number of states .
Automata means Automatic machine which works without
any interference of human being.
40
Finite Automata
FA can be represented by 5 tuple (Q, ∑, δ,
q0, F)
Where,
Q: finite non empty set of states
∑: Finite set of input symbols
δ: Transition function
q0 : Initial state
41
F: Set of final states
42
Types of Finite Automata
Deterministic Finite Automata
Deterministic means on each input there is one and only one
state to which automata can have transition from its current
state
43
Deterministic Finite Automata (DFA)
DFA is a 5 tuple (Q, ∑, δ, q0, F)
Where,
Q: finite non empty set of states
∑: Finite set of input symbols
δ: Transition function to move from current
state to next state.
δ : Q × ∑ -> Q
q0 : Initial state
F: Set of final states
44
Non- Deterministic Finite Automata (NFA)
NFA is a 5 tuple (Q, ∑, δ, q0, F)
Where,
Q: finite non empty set of states
∑: Finite set of input symbols
δ: Transition function to move from current
state to next state.
δ : Q × ∑ -> 2Q
q0 : Initial state
F: Set of final states
45
Difference between DFA and NFA
DFA NFA
Every transition from one state There can be multiple
to other is unique & transitions for an input i.e.
deterministic in nature. non-deterministic.
Null transitions (ε) are not Null transitions (ε) are allowed
allowed. means transition from current
state to next state without any
Transition function
input.
Transition function
δ : Q × ∑ -> Q
Requires less memory as δ : Q × ∑ -> 2
Q
46
Conversion of Regular expression to NFA
47
NFA for RE (a+b)*
48
NFA for a(a+b)*ab
49
NFA for a+b+ab
50
NFA for (0+1)*1(0+1)
51
ε-closure (s)
ε-closure (s): it is a set of states that can be reached from
state s on ε-transitions alone.
52
Example: Find ε-closure of all states
53
Example: Find ε-closure of states 0,1,4
54
Example: Find ε-closure of all states
ε-closure(0) = {0, 1, 2, 5, 7}
ε-closure(1) = {1, 2, 5}
ε-closure(2) = {}
ε-closure(3) = {}
ε-closure(4) = {4, 7, 1, 2, 5}
ε-closure(5) = {}
ε-closure(6) = {}
55 ε-closure(7) = {}
NFA to DFA Conversion
56
Example: Draw NFA for RE (a+b)*abb.
Convert NFA to DFA
57
58
59
60
61
62
Minimizing number of states of DFA
Minimizing means reducing the number of
states in DFA.
The states should be eliminated in such a
way that resulting DFA should not effect the
language accepted by DFA.
63
64
65
66
67
68
69
Example: Minimize the following DFA
70
71
72
73
Language for Lexical Analyzers
LEX is a source program used for the specification of lexical
analyzer.
It is a tool or software which automatically generates Lexical Analyzer
(Finite Automata).
It takes as input a LEX source program and produces Lexical Analyzer
as its output.
Then Lexical Analyzer will convert the input string entered by user
into tokens as its output.
LEX Source
Program LEX Lexical
Analyzer
Auxiliary
LEX Definition
Source program Translation
Rules
75
Auxiliary Definitions
It denotes the Regular Expressions of the form:
Where,
Distinct name (Di) -> shortcut name of Regular Expression
Regular Expression (Ri) -> Notation to represent collection of
input symbols.
76
Auxiliary Definition for Identifiers
77
78
Translation Rules
It is a set of rules or actions which tells Lexical Analyzer what it
has to do.
or
what it has to return to parser on encountering token.
It consists of statements of the form:
P1{Action1}
P2{Action2}
:
:
Pn{Actionn}
Where,
Pi -> pattern or Regular Expression consisting of input alphabets &
Auxiliary definition names.
Actioni -> it is a piece of code which gets executed whenever token is
79
recognised.
Example
82
83
84
85
86
87
88
89
END OF
UNIT-I
90