0% found this document useful (0 votes)
2 views

CD Unit-1 (Complete)

The document provides an overview of compiler design, detailing the role of translators in converting source code from one programming language to another. It discusses various types of programming languages, including machine, assembly, and high-level languages, as well as the different types of translators such as compilers, interpreters, and assemblers. Additionally, it outlines the phases of compilation, including lexical analysis, syntax analysis, semantic analysis, intermediate code generation, optimization, and code generation.

Uploaded by

Shikha Kamra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

CD Unit-1 (Complete)

The document provides an overview of compiler design, detailing the role of translators in converting source code from one programming language to another. It discusses various types of programming languages, including machine, assembly, and high-level languages, as well as the different types of translators such as compilers, interpreters, and assemblers. Additionally, it outlines the phases of compilation, including lexical analysis, syntax analysis, semantic analysis, intermediate code generation, optimization, and code generation.

Uploaded by

Shikha Kamra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 90

COMPILER DESIGN

UNIT-1
Translators
 A translator is a program that
 takes as input a program written in one programming language
(the source language)
 produces as output a program in another language (object or
target language).
 It takes program written in source code and converts it into
machine code.
 It also discovers and identifies the error during translation.

2
Need for Translators
 Binary representation used for communication within a
computer system is called machine language.
 With machine language we can communicate with the
computer in terms of bits.

 There are three main kinds of programming languages:


 Machine language
 Assembly language
 High Level language

3
Machine Language
 Machine language
 Computer can understand only one language i.e. machine language.
 Normally written as strings of binary 0’s and 1’s.

 A program written in machine language has the following disadvantages:


 Difficult to read & understand
 Machine dependent
 Error prone.

 Due to these limitations, some languages have been designed which are easily
understandable by the user and also machine independent.
 A software program is required which can convert this machine independent
language into machine language.
 This software program is called a Translator.

4
Assembly Language
 To overcome limitations of machine language, assembly language was
introduced in 1952.
 Instructions can be written in the form of letters and symbols and not in
binary form.
 For example,
 to add two numbers
ADD A, B

 Advantages of Assembly language over machine language:


 Uses mnemonics (symbols) instead of bits.
 More readable.
 Permits programmers to use labels to identify and name particular memory
words that holds instructions or data.
 Locating and correcting errors is easier.

 The main disadvantage of assembly language is that the programs


5 written in assembly language are machine dependent.
High Level Language
 A high level language writes a program which can be easily
understandable by the user.
 While writing program in high level language, programmer
need not know the internal structure of the computer.
 Machine independent language.
 For example,
 to add two numbers simply write,
C=a+b
 Some of the high level languages are:
 C, C++, Java etc.

6
Types of Translators
 Compilers
 Assemblers
 Interpreters
 Macros
 Preprocessors
 Linkers & Loaders

7
Compilers
 A compiler is a translator which converts high level language
(source program) into low level language (object program or
machine program)

 Also generates diagnostic messages encountered during


compilation of program.

8
Advantages & Disadvantages of Compiler
 Advantages:
 Compiler translates complete program in a single run.
 It takes less time.
 More CPU utilization.
 Easily supported by many high level languages like C, C++ etc.

 Disadvantages:
 Not flexible.
 Consumes more space.
 Error localization is difficult.
 Source program has to be compiled for every modification.
9
Interpreters
 An interpreter like compiler is a translator that translates high level
language program(source program) into low level language (object
program or machine program)
 An interpreter reads a source program written in a high level language
as well as data for this program.
 It runs the program against the data to produce results.

 Advantages:
 Translates the program line by line.
 Flexible
 Error localization is easier.

 Disadvantages:
 Consumes more time as it is slower.
 CPU utilization is less.
 Less efficient.
10
Other Translators
 Assembler
 An assembler is a translator that translates the assembly
language instructions into machine code.

 Macros
 A macro is a translator that translates assembly language
instructions into machine code.
 It is a variation of assembler.

 Preprocessor
 It is a program that transform the source code before
compilation.
 Preprocessor tells the compiler to include some header files
11 into the program.
Linkers and Loaders
 A linker is a program that combines object modules to form
an executable program
 Loader is a program which accepts the input as linked
modules & loads them into main memory for execution by
the computer.

12
Analysis-Synthesis Model of Compiler
 There are two parts of compilation:
 Analysis
 Breaks up the source program into constituent pieces
 Creates an intermediate representation of source program

 Synthesis
 Constructs the desired source program from an intermediate
representation

13
Analysis of source program
 In compilation, analysis consists of three parts:
 Linear analysis
 Stream of characters in source program is read from left to right and
grouped into tokens.
 These tokens are the sequence of characters having a collective meaning.

 Hierarchical analysis
 Characters or tokens are grouped hierarchically into nested collections
with collective meaning.

 Semantic analysis
 Certain checks are performed to ensure that the components of a
program fit together meaningfully.
14
Phases of a Compiler
 A compiler takes as input a source program and produces as
output an equivalent sequence of machine instructions.

 It is difficult to implement the whole process in one step.

 This process is broken down into subtasks called Phases.

 Each phase is interdependent on other phase.

 Output of one phase will be input to another phase.

 First phase of compiler takes as input source program and last


phase produces the required object program.
15
Phases of Compiler

16
Lexical Analysis
 The lexical analysis is an interface between the source
program and the compiler.
 It reads the source program character by character separating
them into groups that logically belongs together.
 The sequence of character groups are called tokens.
 The character sequence that forms a token is called a
“lexeme”.
 The software that performs lexical analysis is called a lexical
analyzer or scanner.

17
An Example
 Consider the statement:
Sum:=bonus+basic*50
 The statement is grouped into 7 tokens as follows:
S. No. Lexeme Token
(Sequence of characters) (Category of lexeme
1 Sum Identifier
2 := Assignment operator
3 Bonus Identifier
4 + Addition operator
5 Basic Identifier
6 * Multiplication operator
7 50 Integer constant

 The output of lexical analysis phase is of the form:


[id1,500] := [id2,700] + [id3,800] * [const,900]
18
Syntax analysis
 Second phase of compiler and performed by software called
Parser or Syntax Analyser.
 Creates the syntactic structure (Parse Tree)of the given program.
 Parser receives input in the form of token from its previous phase
and determines the structural elements of the program and their
relationship.
 Then parser constructs a parse tree from various tokens obtained
from lexical analyzer.

 There are 2 types of parsers:


 Bottom-up Parser
 Constructs parse tree from leaves & scan towards the root of tree.
 Top-down Parser
 Construct parse tree from root level and move downwards towards
19
leaves.
Semantic Analysis
 Semantic refers to the “meaning of the program”.
 Following are the functions performed by semantic analyzer:
 Type checking
 Checks or verifies that each operator has operands that are permitted by
source language definition or there should by type compatibility between
operator & operands.
 Implicit type conversion
 Changing one data type to another automatically when data type of
operands of an operator are different or there is any mismatch.
Type conversion
 E.g. int+real real+real=real

Semantic Analysis
 a+b*10 a + b * int to real (10)

20
Intermediate Code Generation
 After performing syntax and semantic analysis on program, compiler generates an intermediate
code
 Intermediate between source language and machine language

 Types of intermediate code:


 Postfix notation
 E.g. (a+b)*(c+d)
ab+cd+*
 Three address code
 These are statements of the form c=a op b
 i.e. there can be atmost three addresses, two for operands and one for result.
 Each instruction has atmost one operator on right hand side.
 E.g. d=a*b+c
 Three address code: t1=a*b
t2=t1+c
d=t2
 Syntax trees
 It is condensed form of parse tree in which leaves are identifiers and interior nodes will be operators.
 E.g. 2*7+3h

21
Code Optimization
 This phase improves the intermediate code so that faster
running object code can be produced.
 It performs the following tasks:
 Improve target code
 Eliminate redundant information (common sub-expressions)
 Remove unnecessary operation.
 Replaces slow instructions with faster ones.
 Types of Optimization:
 Local optimization
 Loop optimization
 Global data flow analysis

22
Types of Optimization
 Local Optimization
 Removes common sub expressions or redundant information
 E.g. T1=P+Q
S=P+Q+R Local Optimization
S=T1+R
Z=P+Q+M
Z=T1+M
 Loop Optimization
 It is very important to optimize loops so as to increase the performance of the whole
program.
 Statement which computes same value every time when the loop is executed is called
“Loop invariant computation”.
 These statements can be takes outside the loop resulting in decreasing the execution time
of loop and the whole program.
int a=5; int a=5;
int c; int c;
for (int i=1; i<=5; i++) Loop Optimization c=a+2;
{ cout<<i; for (int i=1; i<=5; i++)
c=a+2; { cout<<i;
} }

 Global Data Flow Analysis


 Performs optimization by examining the information flow between various data items.
 Determines information regarding the definition and use of data in a program
23
Code Generation
 Final phase of compilation process
 Converts optimized intermediate code given by code
optimizer into Assembly/ Machine language.

 Main tasks of code generation:


 Register Allocation
 What names in a program should be stored in registers
 Register Assignment
 In which register, names should be stored.

24
Symbol Table & Error Handler
 Symbol Table
 It is a data structure which contains tokens.
 Keeps record of each token & its attributes (i.e. identifier name,
data types, location etc.)
 This information will be used later by semantic analyzer and
code generator.

 Error Handler
 It detects and reports errors occurred at each phase of
compiler.

25
Example

26
Compiler Construction Tools
 Software tools developed to create one or more phases of compiler are called
compiler construction tools.
 Some of these are:
 Scanner generator
 Generates lexical analyzers
 Basic lexical analyzer is produced by finite automata which takes input in the form of regular
expressions.

 Parser generator
 Produces syntax analyzers which takes input in the form of programming language based on
context free grammar.

 Syntax directed translation engines


 Produces intermediate codes.

 Data flow engines


 Used in code optimization.
 Produces optimized code.

 Automatic code generators


27  Takes input in the form of intermediate code and convert it into machine language.
Lexical analysis
 First phase of compiler
 Reads source program one character at a time and convert it
into sequence of tokens.

28
Role of Lexical Analyzer
 Main functions of lexical analyzer are:
 Separate tokens from program and return those tokens to
parser.
 Eliminate comments, white spaces, new line characters etc.
from string.
 Inserts tokens into symbol table.
 Returns a numeric code fro each token to parser.

29
Input Buffering
 To identify tokens, lexical analyzer has to access secondary
memory every time.
 It is costly and time consuming.
 So, input strings are stored into buffer and scanned by lexical
analyzer.
 Lexical analyzer scans input strings from left to right one
character at a time to identify tokens.

 It uses two pointers to scan tokens:


 Begin pointer (bptr)
 Points to beginning of string to be read.
 Look ahead pointer (lptr)
30  It moves ahead to search for end of token.
31
Design of Lexical Analyzer
 Can be designed using Finite Automata or Transition
Diagrams.
 Finite Automata (Transition Diagram)
 It is a directed graph or flowchart used to recognize token.

 Transition diagram has 2 parts:


 States
 Represented by circles.

 Edges
 States are connected by edges arrows.
32
Transition Diagram
(Constants and Identifiers)

33
Transition Diagram (Keywords)

34
Transition Diagram (Relational Operators)

35
Specification of Tokens(Regular Expressions)
 Regular expressions are used to specify tokens.
 Provides convenient and useful notation.
 RE’s define the language accepted by Finite Automata (Transition
Diagram).
 RE’s are defined over an alphabet ∑.
 If R is a regular expression, then L(R) represents language
denoted by RE.

 Language
 It is a collection of strings over some fixed alphabet.
 Empty string can be denoted by ε.
 E.g.
 If L= set of strings of 0’s and 1’s of length two
 Then, L= {00, 01, 10, 11}
36
Operations on Languages
 Operations that can be performed on Languages are:
 Union
 Concatenation
 Kleen Closure
 Positive Closure

 Union
 L1 ∪ L2 = {set of strings in L1 & set of strings in L2}
 Concatenation
 L1L2 = {set of strings in L1 followed by strings in L2}
 Kleen Closure
 L1* = L10 ∪ L11 ∪ L12 ∪ …….
 Positive Closure
37
 L1+ = L11 ∪ L12 ∪ L13 ∪ …….
Rules of Regular Expressions
 ε is a Regular expression.
 Union of two Regular expressions R1 and R2 is also a Regular
expression.
 Concatenation of two Regular expressions R1 and R2 is also a
Regular expression.
 Closure of Regular Expression is also a Regular Expression.
 If R is a Regular Expression then (R) is also a Regular
Expression.

38
Algebraic Laws
 R1|R2 = R2|R1 or R1+R2 = R2+R1 (Commutative)

 R1|(R2|R3) = (R1|R2)+R3 (Associative)


or R1+(R2+R3) = (R1+R2)+R3

 R1(R2R3) = (R1R2)R3 (Associative)

 R1(R2|R3) = R1R2 | R1R3 (Distributive)


or R1(R2+R3) = R1R2+R1R3

 εR = R ε = R (Concatenation)
39
Recognition of Token (Finite Automata)
 It is a machine or a recognizer for a language that is used to
check whether string is accepted by a language or not.
 In Finite Automata,
 Finite means finite number of states .
 Automata means Automatic machine which works without
any interference of human being.

40
Finite Automata
 FA can be represented by 5 tuple (Q, ∑, δ,
q0, F)

 Where,
 Q: finite non empty set of states
 ∑: Finite set of input symbols
 δ: Transition function
 q0 : Initial state

41
 F: Set of final states
42
Types of Finite Automata
 Deterministic Finite Automata
 Deterministic means on each input there is one and only one
state to which automata can have transition from its current
state

 Non-Deterministic Finite Automata


 Non-Determinsitic means there can be several possible
transitions.
 Output is non-deterministic for a given output.

43
Deterministic Finite Automata (DFA)
 DFA is a 5 tuple (Q, ∑, δ, q0, F)

 Where,
 Q: finite non empty set of states
 ∑: Finite set of input symbols
 δ: Transition function to move from current
state to next state.
δ : Q × ∑ -> Q
 q0 : Initial state
 F: Set of final states

44
Non- Deterministic Finite Automata (NFA)
 NFA is a 5 tuple (Q, ∑, δ, q0, F)

 Where,
 Q: finite non empty set of states
 ∑: Finite set of input symbols
 δ: Transition function to move from current
state to next state.
δ : Q × ∑ -> 2Q
 q0 : Initial state
 F: Set of final states

45
Difference between DFA and NFA
DFA NFA
 Every transition from one state There can be multiple
to other is unique & transitions for an input i.e.
deterministic in nature. non-deterministic.
 Null transitions (ε) are not  Null transitions (ε) are allowed
allowed. means transition from current
state to next state without any
 Transition function
input.
 Transition function
δ : Q × ∑ -> Q
 Requires less memory as δ : Q × ∑ -> 2
Q

transitions & states are less.  Requires more memory.

46
Conversion of Regular expression to NFA

47
NFA for RE (a+b)*

48
NFA for a(a+b)*ab

49
NFA for a+b+ab

50
NFA for (0+1)*1(0+1)

51
ε-closure (s)
 ε-closure (s): it is a set of states that can be reached from
state s on ε-transitions alone.

52
Example: Find ε-closure of all states

53
Example: Find ε-closure of states 0,1,4

54
Example: Find ε-closure of all states

ε-closure(0) = {0, 1, 2, 5, 7}
ε-closure(1) = {1, 2, 5}
ε-closure(2) = {}
ε-closure(3) = {}
ε-closure(4) = {4, 7, 1, 2, 5}
ε-closure(5) = {}
ε-closure(6) = {}
55 ε-closure(7) = {}
NFA to DFA Conversion

56
Example: Draw NFA for RE (a+b)*abb.
Convert NFA to DFA

57
58
59
60
61
62
Minimizing number of states of DFA
 Minimizing means reducing the number of
states in DFA.
 The states should be eliminated in such a
way that resulting DFA should not effect the
language accepted by DFA.

63
64
65
66
67
68
69
Example: Minimize the following DFA

70
71
72
73
Language for Lexical Analyzers
 LEX is a source program used for the specification of lexical
analyzer.
 It is a tool or software which automatically generates Lexical Analyzer
(Finite Automata).
 It takes as input a LEX source program and produces Lexical Analyzer
as its output.
 Then Lexical Analyzer will convert the input string entered by user
into tokens as its output.

LEX Source
Program LEX Lexical
Analyzer

Input Lexical Tokens


String Analyzer
74
LEX Source Program
 Language for specifying or representing Lexical Analyzer.
 Components of LEX source program:
 Auxiliary Definitions
 Translation Rules

Auxiliary
LEX Definition
Source program Translation
Rules

75
Auxiliary Definitions
 It denotes the Regular Expressions of the form:

Where,
 Distinct name (Di) -> shortcut name of Regular Expression
 Regular Expression (Ri) -> Notation to represent collection of
input symbols.

76
Auxiliary Definition for Identifiers

77
78
Translation Rules
 It is a set of rules or actions which tells Lexical Analyzer what it
has to do.
or
 what it has to return to parser on encountering token.
 It consists of statements of the form:
P1{Action1}
P2{Action2}
:
:
Pn{Actionn}

Where,
 Pi -> pattern or Regular Expression consisting of input alphabets &
Auxiliary definition names.
 Actioni -> it is a piece of code which gets executed whenever token is
79
recognised.
Example

 If Lexical analyzer recognizes an “identifier”, the action


taken by the Lexical Analyzer is
 to install or store the name in symbol table
 return value 6 as integer code to the parser.
80
Implementation of Lexical Analyzer
 LEX generates Lexical Analyzer as its output by taking LEX program as
its input.
 LEX program is a collection of patterns (Regular expressions) & their
corresponding actions.
 Patterns represent the tokens to be recognized by lexical analyzer to be
generated.
 For each pattern, a corresponding NFA will be designed.
 There can be n number of patterns.
 A start state is taken and using ε-transition, all these NFAs can be
connected together to make combined NFA.
 The final state of each NFA show that it has found its own token Pi.
 Convert the NFA to DFA.
 The final state shows which token we have found.
 If states in DFA does not include any final state of NFA, there will be error
81
condition.
Example

82
83
84
85
86
87
88
89
END OF
UNIT-I

90

You might also like