0% found this document useful (0 votes)
25 views

Compilers

it covered topics related to compilers

Uploaded by

shop.inderjeet
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Compilers

it covered topics related to compilers

Uploaded by

shop.inderjeet
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 86

COMPILERS

PREPARED BY:
ER. INDERJEET BAL, ASSISTANT PROFESSOR,
DEPT. OF CS & IT,
HINDU KANYA COLLEGE, KAPURTHALA
INTRODUCTION
• A compiler is a computer program which helps you transform source code written in
a high-level language into low-level machine language.

• It translates the code written in one programming language to some other language
without changing the meaning of the code.

• The compiler also makes the end code efficient which is optimized for execution time and
memory space.

• The compiling process includes basic translation mechanisms and error detection.

• Compiler process goes through lexical, syntax, and semantic analysis at the front end, and
code generation and optimization at a back-end.
INTRODUCTION
INTRODUCTION
Features of Compilers

• Correctness

• Speed of compilation

• Preserve the correct the meaning of the code

• The speed of the target code

• Recognize legal and illegal program constructs

• Good error reporting/handling

• Code debugging help


TYPES OF COMPILER

Following are the different types of Compiler:


• Single Pass Compilers

• Two Pass Compilers

• Multipass Compilers
TYPES OF COMPILER

• Single Pass Compiler : In single pass Compiler source code directly


transforms into machine code. For example, Pascal language.
TYPES OF COMPILER…
• Two Pass Compiler: Two pass Compiler is divided into two sections, viz.
1. Front end: It maps legal code into Intermediate Representation (IR).
2. Back end: It maps IR onto the target machine
The Two pass compiler method also simplifies the retargeting process. It also
allows multiple front ends.
TYPES OF COMPILER…
• Multipass Compilers

• The multipass compiler processes the source code or syntax tree of a program
several times.
• It divided a large program into multiple small programs and process them. It
develops multiple intermediate codes.
• All of these multipass take the output of the previous phase as an input. So it
requires less memory. It is also known as 'Wide Compiler'.
TASKS OF COMPILER
• Breaks up the up the source program into pieces and impose grammatical structure on them

• Allows you to construct the desired target program from the intermediate representation
and also create the symbol table

• Compiles source code and detects errors in it

• Manage storage of all variables and codes.

• Support for separate compilation

• Read, analyze the entire program, and translate to semantically equivalent

• Translating the source code into object code depending upon the type of machine
HISTORY OF COMPILER

• Important Landmark of Compiler's history are as follows:

• The "compiler" word was first used in the early 1950s by Grace Murray Hopper

• The first compiler was build by John Backum and his group between 1954 and
1957 at IBM

• COBOL was the first programming language which was compiled on multiple
platforms in 1960

• The study of the scanning and parsing issues were pursued in the 1960s and 1970s
to provide a complete solution
STEPS FOR LANGUAGE PROCESSING SYSTEMS

Before knowing about the


concept of compilers, we first
need to understand a few other
tools which work with
compilers.
STEPS FOR LANGUAGE PROCESSING SYSTEMS…
• Preprocessor: The preprocessor is considered as a part of the Compiler. It is a tool which
produces input for Compiler. It deals with macro processing, augmentation, language
extension, etc.

• Interpreter: An interpreter is like Compiler which translates high-level language into low-
level machine language. The main difference between both is that interpreter reads and
transforms code line by line. Compiler reads the entire code at once and creates the
machine code.

• Assembler: It translates assembly language code into machine understandable language.


The output result of assembler is known as an object file which is a combination of
machine instruction as well as the data required to store these instructions in memory.
STEPS FOR LANGUAGE PROCESSING SYSTEMS…
• Linker: The linker helps you to link and merge various object files to create an executable file. All
these files might have been compiled with separate assemblers. The main task of a linker is to
search for called modules in a program and to find out the memory location where all modules are
stored.

• Loader: The loader is a part of the OS, which performs the tasks of loading executable files into
memory and run them. It also calculates the size of a program which creates additional memory
space.

• Cross-compiler: A Cross compiler in compiler design is a platform which helps you to generate
executable code.

• Source-to-source Compiler: Source to source compiler is a term used when the source code of
one programming language is translated into the source of another language.
WHY USE A COMPILER?
• Compiler verifies entire program, so there are no syntax or semantic errors
• The executable file is optimized by the compiler, so it is executes faster
• Allows you to create internal structure in memory
• There is no need to execute the program on the same machine it was built
• Translate entire program in other language
• Generate files on disk
• Link the files into an executable format
• Check for syntax errors and data types
• Helps you to enhance your understanding of language semantics
• Helps to handle language performance issues
• Opportunity for a non-trivial programming project
• The techniques used for constructing a compiler can be useful for other purposes as well
APPLICATION OF COMPILERS

• Compiler design helps full implementation Of High-Level Programming Languages

• Support optimization for Computer Architecture Parallelism

• Design of New Memory Hierarchies of Machines

• Widely used for Translating Programs

• Used with other Software Productivity Tools


PHASES OF COMPILER
1. LEXICAL ANALYSIS
• Lexical analysis is the first phase of a compiler.
• It takes the modified source code from language preprocessors that are written in the form of
sentences.
• The lexical analyzer breaks these syntaxes into a series of tokens, by removing any whitespace
or comments in the source code.
• If the lexical analyzer finds a token invalid, it generates an error. The lexical analyzer works
closely with the syntax analyzer.
• It reads character streams from the source code, checks for legal tokens, and passes the data to
the syntax analyzer when it demands.
1. LEXICAL ANALYSIS…
Tokens

• Lexemes are said to be a sequence of characters (alphanumeric) in a token. There are some
predefined rules for every lexeme to be identified as a valid token.

• These rules are defined by grammar rules, by means of a pattern. A pattern explains what can
be a token, and these patterns are defined by means of regular expressions.

• In programming language, keywords, constants, identifiers, strings, numbers, operators and


punctuations symbols can be considered as tokens.
2. SYNTAX ANALYSIS
• Syntax analysis or parsing is the second phase of a compiler.

• We have seen that a lexical analyzer can identify tokens with the help of regular
expressions and pattern rules. But a lexical analyzer cannot check the syntax of
a given sentence due to the limitations of the regular expressions.

• A syntax analyzer or parser takes the input from a lexical analyzer in the form
of token streams.

• The parser analyzes the source code (token stream) against the production rules
to detect any errors in the code. The output of this phase is a parse tree.
2. SYNTAX ANALYSIS…

• This way, the parser accomplishes two tasks, i.e., parsing the code, looking for errors and

generating a parse tree as the output of the phase.

• Parsers are expected to parse the whole code even if some errors exist in the program.
3. INTERMEDIATE CODE GENERATION
• If a compiler translates the source language to its target machine language without having the
option for generating intermediate code, then for each new machine, a full native compiler is
required.

• Intermediate code eliminates the need of a new full compiler for every unique machine by
keeping the analysis portion same for all the compilers.

• The second part of compiler, synthesis, is changed according to the target machine.

• It becomes easier to apply the source code

modifications to improve code performance by

applying code optimization techniques on the

intermediate code.
4. CODE OPTIMIZATION

• Optimization is a program transformation technique, which tries to improve the code by


making it consume less resources (i.e. CPU, Memory) and deliver high speed.

• In optimization, high-level general programming constructs are replaced by very efficient


low-level programming codes. A code optimizing process must follow the three rules given
below:
 The output code must not, in any way, change the meaning of the program.

 Optimization should increase the speed of the program and if possible, the program should
demand less number of resources.

 Optimization should itself be fast and should not delay the overall compiling process.
4. CODE OPTIMIZATION…
There are different code optimization techniques that are organized into different categories

depending on their functional areas or scopes. These categories are:

1. Peephole optimization

2. Local optimization

3. Loop optimization

4. Global or intraprocedural optimizations

5. Interprocedural or whole-program optimization


5. STORAGE ALLOCATION

The different storage allocation strategies are :

1. Static allocation - lays out storage for all data objects at compile time

2. Dynamic allocation - allocates and deallocates storage as needed at run

time from a data area known as heap.


6. CODE GENERATION
• Code generation is the last and final phase of a compiler. It gets inputs from code
optimization phases and produces the page code or object code as a result. The
objective of this phase is to allocate storage and generate relocatable machine code.

• It also allocates memory locations for the variable. The instructions in the
intermediate code are converted into machine instructions. This phase coverts the
optimize or intermediate code into the target language.

• The target language is the machine code. Therefore, all the memory locations and
registers are also selected and allotted during this phase. The code generated by this
phase is executed to take inputs and generate expected outputs.
Table Management

• Table management task is known as book keeping.

• A table contains a record for each identifier with fields for the attributes of the
identifier.

• The data structure used to record this information is called a Uniform Symbol Table.

• This component makes it easier for the compiler to search the identifier record and
retrieve it quickly.

• The symbol table also helps you for the scope management.

• The symbol table and error handler interact with all the phases and symbol table update
correspondingly.
Error Handling Routine:

• In the compiler design process error may occur in all the below-given phases:

• Lexical analyzer: Wrongly spelled tokens

• Syntax analyzer: Missing parenthesis

• Intermediate code generator: Mismatched operands for an operator

• Code Optimizer: When the statement is not reachable

• Code Generator: Unreachable statements

• Symbol tables: Error of multiple declared identifiers


GROUPING OF PHASES
• Front End

• Lexical analysis

• Syntax Analysis

• Generation of intermediate code

• Back End

• Code optimization

• Code generation
DATABASES USED BY COMPILER

• Source Code • Literal Table

• Uniform Symbol Table • Reductions

• Terminal Table • Matrix

• Identifier Table • Code Productions


LEXICAL ANALYSIS
Performs following tasks:

1. Parse the source program into the basic elements or tokens of the
language and procedure stream of tokens.

2. Eliminates blank and comments

3. Build a literal table and an identifier table

4. Build a uniform symbol table

5. Keep track of line numbers

6. Reports errors encountered while generating tokens.


LEXICAL ANALYSIS… DATABASES USED BY LEXICAL ANALYSIS
LEXICAL ANALYSIS…

Algorithms used by Lexical Analysis Phase

1. First task of string is separated into tokens by break characters.

2. Validity of source character’s checked

3. If non-break characters, then they are accumulated in tokens

4. Recognize three categories of tokens

5. First checks all tokens by comparing them with the entries in


terminal table.
LEXICAL ANALYSIS…

6. If match found, the token is recognize as terminal symbol and it then creates an
uniform symbol of type ‘TRM’ and inserts it into uniform symbol table.

7. If the token is not a terminal symbol, the lexical analysis checks to see whether it
is a identifier or a literal.

8. If a token is an identifier, it is entered into identifier table. It enters this token into
uniform symbol table with the type ‘IDN’

9. If a token is a literal, Lexical Analysis enters it into literal table. It also enters this
literal into uniform symbol table with type ‘LIT’.
LEXICAL ANALYSIS…
LEXICAL ANALYSIS…
LEXICAL ANALYSIS…
LEXICAL ANALYSIS…
SYNTAX ANALYSIS
• Syntax Analysis is a second phase of the compiler design process in
which the given input string is checked for the confirmation of rules and
structure of the formal grammar.

• It analyses the syntactical structure and checks if the given input is in the
correct syntax of the programming language or not.

• Syntax Analysis in Compiler Design process comes after the Lexical


analysis phase. It is also known as the Parse Tree or Syntax Tree.
SYNTAX ANALYSIS…

• The Parse Tree is developed with the help of pre-defined grammar of the

language. The syntax analyser also checks whether a given program

fulfills the rules implied by a context-free grammar.

• If it satisfies, the parser then creates the parse tree of that source program.

Otherwise, it will display error messages.


SYNTAX ANALYSIS…

Role of Parser
SYNTAX ANALYSIS…
INTERMEDIATE CODE GENERATION
• In the analysis-synthesis model of a compiler, the front end translates a source program
into an intermediate representation (IR) and back end generates the target code.

• The type of intermediate form generated by compiler depends on the syntactic


construction i.e it depends upon the type of statement e.g. arithmetic, non-arithmetic or
non executable statement.

• The intermediate form for arithmetic statement

The various statements are:

1. Postfix Notation

2. Parse tree

3. Matrix or triple
INTERMEDIATE CODE GENERATION…

1. Postfix Notation: Examples of postfix notations are given below-

2. The postfix notation is used in many high level languages including SNOBAL.

3. The postfix notation is a popular intermediate code in non-optimizing compilers due to


ease of generation and use.
INTERMEDIATE CODE GENERATION…
2. Parse tree : Parse tree represents the source program statements in a hierarchical form.

COST = RATE * (START – Finish) + 2 * RATE * (START – FINISH -100)


INTERMEDIATE CODE GENERATION…

3. Matrix or triple

• Matrix or triple is a linear representation of Parse Tree.

• In matrix or triple operations of the program are listed sequentially in the order they
would be executed.

• Each matrix entry has one operator and two operands:

• The operands are uniform symbols representing either variables, literals or other matrix
entries Mi ( I denotes a matrix entry number).

• Figure shows the matrix representation for the following arithmetic statement.

• COST = RATE * (START – Finish) + 2 * RATE * (START – FINISH -100)


INTERMEDIATE CODE GENERATION…
3. Matrix or triple

Figure : Matrix representation


DATABASES USED BY INTERMEDIATE CODE
GENERATION PHASE
DATABASES USED BY INTERMEDIATE CODE
GENERATION PHASE…
CODE OPTIMIZATION
• The main purpose of code optimization is to improve execution efficiency of a program.

• This execution efficiency is achieved in two ways:

• Figure shows the schematic of an optimizing compiler. The front end generates an
intermediate representation(IR) which could consist of parse trees, matrix etc. The
optimization phase transforms this to achieve optimization. The transformed
intermediate representation (IR) is given as input to back end to generate target
program.
CODE OPTIMIZATION
However, any optimization attempted by the compiler must satisfy following
conditions:

1. The algorithm should not be modified in any sense

2. Semantic equivalence with the source program must be maintained.

• Efficient code generation for a specific target machine is beyond the scope of a
optimization phase; it is the task of back end of a computer.

• The optimization techniques are thus independent of both the programming


language and the target code.
CODE OPTIMIZATION : DATABASES USED
CODE OPTIMIZATION…

Advantages- The optimized code has the following advantages-

• Optimized code has faster execution speed.

• Optimized code utilizes the memory efficiently.

• Optimized code gives better performance.


CODE OPTIMIZATION TECHNIQUES
The various code optimization techniques that can be used are:

1. Elimination of common sub-expression

2. Compile time compute or evaluation

3. Variable propagation

4. Code movement optimization

5. Strength reduction

6. Dead code elimination

7. Loop optimization

8. Boolean expression optimization


CODE OPTIMIZATION TECHNIQUES…
1. Elimination of common sub-expression
In this technique, As the name suggests,

• it involves eliminating the common sub expressions.

• The redundant expressions are eliminated to avoid their re-computation.

• The already computed result is used in the further program when required.
CODE OPTIMIZATION TECHNIQUES…
Example : Elimination of common sub-expression
CODE OPTIMIZATION TECHNIQUES…

2. Compile time compute or evaluation :

• The execution efficiency of a program can be improved by performing certain action


or computations specified in the program during compilation itself.

• In this way these actions or computations need not be performed during the execution
of the program and this reduces the execution time of the program.

Two techniques that falls under compile time evaluation are-

a) Constant Folding

b) Constant Propagation
CODE OPTIMIZATION TECHNIQUES…
2. Compile time compute or evaluation …

Constant Folding
• In this technique, As the name suggests, it involves folding the constants.
• The expressions that contain the operands having constant values at compile time are
evaluated.
• Those expressions are then replaced with their respective results.

Example-

Circumference of Circle = (22/7) x Diameter

Here, This technique evaluates the expression


22/7 at compile time. The expression is then
replaced with its result 3.14. This saves the time
at run time.
CODE OPTIMIZATION TECHNIQUES…
2. Compile time compute or evaluation …

Constant Propagation : In this technique,


• If some variable has been assigned some constant value, then it replaces that variable with
its constant value in the further program during compilation.
• The condition is that the value of variable must not get alter in between.
• Example-
pi = 3.14 and radius = 10
Area of circle = pi x radius x radius
• Here, This technique substitutes the value of variables ‘pi’ and ‘radius’ at compile time.
• It then evaluates the expression 3.14 x 10 x 10.
• The expression is then replaced with its result 314.
• This saves the time at run time.
CODE OPTIMIZATION TECHNIQUES…

3. Variable propagation

• Variable propagation is similar to constant propagation

• Here, Variable is replaced by another variable that is having same values.

• For example, Consider the following piece of code:


c=d;
… c=d;

z=c+e …
x=d+e – 100;
… …

z=d+e
CODE OPTIMIZATION TECHNIQUES…
4. Code Movement optimization
• In this technique, As the name suggests, it involves movement of the code.
• The code present inside the loop is moved out if it does not matter whether it is present inside or
outside.
• Such a code unnecessarily gets execute again and again with each iteration of the loop.
• This leads to the wastage of time at run time.
CODE OPTIMIZATION TECHNIQUES…
4. Code Movement optimization
Code movement is performed so as to:
 Reduce the size of program
 Reduce execution frequency of the code subjected to movement.
a) Code Space Reduction : It is similar to common sub expression elimination.
The objective is not to reduce the execution frequency of the common sub
expression. This reduces the code size by generating the code for common sub
expression only once. This technique is called Code Hoisting.
Temp = x*4;
If a< b then
If a< b then
z=x*4;
z=temp;
---
---
---
---
Else
Else
y=x*4+20
y=temp+20
CODE OPTIMIZATION TECHNIQUES
4. Code Movement optimization

b) Execution frequency reduction

• Frequency reduction means to make common code execute fewer times.

• This identifies the common code that is to be evaluated at various places in the
program and moves them to the place where they are evaluated fewer times.
Hence, reducing the frequency of their execution.

• There are two ways of implementing frequency reduction:

I. Code Hoisting

II. Loop Optimization


CODE OPTIMIZATION TECHNIQUES
Code Hoisting : code hoisting refers to promoting the code to some earlier part of the
program.
• It can be used to reduce execution frequency of an expression if the expression is
partially available along at least one path reaching the evaluation of the expression.
• For exmaple: consider the following code movement
If a<b then If a<b then

temp=x*2

z=x*2; z=temp;

… …
else else

Loop Optimization:
y=20; this involves movingy=20;
invariant computations outside the loop.
CODE OPTIMIZATION TECHNIQUES…
5. Strength Reduction : In this technique,
• As the name suggests, it involves reducing the strength of expressions.
• This technique replaces the expensive and costly operators with the simple and
cheaper ones.

Here,
The expression “A x 2” is replaced with the expression “A + A”.
This is because the cost of multiplication operator is higher than that of addition operator.
Code optimization techniques…

6. Dead Code Elimination : In this technique,


• As the name suggests, it involves eliminating the dead code.
• The statements of the code which either never executes or are unreachable or their
output is never used are eliminated.
Code optimization techniques…

7. Loop optimization : The various loop optimization techniques are:

Loop Fusion or Loop


Loop fission Loop reversal Loop unrolling
loop combining interchange
Loop optimization : Loop fission
• Loop fission breaks a loop into multiple loops over the same index range.
Hence the name fission.

• Loop fission improves the locality of reference.

• For example: Consider the following code segment:


Before optimization After optimization
for(i=0;i<100;i++) for(i=0;i<100;i+
{ +) a[i]=…
a[i]=… for(i=0;i<100;i+
b[i]=… +) b[i]=…
}
Loop optimization : Loop fussion

• Loop fusion transformation is just the opposite of loop fission.

• Loop fusion consists of combining adjacent or closely located loops into


a single loop.

• The benefits of loop fusion are similar to loop unrolling.

• The approach attempts to reduce the overhead due to loop setup, loop
condition check and loop termination.
• This optimization technique changes inner loops with outer loops.

• When the loop variables index into an array, such a transformation can improve the
locality of reference, depending on the array’s layout.

Before optimization After optimization

for(i=0;i<100;i++) for(j=0;j<100;j++)
for(j=0;j<100;j++) for(i=0;i<100;i++)
a[j][i]=… a[j][i]=…
Loop optimization : Loop Reversal

• Reverses the order in which values are assigned to the index variable.

• This can help eliminate dependencies and thus enable other optimizations.

Before optimization After optimization

for(i=0;i<100;i++) for(i=99;i>=0;i--)

a[99-i]=… a[i]=…
Loop optimization : Loop Unrolling

• The objective of loop unrolling is to increase a program’s speed by

reducing instructions that control the loop, such as pointer arithmetic and

“end of loop” test condition on each iteration.

• To eliminate or reduce this computational overhead, loops can be re-

written as a repeated sequence of similar independent statements.

• It also reduces the number of jumps from end of the loop to the beginning.
8. Boolean Expression Optimization

• We can also use the properties of Boolean expression to minimize their computations.

• For example, consider the following statements:


If a or b or c then
….
….

• Here a, b and c are expressions.

• Rather than generate code that will always test each of the expression a, b, c. We

generate code so that if a is compared as true, then b or c is not computed and

similarly for b.
Local Optimization

• Scope of local optimization is limited to basic block.

• Cost of local optimization is low.

• The most commonly used local optimization transformations are variable

propagation and elimination of common sub expression.

• It has limited scope.

• Its just restricted over a single basic block.


Local Optimization
Global Optimization

• Global optimization is the optimization that can be applied to a program

unit i.e its scope is a program unit.

• Can achieve better transformations and can produce more optimized code

than local optimization.


Storage Allocation
Runtime environment manages runtime memory requirements for the following
entities:

• Code : It is known as the text part of a program that does not change at runtime. Its
memory requirements are known at the compile time.

• Procedures : Their text part is static but they are called in a random manner. That is
why, stack storage is used to manage procedure calls and activations.

• Variables : Variables are known at the runtime only, unless they are global or
constant. Heap memory allocation scheme is used for managing allocation and de-
allocation of memory for variables in runtime.
Storage Allocation : Databases used
Storage Allocation methods

Memory Allocation
• Static Memory Allocation
• Dynamic Memory Allocation
Storage Allocation : Static Memory allocation

• In static allocation, names are bound to storage locations.

• If memory is created at compile time then the memory will be created in static
area and only once.

• Static allocation supports the dynamic data structure that means memory is
created only at compile time and deallocated after program completion.

• The drawback with static storage allocation is that the size and position of data
objects should be known at compile time.

• Another drawback is restriction of the recursion procedure.


Storage Allocation : Dynamic Memory allocation

• Memory bindings are established and destroyed during the execution of


program.

• Dynamic memory allocation is implemented by language like PL/I,


PASCAL, ADA etc.

• Two types are:


• Automatic Allocation

• Program controlled allocation


Storage Allocation
Code Generation
• Code generation can be considered as the final phase of compilation. Optimization process
can be applied on the code, but that can be seen as a part of code generation phase itself.

• The code generated by the compiler is an object code of some lower-level programming
language, for example, assembly language.

• We have seen that the source code written in a higher-level language is transformed into a
lower-level language that results in a lower-level object code, which should have the
following minimum properties:

• It should carry the exact meaning of the source code.

• It should be efficient in terms of CPU usage and memory management.


Code Generation : Code productions

Code Productions
Code Generation : Databases used
Code Generation : Code generation using Code productions
END
https://forms.gle/bjV9UY1Tx3cyxbqR8

You might also like