Compilers
Compilers
PREPARED BY:
ER. INDERJEET BAL, ASSISTANT PROFESSOR,
DEPT. OF CS & IT,
HINDU KANYA COLLEGE, KAPURTHALA
INTRODUCTION
• A compiler is a computer program which helps you transform source code written in
a high-level language into low-level machine language.
• It translates the code written in one programming language to some other language
without changing the meaning of the code.
• The compiler also makes the end code efficient which is optimized for execution time and
memory space.
• The compiling process includes basic translation mechanisms and error detection.
• Compiler process goes through lexical, syntax, and semantic analysis at the front end, and
code generation and optimization at a back-end.
INTRODUCTION
INTRODUCTION
Features of Compilers
• Correctness
• Speed of compilation
• Multipass Compilers
TYPES OF COMPILER
• The multipass compiler processes the source code or syntax tree of a program
several times.
• It divided a large program into multiple small programs and process them. It
develops multiple intermediate codes.
• All of these multipass take the output of the previous phase as an input. So it
requires less memory. It is also known as 'Wide Compiler'.
TASKS OF COMPILER
• Breaks up the up the source program into pieces and impose grammatical structure on them
• Allows you to construct the desired target program from the intermediate representation
and also create the symbol table
• Translating the source code into object code depending upon the type of machine
HISTORY OF COMPILER
• The "compiler" word was first used in the early 1950s by Grace Murray Hopper
• The first compiler was build by John Backum and his group between 1954 and
1957 at IBM
• COBOL was the first programming language which was compiled on multiple
platforms in 1960
• The study of the scanning and parsing issues were pursued in the 1960s and 1970s
to provide a complete solution
STEPS FOR LANGUAGE PROCESSING SYSTEMS
• Interpreter: An interpreter is like Compiler which translates high-level language into low-
level machine language. The main difference between both is that interpreter reads and
transforms code line by line. Compiler reads the entire code at once and creates the
machine code.
• Loader: The loader is a part of the OS, which performs the tasks of loading executable files into
memory and run them. It also calculates the size of a program which creates additional memory
space.
• Cross-compiler: A Cross compiler in compiler design is a platform which helps you to generate
executable code.
• Source-to-source Compiler: Source to source compiler is a term used when the source code of
one programming language is translated into the source of another language.
WHY USE A COMPILER?
• Compiler verifies entire program, so there are no syntax or semantic errors
• The executable file is optimized by the compiler, so it is executes faster
• Allows you to create internal structure in memory
• There is no need to execute the program on the same machine it was built
• Translate entire program in other language
• Generate files on disk
• Link the files into an executable format
• Check for syntax errors and data types
• Helps you to enhance your understanding of language semantics
• Helps to handle language performance issues
• Opportunity for a non-trivial programming project
• The techniques used for constructing a compiler can be useful for other purposes as well
APPLICATION OF COMPILERS
• Lexemes are said to be a sequence of characters (alphanumeric) in a token. There are some
predefined rules for every lexeme to be identified as a valid token.
• These rules are defined by grammar rules, by means of a pattern. A pattern explains what can
be a token, and these patterns are defined by means of regular expressions.
• We have seen that a lexical analyzer can identify tokens with the help of regular
expressions and pattern rules. But a lexical analyzer cannot check the syntax of
a given sentence due to the limitations of the regular expressions.
• A syntax analyzer or parser takes the input from a lexical analyzer in the form
of token streams.
• The parser analyzes the source code (token stream) against the production rules
to detect any errors in the code. The output of this phase is a parse tree.
2. SYNTAX ANALYSIS…
• This way, the parser accomplishes two tasks, i.e., parsing the code, looking for errors and
• Parsers are expected to parse the whole code even if some errors exist in the program.
3. INTERMEDIATE CODE GENERATION
• If a compiler translates the source language to its target machine language without having the
option for generating intermediate code, then for each new machine, a full native compiler is
required.
• Intermediate code eliminates the need of a new full compiler for every unique machine by
keeping the analysis portion same for all the compilers.
• The second part of compiler, synthesis, is changed according to the target machine.
intermediate code.
4. CODE OPTIMIZATION
Optimization should increase the speed of the program and if possible, the program should
demand less number of resources.
Optimization should itself be fast and should not delay the overall compiling process.
4. CODE OPTIMIZATION…
There are different code optimization techniques that are organized into different categories
1. Peephole optimization
2. Local optimization
3. Loop optimization
1. Static allocation - lays out storage for all data objects at compile time
• It also allocates memory locations for the variable. The instructions in the
intermediate code are converted into machine instructions. This phase coverts the
optimize or intermediate code into the target language.
• The target language is the machine code. Therefore, all the memory locations and
registers are also selected and allotted during this phase. The code generated by this
phase is executed to take inputs and generate expected outputs.
Table Management
• A table contains a record for each identifier with fields for the attributes of the
identifier.
• The data structure used to record this information is called a Uniform Symbol Table.
• This component makes it easier for the compiler to search the identifier record and
retrieve it quickly.
• The symbol table also helps you for the scope management.
• The symbol table and error handler interact with all the phases and symbol table update
correspondingly.
Error Handling Routine:
• In the compiler design process error may occur in all the below-given phases:
• Lexical analysis
• Syntax Analysis
• Back End
• Code optimization
• Code generation
DATABASES USED BY COMPILER
1. Parse the source program into the basic elements or tokens of the
language and procedure stream of tokens.
6. If match found, the token is recognize as terminal symbol and it then creates an
uniform symbol of type ‘TRM’ and inserts it into uniform symbol table.
7. If the token is not a terminal symbol, the lexical analysis checks to see whether it
is a identifier or a literal.
8. If a token is an identifier, it is entered into identifier table. It enters this token into
uniform symbol table with the type ‘IDN’
9. If a token is a literal, Lexical Analysis enters it into literal table. It also enters this
literal into uniform symbol table with type ‘LIT’.
LEXICAL ANALYSIS…
LEXICAL ANALYSIS…
LEXICAL ANALYSIS…
LEXICAL ANALYSIS…
SYNTAX ANALYSIS
• Syntax Analysis is a second phase of the compiler design process in
which the given input string is checked for the confirmation of rules and
structure of the formal grammar.
• It analyses the syntactical structure and checks if the given input is in the
correct syntax of the programming language or not.
• The Parse Tree is developed with the help of pre-defined grammar of the
• If it satisfies, the parser then creates the parse tree of that source program.
Role of Parser
SYNTAX ANALYSIS…
INTERMEDIATE CODE GENERATION
• In the analysis-synthesis model of a compiler, the front end translates a source program
into an intermediate representation (IR) and back end generates the target code.
1. Postfix Notation
2. Parse tree
3. Matrix or triple
INTERMEDIATE CODE GENERATION…
2. The postfix notation is used in many high level languages including SNOBAL.
3. Matrix or triple
• In matrix or triple operations of the program are listed sequentially in the order they
would be executed.
• The operands are uniform symbols representing either variables, literals or other matrix
entries Mi ( I denotes a matrix entry number).
• Figure shows the matrix representation for the following arithmetic statement.
• Figure shows the schematic of an optimizing compiler. The front end generates an
intermediate representation(IR) which could consist of parse trees, matrix etc. The
optimization phase transforms this to achieve optimization. The transformed
intermediate representation (IR) is given as input to back end to generate target
program.
CODE OPTIMIZATION
However, any optimization attempted by the compiler must satisfy following
conditions:
• Efficient code generation for a specific target machine is beyond the scope of a
optimization phase; it is the task of back end of a computer.
3. Variable propagation
5. Strength reduction
7. Loop optimization
• The already computed result is used in the further program when required.
CODE OPTIMIZATION TECHNIQUES…
Example : Elimination of common sub-expression
CODE OPTIMIZATION TECHNIQUES…
• In this way these actions or computations need not be performed during the execution
of the program and this reduces the execution time of the program.
a) Constant Folding
b) Constant Propagation
CODE OPTIMIZATION TECHNIQUES…
2. Compile time compute or evaluation …
Constant Folding
• In this technique, As the name suggests, it involves folding the constants.
• The expressions that contain the operands having constant values at compile time are
evaluated.
• Those expressions are then replaced with their respective results.
Example-
3. Variable propagation
z=d+e
CODE OPTIMIZATION TECHNIQUES…
4. Code Movement optimization
• In this technique, As the name suggests, it involves movement of the code.
• The code present inside the loop is moved out if it does not matter whether it is present inside or
outside.
• Such a code unnecessarily gets execute again and again with each iteration of the loop.
• This leads to the wastage of time at run time.
CODE OPTIMIZATION TECHNIQUES…
4. Code Movement optimization
Code movement is performed so as to:
Reduce the size of program
Reduce execution frequency of the code subjected to movement.
a) Code Space Reduction : It is similar to common sub expression elimination.
The objective is not to reduce the execution frequency of the common sub
expression. This reduces the code size by generating the code for common sub
expression only once. This technique is called Code Hoisting.
Temp = x*4;
If a< b then
If a< b then
z=x*4;
z=temp;
---
---
---
---
Else
Else
y=x*4+20
y=temp+20
CODE OPTIMIZATION TECHNIQUES
4. Code Movement optimization
• This identifies the common code that is to be evaluated at various places in the
program and moves them to the place where they are evaluated fewer times.
Hence, reducing the frequency of their execution.
I. Code Hoisting
temp=x*2
z=x*2; z=temp;
… …
else else
Loop Optimization:
y=20; this involves movingy=20;
invariant computations outside the loop.
CODE OPTIMIZATION TECHNIQUES…
5. Strength Reduction : In this technique,
• As the name suggests, it involves reducing the strength of expressions.
• This technique replaces the expensive and costly operators with the simple and
cheaper ones.
Here,
The expression “A x 2” is replaced with the expression “A + A”.
This is because the cost of multiplication operator is higher than that of addition operator.
Code optimization techniques…
• The approach attempts to reduce the overhead due to loop setup, loop
condition check and loop termination.
• This optimization technique changes inner loops with outer loops.
• When the loop variables index into an array, such a transformation can improve the
locality of reference, depending on the array’s layout.
for(i=0;i<100;i++) for(j=0;j<100;j++)
for(j=0;j<100;j++) for(i=0;i<100;i++)
a[j][i]=… a[j][i]=…
Loop optimization : Loop Reversal
• Reverses the order in which values are assigned to the index variable.
• This can help eliminate dependencies and thus enable other optimizations.
for(i=0;i<100;i++) for(i=99;i>=0;i--)
a[99-i]=… a[i]=…
Loop optimization : Loop Unrolling
reducing instructions that control the loop, such as pointer arithmetic and
• It also reduces the number of jumps from end of the loop to the beginning.
8. Boolean Expression Optimization
• We can also use the properties of Boolean expression to minimize their computations.
• Rather than generate code that will always test each of the expression a, b, c. We
similarly for b.
Local Optimization
• Can achieve better transformations and can produce more optimized code
• Code : It is known as the text part of a program that does not change at runtime. Its
memory requirements are known at the compile time.
• Procedures : Their text part is static but they are called in a random manner. That is
why, stack storage is used to manage procedure calls and activations.
• Variables : Variables are known at the runtime only, unless they are global or
constant. Heap memory allocation scheme is used for managing allocation and de-
allocation of memory for variables in runtime.
Storage Allocation : Databases used
Storage Allocation methods
Memory Allocation
• Static Memory Allocation
• Dynamic Memory Allocation
Storage Allocation : Static Memory allocation
• If memory is created at compile time then the memory will be created in static
area and only once.
• Static allocation supports the dynamic data structure that means memory is
created only at compile time and deallocated after program completion.
• The drawback with static storage allocation is that the size and position of data
objects should be known at compile time.
• The code generated by the compiler is an object code of some lower-level programming
language, for example, assembly language.
• We have seen that the source code written in a higher-level language is transformed into a
lower-level language that results in a lower-level object code, which should have the
following minimum properties:
Code Productions
Code Generation : Databases used
Code Generation : Code generation using Code productions
END
https://forms.gle/bjV9UY1Tx3cyxbqR8