Module 1
Module 1
Compiler Design
Module 1
Module 1 - Introduction
Objective: To understand the processes involved in Compiler Design.
1. Introduction :
This module starts with discussing the need for a Translator, Compiler. This module also
tries to group the compiler into phases which will be discussed in the later part of this
module. To begin, let us get to introduce a brief history of compilers.
1.1 A brief History.
In this Context, software can be defined as an essential component of the current
scenario. Normally in earlier days software was written in assembly language. The
instructions are written in Mnemonic code. For example, to add two numbers the
following would be the assembly code.
MOV R1, a
MOV R2, b
ADD R1, R1, R2 1.1
In statement (1.1), MOV is a command that would move the value stored in variable ‘a’
to register ‘R1’, ‘b’ to R2. The command ADD then adds the contents of the registers R1
and R2 and stores the result in R1. As one could observe, these instructions are closer to
the machine than to the human. The drawbacks of writing programs in assembly
instructions are:
– Very difficult to remember instructions
– Benefits of reusing software on different CPUs became greater than the cost of
designing compiler
– Very cumbersome to write
These drawbacks trigger the need for software that will understand human language and
that is the birth of Language Processors called translators.
1.1.2. Language Processors
A translator is one that converts a source program written in one language to a target
program in another language. This is similar to having a translator when two people who
doesn’t know the other person’s language want to communicate. In the context of
computer Science, a Source program is written in one programming language and a
Target Program typically belongs to machine language. The Target language is called
machine language as it is easier for the machine to understand. Some of the translators
are Assembler, Compiler and Interpreter. Compiler converts programs written in high-
level programming language to assembly language. Assemblers convert assembly
language programs to machine language (object language).
The translators help programmers to write programs in a language that is easier for them
to remember and understand and converts them into a language that is closer to the
machine. This results in the following ways of designing software:
a. Design an interpreter / translator to convert human language to machine language
The interpreter will have difficulties in parsing which may be ambiguous. For
example, inefficient parsing would result in incorrect word boundaries during
interpretation resulting in ambiguity.
b. Design a compiler that will understand high level language which is not necessarily in
English but closer to English and convert that to assembly language.
The design is complex but parsing ambiguity could be avoided. The major drawback
is the mapping of the high level language to assembly language. This also necessitates
the designing a compiler for every high level programming language keeping in mind
the instruction set of the target assembly language.
c. Design an assembler that converts assembly language to machine language
The drawback of this is that the target language needs to be specified. Output of the
various compilers to be known prior time
So, our aim is to design a Compiler and Assembler for converting high-level language
to machine language. In addition, certain other things are need for pre-processing and
execution which is discussed in the next section.
A typical Language Processing system is given in Figure 1.1. The source program –
program written in high-level programming language goes through a pre-processor. The
pre-processor replaces macros and converts them into a complete code. For example, if
we have a statement called #define MAX 100, in the source program, the pre-processor
replaces MAX with 100 in all the places in the source program and passes it to the
compiler. The compiler converts this to assembly language and the assembler converts to
object language. At this point, the object language is called as the re-locatable object
code. The code is re-locatable as it doesn’t have the exact address of the memory at
which this code is to be loaded for execution. This re-locatable machine code is passed on
to the linker. The linker will link multiple source files into one or link the current source
files with the object code of the standard library and gets one object file. This file is then
loaded into the main memory for execution by the loader.
Programming Machine
Language Language
Compiler
(Source) (Target)
1.3 Interpreter
An Interpreter is a language processor that executes the operation as specified in the
source program. The inputs are supplied by the user. The interpreter processes an internal
form of the source program and data at the same time (at run time) and therefore no
object program is generated.
1.3.1 Compiler vs Interpreter
The following are some comparison between the compiler and the interpreter.
For a compiler, a higher degree of machine independence exists and hence it
facilitates high portability.
A compiler supports dynamic execution. This helps in making modification or
addition to user programs even during execution.
A compiler also supports dynamic data type which helps in supporting the change
in the type of object even during runtime
An Interpreter on the other hand requires no synthesis part.
Interpreter provides better diagnostics: more source text information available
The machine-language target program produced by a compiler is much faster than
an interpreter at mapping input to output.
An interpreter is better with error diagnostics as it executes the source program
statement by statement.
The process of Compilation and Interpretation is given in Figures 1.3 and 1.4
respectively.
Data
Data
Source
Compiler Result
program
Source Program
Target Program
The compiler’s analysis and synthesis part is grouped into 6 phases and is shown in Figure
1.6. The first three phases belong to the analysis phase and the last three phases to the
synthesis phase. All the phases of the compiler interacts with the symbol table and the error
handler.
Code Optimization: This phase can operate either before or after code generation. The aim of
his phase is to improve the intermediate code so that it results in better target code. This phase
also aims at generating faster, shorter code, so that target code is generated that consumes less
power. The important characteristic of this phase is to carry out simple optimizations that
significantly improve the running time of the target program without slowing down compilation
Code Generation: This phase generation target assembly language. In this phase, the registers
or memory locations are selected for each of the variables used by the program. The inputs to
this phase which are the intermediate instructions are translated into sequences of machine
instructions to complete an operation. One of the important consideration of code generation is
the assignment of registers to hold variables as we have limited number of registers. This phase
also need to decide on the choice of instructions involving registers, memory or a mix of the two.
Symbol Table: The symbol table is implemented as a data structure containing a record for each
variable name, with fields for the attributes of the name. The symbol table is designed to help the
compiler to identify and fetch the record for each name quickly. The symbol table has attributes
that may provide information about the storage allocated for a name, its type, its scope. It also
provides details on the function or procedure names, such things as the number and types of its
arguments, the method of passing each argument and the return type.
Error Handler: The errors encountered in every phase are logged into the error handler for
subsequent reporting to the user. The compiler however, recovers from the errors in every phase
so that it can proceed with the compilation process. The compiler recovers from errors in either
the panic mode of error recovery or phrase mode of error recovery.
Multi-pass Compiler: Several phases can be implemented as a single pass consist of reading an
input file and writing an output file. A typical multi-pass compiler could do the following:
• First pass: preprocessing, macro expansion
• Second pass: syntax-directed translation, IR code generation
• Third pass: optimization
• Last pass: target machine code generation
1.5 Summary
This module discussed need for a compiler and the various phases of the compiler.