Compiler Design
Introduction
1
Source Program
Preprocessor
Modified Source Program
Compiler
Language Target Assembly Program
Processing
Compiler Construction
Assembler
System Relocatable Machine Code
Linker Library Files
Relocatable Object Files
Target Machine Code
Loader
2
Results
Language Processors: Translators.
• A translator inputs and then converts a source program into
an object or target program.
• Source program is written in a source language
• Object program belongs to an object language
• A translators could be: Assembler, Compiler, Interpreter
Compiler Construction
Assembler:
source program object program
Assembler
(in assembly language) (in machine language)
3
Language Processors: Compiler
• A compiler is a program that can read a program in one language the source
language - and translate it into an equivalent program in another language - the
target language;
• An important role of the compiler is to report any errors in the source program that
it detects during the translation process
Compiler Construction
• If the target program is an executable machine-language program, it can then be
called by the user to process inputs and produce outputs;
4
Language Processors: Interpreter
An interpreter is another common kind of language processor. Instead of producing a
target program as a translation, an interpreter appears to directly execute the operations
specified in the source program on inputs supplied by the user
Compiler Construction
The machine-language target program produced by a compiler is usually much faster
than an interpreter at mapping inputs to outputs. An interpreter, however, can usually
give better error diagnostics than a compiler, because it executes the source program
statement by statement.
5
Compilers and Interpreters
Why Interpretation
❖A higher degree of machine independence: high portability.
❖Dynamic execution: modification or addition to user
programs as execution proceeds.
Compiler Construction
❖Dynamic data type: type of object may change at runtime
❖Easier to write – no synthesis part.
❖Better diagnostics: more source text information available
6
Compilers: an Overview
- Compiler: translates a source program written in a
High-Level Language (HLL) such as Pascal, C++ into
computer’s machine language (Low-Level Language
(LLL)).
* The time of conversion from source program into
Compiler Construction
object program is called compile time
* The object program is executed at run time
- Interpreter: processes an internal form of the source
program and data at the same time (at run time); no
object program is generated.
7
Compilers: an Overview (cont.)
Compilation Process:
Data
Compiler Construction
Source Object Executing Results
Compiler
program program Computer
Compile time run time
Interpretation Process:
Data
Source
Interpreter Result
program
8
Combining Both Interpreter and Compiler
Example
• Java language processors combine
compilation and interpretation,
Compiler Construction
• A Java source program may first be compiled into an intermediate form called
bytecodes.
• The bytecodes are then interpreted by a virtual machine. A benefit of this arrangement
is that bytecodes compiled on one machine can be interpreted on another machine,
perhaps across a network.
• In order to achieve faster processing of inputs to outputs, some Java compilers, called
just-in-time compilers, translate the bytecodes into machine language immediately
before they run the intermediate program to process the input. 9
Compiler Model
• A compiler must perform two tasks:
• Analysis of source program: The analysis part breaks up the
source program into constituent pieces and imposes a
grammatical structure on them. It then uses this structure to
create an intermediate representation of the source program.
Compiler Construction
• Synthesis of its corresponding program constructs the desired
target program from the intermediate representation and the
information in the symbol table.
• The analysis part is often called the front end of the
compiler; the synthesis part is the back end.
10
Compiler Model (cont.)
Input source program Output object program
Synthesis
Analysis
Code Code
Lexical Syntactic Semantic
Compiler Construction
Generator optimizer
Analysis Analysis Analysis
Symbol Tables
11
Tasks of Compilation Process & Output
Compiler Construction
Error handler
Compiler phases
12
First Phase: Lexical Analysis (Scanner):
• Lexical analyzer reads the stream of characters making up the
source program and groups the characters into meaningful
sequences called lexeme
• For each lexeme, the lexical analyzer produces a token of the
form that it passes on to the subsequent phase, syntax analysis
Compiler Construction
(token-name, attribute-value)
• Token-name: an abstract symbol is used during syntax
analysis,
• Attribute-value: points to an entry in the symbol table for this
token.
13
Example:
position = initial + rate * 60
1.”position” is a lexeme mapped into a token (id, 1), where id is an abstract symbol
standing for identifier and 1 points to the symbol table entry for position. The
symbol-table entry for an identifier holds information about the identifier, such as
its name and type.
2. = is a lexeme that is mapped into the token (=). Since this token needs no attribute-
value, we have omitted the second component. For notational convenience, the
lexeme itself is used as the name of the abstract symbol.
Compiler Construction
3. “initial” is a lexeme that is mapped into the token (id, 2), where 2 points to the
symbol-table entry for initial.
4. + is a lexeme that is mapped into the token (+).
5. “rate” is a lexeme mapped into the token (id, 3), where 3 points to the symbol-
table entry for rate.
6. * is a lexeme that is mapped into the token (*) .
7. 60 is a lexeme that is mapped into the token (60)
Blanks separating the lexemes would be discarded by the lexical 14
analyzer.
Second Phase: Syntax Analysis (parser)
• The parser uses the first components of the tokens produced by the lexical
analyzer to create a tree-like intermediate representation that depicts the
grammatical structure of the token stream.
• A typical representation is a syntax tree in which each interior node
represents an operation and the children of the node represent the
arguments of the operation
Compiler Construction
15
Syntax Analysis: Example
Pay := Base + Rate* 60
❖ The seven tokens are grouped into a parse tree
Assignment stmt
Compiler Construction
identifier expression
:=
pay expression expression
+
identifier
Rate*60
base
16
Third phase: Semantic Analysis
• The semantic analyzer uses the syntax tree and the information in the symbol table to
check the source program for semantic consistency with the language definition.
• Gathers type information and saves it in either the syntax tree or the symbol table, for
subsequent use during intermediate-code generation.
• An important part of semantic analysis is type checking, where the compiler checks
that each operator has matching operands. For example, many programming
Compiler Construction
language definitions require an array index to be an integer; the compiler must report
an error if a floating-point number is used to index an array.
• The language specification may permit some type conversions called coercions. For
example, a binary arithmetic operator may be applied to either a pair of integers or to
a pair of floating-point numbers. If the operator is applied to a floating-point number
and an integer, the compiler may convert or coerce the integer into a floating-point
number.
17
Phase Four: Intermediate Code Generation
Intermediate Code Generation generates three-address code
After syntax and semantic analysis of the source program, many compilers
generate an explicit low-level or machine-like intermediate representation (a
program for an abstract machine). This intermediate representation should
have two important properties:
• it should be easy to produce and
Compiler Construction
• it should be easy to translate into the target machine.
The considered intermediate form called three-address code, which consists
of a sequence of assembly-like instructions with three operands per
instruction. Each operand can act like a register.
Pay := Base + Rate* 60
18
Code Optimization
Code Optimization applied to generate better target code
• The machine-independent code-optimization phase attempts to improve the
intermediate code so that better target code will result.
• Usually better means:
• faster, shorter code, or target code that consumes less power.
• The optimizer can deduce that the conversion of 60 from integer to floating
point can be done once and for all at compile time, so the int to float
Compiler Construction
operation can be eliminated by replacing the integer 60 by the floating-point
number 60.0. Moreover, t3 is used only once
• There are simple optimizations that significantly improve the running time
of the target program without slowing down compilation too much.
19
Code Generation
Code generation takes as input an intermediate representation of
the source program and maps it into the target (machine / object)
language
• If the target language is machine, code, registers or memory locations are
selected for each of the variables used by the program.
• The intermediate instructions are translated into sequences of machine
Compiler Construction
instructions that perform the same task.
• A crucial aspect of code generation is the judicious assignment of
registers to hold variables.
20
Symbol-Table Management
• The symbol table is a data structure containing a record
for each variable name, with fields for the attributes of the
name.
• The data structure should be designed to allow the
compiler to find the record for each name quickly and to
store or retrieve data from that record quickly
Compiler Construction
• These attributes may provide information about the
storage allocated for a name, its type, its scope (where in
the program its value may be used), and in the case of
method names, such things as the number and types of its
arguments, the method of passing each argument (for
example, by value or by reference), and the type returned.
21
Translation of an assignment
statement
Compiler
Phases:
Example
Compiler Construction
with output
22
Compiler Phases: Grouping
• Front end
❖ Consist of those phases that depend on the source language but
largely independent of the target machine.
• Back end
Compiler Construction
❖ Consist of those phases that are usually target machine dependent
such as optimization and code generation.
23
Common Back-end Compiling System
Fortran C/C++ Pascal Cobol
Compiler Construction
Common IR (e.g., Unicode)
Optimizer
Target Machine Code Gen
24
Compiling Passes
• Several phases can be implemented as a single pass consist of
reading an input file and writing an output file.
• A typical multi-pass compiler looks like:
• First pass: preprocessing, macro expansion
Compiler Construction
• Second pass: syntax-directed translation, IR code generation
• Third pass: optimization
• Last pass: target machine code generation
25
Cousins of Compilers
• Preprocessors
• Assemblers
• Compiler may produce assembly code instead of
generating relocatable machine code directly.
• Loaders and Linkers
Compiler Construction
• Loader copies code and data into memory, allocates
storage, setting protection bits, mapping virtual addresses,
.. etc
• Linker handles relocation and resolves symbol references.
• Debugger
26
Tasks of Compilation Process &Output
• Each tasks is assigned to a phase, e.g. Lexical
Analyzer phase, Syntax Analyzer phase, and so on.
• Each task has input and output.
• Any thing between brackets in the last figure is
Compiler Construction
output of a phase.
• The compiler first analyzes the program, the result
is representations suitable to be translated later on:
- Parse tree
- Symbol table
27
Parse Tree & Symbol Table
• Parse tree (syntax tree) defines the program
structure; how to combine parts of the program to
produce larger part and so on.
• Symbol table provides
Compiler Construction
- the associations between all occurrences of each
identifier name given in the program.
- It provides a link between each identifier name
and its declaration.
28