Introduction To Compilers1
Introduction To Compilers1
1
How to translate?
• Source code and machine code mismatch in level of
abstraction
• Goals of translation
– Good performance for the generated code
– Good compile time performance
– Maintainable code
– High level of abstraction
3
The big picture
• Compiler is part of program
development environment
Machine
Programmer Code
Does manual
Correction of Linker
The code Resolved
Machine
Code
Debugger Loader
Debugging Execute under
Control of Executable
results Image
debugger
Execution on
the target machine
Normally end
up with error 5
How to translate easily?
• Translate in steps. Each step handles a reasonably
simple, logical, and well defined task
6
The first few steps
• The first few steps can be understood by
analogies to how humans comprehend a natural
language
I am going to market
Sentence
9
Parsing
• Parsing a program is exactly the same
• Consider an expression
if x == y then z = 1 else z = 2
if stmt
== = =
x y z 1 z 2
10
Understanding the meaning
• Once the sentence structure is understood we try
to understand the meaning of the sentence
(semantic analysis)
• Example:
Prateek said Nitin left his assignment at home
{ int Amit = 3;
{ int Amit = 4;
cout << Amit;
}
}
12
More on Semantic Analysis
• Compilers perform many other checks
besides variable bindings
• Type checking
Amit left her work at home
Front End
(Language specific)
14
Front End Phases
• Lexical Analysis
– Recognize tokens and ignore white spaces,
comments
– Error reporting
– Model using regular expressions
– Recognize using Finite State Automata
15
Syntax Analysis
• Check syntax and construct abstract
syntax tree
if
== = ;
b 0 a b
• Check semantics
• Error reporting
• Disambiguate
overloaded operators
• Type coercion
• Static checking
– Type checking
– Control flow checking
– Unique ness checking
– Name checks
17
Code Optimization
• No strong counter part with English, but is similar
to editing/précis writing
• Example: x = 15 * 3 is transformed to x = 45
18
Example of Optimizations
PI = 3.14159 3A+4M+1D+2E
Area = 4 * PI * R^2
Volume = (4/3) * PI * R^3
--------------------------------
X = 3.14159 * R * R 3A+5M
Area = 4 * X
Volume = 1.33 * X * R
--------------------------------
Area = 4 * 3.14159 * R * R 2A+4M+1D
Volume = ( Area / 3 ) * R
--------------------------------
Area = 12.56636 * R * R 2A+3M+1D
Volume = ( Area /3 ) * R
--------------------------------
X=R*R 3A+4M
Area = 12.56636 * X
Volume = 4.18879 * X * R
A : assignment M : multiplication
D : division E : exponent
19
Code Generation
• Usually a two step process
– Generate intermediate code from the semantic
representation of the program
– Generate machine code from the intermediate code
21
Intermediate Code
Generation …
• Map identifiers to locations (memory/storage
allocation)
22
Intermediate Code
Generation …
• Layout parameter passing protocols:
locations for parameters, return
values, layout of activations frame
etc.
23
Post translation Optimizations
• Algebraic transformations and re-ordering
– Remove/simplify operations like
• Multiplication by 1
• Multiplication by 0
• Addition with 0
Instruction selection
– Addressing mode selection
– Opcode selection
– Peephole optimization
24
if
boolean
== = int ;
int b 0 a b
int int int
CMP Cx, 0
CMOVZ Dx,Cx
25
Compiler structure
Compiler
Lexical Syntax Semantic IL code
Analysis Optimizer generator Code
Analysis Analysis generator
Optimized
code
Source Abstract Unambiguous Target
Token Syntax Program IL
Program stream tree
Program
representation code
26
• Information required about the program variables
during compilation
– Class of variable: keyword, identifier etc.
– Type of variable: integer, float, array, function etc.
– Amount of storage required
– Address in the memory
– Scope information
27
Final Compiler structure
Symbol Table
Compiler
Lexical Syntax Semantic IL code
Code
Optimizer generator
Analysis Analysis Analysis generator
Optimized
code
Source Abstract Unambiguous Target
Token Syntax Program IL
Program stream tree
Program
representation code
29
Advantages of the model …
• Compiler is retargetable
30
Issues in Compiler Design
• Compilation appears to be very simple, but there
are many pitfalls
F1 B1 F1 B1
Universal IL
F2 B2 F2 B2
F3 B3 F3 B3
FM BN FM BN
32
Universal Intermediate
Language
• Universal Computer/Compiler Oriented
Language (UNCOL)
– a vast demand for different compilers, as
potentially one would require separate
compilers for each combination of source
language and target architecture. To
counteract the anticipated combinatorial
explosion, the idea of a linguistic switchbox
materialized in 1958
– UNCOL (UNiversal COmputer Language) is an
intermediate language, which was proposed in
1958 to reduce the developmental effort of
compiling many different languages to different
architectures
33
Universal Intermediate
Language …
– The first intermediate language UNCOL
(UNiversal Computer Oriented Language) was
proposed in 1961 for use in compilers to reduce
the development effort of compiling many
different languages to many different
architectures
35
How do we know compilers
generate correct code?
• Prove that the compiler is correct.
36
• Regression testing
– Maintain a suite of test programs
– Expected behaviour of each program is
documented
– All the test programs are compiled using the
compiler and deviations are reported to the
compiler writer
• GENERATE compilers
Source Language
Specification
Compiler Compiler
Target Machine Generator
Specification
38
Specifications and Compiler
Generator
• How to write specifications of the source
language and the target machine?
– Language is broken into sub components like
lexemes, structure, semantics etc.
– Each component can be specified separately.
For example an identifiers may be specified as
• A string of characters that has at least one alphabet
• starts with an alphabet followed by alphanumeric
• letter(letter|digit)*
– Similarly syntax and semantics can be
described
• Can target machine be described using
specifications?
39
Tool based Compiler
Development
Source Target
Lexical Parser Semantic IL code Code
Program Analyzer Optimizer generator generator Program
Analyzer
Generator
Generator
Generator
generator
Analyzer
Parser
Lexical
Other phase
Code
Generators
Lexeme Parser
Phase Machine
specs specs
Specifications specifications
40
How to Retarget Compilers?
• Changing specifications of a phase can lead to a
new compiler
– If machine specifications are changed then compiler can
generate code for a different machine without changing
any other phase
– If front end specifications are changed then we can get
compiler for a new language
L N L N
S S M M
M EQN TROFF EQN TROFF
C C PDP11 PDP11
PDP11
44
Bootstrapping a Compiler
• Suppose LLN is to be developed on a machine M
where LMM is available
L N L N
L L M M
M
L N L N
L L N N
M
45
Bootstrapping a Compiler:
the Complete picture
L N L N
L N L L N N
L L M M
46
Compilers of the 21st
Century
• Overall structure of almost all the compilers is
similar to the structure we have discussed
47