Lecture 2
Lecture 2
Compiler as a Translator
2
Goals of translation
• Good compile time performance
• Good performance for the
generated code
• Correctness
– A very important issue.
–Can compilers be proven to be
correct?
• Tedious even for toy compilers!
Undecidable in general.
–However, the correctness has an
implication on the development cost
3
How to translate?
• Direct translation is difficult. Why?
6
The first few steps
• The next step to understand the sentence
is recognizing words
– How to recognize English words?
– Words found in standard dictionaries
– Dictionaries are updated regularly
7
The first few steps
• How to recognize words in a
programming language?
– a dictionary (of keywords
etc.)
– rules for constructing words (identifiers,
numbers etc.)
• This is called lexical analysis
• Recognizing words is not completely
trivial. For example:
what is this sentence?
8
Lexical Analysis: Challenges
• We must know what the word
separators are
9
Lexical Analysis: Challenges
• In programming languages a character
from a different class may also be
treated as word separator.
10
The next step
• Once the words are understood, the next
step is to understand the structure of
the sentence
11
Parsing
• Parsing a program is exactly the same
process as shown in previous slide.
• Consider an expression
if x == y then z = 1 else z = 2
12
Understanding the meaning
• Once the sentence structure is
understood we try to understand the
meaning of the sentence (semantic
analysis)
• A challenging task
• Example:
Qasim said hashir left his assignment at
home
• What does his refer to? Qasim or Hashir?
13
Understanding the meaning
• Worse case
Qasim said Qasim left his
assignment at home
• Even worse
Qasim said Qasim left Qasim’s
assignment at home
• How many Qasim are there?
Which one left the assignment?
Whose assignment got left?
14
Semantic Analysis
• Too hard for compilers. They do not have
capabilities similar to human understanding
• However, compilers do perform analysis to
understand the meaning and catch
inconsistencies
• Programming languages define strict rules to
avoid such ambiguities
{ int Qasim = 3;
{ int Qasim = 4;
cout << Qasim;
}
}
15
More on Semantic Analysis
• Compilers perform many other checks
besides variable bindings
• Type checking
Qasim left her work at home
• There is a type mismatch between her
and Qasim. Presumably Qasim is a
male. And they are not the same
person.
16
Compiler structure once again
18
Code Optimization
• No strong counter part with
English, but is similar to
editing/précise writing
23
Code Optimization
• Some common optimizations
–Common sub-expression elimination
–Copy propagation
–Dead code elimination
–Code motion
–Strength reduction
–Constant folding
• Example: x = 15 * 3 is transformed
to x = 45
24
Example of Optimizations
A : assignment M : multiplication D : division E : exponent
25
Code Generation
• Usually a two step process
– Generate intermediate code from the
semantic representation of the program
– Generate machine code from the
intermediate code
27
Code Generation
• Abstractions at the source level
identifiers, operators, expressions, statements,
conditionals, iteration, functions (user defined,
system defined or libraries)
• Abstraction at the target level
memory locations, registers, stack, opcodes,
addressing modes, system libraries, interface to
the operating systems
28
Code Generation
• Map identifiers to locations
(memory/storage allocation)
• Explicate variable accesses (change
identifier reference to
relocatable/absolute address
• Map source operators to opcodes
or a sequence of opcodes
29
Code Generation
30
Post translation Optimizations
Instruction selection
– Addressing mode selection
– Opcode selection
– Peephole optimization
32
Compiler structure
34
Something is missing
• Information required about the program variables during
compilation
– Class of variable: keyword, identifier etc.
– Type of variable: integer, float, array, function etc.
– Amount of storage required
– Address in the memory
– Scope information
• Location to store this information
– Attributes with the variable (has obvious problems)
– At a central repository and every phase refers to the repository
whenever information is required
• Normally the second approach is preferred
– Use a data structure called symbol table
35
Final Compiler structure
36
Advantages of the model
• Also known as Analysis-Synthesis model of
compilation
– Front end phases are known as analysis
phases
– Back end phases are known as synthesis
phases
38