0% found this document useful (0 votes)
24 views

Lecture 2

The document discusses the process of compilation, describing how compilers translate programs from high-level source code to low-level machine code. It explains the major phases of a compiler including lexical analysis, parsing, semantic analysis, code optimization, code generation, and the use of symbol tables and intermediate representations.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Lecture 2

The document discusses the process of compilation, describing how compilers translate programs from high-level source code to low-level machine code. It explains the major phases of a compiler including lexical analysis, parsing, semantic analysis, code optimization, code generation, and the use of symbol tables and intermediate representations.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

What are Compilers?

• Translates from one representation of the program to


another

• Typically from high level source code to low level


machine code or object code

• Source code is normally optimized for human


readability
– Expressive: matches our notion of languages (and
application?!)
– Redundant to help avoid programming errors

• Machine code is optimized for hardware


– Redundancy is reduced
1
– Information about the intent is lost
High level Low level
Compiler code
program

Compiler as a Translator

2
Goals of translation
• Good compile time performance
• Good performance for the
generated code
• Correctness
– A very important issue.
–Can compilers be proven to be
correct?
• Tedious even for toy compilers!
Undecidable in general.
–However, the correctness has an
implication on the development cost
3
How to translate?
• Direct translation is difficult. Why?

• Source code and machine code mismatch in


level of abstraction
– Variables vs Memory locations/registers
– Functions vs jump/return
– Parameter passing
– structs
• Some languages are farther from machine
code than others
– For example, languages supporting Object
Oriented Paradigm
4
How to translate easily?
• Translate in steps. Each step handles a
reasonably simple, logical, and well defined
task
• Design a series of program representations
• Intermediate representations should be
amenable to program manipulation of
various kinds (type checking, optimization,
code generation etc.)
• Representations become more machine
specific and less language specific as the
translation proceeds 5
The first few steps
• The first few steps can be understood
by analogies to how humans
comprehend a natural language
• The first step is recognizing/knowing alphabets of a
language. For example
– English text consists of lower and upper case
alphabets, digits, punctuations and white spaces
– Written programs consist of characters from the
ASCII characters set (normally 9-13, 32-126)

6
The first few steps
• The next step to understand the sentence
is recognizing words
– How to recognize English words?
– Words found in standard dictionaries
– Dictionaries are updated regularly

7
The first few steps
• How to recognize words in a
programming language?
– a dictionary (of keywords
etc.)
– rules for constructing words (identifiers,
numbers etc.)
• This is called lexical analysis
• Recognizing words is not completely
trivial. For example:
what is this sentence?
8
Lexical Analysis: Challenges
• We must know what the word
separators are

• The language must define rules for


breaking a sentence into a sequence of
words.

• Normally white spaces and


punctuations are word separators in
languages.

9
Lexical Analysis: Challenges
• In programming languages a character
from a different class may also be
treated as word separator.

• The lexical analyzer breaks a sentence


into a sequence of words or tokens:
– If a == b then a = 1 ; else a = 2 ;
– Sequence of words (total 14 words)

10
The next step
• Once the words are understood, the next
step is to understand the structure of
the sentence

• The process is known as syntax checking


or
parsing

11
Parsing
• Parsing a program is exactly the same
process as shown in previous slide.
• Consider an expression
if x == y then z = 1 else z = 2

12
Understanding the meaning
• Once the sentence structure is
understood we try to understand the
meaning of the sentence (semantic
analysis)
• A challenging task
• Example:
Qasim said hashir left his assignment at
home
• What does his refer to? Qasim or Hashir?

13
Understanding the meaning
• Worse case
Qasim said Qasim left his
assignment at home
• Even worse
Qasim said Qasim left Qasim’s
assignment at home
• How many Qasim are there?
Which one left the assignment?
Whose assignment got left?
14
Semantic Analysis
• Too hard for compilers. They do not have
capabilities similar to human understanding
• However, compilers do perform analysis to
understand the meaning and catch
inconsistencies
• Programming languages define strict rules to
avoid such ambiguities
{ int Qasim = 3;
{ int Qasim = 4;
cout << Qasim;
}
}

15
More on Semantic Analysis
• Compilers perform many other checks
besides variable bindings
• Type checking
Qasim left her work at home
• There is a type mismatch between her
and Qasim. Presumably Qasim is a
male. And they are not the same
person.
16
Compiler structure once again

18
Code Optimization
• No strong counter part with
English, but is similar to
editing/précise writing

• Automatically modify programs so


that they
–Run faster
–Use less resources (memory,
registers, space, fewer fetches etc.)

23
Code Optimization
• Some common optimizations
–Common sub-expression elimination
–Copy propagation
–Dead code elimination
–Code motion
–Strength reduction
–Constant folding

• Example: x = 15 * 3 is transformed
to x = 45
24
Example of Optimizations
A : assignment M : multiplication D : division E : exponent

25
Code Generation
• Usually a two step process
– Generate intermediate code from the
semantic representation of the program
– Generate machine code from the
intermediate code

• The advantage is that each phase is


simple

• Requires design of intermediate


language
26
Code Generation
• Most compilers perform translation
between successive intermediate
representations

• Intermediate languages are generally


ordered in decreasing level of abstraction
from highest (source) to lowest (machine)

27
Code Generation
• Abstractions at the source level
identifiers, operators, expressions, statements,
conditionals, iteration, functions (user defined,
system defined or libraries)
• Abstraction at the target level
memory locations, registers, stack, opcodes,
addressing modes, system libraries, interface to
the operating systems

• Code generation is mapping from source level


abstractions to target machine abstractions

28
Code Generation
• Map identifiers to locations
(memory/storage allocation)
• Explicate variable accesses (change
identifier reference to
relocatable/absolute address
• Map source operators to opcodes
or a sequence of opcodes

29
Code Generation

• Convert conditionals and iterations to a


test/jump or compare instructions
• Layout parameter passing protocols:
locations for parameters, return values
• Interface calls to library, runtime system,
operating systems

30
Post translation Optimizations

• Algebraic transformations and


reordering
– Remove/simplify operations
like
• Multiplication by 1
• Multiplication by 0
• Addition with 0

– Reorder instructions based on


• Commutative properties of
operators
31
• For example x+y is same as y+x
Post translation Optimizations

Instruction selection
– Addressing mode selection
– Opcode selection
– Peephole optimization

32
Compiler structure

34
Something is missing
• Information required about the program variables during
compilation
– Class of variable: keyword, identifier etc.
– Type of variable: integer, float, array, function etc.
– Amount of storage required
– Address in the memory
– Scope information
• Location to store this information
– Attributes with the variable (has obvious problems)
– At a central repository and every phase refers to the repository
whenever information is required
• Normally the second approach is preferred
– Use a data structure called symbol table

35
Final Compiler structure

36
Advantages of the model
• Also known as Analysis-Synthesis model of
compilation
– Front end phases are known as analysis
phases
– Back end phases are known as synthesis
phases

• Each phase has a well defined work

• Each phase handles a logical activity in the


process of compilation
37
Advantages of the model


Compiler is re-targetable

• Source and machine independent code optimization


is possible.

• Optimization phase can be inserted after the front


and back end phases have been developed and
deployed

38

You might also like