0% found this document useful (0 votes)
6 views

Chapter-1[1]

The document outlines the design and construction of compilers, detailing the translation process which includes analysis (lexical, syntax, and semantic analysis) and synthesis (code generation and optimization). It discusses the roles of various programs related to compilers, such as interpreters, assemblers, linkers, and loaders, as well as the importance of intermediate code generation and optimization techniques. The document emphasizes the significance of compilers in bridging high-level programming languages with machine code across different platforms.

Uploaded by

aynetenew1921
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Chapter-1[1]

The document outlines the design and construction of compilers, detailing the translation process which includes analysis (lexical, syntax, and semantic analysis) and synthesis (code generation and optimization). It discusses the roles of various programs related to compilers, such as interpreters, assemblers, linkers, and loaders, as well as the importance of intermediate code generation and optimization techniques. The document emphasizes the significance of compilers in bridging high-level programming languages with machine code across different platforms.

Uploaded by

aynetenew1921
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Compiler design

(CoSc4103 )

Melese Alemante

1
Outline
 Introduction
 Programs related to compiler
 The translation process
 Analysis
• Lexical analysis
• Syntax analysis
• Semantic analysis
 Synthesis
• IC generator
• IC optimizer
• Code generator
• Code optimizer
• Phases of compiler
• Major data and structures in a compiler
• Compiler construction tools
2
Introduction
What is a compiler?
 a program that reads a program written in one language (the
source language) and translates it into an equivalent
program in another language (the target language).
 Why we design compiler?
 Why we study compiler construction techniques?
• Compilers provide an essential interface between
applications and architectures
• Compilers embody a wide range of theoretical techniques
Source
program
High level
Compiler
Target program
language Error messages Assembly or machine
language
Target program
Input
exe Output
3
Introduction…
 Using a high-level language for programming has a large
impact on how fast programs can be developed.

 The main reasons for this are:


 Compared to machine language, the notation used by
programming languages is closer to the way humans think
about problems.
 The compiler can spot some obvious programming
mistakes.
 Programs written in a high-level language tend to b e
shorter than equivalent programs written in machine
language.
 The same program can be compiled to many different
machine languages and, hence, be brought to run on
many different machines.
4
Introduction…
 Since different platforms, or hardware architectures
along with the operating systems (Windows, Macs, Unix),
require different machine code, you must compile most
programs separately for each platform.

program

compiler compiler

compiler

Unix
Win
Mac

5
Programs related to compilers
➢ Interpreter
 Is a program that reads a source program and executes it
 Works by analyzing and executing the source program
commands one at a time
 Does not translate the whole source program into object
code
 Interpretation is important when:
 Programmer is working in interactive mode and needs to
view and update variables
 Running speed is not important
 Commands have simple formats, and thus can be
quickly analyzed and executed
 Modification or addition to user programs is required
as execution proceeds
6
Programs related to compilers
➢ Interpreter and compiler

Source code Exe code Machine

Compilation Processing

Intermediate Interpreter
Source code
code

Compilation Interpretation

7
Programs related to compilers
➢ Interpreter and compiler differences

➢ Interpreter takes one statement ➢ While compiler translates the


then translates it and executes entire program in one go and
it and then takes another then executes it.
statement.
➢ Compiler generates the error
➢ Interpreter will stop the report after the translation of
translation after it gets the first the entire program.
error.
➢ Compiler takes a large amount
➢ Interpreter takes less time to of time in analyzing and
analyze the source code. processing the high level
➢ Over all execution speed is less. language code.
➢ Overall execution time is
faster.

8
Programs related to compilers…
➢ Interpreter…

 Well-known examples of interpreters:


 Basic interpreter, Lisp IUNIX shell command
interpreter, SQL interpreter, java interpreter…
 In principle, any programming language can be either
interpreted or compiled:
 Some languages are to be interpreted, others are
designed to be compiled

9
E.g., Compiling Java Programs
 The Java compiler produces bytecode not machine code
 Bytecode is converted into machine code using a Java
Interpreter
 You can run bytecode on any computer that has a Java
Interpreter installed

Win

Java Program Java bytecode


Mac
compiler Interpreter

Unix

10
Programs related to compiler…
➢ Assemblers
 Translator for the assembly language.
 Assembly code is translated into machine code
 Output is relocatable machine code.
➢ Linker
 Links object files separately compiled or
assembled
 Links object files to standard library functions
 Generates a file that can be loaded and executed

11
Programs related to compiler…
➢ Loader
 Loading of the executable codes, which are the outputs
of linker, into main memory.
➢ Pre-processors
 A pre-processor is a separate program that is called by
the compiler before actual translation begins.
 Such a pre-processor:
• Produce input to a compiler
• can delete comments,
• Macro processing (substitutions)
• include other files...

12
Programs related to compiler
C or C++ program

Preprocessor

C or C++ program with


macro substitutions
and file inclusions
Compiler

Assembly code
Assembler

Relocatable object
module
Other relocatable Linker
object modules or
library modules Executable code
Loader
Absolute machine code
14
The translation process
 A compiler consists of internally of a number of steps,
or phases, that perform distinct logical operations.
 The phases of a compiler are shown in the next slide,
together with three auxiliary components that interact
with some or all of the phases:
 The symbol table,
 the literal table,
 and error handler.

 There are two important parts in compilation process:


 Analysis and
 Synthesis.

14
The translation process…
Source code
Intermediate code
Literal Scanner generator
table
Intermediate
Tokens code
Symbol Intermediate code
table Parser optimizer

Intermediate
Syntax tree code
Error Target code
handler generator
Semantic
analyzer Target
code
Target code
Annotated optimizer
tree
Target
code
15
Analysis and Synthesis
➢ Analysis (front end)
 Breaks up the source program into constituent pieces and
 Creates an intermediate representation of the source
program.
 During analysis, the operations implied by the source
program are determined and recorded in hierarchical
structure called a tree.
➢ Synthesis (back end)
 The synthesis part constructs the desired program from the
intermediate representation.

16
Analysis of the source program

➢ Analysis consists of three phases:


 Linear/Lexical analysis
 Hierarchical/Syntax analysis
 Semantic analysis

17
1. Lexical analysis or Scanning
 The stream of characters making up the source program is
read from left to right and is grouped into tokens.
 A token is a sequence of characters having a collective
meaning.
 A lexical analyzer, also called a lexer or a scanner,
receives a stream of characters from the source program and
groups them into tokens.
 Examples: Source Lexical Streams of
program analyzer tokens
• Identifiers
• Keywords
• Symbols (+, -, …)
• Numbers …
 Blanks, new lines, tabulation marks will be removed during
lexical analysis.

18
Lexical analysis or Scanning…
 Example
a[index] = 4 + 2;
a identifier
[ left bracket
index identifier
] right bracket
= assignment operator
4 number
+ plus operator
2 number
; semicolon
 A scanner may perform other operations along with the
recognition of tokens.
• It may inter identifiers into the symbol table, and
• It may inter literals into literal table.
19
Lexical Analysis Tools

 There are tools available to assist in the writing of


lexical analyzers.
 lex - produces C source code (UNIX/linux).
 flex - produces C source code (gnu).
JLex - produces Java source code.
 We will use flex.

20
2. Syntax analysis or Parsing

 The parser receives the source code in the form of tokens


from the scanner and performs syntax analysis.
 The results of syntax analysis are usually represented by a
parse tree or a syntax tree.
 Syntax tree- each interior node represents an operation
and the children of the node represent the arguments of the
operation.
 The syntactic structure of a programming language is
determined by context free grammar (CFG).

Stream of Syntax Abstract


tokens analyzer syntax tree

21
Syntax analysis or Parsing…
 Ex. Consider again the line of C code: a[index] = 4 + 2

22
Syntax analysis or Parsing…

 Sometimes syntax trees are called abstract syntax trees, since


they represent a further abstraction from parse trees. Example is
shown in the following figure.

23
Syntax Analysis Tools

 There are tools available to assist in the writing of


parsers.
 yacc - produces C source code (UNIX/Linux).
 bison - produces C source code (gnu).
 CUP - produces Java source code.
 We will use bison.

24
3. Semantic analysis
 The semantics of a program are its meaning as opposed
to syntax or structure
 The semantics consist of:
 Runtime semantics – behavior of program at runtime
 Static semantics – checked by the compiler
 Static semantics include:
 Declarations of variables and constants before use
 Calling functions that exist (predefined in a library or defined
by the user)
 Passing parameters properly.
 Type checking.

 The semantic analyzer does the following:


 Checks the static semantics of the language
 Annotates the syntax tree with type information
25
Semantic analysis…
 Ex. Consider again the line of C code: a[index] = 4 + 2

Annotated (integer)
syntax tree

26
Synthesis of the target program

❑ Intermediate code generator


 Intermediate code optimizer
 The target code generator
 The target code optimizer

27
Code Improvement
 Code improvement techniques can be applied to:
 Intermediate code – independent of the target machine
 Target code – dependent on the target machine
 Intermediate code improvement include:
 Constant folding
 Elimination of common sub-expressions
 Improving loops- Optimizes loops to reduce the number of
iterations or computations
 Improving function calls-Optimizes function calls to reduce
overhead
 Target code improvement include:
 Allocation and use of registers-Efficiently assigns variables
and intermediate values to CPU registers instead of memory, as
register access is faster
 Selection of better (faster) instructions and addressing
28
modes
Code Improvement
 Code improvement techniques can be applied to:
 Constant folding is an optimization technique where constant
expressions are evaluated at compile-time instead of runtime.
Before optimization
int x = 5 * 10; // The multiplication happens at runtime
After Optimization
Int x = 50; // The compiler precomputes 5 * 10
Elimination of Common Sub-expressions
If the same expression is computed multiple times, it is calculated
once and reused.
Before Optimization
 int x = a * b;
 int y = a * b + c;
After Optimization
 int temp = a * b; 29

 int x = temp;
Code Improvement
 Code improvement techniques can be applied to:
Improving Loops (Loop Optimization)
Optimizing loops by reducing redundant computations.
Before
 for (int i = 0; i < n; i++) {
 int temp = x * y; // Computed in every iteration
 arr[i] = temp + i;
 }
After
int temp = x * y; // Computed once before the loop
for (int i = 0; i < n; i++) {
arr[i] = temp + i;
}
30
Code Improvement
 Code improvement techniques can be applied to:
Improving Function Calls (Inlining)
Reducing function call overhead by replacing function calls with their
body.
Before optimization
int square(int x) {
return x * x;
}

int result = square(5);


After optimization
int result = 5 * 5; // The function call is replaced with its body

31
Intermediate code generator
 Comes after syntax and semantic analysis
 Separates the compiler front end from its backend
 Intermediate representation should have 2 important
properties:
 Should be easy to produce
 Should be easy to translate into the target program
 Intermediate representation can have a variety of forms:
 Three-address code, P-code for an abstract machine, Tree
or DAG representation

Intermediate code
Abstract syntax Intermediate code
generator

❑ Three address code for the original C expression a[index]=4+2 is:


t1=2
t2 = 4 + t1
a[index] = t2 30
Intermediate code generator

Three-Address Code (TAC)


• Uses at most three operands per instruction.
• Breaks down complex expressions into simple steps.
• Commonly used in compiler design.
For the expression
a = b + c * d;
TAC representations.
t1 = c * d
t2 = b + t1
a = t2
P-Code (Pseudo Machine Code)
• Used in stack-based abstract machines.
• Represents operations independent of hardware.
• Often used in interpreters.
For the expression P-code Representations
LOAD b
a = b + c;
LOAD c 30
ADD
STORE a
Intermediate code generator

Tree or Directed Acyclic Graph (DAG) Representation


• Used for optimizing expressions.
• DAG eliminates redundant sub-expressions.
• Useful for common subexpression elimination.
Example DAG for a = b + c * d; x = b + c * d;

+
/\
b *
/\
c d

30
IC optimizer
 An IC optimizer reviews the code, looking for ways to reduce:
 the number of operations and
 the memory requirements.
 A program may be optimized for speed or for size.
 This phase changes the IC so that the code generator produces
a faster and less memory consuming program.
 The optimized code does the same thing as the original (non-
optimized) code but with less cost in terms of CPU time and
memory space.

Intermediate IC Optimizer Intermediate


code code

35
IC optimizer…
 There are several techniques of optimizing code
and they will be discussed in the forthcoming
chapters.

 Ex. Unnecessary lines of code in loops (i.e. code


that could be executed outside of the loop) are
moved out of the loop.
for(i=1; i<10; i++){
x = y+1;
z = x+i;
}
x = y+1;
for(i=1; i<10, i++)
z = x+i;
36
IC optimizer…
 In our previous example, we have included an opportunity
for source level optimization; namely, the expression 4 + 2
can be recomputed by the compiler to the result 6(This
particular optimization is called constant folding).
 This optimization can be performed directly on the syntax
tree as shown below.

37
IC optimizer…
 Many optimizations can be performed directly on the tree.
 However, in a number of cases, it is easier to optimize a linearized
form of the tree that is closer to assembly code.
 A standard choice is Three-address code, so called because it
contains the addresses of up to three locations in memory.
 In our example, three address code for the original C expression
might look like this:
t1=2
t2 = 4 + t1
a[index] = t2
❑ Now the optimizer would improve this code in two steps, first
computing the result of the addition
t = 4+2
a[index] = t
 And then replacing t by its value to get the three-address statement
a[index] = 6
38
Code generator
 The machine code generator receives the (optimized)
intermediate code, and then it produces either:
 Machine code for a specific machine, or
 Assembly code for a specific machine and assembler.
 Code generator
 Selects appropriate machine instructions
 Allocates memory locations for variables
 Allocates registers for intermediate computations

39
Code generator…
 The code generator takes the IR code and generates code for the
target machine.
 Here we will write target code in assembly language: a[index]=6

MOV R0, index ;; value of index -> R0


MUL R0, 2 ;; double value in R0
MOV R1, &a ;; address of a ->R1
ADD R1, R0 ;; add R0 to R1
MOV *R1, 6 ;; constant 6 -> address in R1

 &a –the address of a (the base address of the array)


 *R1-indirect registers addressing (the last instruction stores the
value 6 to the address contained in R1)

40
The target code optimizer
 In this phase, the compiler attempts to improve the
target code generated by the code generator.
 Such improvement includes:
• Choosing addressing modes to improve performance
• Replacing slow instruction by faster ones
• Eliminating redundant or unnecessary operations
 In the sample target code given, use a shift instruction to
replace the multiplication in the second instruction.
 Another is to use a more powerful addressing mode, such as
indexed addressing to perform the array store.
 With these two optimizations, our target code becomes:
MOV R0, index ;; value of index -> R0
SHL R0 ;; double value in R0
MOV &a [R0], 6 ;; constant 6 -> address a + R0
41
Grouping of phases
 The discussion of phases deals with the logical organization of
a compiler.
 In practice most compilers are divided into:
 Front end - language-specific and machine-independent.
 Back end - machine-specific and language-independent.

Compiler passes:
 A pass consists of reading an input file and writing an output
file.
 Several phases may be grouped in one pass.
 For example, the front-end phases of lexical analysis, syntax
analysis, semantic analysis, and intermediate code generation
might be grouped together into one pass.
42
Grouping of phases…
 Single pass
 is a compiler that passes through the source codeof
each compilation unit only once.
 A one-pass compiler does not "look
at code it previously processed.
 A one-pass compilers is faster
than multi-pass compilers
 They are unable to generate an
efficient programs, due to the limited scope
available.
 Multi pass
 is a type of compiler that processes the source code
orabstract syntax tree of a program several times.
 A collection of phases is done multiple times
43
Major Data and Structures in a Compiler
 Token
 Represented by an integer value or an
enumeration literal
 Sometimes, it is necessary to preserve the string
of characters that was scanned
 For example, name of an identifiers or value of
a literal
 Syntax Tree
 Constructed as a pointer-based structure
 Dynamically allocated as parsing proceeds
 Nodes have fields containing information
collected by the parser and semantic analyzer
44
Major Data and Structures in a Compiler…

 Symbol Table
 Keeps information associated with all kinds of tokens:
• Identifiers, numbers, variables, functions, parameters, types, fields, etc.
 Tokens are entered by the scanner and parser

 Semantic analyzer adds type information and other attributes


 Code generation and optimization phases use the
information in the symbol table
Performance Issues
 Insertion, deletion, and search operations need to be
efficient because they are frequent
 Hash table with constant-time is usually the
operations preferred choice
 More than one symbol table may be used

45
Major Data and Structures in a Compiler…
 Literal Table
 Stores constant values and striLiterals in a
program.
 Oneliteral table applies globally to the entire
program.
 Used by the code generator to:
• Assign addresses for literals.
 Avoids the replication of constants and strings.
 Quick insertion and lookup are essential.

MOV AX, 25H ; Directly using a hexadecimal literal


MOV BX, 100 ; Directly using a decimal literal
MOV DX, 'A' ; Character literal
46
Compiler construction tools
 Various tools are used in the construction of the
various parts of a compiler.
➢ Scanner generators
 Ex. Lex, flex, JLex
 These tools generate a scanner /lexical analyzer/
if given a regular expression.
➢ Parser Generators
 Ex. Yacc, Bison, CUP
 These tools produce a parser /syntax analyzer/
if given a Context Free Grammar (CFG)
that describes the syntax of the source language.

47
Compiler construction tools…
➢ Syntax directed translation engines
 Ex. Cornell Synthesizer Generator

➢ Automatic code generators


 Take a collection of rules that define
the translation of the IC to target
code and produce a code generator.

48
End

You might also like