0% found this document useful (0 votes)
123 views

CSC 437 Chapter 1

This document provides an overview of compilers and language processors. It discusses: 1) Compilers translate an entire source program into machine code before execution. Interpreters translate each statement individually before executing. 2) Assemblers translate assembly language programs into machine code. 3) Java uses a hybrid approach, compiling to bytecode then interpreting with a virtual machine, allowing cross-platform execution. Just-in-time compilers may further translate bytecode to native machine code. 4) Additional programs like preprocessors, linkers, and debuggers help process modules and create executable target programs.

Uploaded by

Ema Nishy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
123 views

CSC 437 Chapter 1

This document provides an overview of compilers and language processors. It discusses: 1) Compilers translate an entire source program into machine code before execution. Interpreters translate each statement individually before executing. 2) Assemblers translate assembly language programs into machine code. 3) Java uses a hybrid approach, compiling to bytecode then interpreting with a virtual machine, allowing cross-platform execution. Just-in-time compilers may further translate bytecode to native machine code. 4) Additional programs like preprocessors, linkers, and debuggers help process modules and create executable target programs.

Uploaded by

Ema Nishy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 82

CSC 437 (Compilers)

Basic Concepts

Md. Alomgir Hossain


Faculty
Department of Computer Science and Engineering
IUBAT - International University of Business Agriculture and Technology

1
Computer system

 A typical computer system can be described


in terms of level of abstraction.
 Each level of computer system has its own
artificial language.
 The automatic translation between these
languages is the center of computer science.

2
Application Level

Higher Order Language Level

Assembly Level

Operating System Level

Machine Level

Micro-Programming Level
3
Logic Gate Level
Introduction

Programming languages are notations for describing


computations to people and to machines.
The world as we know it depends on programming
languages, because all the software running on all the
computers was written in some programming language.
But, before a program can be run, it first must be translated
into a form in which it can be executed by a computer.
The software systems that do this translation are called
compilers.

4
CSC 437 (Compilers) 2/28
Definition of Program
 Definition: A program is a set of instructions which is
developed by a programmer using any of the computer
programming languages. These languages are human
readable languages.

5
Definition of Machine Language
• Definition: Machine code or machine language is
a set of instructions executed directly by
a computer's central processing unit (CPU).
• Computers can only understand machine language
- a language of numbers that is impractical, if not
impossible, for most humans to understand.
• Machine language consists of binary digits or bits,
which come in the form of zeros (0) and ones (1).
• Machine language is also called Binary language

6
Language Processors
 Computer can only understands instructions in machine
code, i.e. in the form of 0s and 1s.
 It is difficult to write computer program directly in machine
code.
 Nowadays, the programs are written mostly in high-level
languages, i.e. BASIC, C-H-, PASCAL, etc. A program
written in any high-level programming language (or written
in assembly language) is called the Source Program.

7
Language Processors
 The source code cannot be executed directly by the
computer.
 The source program must be converted into machine code to
run it on the computer.
 The program translated into machine code is known as
Object Program.
 Every language has its own language processor (or translator).
Therefore, language processor is defined as: The
special translator system software that is used to translate the
program written in high-level language (or Assembly
language) into machine code is called language processor or
translator program.
8
Language Processors
 The language processors are divided into three types.

 Compilers
 Interpreters
 Assemblers

9
Language Processors
 1. Compiler: The language processor that translates the
complete source program as a whole in machine code before
execution is called compiler. The C and C++ compilers are
best examples of compilers.
 The program translated into machine code is called the object
program.
 The source code is translated to object code successfully if it
is free of errors.
 If there are any errors in the source code, the compiler
specifies the errors at the end of compilation.
 The errors must be removed before the compiler can
10
successfully compile the source code.
Language Processors
 2. Interpreter
The language processor that translates (converts) each
statement of source program into machine code and executes
it immediately before to translate the next statement is called
Interpreter.
 If there is an error in the statement the interpreter
terminates its translating process at that statement and
displays an error message. The OWBASIC is an example of
interpreter.

11
Language Processors
 2. Interpreter

12
Language Processors
3. Assembler
 An assembler is third type of translator program. It is used to
translate the program written in Assembly language into
machine code. An assembler performs the translation process
in similar way as compiler. But assembler is the translator
program for low-level programming language, while a
compiler is the translator program for high-level
programming languages.

13
Language Processors — continued

Compiler vs. Interpreter

The machine-language target program produced by a


compiler is usually much faster than an interpreter at
mapping inputs to outputs.
An interpreter, however, can usually give better error
diagnostics than a compiler, because it executes the
source program statement by statement.

14
6/28
Interpreter:
 An interpreter is a program which translates statements of a program into
machine code. It translates only one statement of the program at a time.
 Difference between Interpreter & Compiler :
 Interpreter reads only one statement of program, translates it and executes it.
Then it reads the next statement of the program again translates it and executes
it. In this way it proceeds further till all the statements are translated and
executed.
 On the other hand, a compiler goes through the entire program and then
translates the entire program into machine codes. A compiler is 5 to 25 times
faster than an interpreter.
 By the compiler, the machine codes are saved permanently for future reference.
 On the other hand, the machine codes produced by interpreter are not saved.
 An interpreter is a small program as compared to compiler. It occupies less
memory space, so it can be used in a smaller system which has limited memory
space.
Difference between Interpreter & Compiler :

16
 Difference between Interpreter & Compiler :

17
Introduction to Compiler
• A compiler is software that converts programs from the
high-level language into machine language.

• To program for a computer, software developers use compilers.

18
Importance of Compiler
 For performing different task on the computer, a compiler is
communication medium that translate source language
into target language.
 Compiler allows the user to perform customized task on
machine
 it allows us to communicate with hardware.
 It is also use to cover the "GAP" between Humans and the
computer language.

19
Advantage of Compiler
 When there is only Machine Language then programmers write
their compilers in this language. But it is very difficult and tedious
job.
 The role of compiler is take source code written in high level
language (Java, C++, VB.Net etc).
 The High Level Languages are easily understood by humans. So
compiler converts the program written in formal language (Source
language) into machine language (target language). As we know
that computers can easily understand machine language.
 here are different programs related to compiler that works before
compilation such as editor, preprocessor, assembler, linker or
loader, debugger and profiler.

20
Error & diagnostic
message

Source Compiler Target


program program

Source languages: Programming languages,


FORTRAN, PASCAL, C, etc.
Target language: machine code for the Central
21 processing unit being used (target machine).
Hybrid Compiler
 Hybrid compiler is a compiler which translates a human
readable source code to an intermediate byte code for later
interpretation. So these languages do have both features of a
compiler and an interpreter. These types of compilers are
commonly known as Just In-time Compilers (JIT).

 Java is one good example for these types of compilers.

22
Example of Hybrid Compiler

Java language processors combine compilation and


interpretation.
23
7/28
Example — continued

A Java source program may first be compiled into an


intermediate form called bytecodes.
The bytecodes are then interpreted by a virtual machine.
A benefit of this arrangement is that bytecodes compiled
on one machine can be interpreted on another machine,
perhaps across a network.
In order to achieve faster processing of inputs to outputs,
some Java compilers, called just-in-time compilers,
translate the bytecodes into machine language
immediately.

24
8/28
Language Processors — continued

In addition to a compiler, several


other programs may be required
to create an executable target
program.
A source program may be
divided into modules stored in
separate files.
The task of collecting the source
program is sometimes entrusted
to a separate program, called a
preprocessor.

25
. CSC 347 (Compilers) 9/28
Language Processors — continued
• A macro or "macro-instruction"
in computer science is a
rule or pattern that specifies how a
certain input sequence (often a
sequence of characters) should be
mapped to a replacement input
sequence (also often a sequence of
characters) according to a defined
procedure.
• The mapping process that instantiates
(transforms) a macro use into a specific
sequence is known as macro expansion

The preprocessor may also


expand shorthands, called
macros, into source language
26 statements.
. CSC 347 (Compilers) 9/28
Language Processors — continued

The modified source program is


then fed to a compiler.
The compiler may produce an
assembly-language program as
its output, because assembly
language is easier to produce as
output and is easier to debug.
The assembly language is then
processed by a program called
an assembler that produces
relocatable machine code as its
output .It is also known as object files.

27
. CSC 347 (Compilers) 9/28
Language Processors — continued

Linker takes one or more object


files or libraries as input and
combines them to produce a
single (usually executable) file.

The loader then puts together all


of the executable object files into
memory for execution.

28
. CSC 347 (Compilers) 9/28
Linker & Loader
 Linker: In high level languages, some built in header files or libraries
are stored. These libraries are predefined and these contain basic
functions which are essential for executing the program. These functions
are linked to the libraries by a program called Linker. If linker does not
find a library of a function then it informs to compiler and then compiler
generates an error.

 Loader: Loader is a program that loads machine codes of a program into


the system memory. In Computing, a loader is the part of an Operating
System that is responsible for loading programs. It is one of the essential
stages in the process of starting a program. Because it places programs
into memory and prepares them for execution.
30
31
The Structure of a Compiler

Up to this point we have treated a


compiler as a single box that
maps a source program into a
semantically equivalent target
program.
If we open up this box a little, we
see that there are two parts to
this mapping: analysis and
synthesis.

32
. CSC 347 (Compilers) 10 / 28
The analysis part breaks up the
source program into constituent
pieces and imposes a
grammatical structure on them.
It then uses this structure to
create an intermediate
representation of the source
program.
If the analysis part detects that
the source program is either
syntactically ill formed or
semantically unsound, then it
must provide informative
messages, so the user can take
corrective action.

33
The analysis part also collects
information about the source
program and stores it in a data
structure called a symbol table,
which is passed along with the
intermediate representation to
the synthesis part.
The synthesis part constructs the
desired target program from the
intermediate representation and
the information in the symbol
table.
The analysis part is often called
the front end of the compiler; the
synthesis part is the back end.

34
If we examine the compilation
process in more detail, we see
that it operates as a sequence of
phases, each of which transforms
one representation of the source
program to another.
In practice, several phases may
be grouped together, and the
intermediate representations
between the grouped phases
need not be constructed
explicitly.

35
The symbol table, which stores
information about the entire
source program, is used by all
phases of the compiler.

36
Some compilers have a
machine-independent
optimization phase between the
front end and the back end.
The purpose of this optimization
phase is to perform
transformations on the
intermediate representation, so
that the back end can produce a
better target program than it
would have otherwise produced
from an unoptimized intermediate
representation.
Since optimization is optional,
one or the other of the two
optimization phases shown may
37 be missing.
LexicalAnalysis

The first phase of a compiler is


called lexical analysis or
scanning.

38
. CSC 347 (Compilers) 12 / 28
LexicalAnalysis — continued

The lexical analyzer reads the stream of characters making


up the source program and groups the characters into
meaningful sequences called lexemes.

39
. CSC 347 (Compilers) 13 / 28
LexicalAnalysis — continued

For each lexeme, the lexical analyzer produces as output a


token of the form
<token-name, attribute-value>
that it passes on to the subsequent phase, syntax analysis.
In the token, the first component token-name is an abstract
symbol that is used during syntax analysis, and the second
component attribute-value points to an entry in the symbol
table for this token.
Information from the symbol-table entry is needed for
semantic analysis and code generation.

40
. CSC 347 (Compilers) 14 / 28
LexicalAnalysis — continued

〈token-name, attribute-value〉
position = initial + rate ∗ 60

We consider a source program that contains the


assignment statement
position = initial + rate * 60
The characters in this assignment could be grouped into
the lexemes and mapped into the tokens passed on to the
syntax analyzer

41
. CSC 347 (Compilers) 15 / 28
LexicalAnalysis — continued

〈token-name, attribute-value〉

position = initial + rate ∗ 60

1. position is a lexeme that would be mapped into a token


〈id, 1〉.

id is an abstract symbol standing for identifier,


1 points to the symbol-table entry for position,
The symbol-table entry for an identifier holds
information about the identifier, such as its name and
type.

42
. CSC 347 (Compilers) 15 / 28
LexicalAnalysis — continued

〈token-name, attribute-value〉
position = initial + rate ∗ 60

2. The assignment symbol = is a lexeme that is mapped into


the token 〈=〉.
Since this token needs no attribute-value, we have
omitted the second component.
We could have used any abstract symbol such as
assign for the token-name, but for notational
convenience we have chosen to use the lexeme itself
as the name of the abstract symbol.

43
. CSC 347 (Compilers) 15 / 28
LexicalAnalysis — continued

〈token-name, attribute-value〉
position = initial + rate ∗ 60

3. initial is a lexeme that is mapped into the token 〈id, 2〉,


2 points to the symbol-table entry for initial.

44
. CSC 347 (Compilers) 15 / 28
LexicalAnalysis — continued

〈token-name, attribute-value〉
position = initial + rate ∗ 60

4. + is a lexeme that is mapped into the token 〈+〉.

45
. CSC 347 (Compilers) 15 / 28
LexicalAnalysis — continued

〈token-name, attribute-value〉
position = initial + rate ∗ 60

6. * is a lexeme that is mapped into the token 〈∗〉.

46
. CSC 347 (Compilers) 15 / 28
LexicalAnalysis — continued

〈token-name, attribute-value〉
position = initial + rate ∗ 60

7. 60 is a lexeme that is mapped into the token 〈60〉.

47
. CSC 347 (Compilers) 15 / 28
LexicalAnalysis — continued

〈token-name, attribute-value〉
position = initial + rate ∗ 60

Blanks separating the lexemes would be discarded by the


lexical analyzer.

48
. CSC 347 (Compilers) 15 / 28
LexicalAnalysis — continued

position = initial + rate ∗ 60

Figure shows the representation of


the assignment statement after
lexical analysis as the sequence of
tokens,

〈id, 1〉 〈=〉 〈id, 2〉 〈+〉 〈id, 3〉 〈∗〉〈60〉

In this representation, the token


names =, +, and * are abstract
symbols for the assignment,
addition, and multiplication
operators, respectively.

49
. CSC 347 (Compilers) 16 / 28
SyntaxAnalysis

The second phase of the compiler is


syntax analysis or parsing.
The parser uses the first components of
the tokens produced by the lexical
analyzer to create a tree-like
intermediate representation that depicts
the grammatical structure of the token
stream.

50
. CSC 347 (Compilers) 17 / 28
SyntaxAnalysis

A typical representation is a syntax tree


in which each interior node represents
an operation and the children of the
node represent the arguments of the
operation.

51
. CSC 347 (Compilers) 17 / 28
SyntaxAnalysis

This tree shows the order in which the


operations in the assignment
position = initial + rate *
60
are to be performed.

52
. CSC 347 (Compilers) 17 / 28
SyntaxAnalysis

The tree has an interior node labeled *


with 〈id, 3〉 as its left child and the
integer 60 as its right child. The node
〈id, 3〉 represents the identifier rate.
The node labeled * makes it explicit
that we must first multiply the value of
rate by 60.

53
. CSC 347 (Compilers) 17 / 28
SyntaxAnalysis

The node labeled + indicates that we


must add the result of this multiplication
to the value of initial.

54
. CSC 347 (Compilers) 17 / 28
SyntaxAnalysis

The root of the tree, labeled =, indicates


that we must store the result of this
addition into the location for the
identifier position.

55
. CSC 347 (Compilers) 17 / 28
SyntaxAnalysis

This ordering of operations is consistent


with the usual conventions of arithmetic
which tell us that multiplication has
higher precedence than addition, and
hence that the multiplication is to be
performed before the addition.

56
. CSC 347 (Compilers) 17 / 28
SemanticAnalysis

The semantic analyzer uses the syntax


tree and the information in the symbol
table to check the source program for
semantic consistency with the language
definition.
It also gathers type information and
saves it in either the syntax tree or the
symbol table, for subsequent use during
intermediate-code generation.

57
. CSC 347 (Compilers) 18 / 28
SemanticAnalysis

An important part of semantic analysis


is type checking, where the compiler
checks that each operator has
matching operands.
For example, many programming
language definitions require an array
index to be an integer.
The compiler must report an error if a
floating-point number is used to index
an array.

58
. CSC 347 (Compilers) 18 / 28
SemanticAnalysis

The language specification may permit


some type conversions called
coercions.
For example, a binary arithmetic
operator may be applied to either a pair
of integers or to a pair of floating-point
numbers.
If the operator is applied to a
floating-point number and an integer,
the compiler may convert or coerce the
integer into a floating-point number.

59
. CSC 347 (Compilers) 18 / 28
SemanticAnalysis

Suppose that position, initial,


and rate have been declared to be
floating-point numbers, and that the
lexeme 60 by itself forms an integer.
The type checker in the semantic
analyzer discovers that the operator * is
applied to a floating-point number rate
and an integer 60.
In this case, the integer may be
converted into a floating-point number.

60
. CSC 347 (Compilers) 18 / 28
SemanticAnalysis

Notice that the output of the semantic


analyzer has an extra node for the
operator inttofloat, which explicitly
converts its integer argument into a
floating-point number.

61
. CSC 347 (Compilers) 18 / 28
Intermediate Code Generation

In the process of translating a source


program into target code, a compiler
may construct one or more intermediate
representations, which can have a
variety of forms.
Syntax trees are a form of intermediate
representation.
They are commonly used during syntax
and semantic analysis.

62
. CSC 347 (Compilers) 19 / 28
Intermediate Code Generation

After syntax and semantic analysis of


the source program, many compilers
generate an explicit low-level or
machine-like intermediate
representation, which we can think of
as a program for an abstract machine.
This intermediate representation should
have two important properties:
it should be easy to produce and
it should be easy to translate into
the target machine.

63
. CSC 347 (Compilers) 19 / 28
Intermediate Code Generation

We consider an intermediate form


called "three-address code".
Like the assembly language for a
machine in which every memory
location can act like a register.
Three-address code consists of a
sequence of instructions, each of which
has at most three operands.

64
. CSC 347 (Compilers) 19 / 28
Intermediate Code Generation

t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3

65
. CSC 347 (Compilers) 19 / 28
Intermediate Code Generation

Each three-address instruction has at


most one operator in addition to the
assignment.
Has to decide on the order in which
operations are to be done.
Multiplication precedes the addition
in the source program.
Must generate a temporary name to
hold the value computed by each
instruction.
Some "three-address" instructions have
fewer than three operands.

66
. CSC 347 (Compilers) 19 / 28
Code Optimization

The machine-independent
code-optimization phase attempts to
improve the intermediate code so that
better target code will result.
Usually better means faster, but other
objectives may be desired, such as
shorter code, or target code that
consumes less power.

67
. CSC 347 (Compilers) 20 / 28
Code Optimization

A simple intermediate code generation


algorithm followed by code optimization
is a reasonable way to generate good
target code.
The optimizer can deduce that the
conversion of 60 from integer to floating
point can be done once and for all at
compile time.
So the inttofloat operation can be
eliminated by replacing the integer 60
by the floating-point number 60.0.
Moreover, t3 is used only once to
transmit its value to 〈id, 1〉.

68
. CSC 347 (Compilers) 20 / 28
Code Optimization

So the optimizer can transform the code


into the shorter sequence,
t1 = id3 * 60.0
id1 = id2 + t1

69
. CSC 347 (Compilers) 20 / 28
Code Optimization

There is a great variation in the amount


of code optimization different compilers
perform.
In those that do the most, the so-called
"optimizing compilers," a significant
amount of tim e is spent on this phase.
There are simple optimizations that
significantly improve the running time of
the target program without slowing
down compilation too much.

70
. CSC 347 (Compilers) 20 / 28
Code Generation

The code generator takes as input


an intermediate representation of
the source program and maps it into
the target language.
If the target language is machine
code, registers or memory locations
are selected for each of the
variables used by the program.
Then, the intermediate instructions
are translated into sequences of
machine instructions that perform
the same task.
A crucial aspect of code generation
is the judicious assignment of
registers to hold variables.
71
. CSC 347 (Compilers) 21 / 28
Code Generation

For example, using registers R1 and


R2, the intermediate code might get
translated into the machine code
LDF R2, id3
MULF R2, R2, #60.0
LDF Rl, id2
ADDF Rl, Rl, R2
STF idl, Rl

72
. CSC 347 (Compilers) 21 / 28
Symbol-T
able Management

An essential function of a compiler is


to record the variable names used in
the source program and collect
information about various attributes
of each name.

73
. CSC 347 (Compilers) 22 / 28
Symbol-T
able Management

These attributes may provide


information about the storage
allocated for a name, its type, its
scope (where in the program its
value may be used)
In the case of procedure names,
such things as the number and
types of its arguments, the method
of passing each argument (for
example, by value or by reference),
and the type returned.

74
. CSC 347 (Compilers) 22 / 28
Symbol-T
able Management

The symbol table is a data structure


containing a record for each variable
name, with fields for the attributes of
the name.
The data structure should be
designed to allow the compiler to
find the record for each name
quickly and to store or retrieve data
from that record quickly.

75
. CSC 347 (Compilers) 22 / 28
The Grouping of Phases into Passes

The discussion of phases deals with the logical


organization of a compiler.
In an implementation, activities from several phases may
be grouped together into a pass that reads an input file and
writes an output file.
For example, the front-end phases of lexical analysis,
syntax analysis, semantic analysis, and intermediate code
generation might be grouped together into one pass.
Code optimization might be an optional pass.
Then there could be a back-end pass consisting of code
generation for a particular target machine.

76
. CSC 347 (Compilers) 23 / 28
The Grouping of Phases into Passes — continued

Some compiler collections have been created around


carefully designed intermediate representations that allow
the front end for a particular language to interface with the
back end for a certain target machine.
With these collections, we can produce compilers for
different source languages for one target machine by
combining different front ends with the back end for that
target machine.
Similarly, we can produce compilers for different target
machines, by combining a front end with back ends for
different target machines.

77
. CSC 347 (Compilers) 24 / 28
Compiler-Construction T
ools

The compiler writer, like any software developer, can


profitably use modern software development environments
containing tools such as language editors, debuggers,
version managers, profilers, test harnesses, and so on.
In addition to these general software-development tools,
other more specialized tools have been created to help
implement various phases of a compiler.

78
. CSC 347 (Compilers) 25 / 28
Compiler-Construction T
ools — continued

These tools use specialized languages for specifying and


implementing specific components, and many use quite
sophisticated algorithms.
The most successful tools are those that hide the details of
the generation algorithm and produce components that
can be easily integrated into the remainder of the compiler.

79
. CSC 347 (Compilers) 26 / 28
Compiler-Construction T
ools — continued

Some commonly used compiler-construction tools include


1. Parser generators that automatically produce syntax
analyzers from a grammatical description of a
programming language.
2. Scanner generators that produce lexical analyzers from a
regular-expression description of the tokens of a
language.
3. Syntax-directed translation engines that produce collections
of routines for walking a parse tree and generating
intermediate code.

80
. CSC 347 (Compilers) 27 / 28
Compiler-Construction T
ools — continued

Some commonly used compiler-construction tools include


4. Code-generator generators that produce a code generator
from a collection of rules for translating each
operation of the intermediate language into the
machine language for a target machine.
5. Data-flow analysis engines that facilitate the gathering of
information about how values are transmitted from
one part of a program to each other part.
Data-flow analysis is a key part of code
optimization.
6. Compiler-construction toolkits that provide an integrated set
of routines for constructing various phases of a
compiler.

81
. CSC 347 (Compilers) 27 / 28
End of
82
Slides
. CSC 347 (Compilers) 28 / 28

You might also like