0% found this document useful (0 votes)
23 views

Compiler CH-2

Uploaded by

Yohannes Dereje
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Compiler CH-2

Uploaded by

Yohannes Dereje
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

PRINCIPLES OF COMPILER DESIGN

Chapter 2:

Program Language Translation

1
Program Language Translation
 After going through this chapter:
 Students must be quite comfortable with the concepts related
to compilers and should be able to deploy their knowledge in
various related fields.
 Students should be confident that they can use language
processing technology for various software developments.

2
Why Do We Study Compilers?
Reason #1: understand compilers and languages.
 Understand the code structure and language semantics.
 Understand relation between source code and generated machine code.
 Allow to become a better programmer and increase programmer
productivity and portability
Reason #2: nice balance of theory and practice.
 Theory:
 Mathematical models: regular expressions, automata, grammars, graphs.
 Algorithms that use these models.
 Practice:
 Apply theoretical notions to build a real compiler.

Reason #3: programming experience.


 Creating a compiler entails writing a large computer program which
manipulates complex data structures and implement sophisticated algorithm

3 Increasing programming capability
Bit of History
 How are programming languages implemented?
 Two major strategies:
 Interpreters
 Compilers (very well understood with mathematical
foundations)
 Some environments provide both interpreter and
compiler. Lisp, scheme provide
 Interpreter for development
 Compiler for deployment
4
Con…
 Java
 Java compiler: Java to interpretable bytecode
 Java JIT(Just In Time): bytecode to executable image

 Some early machines and implementations


 IBM developed 704 in 1954 –
 All programming was done in assembly language
 Cost of software development far exceeded cost of
hardware.
 Low productivity.

 Speedcoding interpreter:
5 – programs ran about 10 times slower than hand written
Con…
 Fortran I project (1954-1957): The first compiler was
released
 Fortran I compiler had a huge impact on the programming
languages and computer science.

 The whole new field of compiler design was


started
 More than half the programmers were using Fortran by
1958
 The development time was cut down to half
 Led to enormous amount of theoretical work (lexical
analysis, parsing, optimization, structured programming,
6 code generation, error recovery)
Con…
 John Backus (in 1954): Proposed a program
that translated high level expressions into
native machine code. Skeptism all around,
most people thought it was impossible
 Throughout the 1950’s, compilers were considered
notoriously difficult programs to write.
 The first FORTRAN compiler, for example, took 18
staff-years to implement.

7
A Language-processing System

8
A Language-processing System
 Preprocessor-
 A preprocessor takes the skeletal source program as input and

produces an extended version of it, which is the resultant of

expanding the Macros, manifest constants, and including header

files in the source file.


 Over and above a preprocessor performs the following activities:

– Collects all the modules, files in case if the source program is

divided into different modules stored at different files.

– Expands short hands / macros into source language statements.

9
Language Processors:
 Computer programs are generally written in high-level
languages (like C++, Python, and Java).
A language processor, or language translator, is a computer
program that convert source code from one programming
language to another language.
 Language Processors includes
– Compilers, Interpreters, Pre-processors, Assemblers, Linkers,
Loaders.

10
Compilers
A compiler is a program that reads a program
written in one language –– the source language ––
and translates it into an equivalent program in
another language –– the target language.
 Usually the source language is a high level
language like Java, C, C++ etc. whereas the target
language is machine code or "code" that a
computer's processor understands.
A compiler is also used to report any errors in the
11
source program that it detects during the
Compilers

 There are thousands of source languages, ranging


from traditional programming languages such as
FORTRAN and Pascal to specialized languages.

12
Compilers
 Target languages are equally as varied;
 The basic tasks that any compiler must perform are essentially the same.
 By understanding these tasks, we can construct compilers for a wide
variety of source languages and target machines using the same basic
techniques.
 If the target program is an executable machine-language program, it can
then be called by the user to process inputs and produce outputs;

13
Interpreters
 An interpreter translates code written in a high-level
programming language into machine code line-by-line as the
code runs.
 Conversion is done line by line.
 Error will be displayed at the time of conversion.

14
Example 2.1:
 Java language processors combine compilation and
interpretation.
A Java source program may first be compiled into an
intermediate form called bytecodes.
 The bytecodes are then interpreted by a virtual machine.
A benefit of this arrangement is that bytecodes compiled on
one machine can be interpreted on another machine,
perhaps across a network.

15
Example 2.1
 In order to achieve faster processing of inputs to outputs, some Java
compilers, called just-in-time compilers, translate the bytecodes into machine
language immediately before they run the intermediate program to process the
input.

16
Compilers….
 The source language is optimized for humans.
 It is more user-friendly, to some extent platform-
independent.
 They are easier to read, write, and maintain, and hence it is
easy to avoid errors.
 A program written in any language must be translated into a
form that is understood by the computer.
 This form is typically known as Machine Language (ML)
or Machine Code, or Object Code.

 Consists of streams of 0’s and 1’s


17
Compilers….
 Some examples of compilers are:
– A Java compiler for the Apple Macintosh
– A COBOL compiler for the SUN
– A C++ compiler for the Apple Macintosh
 If a portion of the input to a Java compiler looked like this:

– a = b + c ∗ d;
– the output corresponding to this input might look something like this:

18
Sample Problem 2.2
 Show the output of a Java native code compiler, in
any typical assembly language, for the following
Java input string: while (x<a+b) x = 2*x;

19
Compilers vs. Interpreter

 The machine-language target program produced by a


compiler is usually much faster than an interpreter at
mapping inputs to outputs .
 An interpreter, however, can usually give better error
diagnostics than a compiler, because it executes the source
program statement by statement.

20
Compilers vs. Interpreter

Figure 2.4: A Compiler and Interpreter produce very different output for the
same input.
21
Compilers vs. Interpreter

 The input to an interpreter is a program written in a high-level


language, but rather than generating a machine language
program, the interpreter actually carries out the computations
specified in the source program.
 In other words, the output of a compiler is a program, whereas
the output of an interpreter is the source program’s output.

22
Sample Problem 2.3
 Show the compiler output and the interpreter output for the
following Java source code:
for (i=1; i<=4; i++) System.out.println (i*3);

23
Compiler vs. Interpreter

Generally, Compilers and Interpreter are different by the


following points:

24
Exercise
 Show assembly language for a machine of your choice,
corresponding to each of the following Java statements:

1. a = b + c;

2. a = (b+c) * (c-d);

3. for (i=1; i<=10; i++) a = a+i;


 Show the difference between compiler output and
interpreter output for each of the following source inputs:

25
Big C notation for compilers
 It is important to remember that a compiler is a
program, and it must be written in some language
(machine, assembly, high-level).
 In describing this program, we are dealing with three
languages:
1) The source language, i.e. the input to the compiler,
2) The object language, i.e. the output of the compiler,
3) The language in which the compiler is written

26
Exercise
 Using the big C notation, show each of the
following compilers:
1) An Ada compiler which runs on the PC and compiles
to the PC machine language.

2) An Ada compiler which compiles to the PC machine


language, but which is written in Ada.

3) An Ada compiler which compiles to the PC machine


language, but which runs on a Sun.

27
Assemblers-
 Assemblers- translate programs written in assembly
language into machine code.
 Assembly language is called low-level language.
 Because there is one to one correspondence between the
assembly language statements and machine language
statements.
 Symbolic form of the machine language, makes it easy to
translate
 Compiler generates assembly language as its target
language and assembler translate it into object code.

28
Assemblers-
 Assembler is basically the 1st interface that is able to
communicate humans with the machine.

29
Linker
 Linker - is responsible for taking object code
generated by the compiler and linking it with
libraries and other object code to create a
final executable program.
 It resolves references to external symbols and
creates the complete executable.
 Common examples include the GNU linker
(LD) and the Microsoft Visual C++ linker.
30
Loader
 Loader - is responsible for loading an executable

program into memory for execution.


 It prepares the memory space, reads the binary code, and
resolves any dynamic linking or loading of shared libraries.
 This is often done by the operating system's loader
component.

31
The Phases of a Compiler
 The structure of a compiler includes;
 Lexical analysis,
 Syntax analysis,
 Semantic Analysis,

 Intermediate code generation,


 Code Optimization,
 Code generation,
 Bookkeeping and
32Error handling
The Analysis-Synthesis Model of Compilation:
 There are two parts of compilation:
 Analysis: The analysis part breaks up the source program into
constituent pieces. Such as,
– Lexical Analysis
– Syntax Analysis
– Semantic Analysis
 Synthesis: Creates an intermediate representation of the
source program.
– Code Optimization,
– Code generation

33
Overview of compiler

34
1. Lexical Analysis
 In a compiler, lexical analysis is called linear
analysis or scanning.
 The lexical analysis phase reads the characters in
the source program and groups them into a stream
of tokens in which each token represents a logically
cohesive sequence of characters, such as, an identifier,
keyword, punctuation character.
 The character sequence forming a token is called
the lexeme for the token.
35
Lexical Analysis
 NOTE: In computer science, a program that executes the process
of lexical analysis is called a scanner, tokenizer, or lexer.

 A token(a lexeme, a lexical item, or a lexical token) is basically the


arrangement of characters that defines a unit of information in
the source code.

36
Lexical Analysis
 A token includes
 Key words - while, void, if, for, ...
 Identifiers - declared by the programmer
 Operators - +, -, *, /, =, ==, ...
 Numeric constants - numbers such as 124, 12.35, 0.09E-23,
etc.
 Character constants – a character or strings of characters
enclosed in quotes
 Special characters - characters used as delimiters such as . (
),;:

37 Comments - ignored by subsequent phases. These must be
Lexical Analysis
 Roles and Responsibilities of Lexical Analyzer
 It is accountable for terminating the comments and
white spaces from the source program.
 It helps in identifying the tokens.
 Categorization of lexical units.
 Example:  List of tokens are:
– Identifier
(Fahrenheit)
– Assignment (=)
– Identifier
(centigrade)
– operator (*)
– Numeric constant
38 (1.8)
– operator
Sample Problem 2.4

39
2. Syntax Analysis
 The second phase of a compiler is syntax analysis, also
known as parsing.
 This phase takes the stream of tokens generated by the
lexical analysis phase and checks whether they conform to
the grammar of the programming language.
 The output of this phase is usually an Abstract Syntax Tree
(AST).

40
Syntax Analysis
 It accepts tokens as input and provides a parse tree as
output.
 The parser will check for proper syntax, issue appropriate error
messages, and determine the underlying structure of the source
program.
 The output of this phase may be a stream of atoms or a
collection of syntax trees.
 Roles and Responsibilities of Syntax Analyzer
– Helps in building a parse tree.
– Acquire tokens from the lexical analyzer.
– Scan the syntax errors, if any.

41
Syntax Analysis
 Example: Fahrenheit = centigrade * 1.8 + 32

42
Parsing or syntax analysis

43
Semantic Analysis
 This phase checks whether the code is semantically
correct, i.e., whether it conforms to the language’s
type system and other semantic rules.
 Roles and Responsibilities of Semantic Analyzer:
– Saving collected data to symbol tables or syntax
trees.
– It notifies semantic errors.
– Scanning for semantic errors.

44
Semantic Analysis

45
Semantic Analysis
 Example:

46
Intermediate Code Generation
 This phase generates an intermediate representation of the source code that
can be easily translated into machine code.
 A middle-level language code generated by a compiler at the time of the
translation of a source program into the object code is known as intermediate
code.

 Roles and Responsibilities:


– Helps in maintaining the priority ordering of the source language.
– Translate the intermediate code into the machine code.
– Having operands of instructions.

47
Intermediate Code Generation

48
Optimization
 This phase applies various optimization
techniques to the intermediate code to improve
the performance of the generated machine code.

 Roles and Responsibilities:


– Remove the unused variables and unreachable code.
– Enhance runtime and execution of the program.
49 – Produce streamlined code from the intermediate expression.
Code optimization

50
Principle Sources of Optimization

51
Code Generation
 The final phase of a compiler is code generation.
 This phase takes the optimized intermediate code
and generates the actual machine code that can
be executed by the target hardware.
 Roles and Responsibilities:
– Translate the intermediate code to target
machine code.
– Select and allocate memory spots and registers.

52
Code Generation

53
 Generally, phases are divided into two parts:
Phases in
Compiler 1. Front End phases:
– Source language-dependent and
target machine, independents.
– Front end phase consist of lexical
analysis, semantic
analysis, syntactic analysis, symbol
table, and intermediate code
generation.
– The front-end also includes the
error handling that goes along
with each of the phases.
2. Back End phases:
– depend on the target machine
and do not depend on the source
language.
– In the back end, code generation and code
optimization phases, along with error
54
handling and symbol table operations.
Symbol Table
 The symbol table is mainly known as the data structure of the compiler.
 It helps in storing the identifiers with their name and types.
 It stores:
 Information about the identifier such as,
– its type, (by semantic and intermediate code)
– its scope, (by semantic and intermediate code)
– storage allocation, (by code generation)
– number of arguments and its type for procedure, the type returned.
 It stores the literal constants and strings.
 It helps in storing the function names.
 It also prefers to store variable names and constants.
 It stores labels in source languages.
55
Error Detecting and Reporting
 Each phase encounters errors.
 Lexical phase determine the input that do not
form token.
 Syntax phase determine the token that violates
the syntax rule.
 Semantic phase detects the constructs that
have no meaning to operand.

56
Error handling in Compilers
Compile time errors
 Syntax errors are reported
 Other kinds of errors not

by the compiler at compile generally detected by the

time. compiler are called run-time


errors.

57
Compiler writing tools
 The compiler writer can use some specialized tools that
help in implementing various phases of a compiler.
 These tools assist in the creation of an entire compiler or
its parts.
 Some commonly used compiler construction tools include:
– Parser Generators
– Scanner Generators
– Syntax-directed translation engine
– Automatic code generators
– Data-flow analysis Engines
– Compiler Construction toolkits

58
Parser Generators
 Input : Grammatical description of a programming language
 Output : Syntax analyzers.
 These produce syntax analyzers, normally from input that is
based on a context-free grammar.
 This phase is one of the easiest to implement.
– YACC,Bison

59
Scanner Generators
 Input : Regular expression description of the tokens of a
language
 Output : Lexical analyzers.
 These automatically generate lexical analyzers,
normally from a specification based on regular
expressions.
 The basic organization of the resulting lexical analyzer

is in effect a finite automaton.


 e.g., LEX
60

You might also like