Compiler CH-2
Compiler CH-2
Chapter 2:
1
Program Language Translation
After going through this chapter:
Students must be quite comfortable with the concepts related
to compilers and should be able to deploy their knowledge in
various related fields.
Students should be confident that they can use language
processing technology for various software developments.
2
Why Do We Study Compilers?
Reason #1: understand compilers and languages.
Understand the code structure and language semantics.
Understand relation between source code and generated machine code.
Allow to become a better programmer and increase programmer
productivity and portability
Reason #2: nice balance of theory and practice.
Theory:
Mathematical models: regular expressions, automata, grammars, graphs.
Algorithms that use these models.
Practice:
Apply theoretical notions to build a real compiler.
Speedcoding interpreter:
5 – programs ran about 10 times slower than hand written
Con…
Fortran I project (1954-1957): The first compiler was
released
Fortran I compiler had a huge impact on the programming
languages and computer science.
7
A Language-processing System
8
A Language-processing System
Preprocessor-
A preprocessor takes the skeletal source program as input and
9
Language Processors:
Computer programs are generally written in high-level
languages (like C++, Python, and Java).
A language processor, or language translator, is a computer
program that convert source code from one programming
language to another language.
Language Processors includes
– Compilers, Interpreters, Pre-processors, Assemblers, Linkers,
Loaders.
10
Compilers
A compiler is a program that reads a program
written in one language –– the source language ––
and translates it into an equivalent program in
another language –– the target language.
Usually the source language is a high level
language like Java, C, C++ etc. whereas the target
language is machine code or "code" that a
computer's processor understands.
A compiler is also used to report any errors in the
11
source program that it detects during the
Compilers
12
Compilers
Target languages are equally as varied;
The basic tasks that any compiler must perform are essentially the same.
By understanding these tasks, we can construct compilers for a wide
variety of source languages and target machines using the same basic
techniques.
If the target program is an executable machine-language program, it can
then be called by the user to process inputs and produce outputs;
13
Interpreters
An interpreter translates code written in a high-level
programming language into machine code line-by-line as the
code runs.
Conversion is done line by line.
Error will be displayed at the time of conversion.
14
Example 2.1:
Java language processors combine compilation and
interpretation.
A Java source program may first be compiled into an
intermediate form called bytecodes.
The bytecodes are then interpreted by a virtual machine.
A benefit of this arrangement is that bytecodes compiled on
one machine can be interpreted on another machine,
perhaps across a network.
15
Example 2.1
In order to achieve faster processing of inputs to outputs, some Java
compilers, called just-in-time compilers, translate the bytecodes into machine
language immediately before they run the intermediate program to process the
input.
16
Compilers….
The source language is optimized for humans.
It is more user-friendly, to some extent platform-
independent.
They are easier to read, write, and maintain, and hence it is
easy to avoid errors.
A program written in any language must be translated into a
form that is understood by the computer.
This form is typically known as Machine Language (ML)
or Machine Code, or Object Code.
– a = b + c ∗ d;
– the output corresponding to this input might look something like this:
18
Sample Problem 2.2
Show the output of a Java native code compiler, in
any typical assembly language, for the following
Java input string: while (x<a+b) x = 2*x;
19
Compilers vs. Interpreter
20
Compilers vs. Interpreter
Figure 2.4: A Compiler and Interpreter produce very different output for the
same input.
21
Compilers vs. Interpreter
22
Sample Problem 2.3
Show the compiler output and the interpreter output for the
following Java source code:
for (i=1; i<=4; i++) System.out.println (i*3);
23
Compiler vs. Interpreter
24
Exercise
Show assembly language for a machine of your choice,
corresponding to each of the following Java statements:
1. a = b + c;
2. a = (b+c) * (c-d);
25
Big C notation for compilers
It is important to remember that a compiler is a
program, and it must be written in some language
(machine, assembly, high-level).
In describing this program, we are dealing with three
languages:
1) The source language, i.e. the input to the compiler,
2) The object language, i.e. the output of the compiler,
3) The language in which the compiler is written
26
Exercise
Using the big C notation, show each of the
following compilers:
1) An Ada compiler which runs on the PC and compiles
to the PC machine language.
27
Assemblers-
Assemblers- translate programs written in assembly
language into machine code.
Assembly language is called low-level language.
Because there is one to one correspondence between the
assembly language statements and machine language
statements.
Symbolic form of the machine language, makes it easy to
translate
Compiler generates assembly language as its target
language and assembler translate it into object code.
28
Assemblers-
Assembler is basically the 1st interface that is able to
communicate humans with the machine.
29
Linker
Linker - is responsible for taking object code
generated by the compiler and linking it with
libraries and other object code to create a
final executable program.
It resolves references to external symbols and
creates the complete executable.
Common examples include the GNU linker
(LD) and the Microsoft Visual C++ linker.
30
Loader
Loader - is responsible for loading an executable
31
The Phases of a Compiler
The structure of a compiler includes;
Lexical analysis,
Syntax analysis,
Semantic Analysis,
33
Overview of compiler
34
1. Lexical Analysis
In a compiler, lexical analysis is called linear
analysis or scanning.
The lexical analysis phase reads the characters in
the source program and groups them into a stream
of tokens in which each token represents a logically
cohesive sequence of characters, such as, an identifier,
keyword, punctuation character.
The character sequence forming a token is called
the lexeme for the token.
35
Lexical Analysis
NOTE: In computer science, a program that executes the process
of lexical analysis is called a scanner, tokenizer, or lexer.
36
Lexical Analysis
A token includes
Key words - while, void, if, for, ...
Identifiers - declared by the programmer
Operators - +, -, *, /, =, ==, ...
Numeric constants - numbers such as 124, 12.35, 0.09E-23,
etc.
Character constants – a character or strings of characters
enclosed in quotes
Special characters - characters used as delimiters such as . (
),;:
37 Comments - ignored by subsequent phases. These must be
Lexical Analysis
Roles and Responsibilities of Lexical Analyzer
It is accountable for terminating the comments and
white spaces from the source program.
It helps in identifying the tokens.
Categorization of lexical units.
Example: List of tokens are:
– Identifier
(Fahrenheit)
– Assignment (=)
– Identifier
(centigrade)
– operator (*)
– Numeric constant
38 (1.8)
– operator
Sample Problem 2.4
39
2. Syntax Analysis
The second phase of a compiler is syntax analysis, also
known as parsing.
This phase takes the stream of tokens generated by the
lexical analysis phase and checks whether they conform to
the grammar of the programming language.
The output of this phase is usually an Abstract Syntax Tree
(AST).
40
Syntax Analysis
It accepts tokens as input and provides a parse tree as
output.
The parser will check for proper syntax, issue appropriate error
messages, and determine the underlying structure of the source
program.
The output of this phase may be a stream of atoms or a
collection of syntax trees.
Roles and Responsibilities of Syntax Analyzer
– Helps in building a parse tree.
– Acquire tokens from the lexical analyzer.
– Scan the syntax errors, if any.
41
Syntax Analysis
Example: Fahrenheit = centigrade * 1.8 + 32
42
Parsing or syntax analysis
43
Semantic Analysis
This phase checks whether the code is semantically
correct, i.e., whether it conforms to the language’s
type system and other semantic rules.
Roles and Responsibilities of Semantic Analyzer:
– Saving collected data to symbol tables or syntax
trees.
– It notifies semantic errors.
– Scanning for semantic errors.
44
Semantic Analysis
45
Semantic Analysis
Example:
46
Intermediate Code Generation
This phase generates an intermediate representation of the source code that
can be easily translated into machine code.
A middle-level language code generated by a compiler at the time of the
translation of a source program into the object code is known as intermediate
code.
47
Intermediate Code Generation
48
Optimization
This phase applies various optimization
techniques to the intermediate code to improve
the performance of the generated machine code.
50
Principle Sources of Optimization
51
Code Generation
The final phase of a compiler is code generation.
This phase takes the optimized intermediate code
and generates the actual machine code that can
be executed by the target hardware.
Roles and Responsibilities:
– Translate the intermediate code to target
machine code.
– Select and allocate memory spots and registers.
52
Code Generation
53
Generally, phases are divided into two parts:
Phases in
Compiler 1. Front End phases:
– Source language-dependent and
target machine, independents.
– Front end phase consist of lexical
analysis, semantic
analysis, syntactic analysis, symbol
table, and intermediate code
generation.
– The front-end also includes the
error handling that goes along
with each of the phases.
2. Back End phases:
– depend on the target machine
and do not depend on the source
language.
– In the back end, code generation and code
optimization phases, along with error
54
handling and symbol table operations.
Symbol Table
The symbol table is mainly known as the data structure of the compiler.
It helps in storing the identifiers with their name and types.
It stores:
Information about the identifier such as,
– its type, (by semantic and intermediate code)
– its scope, (by semantic and intermediate code)
– storage allocation, (by code generation)
– number of arguments and its type for procedure, the type returned.
It stores the literal constants and strings.
It helps in storing the function names.
It also prefers to store variable names and constants.
It stores labels in source languages.
55
Error Detecting and Reporting
Each phase encounters errors.
Lexical phase determine the input that do not
form token.
Syntax phase determine the token that violates
the syntax rule.
Semantic phase detects the constructs that
have no meaning to operand.
56
Error handling in Compilers
Compile time errors
Syntax errors are reported
Other kinds of errors not
57
Compiler writing tools
The compiler writer can use some specialized tools that
help in implementing various phases of a compiler.
These tools assist in the creation of an entire compiler or
its parts.
Some commonly used compiler construction tools include:
– Parser Generators
– Scanner Generators
– Syntax-directed translation engine
– Automatic code generators
– Data-flow analysis Engines
– Compiler Construction toolkits
58
Parser Generators
Input : Grammatical description of a programming language
Output : Syntax analyzers.
These produce syntax analyzers, normally from input that is
based on a context-free grammar.
This phase is one of the easiest to implement.
– YACC,Bison
59
Scanner Generators
Input : Regular expression description of the tokens of a
language
Output : Lexical analyzers.
These automatically generate lexical analyzers,
normally from a specification based on regular
expressions.
The basic organization of the resulting lexical analyzer