Chapter 1 Introduction
Chapter 1 Introduction
(SENG471)
Chapter One
Introduction
1
Preliminaries Required
Textbook:
Addison-Wesley, 2007.
Objective
At the end of this session students will be able to:
3
Introduction
Definition
Compiler is an executable program that can read a program in one high-level language and
A compiler is a computer program that translates an executable program in a source language into
high-level language.
Assembly language
Software Engineering
Computer Architecture
Discrete Mathematics
5
Why Study Theory of Compiler?
Curiosity
To apply almost all of the major computer science fields, such as: automata theory,
Link/Load.
The growth of portable computers has created a market for minimizing the
12
source program
Contd.
preprocessor
modified source program
compiler
target assembly program
assembler
linker/loader Library
files
target machine code
C. Linker:- is a program that takes one or more objects generated by a
compiler and combines them into a single executable program.
D. Loader:- is the part of an operating system that is responsible for loading
programs from executables (i.e., executable files) into memory, preparing
13 them for execution and then executing them.
Compiler vs. Interpreter
Ideal concept:
Source code Compiler Executable
Input data
Executable Output data
Source code
Interpreter Output data
Input data
Most languages are usually thought of as using either one or the other:
constituent pieces
3. Semantic Analysis Creates intermediate representation of
source program
Stream of characters
scanner
Stream of tokens
parser
Parse/syntax tree
Semantic analyzer
Annotated tree
Intermediate code generator
General Structure of a Compiler Intermediate code
Code optimization
Intermediate code
Code generator
Target code
Code optimization
18
Target code
Phase I: Lexical Analysis
The low-level text processing portion of the compiler
The source file, a stream of characters, is broken into larger chunks called
token.
For example:
void main() It will be broken into 13 tokens as
{
int x; below:
x=3; void main ( ) { int x ; x = 3 ; }
}
The lexical analyzer (scanner) reads a stream of characters and puts them
together into some meaningful (with respect to the source language) units
called tokens.
Typically, spaces, tabs, end-of-line characters and comments are ignored by
analysis
Constructed by repeated application of rules in Context Free Grammar (CFG)
Syntax structures are analyzed by DPDA (Deterministic Push Down Automata)
Example: parse tree for position:=initial + rate*60
21
Phase III: Semantic Analysis
It gets the parse tree from the parser together with information about some
syntactic elements
It determines if the semantics (meanings) of the program is correct.
It detects errors of the program, such as using variables before they are
declared, assign an integer value to a Boolean variable, …
This part deals with static semantic.
semantic of programs that can be checked by reading off from the
program only.
syntax of the language which cannot be described in context-free
grammar.
Mostly, a semantic analyzer does type checking (i.e. Gathers type information
for subsequent code generation.)
22
It modifies the parse tree in order to get that (static) semantically correct code
Contd.
The main tool used by the semantic analyzer is a symbol table
Symbol table:- is a data structure with a record for each identifier and its
attributes
Attributes include storage allocation, type, scope, etc
All the compiler phases insert and modify the symbol table
Discovery of meaning in a program using the symbol table
Do static semantics check
Simplify the structure of the parse tree ( from parse tree to abstract syntax tree
(AST) )
Static semantics check
Making sure identifiers are declared before use
Type checking for assignments and operators
23 Checking types and number of parameters to subroutines
Phase IV: Intermediate Code Generation
An intermediate code generator
first and then the intermediate code is translated into the target language.
In other compilers, a source program is translated directly into the target
language.
Compiler makes a second pass over the parse tree to produce the translated
code
If there are no compile-time errors, the semantic analyzer translates the
24 abstract syntax tree into the abstract assembly tree
Contd.
25
Phase V: Assembly Code Generation
Code generator coverts the abstract assembly tree into the actual assembly
code
To do code generation
The generator covers the abstract assembly tree with tiles (each tile
Output the actual assembly code associated with the tiles that we used to cover
Phase VI: Machine Code Generation and Linking
the tree
The final phase of compilation coverts the assembly code into machine code
instructions.
Sometimes called code improvement.
Code optimization can be done:
after semantic analyzing
performed on a parse tree
after intermediate code generation
performed on a intermediate code
after code generation
performed on a target code
The compiler looks at a very small block of instructions and tries to determine
1. Constant evaluation
2. Strength reduction
28
Global Optimization
The compiler looks at large segments of the program to decide how to improve
performance
Much more difficult; usually omitted from all but the most sophisticated and
29
The Phases of a Compiler
Phase Output Sample
Programmer (source code producer) Source string A=B+C;
Scanner (performs lexical analysis) Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’
And symbol table with names
Parser (performs syntax analysis based Parse tree or abstract syntax tree ;
on the grammar of the programming |
language) =
/\
A +
/\
B C
Semantic analyzer (type checking, etc) Annotated parse tree or abstract
syntax tree
31
Compiler Construction Tools
Software development tools are available to implement one or more compiler phases
Scanner generators Other compiler tools:
JavaCC, a parser generator for Java, including
Parser generators. scanner generator and parser generator. Input
specifications are different than those suitable for
Syntax-directed translation engines
Lex/YACC. Also, unlike YACC, JavaCC generates
Automatic code generators a top-down parser.
ANTLR, a set of language translation tools
Data Flow Engines (formerly PCCTS). Includes scanner/parser
generators for C, C++, and Java.