0% found this document useful (0 votes)
86 views47 pages

Introduction Compiler

The document discusses compilers and interpreters. It explains that a compiler translates source code into machine code that can be executed, while an interpreter executes source code directly without compiling. The compilation process involves an analysis phase that checks for errors, and a synthesis phase that generates the target code. Key phases of a compiler include lexical analysis, syntax analysis, and semantic analysis in the front-end, which generate an intermediate representation, and code generation in the back-end.

Uploaded by

Harshit Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views47 pages

Introduction Compiler

The document discusses compilers and interpreters. It explains that a compiler translates source code into machine code that can be executed, while an interpreter executes source code directly without compiling. The compilation process involves an analysis phase that checks for errors, and a synthesis phase that generates the target code. Key phases of a compiler include lexical analysis, syntax analysis, and semantic analysis in the front-end, which generate an intermediate representation, and code generation in the back-end.

Uploaded by

Harshit Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Introduction to Compiler

Construction

Fall 2023
Compiler
• A given source language is either compiled or
interpreted for execution
• compiler is a program that translates a source
program (HLL; C, Java) into target code;
machine re-locatable code or assembly code.
– The generated machine code can be later
executed many times against different data
each time.
– The code generated is not portable to other
systems.
Interpreter

In an interpreted language, implementations
execute instructions directly and freely
without previously compiling a program into
machine code instructions.

Translation occurs at the same time as the
program is being executed.

An interpreter reads an executable source
program written in HLL as well as data for this
program, and it runs the program against the
data to produce some results.

Interpreter


Common interpreters include Perl, Python, and
Ruby interpreters, which execute Perl, Python,
and Ruby code respectively.

Others include Unix shell interpreter, which
runs operating system commands interactively.

Source program is interpreted every time it is
executed (less efficient).

Interpreter


Interpreted languages are portable since they
are not machine dependent. They can run on
different operating systems and platforms.

They are translated on the spot and thus
optimized for the system on which they’re
being run.

Phases
Compilers and Interpreters

• “Compilation”
– Translation of a program written in a source
language into a semantically equivalent
program written in a target language
Input

Source Target
Compiler
Program Program

Error messages Output


Compilers and Interpreters (cont’d)

• “Interpretation”
– Performing the operations implied by the
source program

Source
Program
Interpreter Output
Input

Error messages
The Analysis-Synthesis Model of
Compilation
• There are two parts to compilation:
– Analysis Phase
This is also known as the front-end of the compiler. It
reads the source program, divides it into core parts and
then checks for lexical, grammar and syntax errors. The
analysis phase generates an intermediate representation
of the source program and symbol table, which should
be fed to the Synthesis phase as input
– Synthesis Phase
Its also known as the back-end of the compiler.
It generates the target program with the help of
intermediate source code representation and symbol
table.
Other Tools that Use the
Analysis-Synthesis Model
• Editors (syntax highlighting)
• Pretty printers (e.g. Doxygen)
• Static checkers (e.g. Lint and Splint)
• Interpreters
• Text formatters (e.g. TeX and LaTeX)
• Silicon compilers (e.g. VHDL)
• Query interpreters/compilers (Databases)
Preprocessors, Compilers, Assemblers and
Linkers
• A preprocessor considered as part of compiler, is a
tool that produces input for compilers. It deals with
macro-processing, file inclusion, language
extension, etc.
• Assembler
An assembler translates assembly language programs
into machine code. The output of an assembler is called
an object file, which contains a combination of
machine instructions as well as the data required to
place these instructions in memory.
Preprocessors, Compilers, Assemblers and
Linkers
• Linker
A computer program that links and merges various
object files together in order to make an executable
file.

All these files might have been compiled by separate
assemblers. The major task of a linker is to search
and locate referenced module/routines in a program
and to determine the memory location where these
codes will be loaded, making the program
instruction to have absolute references.
Compiler Design - Architecture of a
Compiler
• A compiler can have many phases and passes.
• Pass : A pass refers to the traversal of a compiler
through the entire program.
• Phase : A phase of a compiler is a distinguishable
stage, which takes input from the previous stage,
processes and yields output that can be used as input
for the next stage. A pass can have more than one
phase.
Phases of a Compiler
• The compilation process is a sequence of various
phases.
• Each phase takes input from its previous stage and
has its own representation of source program, and
feeds its output to the next phase of the compiler.
Traditional three pass compiler

Source IR Middle IR Machine


Front end Back end
code end code

errors
Phases of a Compiler - Front end
 The front end analyzes the source code to
build an internal representation of the
program, called the intermediate
representation (IR).
 It also manages the symbol table, a data
structure mapping each symbol in the source
code to associated information such as
location, type and scope.
Phases of a Compiler - Front end cont’d

The front end includes all analysis phases and


the intermediate code generator.
• Lexical analysis is the first phase of compiler
which is also termed as scanning.
• During this phase, Source program is scanned to
read the stream of characters and those characters
are grouped to form a sequence called lexemes
which produces token as output. Tokens are defined
by regular expressions which are understood by the
lexical analyzer.
Lexical Analysis

lexical analysis: The process of converting a sequence
of characters (such as in a computer program) into a
sequence of tokens (strings with an identified
"meaning")

Lexical analysis takes the modified source code from
language preprocessors that are written in the form of
sentences.

The lexical analyzer breaks these syntaxes into a
series of tokens, by removing any whitespace or
comments in the source code.
Lexical Analysis

The lexical analyzer (either generated automatically
by a tool like lex, or hand-crafted) reads in a stream
of characters, identifies the lexemes in the stream, and
categorizes them into tokens.

This is called "tokenizing". If the lexer finds an
invalid token, it will report an error.
Front end: Terminologies
• Token: Token is a sequence of characters
that represent lexical unit, which matches
with the pattern, such as keywords,
operators, identifiers etc.
• Lexeme: Lexeme is instance of a token i.e.,
group of characters forming a token.
• Pattern: Pattern describes the rule that the
lexemes of a token takes. It is the structure
that must be matched by strings.
Token and Lexeme
 Once a token is generated the corresponding
entry is made in the symbol table.
At lexical analysis phase,
Input: stream of characters
Output: Token
Token Template:
<token-name, attribute-value>
 For example, for c=a+b*5;
Hence,
<id, 1><=>< id, 2>< +><id, 3 >< * >< 5>
Token and Lexeme Cont’d
Syntax Analysis
 Syntax Analyzer is sometimes called as
parser. It constructs the parse tree. It takes all
the tokens one by one and uses Context Free
Grammar to construct the parse tree.
Why Grammar ?
 The rules of programming can be entirely
represented in some few productions. Using
these productions we can represent what the
program actually is. The input has to be
checked whether it is in the desired format or
not.
Syntax Analysis cont’d

 Syntax error can be detected at this level if


the input is not in accordance with the
grammar.
Syntactic Analysis

Parsing or syntactic analysis is the process of
analyzing a string of symbols, either in natural
language or in computer languages,
conforming to the rules of a formal grammar

Parse: analyze (a string or text) into logical
syntactic components, typically in order to test
conformability to a logical grammar.
Syntactic Analysis cont’d

If the lexical analyzer finds a token invalid, it
generates an error.

The lexical analyzer works closely with the
syntax analyzer. It reads character streams
from the source code, checks for legal tokens,
and passes the data to the syntax analyzer
when it demands.
Semantic Analysis

 Semantic analyzer takes the output of syntax


analyzer and produces another tree.
 Similarly, intermediate code generator takes
a tree as an input produced by semantic
analyzer and produces intermediate code.
Semantic Analyzer
Semantic Analysis cont’d
Syntax tree is a compressed representation of
the parse tree (a hierarchical structure that
represents the derivation of the grammar to
obtain input strings) in which the operators
appear as interior nodes and the operands of the
operator are the children of the node for that
operator.

Example of syntax tree
Semantic Analyzer
Semantic analysis is the third phase of compiler.
 It checks for the semantic consistency.
 Type information is gathered and stored in
symbol table or in syntax tree.
 Performs type checking.
 It verifies the parse tree, whether it’s
meaningful or not. It furthermore produces a
verified parse tree.
Semantic Analyzer
Front-end, Back-end division

Source IR Machine
Front end Back end
code code

errors

• Front end maps legal code into IR


• Back end maps IR onto target machine
• Allows multiple front ends
• Multiple passes -> better code
Front end
Source tokens IR
Scanner Parser
code

errors

• Recognize legal code


• Report errors
• Produce IR
• Preliminary storage maps
Front end
Source tokens IR
Scanner Parser
code

errors

• Scanner:
– Maps characters into tokens – the basic unit of syntax
• x = x + y becomes <id, x> = <id, x> + <id, y>
– Typical tokens: number, id, +, -, *, /, do, end
– Eliminate white space (tabs, blanks, comments)
• A key issue is speed so instead of using a tool like
LEX it sometimes needed to write your own
scanner
Front end
Source tokens IR
Scanner Parser
code

errors
• Parser:
– Recognize context-free syntax
– Guide context-sensitive analysis
– Construct IR
– Produce meaningful error messages
– Attempt error correction
• There are parser generators like YACC which
automates much of the work
Phases of a Compiler cont’d
Middle End – The Optimizer
 The middle end performs optimizations on the
intermediate representation in order to improve the
performance and the quality of the produced
machine code.
 The middle end contains those optimizations that
are independent of the CPU architecture being
targeted.
– Effort to realize efficiency
– Can be very computationally intensive
Middle end (optimizer)
• Modern optimizers are usually built as a set
of passes
• Typical passes
– Constant propagation
– Common sub-expression elimination
– Redundant store elimination
– Dead code elimination
Back end
Instruction Register Machine code
IR selection Allocation

errors

• Translate IR into target machine code


• Choose instructions for each IR operation
• Decide what to keep in registers at each
point
• Ensure conformance with system interfaces
Phases of a Compiler
 Back End – This is responsible for the CPU
architecture specific optimizations and for code
generation.
 Machine dependent optimizations: optimizations that
depend on the details of the CPU architecture that the
compiler targets
 Code generation. The transformed intermediate
language is translated into the output language, usually
the native machine language of the system.

Phases of a Compiler
 It also involves resource and storage decisions, such
as; deciding which variables to fit into registers and
memory and the selection and scheduling of
appropriate machine instructions along with their
associated addressing modes
– Processor (target) Dependant optimization

Phases of a Compiler - Instruction selection
• Instruction selection is the stage of a compiler back-
end that transforms its middle-level intermediate
representation (IR) into a low-level IR where each
operation directly corresponds to an instruction
available on the target machine.
• In a typical compiler, instruction selection precedes
both instruction scheduling and register allocation;
hence its output IR has an infinite set of pseudo-
registers
Back end
Instruction Register Machine code
IR selection Allocation

errors

• Have a value in a register when used


• Limited resources
• Optimal allocation is difficult
Intermediate Code Generation
After semantic analysis the compiler generates an
intermediate code of the source code for the target
machine.
– It represents a program for some abstract
machine.
– It is in between the high-level language and the
machine language.
– This intermediate code should be generated in
such a way that it makes it easier to be
translated into the target machine code.
Code Optimization
 Optimization can be assumed as something that
removes unnecessary code lines, and arranges the
sequence of statements in order to speed up the
program execution without wasting resources
(CPU, memory).
Code Generation
• In this phase, the code generator takes the
optimized representation of the intermediate code
and maps it to the target machine language.
• The code generator translates the intermediate
code into a sequence of (generally) re-locatable
machine code. Sequence of instructions of
machine code performs the task as the
intermediate code would do.
Symbol Table
 It is a data-structure maintained throughout all the
phases of a compiler.
 All the identifier's names along with their types
are stored here.
 The symbol table makes it easier for the compiler
to quickly search the identifier record and retrieve
it. The symbol table is also used for scope
management.
Compiler-Construction Tools
• Software development tools are available to
implement one or more compiler phases
– Scanner generators
– Parser generators
– Syntax-directed translation engines
– Automatic code generators
– Data-flow engines

Common questions

Powered by AI

A linker contributes to the creation of an executable file by linking and merging various object files (produced by assemblers) into a single executable . It resolves external references, assigns memory locations, and generates absolute references necessary for the final executable . In contrast, preprocessors handle tasks like macro processing and file inclusion, compilers translate high-level code into machine code or assembly, and assemblers convert this assembly code into object files containing machine instructions . The linker ensures all components are correctly combined and references are properly resolved to produce a functioning executable .

In compiler design, a 'pass' refers to how a compiler traverses the entire program to perform specific operations through various phases . Each pass may involve executing multiple phases, such as scanning, parsing, and optimization . The concept of passes influences the design of a compiler by dictating how often the source or intermediate representation is scanned and transformed before generating the final output . Multiple passes allow for more thorough analysis and optimizations, leading to better-generated code but may increase complexity and resource usage .

In the analysis-synthesis model of compilation, the front-end phase (analysis phase) involves reading the source program, dividing it into its core parts, checking for errors, and generating an intermediate representation and symbol table . The back-end phase (synthesis phase) uses this intermediate representation and symbol table to generate the target program, translating it into machine-specific instructions . While the front-end focuses on correctness and representation, the back-end focuses on optimization and efficient instruction generation for the specific target machine .

The symbol table is a crucial data structure maintained throughout the compilation process, used to store information about identifiers, including types, locations, and scopes . During lexical analysis, tokens generated are recorded in the symbol table as part of semantic analysis; it helps in verifying identifier declarations and usages, ensuring semantic consistency . The symbol table facilitates quick lookup and retrieval of identifier records across all compilation phases, playing a vital role in type checking and scope management .

Interpreted languages offer advantages in system compatibility because they are not tied to any specific machine architecture . Since interpretation occurs at runtime, the same source code can run on different platforms without modification, making interpreted languages highly portable . This offers flexibility in developing cross-platform applications where the code needs only to meet the interpreter's requirements rather than those of the target machine . Compiled languages, in contrast, generate machine-specific code that needs recompilation for each different system .

Instruction selection in the backend stage of a compiler impacts performance by determining how intermediate representations are mapped onto the machine's instruction set . Efficient instruction selection can minimize the number of instructions, optimize instruction combinations, and enhance CPU utilization, directly affecting execution speed . Choices made during instruction selection, such as which operations to prioritize and how to manage registers, influence the effectiveness of further phases like scheduling and register allocation, ultimately leading to higher-quality machine code that executes rapidly and utilizes processor resources effectively .

Lexical analysis is the first phase of compilation, where the source code is scanned and divided into tokens by identifying lexemes, which are sequences of characters that form a token . This process involves removing whitespaces and comments from the source code and converting it into a series of tokens . Syntax analysis (or parsing) follows lexical analysis and constructs a parse tree using the tokens to check for syntactic correctness according to grammar rules . The lexical analyzer works closely with the syntax analyzer by providing tokens one by one as input to the syntax analyzer .

The middle-end phase of a compiler performs several optimizations, including constant propagation, common sub-expression elimination, redundant store elimination, and dead code elimination . These optimizations are significant as they improve both the performance and size of the generated machine code by removing unnecessary computations, reducing operand loads, and minimizing resource usage . Performing these optimizations at the IR level allows for improvements that are independent of the target CPU architecture, enabling the generation of more efficient and faster running programs .

A compiler translates a high-level language (HLL) source program into machine code or assembly code, which can be executed multiple times on the system for which it was compiled. This makes the generated machine code non-portable to other systems . In contrast, an interpreter executes instructions directly and translates the source program at runtime, making interpreted languages portable across different systems and platforms since no precompiled machine code is generated .

During the Intermediate Code Generation phase, the compiler takes the output from the semantic analysis phase and generates an intermediate representation of the source code . This representation is designed to be independent of the machine architecture and is usually simpler to translate into target machine code . The intermediate code serves as a bridge between the high-level source code and machine code, ensuring that optimizations can be applied effectively . This phase aims to preserve the semantics of the original program while facilitating easy translation into machine instructions .

You might also like