0% found this document useful (0 votes)
13 views

Compiler Design

Uploaded by

rrrroptv
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Compiler Design

Uploaded by

rrrroptv
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Compiler Design

Introduction

Falguni Sinhababu
Government College of Engineering and Leather Technology

1 1
Books
 Compliers: Principles, Techniques and Tools by Aho,
Lam, Sethi, Ullman Dragon Book
 Engineering a Complier : Cooper and Torczon
 The Essence of Compilers by Hunter (Prentice-
Hall)
 Modern Compiler Design by Grune (Wiley)

2
Definitions
 What is a compiler?
 A program that accepts as input a program text in a
certain language and produces as output a program
text in another language, while preserving the meaning
of that text (Grune et al, 2000).
 A program that reads a program written in one
language (source language) and translates it into an
equivalent program in another language (target
language) (Aho et al)
3 3
General Structure of a Compiler

Source code Object code


Compiler

Error message

The compiler:-
 must generate correct code.
 must recognise errors.
 analyses and synthesises.

4
Definitions
 Interpreter
 A computer program that translates an instruction into machine language and
executes it before going to the next instruction.
 Cross-complier
 A cross-complier is a compiler that runs on one machine and produces object
code for another machine. If a compiler has been implemented in its own language,
then this arrangement is called bootstrap arrangement.
 Assembler
 An assembler translates an assembly language source code to executable or
almost executable object code.
 Macro assembler
 A macro assembler is a type of assembler that supports the use of macros, which are blocks of
code that can be defined and then invoked multiple times within a program, potentially saving time.

5
Qualities of a good compiler
 Generates correct code (first and foremost).
 Generates fast code.
 Conforms to the specification of the input language.
 Copes with essentially arbitrary input size, variables, etc.
 Compilation time (linearly) proportional to size of source.
 Good diagnostics.
 Consistent optimization.
 Works well with the debugger.

6
Principles of compilation
 Preserve the meaning of the program being compiled.
 Improves the source code in some way.
 Speed (of compiled code).
 Space (size of compiled code).
 Feedback (information provided to the user).
 Debugging (transformation observe the relationship source
code vs target).
 Compilation time efficiency (fast or slow compiler).

7
Language Processing System

8 2024/8/3
Phases of a Compiler

9
Compilers and Interpreters
• Compilers generate machine code, whereas Interpreters interpret
intermediate code
• Interpreters are easier to write and can provide better error messages
(symbol table still available)
• Interpreters are at least 5 times slower than machine code generated by
compilers
• Interpreters also require much more memory than machine code
generated by compilers
• Examples: Java, Scala, C#, C, C++ use Compilers. Perl, Ruby, PHP use Interpreters.

10
Translation Overview – Lexical Analysis

11
Lexical Analysis (Scanning)
 Reads characters in the source program and group them into stream of tokens (basic unit of
syntax).
 Each token represents a logical cohesive sequence of characters, such as identifiers,
operators and keywords.
 The character sequence that forms a token is called a lexeme.
 The output is called a token and is a pair of the form <type, lexeme> or <token_class,
attribute>
 a = b + c becomes
 <id, a> <=> <id, b> <id, c>
 position = initial + rate * 60 becomes
 <id, 1> <=> <id, 2> < +> <id, 3> <*> <60>
 And each id attribute is used to record in the symbol table
 Lexical analysis eliminates white space
 FLEX or LEX is used for generating scanners; programs which recognize lexical patterns in
text.

12 12
Token Types
 Keywords: if, else, int, char, do, while, for, struct, return etc
 Constants: often these are numbers, strings or characters
 Numbers are types of numbers
 Strings are text items the language can recognize
 In C or C++ → “This is string”
 Characters are single letters
 In C or C++ →‘C’
 Identifiers: names the programmer has given to something. These include variables, functions,
classes, enumerates etc. each language has rules for specifying how these names can be
written.
 Operators: these are the mathematical, logical and other operators that the language can
recognize
 +, -, *, /, % (modulo), -- (decrement), ++ (increment) etc
 Other tokens: {()} may be valid in the language but not treated as a keyword or operator

13
Attributes of Tokens
 The lexical analyser returns to the parser a representation for the token it has found. The
representation is a integer code if the token is a simple construct such a left parenthesis, comma or
colon. The representation is a pair of an integer code and a pointer to a table if the token is a more
complex element such as an identifier or constant. The integer type gives the token type and the
pointer points to the value of that token.
 The token names and the associated attribute values for the FORTRAN statement
 E = M * C ** 2 are written below as a sequence of pairs
 <id, pointer to symbol table entry for E>
 <assign_op>
 <id, pointer to symbol table entry for M>
 <multi_op>
 <id, pointer to symbol table entry for C>
 <exp_op>
 <number, integer value 2>
 Special operators, punctations and keywords, there is no need for an attribute value. In this example,
the token number has been given an integer valued attribute.

14
Lexical Analysis
• LA can be generated automatically from regular Expressions specification
• LEX and Flex are two such tools

• LA is a deterministic finite state automaton


• Why is LA separated from parsing?
• Simplification of design – software Engineering reason
• I/O issues are limited LA alone
• LA based on finite automaton are more efficient to implement than pushdown automata
used for parsing (due to stack)

15
Translation Overview – Syntax Analysis

16
Parsing or Syntax Analysis
• Syntax Analyzers ( Parsers) can be generated automatically from several
variants of context free grammar specifications
• LL(1) or LALR(1) are the most popular ones
• ANTLR (for LL(1)), YACC and Bison (for LALR(1)) are such tools

• Parsers are deterministic push-down automata


• Parsers cannot handle context sensitive features of programming language
• Variables are declare before use
• Types match on both sides of assignments
• Parameter types and number match in declaration and use

17
Translation Overview – Semantic Analysis

18
Semantic Analysis
• Semantic consistency that cannot be handled at the parsing stage is handled
here
• Type checking of various language constructs is one of the most important tasks
• Stores type information in the symbol table or the syntax tree
• Types of variables, function parameters, array dimensations, etc.
• Used not only for semantic validation but also for subsequent phases of compilation

• Static semantics of programming language can be specified using


attribute grammar

19
Translation overview – Intermediate Code Generation

20
Intermediate Code Generation
• While generation machine code directly from source code , it entails two
problems
• With m languages and n target machines, we need to write mxn compilers
• The code optimizer which is one of the largest and very-difficult-to-write component of
any compiler cannot be reused

• By converting source code to an intermediate code, a machine independent code


optimizer may be used
• Intermediate code must be easy to produce and easy to translate to machine
code
• A sort of universal language.
• Should not contain a machine specific parameter (registers, address, etc)

21
Different types of Intermediate Code
• Types of Intermediate code deployed is based on the application
• Quadruples, triples, indirect triples, abstract syntax trees are the classical forms
used for machine-independent optimizations
• Static Single Assgnment form(SSA) is a recent form and enables more effective
optimizations
• Conditional constant propagation and global values numbering are more effective on
SSA

• Program Dependence Graph is useful in automatic parallelization,


instruction scheduling and software pipelining

22
Translation Overview – Code Optimization

23
Machine Independent Code Optimization
• Intermediate code generation process introduces many inefficiencies.
• Extra copies of variables, using variables instead of constants, repeated evolution
of expressions, etc.

• Code optimization removes such inefficiencies and improves code.


• Improvement may be time, space and power consumption.
• It changes the structure of programs, sometimes of beyond recognition.
• Inline functions, unroll loops, eliminates some programmer defined variables, etc.

• Code optimization consists of a bunch of heuristics and percentage of


improvement depends on programs (may be zero also).

24
Examples of Machine Independent Code Optimization
• Common sub-expression elimination
• Copy propagation
• Loop invariant code motion
• Partial redundant elimination
• Induction variable elimination and strength reduction
• Code optimization needs information about the program
• Which expressions are being recomputed in a function?
• Which definitions reach a point?

• All such information is gathered through data-flow analysis

25 25
Translation Overview – Code Generation

26
Code Generation
 Converts intermediate code to machine code
 Each intermediate code instruction may result in many machine instructions or
vice-versa
 Must handle all aspects of machine architecture
 Registers, pipelining, cache, multiple function units, etc.
 Generating efficient code is a NP complete problem
 Tree pattern matching-based strategies are the best and most common
 Needs tree intermediate code
 Storage allocation decisions are made here
 Register allocation and assignment are the most important problems

27
Machine-Dependent Optimization
 Peephole optimization
 Analyze sequence of instructions in a small `window (peephole) and using preset
patterns, replace them with a more efficient sequence
 Redundant instruction elimination
 E.g. replace the sequence [LD A, R1][ST R1, A] by [LD A, R1]
 Eliminate “jump to jump” instructions
 Use machine idioms (use INC instead of LD and ADD)
 Instruction scheduling (reordering) to eliminate pipeline interlocks and to increase
parallelism
 Trace scheduling to increase the size of basic blocks and increase parallelism
 Software pipelining to increase parallelism in loops

28
Thank You

29

You might also like