0% found this document useful (0 votes)
72 views

Unit - 1 Compiler Design

The document discusses the structure and phases of a compiler. A compiler converts high-level code into assembly language in multiple phases including lexical analysis, syntax analysis, semantic analysis, code generation, and optimization. It also discusses interpreters, assemblers, linkers, loaders, cross-compilers, and basic parsing techniques like top-down and bottom-up parsing.

Uploaded by

Mandeep Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views

Unit - 1 Compiler Design

The document discusses the structure and phases of a compiler. A compiler converts high-level code into assembly language in multiple phases including lexical analysis, syntax analysis, semantic analysis, code generation, and optimization. It also discusses interpreters, assemblers, linkers, loaders, cross-compilers, and basic parsing techniques like top-down and bottom-up parsing.

Uploaded by

Mandeep Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Structure of Compiler

The high-level language is converted into binary language in various phases. A compiler is a
program that converts high-level language to assembly language.

Interpreter

An interpreter, like a compiler, translates high-level language into low-level machine language.
The difference lies in the way they read the source code or input. A compiler reads the whole
source code at once, creates tokens, checks semantics, generates intermediate code, executes
the whole program and may involve many passes. In contrast, an interpreter reads a statement
from the input, converts it to an intermediate code, executes it, then takes the next statement in
sequence. If an error occurs, an interpreter stops execution and reports it. whereas a compiler
reads the whole program even if it encounters several errors.

Assembler
An assembler translates assembly language programs into machine code.The output of an
assembler is called an object file, which contains a combination of machine instructions as
well as the data required to place these instructions in memory.

Linker

Linker is a computer program that links and merges various object files together in order to
make an executable file. All these files might have been compiled by separate assemblers. The
major task of a linker is to search and locate referenced modules/routines in a program and to
determine the memory location where these codes will be loaded, making the program
instruction to have absolute references.

Loader

Loader is a part of the operating system and is responsible for loading executable files into
memory and executing them. It calculates the size of a program (instructions and data) and
creates memory space for it. It initializes various registers to initiate execution.

Cross-compiler

A compiler that runs on platform (A) and is capable of generating executable code for platform
(B) is called a cross-compiler.

Translators

A compiler that takes the source code of one programming language and translates it into the
source code of another programming language is called a source-to-source compiler.

Phases of Compiler

The compilation process is a sequence of various phases. Each phase takes input from its
previous stage, has its own representation of the source program, and feeds its output to the
next phase of the compiler. Let us understand the phases of a compiler.

Lexical Analysis
The first phase of the scanner works as a text scanner. This phase scans the source code as a
stream of characters and converts it into meaningful lexemes. Lexical analyzer represents
these lexemes in the form of tokens as:

<token-name, attribute-value>

Syntax Analysis

The next phase is called the syntax analysis or parsing. It takes the token produced by lexical
analysis as input and generates a parse tree (or syntax tree). In this phase, token arrangements
are checked against the source code grammar, i.e. the parser checks if the expression made by
the tokens is syntactically correct.

Semantic Analysis

Semantic analysis checks whether the parse tree constructed follows the rules of language.
For example, assignment of values is between compatible data types, and adding string to an
integer. Also, the semantic analyzer keeps track of identifiers, their types and expressions;
whether identifiers are declared before use or not etc. The semantic analyzer produces an
annotated syntax tree as an output.

Intermediate Code Generation

After semantic analysis the compiler generates an intermediate code of the source code for
the target machine. It represents a program for some abstract machine. It is in between the
high-level language and the machine language. This intermediate code should be generated in
such a way that it makes it easier to be translated into the target machine code.

Code Optimization

The next phase does code optimization of the intermediate code. Optimization can be
assumed as something that removes unnecessary code lines, and arranges the sequence of
statements in order to speed up the program execution without wasting resources (CPU,
memory).

Code Generation

In this phase, the code generator takes the optimized representation of the intermediate code
and maps it to the target machine language. The code generator translates the intermediate
code into a sequence of (generally) relocatable machine code. Sequence of instructions of
machine code performs the task as the intermediate code would do.

Symbol Table

It is a data-structure maintained throughout all the phases of a compiler. All the identifier's
names along with their types are stored here. The symbol table makes it easier for the compiler
to quickly search the identifier record and retrieve it. The symbol table is also used for scope
management.

Basic Parsing Techniques


Parser is a compiler that is used to break the data into smaller elements coming from the lexical
analysis phase.

A parser takes input in the form of a sequence of tokens and produces output in the form of a
parse tree.

Parsing is of two types: top-down parsing and bottom up parsing.


Top down parsing

● The top down parsing is known as recursive parsing or predictive parsing.

● Bottom up parsing is used to construct a parse tree for an input string.

● In the top down parsing, the parsing starts from the start symbol and transforms it into
the input symbol.

Bottom up parsing

● Bottom up parsing is also known as shift-reduce parsing.

● Bottom up parsing is used to construct a parse tree for an input string.

● In the bottom up parsing, the parsing starts with the input symbol and constructs the
parse tree up to the start symbol by tracing out the rightmost derivations of the string in
reverse.

Operator precedence parsing

Operator precedence grammar is a kind of shift-reducing parsing method. It is applied to a small


class of operator grammars.

A grammar is said to be operator precedence grammar if it has two properties:

● No R.H.S. of any production has a∈.


● No two non-terminals are adjacent.

Operator precedence can only be established between the terminals of the grammar. It ignores
the non-terminal.

There are the three operator precedence relations:

a ⋗ b means that terminal "a" has the higher precedence than terminal "b".

a ⋖ b means that terminal "a" has the lower precedence than terminal "b".

a ≐ b means that the terminal "a" and "b" both have same precedence.

LR Parser

LR parsing is one type of bottom up parsing. It is used to parse the large class of grammars.

In the LR parsing, "L" stands for left-to-right scanning of the input.

"R" stands for constructing a right most derivation in reverse.

"K" is the number of input symbols of the look ahead used to make a number of parsing
decisions.

LR parsing is divided into four parts: LR (0) parsing, SLR parsing, CLR parsing and LALR parsing.
Canonical Collection of LR(0) items
An LR (0) item is a production G with a dot at some position on the right side of the production.

LR(0) items are useful to indicate how much of the input has been scanned up to a given point
in the process of parsing.

In the LR (0), we place the reduced node in the entire row.

LR(0) Table

● If a state is going to some other state on a terminal then it corresponds to a shift move.

● If a state is going to some other state on a variable then it corresponds to going to move.

● If a state contains the final item in the particular row then write the reduce node
completely.

Explanation:

● I0 on S is going to I1 so write it as 1.

● I0 on A is going to I2 so write it as 2.

● I2 on A is going to I5 so write it as 5.

● I3 on A is going to I6 so write it as 6.

● I0, I2and I3on a are going to I3 so write it as S3 which means that shift 3.

● I0, I2 and I3 on b are going to I4 so write it as S4 which means that shift 4.

● I4, I5 and I6 all states contains the final item because they contain • in the right most end.
So rate the production as production number.
SLR (1) Table Construction

The steps which use to construct SLR (1) Table is given below:

If a state (Ii) is going to some other state (Ij) on a terminal then it corresponds to a shift move in
the action part.

If a state (Ii) is going to some other state (Ij) on a variable then it corresponds to go to move in
the Go to part.

If a state (Ii) contains the final item like A → ab• which has no transitions to the next state then
the production is known as reduced production. For all terminals X in FOLLOW (A), write the
reduced entry along with their production numbers.

LALR (1) Parsing:

LALR refers to the lookahead LR. To construct the LALR (1) parsing table, we use the canonical
collection of LR (1) items.
In the LALR (1) parsing, the LR (1) items which have same productions but different look ahead
are combined to form a single set of items

LALR (1) parsing is the same as the CLR (1) parsing, only difference in the parsing table.

LALR (1) Parsing table:

UNIT - 2

Syntax directed translation

In syntax directed translation, along with the grammar we associate some informal notations
and these notations are called semantic rules.

So we can say that

1. Grammar + semantic rule = SDT (syntax directed translation)


● In syntax directed translation, every non-terminal can get one or more than one attribute
or sometimes 0 attribute depending on the type of the attribute. The value of these
attributes is evaluated by the semantic rules associated with the production rule.

● In the semantic rule, attribute is VAL and an attribute may hold anything like a string, a
number, a memory location and a complex record
● In Syntax directed translation, whenever a construct is encountered in the programming
language then it is translated according to the semantic rules defined in that particular
programming language.

Example

Production Semantic Rules

E→E+T E.val := E.val + T.val

E→T E.val := T.val

T→T*F T.val := T.val + F.val

T→F T.val := F.val

F → (F) F.val := F.val

F → num F.val := num.lexval

E.val is one of the attributes of E.

num.lexval is the attribute returned by the lexical analyzer.

Syntax directed translation scheme

● The Syntax directed translation scheme is a context -free grammar.


● The syntax directed translation scheme is used to evaluate the order of semantic rules.

● In the translation scheme, the semantic rules are embedded within the right side of the
productions.

● The position at which an action is to be executed is shown by enclosed between braces.


It is written within the right side of the production.

Example

Production Semantic Rules

S→E$ { printE.VAL }

E→E+E {E.VAL := E.VAL + E.VAL }

E→E*E {E.VAL := E.VAL * E.VAL }

E → (E) {E.VAL := E.VAL }

E→I {E.VAL := I.VAL }

I → I digit {I.VAL := 10 * I.VAL + LEXVAL }

I → digit { I.VAL:= LEXVAL}

Implementation of Syntax directed translation


Syntax directed translation is implemented by constructing a parse tree and performing the
actions in a left to right depth first order.

SDT is implemented by parsing the input and producing a parse tree as a result.

Example

Production Semantic Rules

S→E$ { printE.VAL }

E→E+E {E.VAL := E.VAL + E.VAL }

E→E*E {E.VAL := E.VAL * E.VAL }

E → (E) {E.VAL := E.VAL }

E→I {E.VAL := I.VAL }

I → I digit {I.VAL := 10 * I.VAL + LEXVAL }

I → digit { I.VAL:= LEXVAL}

Parse tree for SDT:


Intermediate code

Intermediate code is used to translate the source code into the machine code. Intermediate
code lies between the high-level language and the machine language.

Fig: Position of intermediate code generator

● If the compiler directly translates source code into the machine code without generating
intermediate code then a full native compiler is required for each new machine.
● The intermediate code keeps the analysis portion the same for all the compilers that's
why it doesn't need a full compiler for every unique machine.

● Intermediate code generator receives input from its predecessor phase and semantic
analyzer phase. It takes input in the form of an annotated syntax tree.

● Using the intermediate code, the second phase of the compiler synthesis phase is
changed according to the target machine.

Intermediate representation

Intermediate code can be represented in two ways:

1. High Level intermediate code:

High level intermediate code can be represented as source code. To enhance performance of
source code, we can easily apply code modification. But to optimize the target machine, it is
less preferred.

2. Low Level intermediate code

Low level intermediate code is close to the target machine, which makes it suitable for register
and memory allocation etc. it is used for machine-dependent optimizations.

Postfix Notation

● Postfix notation is the useful form of intermediate code if the given language is
expressions.

● Postfix notation is also called 'suffix notation' and 'reverse polish'.

● Postfix notation is a linear representation of a syntax tree.

● In the postfix notation, any expression can be written unambiguously without


parentheses.

● The ordinary (infix) way of writing the sum of x and y is with operator in the middle: x * y.
But in the postfix notation, we place the operator at the right end as xy *.
● In postfix notation, the operator follows the operand.

Example

Production

1. E → E1 op E2
2. E → (E1)
3. E → id

Semantic Rule Program fragment

E.code = E1.code || E2.code || op print op

E.code = E1.code

E.code = id print id

Parse tree and Syntax tree

When you create a parse tree then it contains more details than actually needed. So, it is very
difficult for a compiler to parse the parse tree. Take the following parse tree as an example:
● In the parse tree, most of the leaf nodes are single child to their parent nodes.

● In the syntax tree, we can eliminate this extra information.

● Syntax tree is a variant of parse tree. In the syntax tree, interior nodes are operators and
leaves are operands.

● Syntax tree is usually used when representing a program in a tree structure.

A sentence id + id * id would have the following syntax tree:

Abstract syntax tree can be represented as:


Abstract syntax trees are important data structures in a compiler. It contains the least
unnecessary information.

Abstract syntax trees are more compact than a parse tree and can be easily used by a compiler.

Three address code

● Three-address code is an intermediate code. It is used by optimizing compilers.

● In three-address code, the given expression is broken down into several separate
instructions. These instructions can easily translate into assembly language.

● Each Three address code instruction has at most three operands. It is a combination of
assignment and a binary operator.

Example

GivenExpression:

1. a := (-c * b) + (-c * d)

Three-address code is as follows:

t1 := -c

t2 := b*t1

t3 := -c
t4 := d * t3

t5 := t2 + t4

a := t5

t is used as registers in the target program.

The three address codes can be represented in two forms: quadruples and triples.

Quadruples

The quadruples have four fields to implement the three address code. The field of quadruples
contains the name of the operator, the first source operand, the second source operand and the
result respectively.

Fig: Quadruples field

Example

1. a := -b * c + d

Three-address code is as follows:

t1 := -b

t2 := c + d
t3 := t1 * t2

a := t3

These statements are represented by quadruples as follows:

10.6M

177

Triggers in SQL (Hindi)

Operator Source 1 Source 2 Destination

(0) uminus b - t1

(1) + c d t2

(2) * t1 t2 t3

(3) := t3 - a

Triples

The triples have three fields to implement the three address codes. The field of triples contains
the name of the operator, the first source operand and the second source operand.

In triples, the results of respective sub-expressions are denoted by the position of expression.
Triple is equivalent to DAG while representing expressions.
Fig: Triples field

Example:

1. a := -b * c + d

Three address code is as follows:

OOPs Concepts in Java

t1 := -b t2 := c + dM t3 := t1 * t2 a := t3

These statements are represented by triples as follows:

Operator Source 1 Source 2

(0) uminus b -

(1) + c d

(2) * (0) (1)

(3) := (2) -
Translation of Assignment Statements

In the syntax directed translation, the assignment statement mainly deals with expressions. The
expression can be of type real, integer, array and records.

Consider the grammar

1. S → id := E
2. E → E1 + E2
3. E → E1 * E2
4. E → (E1)
5. E → id

The translation scheme of above grammar is given below:

Production rule Semantic actions

S → id :=E {p = look_up(id.name);

If p ≠ nil then

Emit (p = E.place)

Else

Error;

}
E → E1 + E2 {E.place = newtemp();

Emit (E.place = E1.place '+' E2.place)

E → E1 * E2 {E.place = newtemp();

Emit (E.place = E1.place '*' E2.place)

E → (E1) {E.place = E1.place}

E → id {p = look_up(id.name);

If p ≠ nil then

Emit (p = E.place)

Else

Error;

● The p returns the entry for id.name in the symbol table.

● The Emit function is used for appending the three address codes to the output file.
Otherwise it will report an error.

● The newtemp() is a function used to generate new temporary variables.

● E.place holds the value of E.


Boolean expressions

Boolean expressions have two primary purposes. They are used for computing the logical
values. They are also used as conditional expressions using if-then-else or while-do.

Consider the grammar

1. E → E OR E
2. E → E AND E
3. E → NOT E
4. E → (E)
5. E → id relop id
6. E → TRUE
7. E → FALSE

The relop is denoted by <, >, <, >.

The AND and OR are left associated. NOT has a higher precedence than AND and lastly OR.

Production rule Semantic actions

E → E1 OR E2 {E.place = newtemp();

Emit (E.place ':=' E1.place 'OR' E2.place)

E → E1 + E2 {E.place = newtemp();

Emit (E.place ':=' E1.place 'AND' E2.place)

}
E → NOT E1 {E.place = newtemp();

Emit (E.place ':=' 'NOT' E1.place)

E → (E1) {E.place = E1.place}

E → id relop id2 {E.place = newtemp();

Emit ('if' id1.place relop.op id2.place 'goto'

nextstar + 3);

EMIT (E.place ':=' '0')

EMIT ('goto' nextstat + 2)

EMIT (E.place ':=' '1')

E → TRUE {E.place := newtemp();

Emit (E.place ':=' '1')

E → FALSE {E.place := newtemp();

Emit (E.place ':=' '0')


}

The EMIT function is used to generate the three address codes and the newtemp( ) function is
used to generate the temporary variables.

The E → id relop id2 contains the next_state and it gives the index of next three address
statements in the output sequence.

Here is the example which generates the three address code using the above translation
scheme:

1. p>q AND r<s OR u>r


2. 100: if p>q goto 103
3. 101: t1:=0
4. 102: goto 104
5. 103: t1:=1
6. 104: if r>s goto 107
7. 105: t2:=0
8. 106: goto 108
9. 107: t2:=1
10. 108: if u>v goto 111
11. 109: t3:=0
12. 110: goto 112
13. 111: t3:= 1
14. 112: t4:= t1 AND t2
15. 113: t5:= t4 OR t3

Statements that alter the flow of control

The goto statement alters the flow of control. If we implement goto statements then we need to
define a LABEL for a statement. A production can be added for this purpose:

1. S → LABEL : S
2. LABEL → id
In this production system, semantic action is attached to record the LABEL and its value in the
symbol table.

Following grammar used to incorporate structure flow-of-control constructs:

1. S → if E then S
2. S → if E then S else S
3. S → while E do S
4. S → begin L end
5. S→ A
6. L→ L ; S
7. L→ S

Here, S is a statement, L is a statement-list, A is an assignment statement and E is a


Boolean-valued expression.

Symbol Table

Symbol table is an important data structure used in a compiler.

Symbol table is used to store the information about the occurrence of various entities such as
objects, classes, variable name, interface, function name etc. it is used by both the analysis and
synthesis phases.

The symbol table used for following purposes:

● It is used to store the name of all entities in a structured form at one place.

● It is used to verify if a variable has been declared.

● It is used to determine the scope of a name.

● It is used to implement type checking by verifying assignments and expressions in the


source code are semantically correct.

A symbol table can either be linear or a hash table. Using the following format, it maintains the
entry for each name.
Data structure for symbol table

● A compiler contains two types of symbol table: global symbol table and scope symbol
table.

● Global symbol tables can be accessed by all the procedures and scope symbol tables.

Data structure hierarchy of the symbol table is stored in the semantic analyzer. If you want to
search the name in the symbol table then you can search it using the following algorithm:

● First a symbol is searched in the current symbol table.

● If the name is found then search is completed else the name will be searched in the
symbol table of parent until,

● The name is found or a global symbol is searched.

Representing Scope Information

In the source program, every name possesses a region of validity, called the scope of that name.

The rules in a block-structured language are as follows:

1. If a name is declared within block B then it will be valid only within B.

2. If B1 block is nested within B2 then the name that is valid for block B2 is also valid for B1
unless the name's identifier is re-declared in B1.

● These scope rules need a more complicated organization of symbol tables than a list of
associations between names and attributes.

● Tables are organized into a stack and each table contains the list of names and their
associated attributes.

● Whenever a new block is entered then a new table is entered onto the stack. The new
table holds the name that is declared as local to this block.

● When the declaration is compiled then the table is searched for a name.

● If the name is not found in the table then the new name is inserted.
● When the name's reference is translated then each table is searched, starting from each
table on the stack.

UNIT - 3

Run time Storage administration

● When the target program executes then it runs in its own logical address space in which
the value of each program has a location.

● The logical address space is shared among the compiler, operating system and target
machine for management and organization. The operating system is used to map the
logical address into a physical address which is usually spread throughout the memory.

Subdivision of Run-time Memory:

● Runtime storage comes into blocks, where a byte is used to show the smallest unit of
addressable memory. Using the four bytes a machine word can form. Object of multibyte
is stored in consecutive bytes and gives the first byte address.

● Run-time storage can be subdivide to hold the different components of an executing


program:

1. Generated executable code


2. Static data objects

3. Dynamic data-object- heap

4. Automatic data objects- stack

Storage Allocation

The different ways to allocate memory are:

1. Static storage allocation

2. Stack storage allocation

3. Heap storage allocation

Static storage allocation

● In static allocation, names are bound to storage locations.

● If memory is created at compile time then the memory will be created in a static area and
only once.

● Static allocation supports the dynamic data structure that means memory is created only
at compile time and deallocated after program completion.

● The drawback with static storage allocation is that the size and position of data objects
should be known at compile time.

● Another drawback is restriction of the recursion procedure.

Stack Storage Allocation

● In static storage allocation, storage is organized as a stack.

● An activation record is pushed into the stack when activation begins and it is popped
when the activation ends.

● Activation record contains the locals so that they are bound to fresh storage in each
activation record. The value of locals is deleted when the activation ends.
● It works on the basis of last-in-first-out (LIFO) and this allocation supports the recursion
process.

Heap Storage Allocation

● Heap allocation is the most flexible allocation scheme.

● Allocation and deallocation of memory can be done at any time and at any place
depending upon the user's requirement.

● Heap allocation is used to allocate memory to the variables dynamically and when the
variables are no more used then claim it back.

● Heap storage allocation supports the recursion process.

Lexical Error

During the lexical analysis phase this type of error can be detected.

Lexical error is a sequence of characters that does not match the pattern of any token. Lexical
phase error is found during the execution of the program.

Lexical phase error can be:

● Spelling error.

● Exceeding length of identifier or numeric constants.

● Appearance of illegal characters.

● To remove the character that should be present.

● To replace a character with an incorrect character.

● Transposition of two characters.

Syntax Error

During the syntax analysis phase, this type of error appears. Syntax error is found during the
execution of the program.
Some syntax error can be:

● Error in structure

● Missing operators

● Unbalanced parenthesis

When an invalid calculation enters into a calculator then a syntax error can also occur. This can
be caused by entering several decimal points in one number or by opening brackets without
closing them.

Semantic Error

During the semantic analysis phase, this type of error appears. These types of errors are
detected at compile time.

Most of the compile time errors are scope and declaration errors. For example: undeclared or
multiple declared identifiers. Type mismatched is another compile time error.

The semantic error can arise using the wrong variable or using the wrong operator or doing
operation in the wrong order.

Some semantic error can be:

● Incompatible types of operands

● Undeclared variable

● Not matching of actual argument with formal argument

UNIT - 4

Code Optimisation and principal source

Optimization is a program transformation technique, which tries to improve the code by making
it consume less resources (i.e. CPU, Memory) and deliver high speed.
In optimization, high-level general programming constructs are replaced by very efficient
low-level programming codes. A code optimizing process must follow the three rules given
below:

● The output code must not, in any way, change the meaning of the program.
● Optimization should increase the speed of the program and if possible, the program
should demand less resources.
● Optimization should itself be fast and should not delay the overall compiling process.

Efforts for an optimized code can be made at various levels of compiling the process.

● At the beginning, users can change/rearrange the code or use better algorithms to write
the code.
● After generating intermediate code, the compiler can modify the intermediate code by
address calculations and improving loops.
● While producing the target machine code, the compiler can make use of memory
hierarchy and CPU registers.

Optimization can be categorized broadly into two types : machine independent and machine
dependent.

DAG representation for basic blocks

A DAG for basic block is a directed acyclic graph with the following labels on nodes:

1. The leaves of the graph are labeled by a unique identifier and that identifier can be
variable names or constants.

2. Interior nodes of the graph are labeled by an operator symbol.

3. Nodes are also given a sequence of identifiers for labels to store the computed value.

● DAGs are a type of data structure. It is used to implement transformations on basic


blocks.

● DAG provides a good way to determine the common sub-expression.

● It gives a picture representation of how the value computed by the statement is used in
subsequent statements.
Global data flow analysis

● To efficiently optimize the code, the compiler collects all the information about the
program and distributes this information to each block of the flow graph. This process is
known as data-flow graph analysis.

● Certain optimization can only be achieved by examining the entire program. It can't be
achieved by examining just a portion of the program.

● For this kind of optimization user defined chaining is one particular problem.

● Here using the value of the variable, we try to find out which definition of a variable is
applicable in a statement.

Data flow analysis is used to discover this kind of property. The data flow analysis can be
performed on the program's control flow graph (CFG).

The control flow graph of a program is used to determine those parts of a program to which a
particular value assigned to a variable might propagate.

Code Generator

Code generator is used to produce the target code for three-address statements. It uses
registers to store the operands of the three address statements.

Design Issues

In the code generation phase, various issues can arises:

1. Input to the code generator

2. Target program

3. Memory management

4. Instruction selection

5. Register allocation

6. Evaluation order
1. Input to the code generator

● The input to the code generator contains the intermediate representation of the source
program and the information of the symbol table. The source program is produced by the
front end.

● Intermediate representation has the several choices:


a) Postfix notation
b) Syntax tree
c) Three address code

● We assume the front end produces low-level intermediate representation i.e. values of
names in it can be directly manipulated by the machine instructions.

● The code generation phase needs complete error-free intermediate code as an input
requires.

2. Target program:

The target program is the output of the code generator. The output can be:

a) Assembly language: It allows subprograms to be separately compiled.

b) Relocatable machine language: It makes the process of code generation easier.

Hello Java Program for Beginners

c) Absolute machine language: It can be placed in a fixed location in memory and can be
executed immediately.

3. Memory management

● During the code generation process the symbol table entries have to be mapped to actual
ip addresses and levels have to be mapped to instruction addresses.

● Mapping names in the source program to address data is co-operating done by the front
end and code generator.
● Local variables are stack allocation in the activation record while global variables are in
static areas.

4. Instruction selection:

● Nature of the instruction set of the target machine should be complete and uniform.

● When you consider the efficiency of the target machine then the instruction speed and
machine idioms are important factors.

● The quality of the generated code can be determined by its speed and size.

5. Register allocation

Register can be accessed faster than memory. The instructions involving operands in register
are shorter and faster than those involving memory operands.

The following sub problems arise when we use registers:

Register allocation: In register allocation, we select the set of variables that will reside in the
register.

Register assignment: In Register assignment, we pick the register that contains variables.

Certain machines require even-odd pairs of registers for some operands and results.

Even register is used to hold the reminder.

Old register is used to hold the quotient.

6. Evaluation order

The efficiency of the target code can be affected by the order in which the computations are
performed. Some computation orders need fewer registers to hold intermediate results than
others.
Peephole Optimization

This optimization technique works locally on the source code to transform it into an optimized
code. By locally, we mean a small portion of the code block at hand. These methods can be
applied on intermediate codes as well as on target codes. A bunch of statements is analyzed
and are checked for the following possible optimization:

● Redundant instruction elimination

● Unreachable code

● Flow of control optimization

● Algebraic expression simplification

● Strength reduction

● Accessing machine instructions

You might also like