0% found this document useful (0 votes)
2 views

CSC 304

The document provides an overview of program translators, specifically focusing on compilers, interpreters, and assemblers. It explains the roles and functionalities of these tools in translating programming languages, including their advantages and disadvantages. Additionally, it details the structure of compilers, their phases, and types, emphasizing the importance of compiler correctness and optimization.

Uploaded by

osarienjoseph
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

CSC 304

The document provides an overview of program translators, specifically focusing on compilers, interpreters, and assemblers. It explains the roles and functionalities of these tools in translating programming languages, including their advantages and disadvantages. Additionally, it details the structure of compilers, their phases, and types, emphasizing the importance of compiler correctness and optimization.

Uploaded by

osarienjoseph
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

CSC 304: COMPILER CONSTRUCTION

PROGRAM TRANSLATORS

A program translator is a computer program that performs the translation of a program written in a given
programming language into a functionality equivalent program in a different computer language, without losing
the functional or logical structure of the original code (“the essence” of each program).

These include translations between higher level and human-readable computer languages such as C++, Java and
COBOL, intermediate-level language such as Java bytecode, low-level languages such as the assembly language
and machine code, and between similar levels of language on different computing platforms, as well as from any
of these to any other of these.

They also include translators between software implementations and hardware/ASIC microchip implementations
of the same program, and from software descriptions of a microchip to the logic gates needed to build it.

COMPILERS

A compiler is a computer program (or set of programs) that transforms source code written in a programming
language (the source language) into another computer language (the target language, often having a binary form
known as object code).

The most common reason for converting a source code is to create an executable program.

The name “compiler” is primarily used for programs that translate source code from a higher-level programming
language to a lower-level language (e.g., assembly language or machine code).

If the compiled program can run on a computer whose CPU or operating system is different from the one on which
the compiler runs, the compiler is known as a cross-compiler. More generally, compilers are a specific type of
translators.

A program that translates from a low-level language to a higher level one is a decompiler.

A program that translates between high-level languages is usually called a source-to-source compiler or transpiler.

A language rewriter is usually a program that translates the form of expressions without a change of language.

The term compiler-compiler is sometimes used to refer to a parser generator, a tool often used to help create the
lexer and parser.

A compiler is likely to perform many or all of the following operations:

1. Lexical analysis
2. Preprocessing
3. Parsing
4. Semantic analysis (syntax-directed translation),
5. Code generation, and code optimization.
Program faults caused by incorrect compiler behavior can be very difficult to track down and work around;
therefore, compiler implementors invest significant effort to ensure compiler correctness.

Compilers enabled the development of programs that are machine-independent.

Before the development of FORTRAN, the first higher-level language, in the 1950’s, machine-dependent assembly
language was widely used.

While assembly language produces more abstraction than machine code on the same architecture, just as with
machine code, it has to be modified or rewritten if the program is to be executed on different computer hardware
architecture.

With the advent of high-level programming languages that followed FORTRAN, such as COBOL, C and BASIC,
programmers could write machine-independent source programs. A compiler translates the high-level source
programs into target programs in machine languages for the specific hardware. Once the target program is
generated, the user can execute the program.

INTERPRETERS

In computer science, an interpreter is a computer program that directly executes, i.e., performs, instructions
written in a programming or scripting language, without previously compiling them into a machine language
program.

An interpreter is a program that reads in as input a source program, along with data for the program, and
translates the source program instruction by instruction.

EXAMPLE

 The java interpreter javac translate a. clas file into code that can be executed natively on the underlying
machine.
 The program VirtualPC interprets programs written for the intel Pentium architecture (IBM-PC clone) for
the PowerPC architecture (Macintosh). This enable Macintosh users to run windows program on their
computer.

An interpreter generally uses one of the following strategies for program execution:

1. Parse the source code and perform its behavior directly.


2. Translate source code into some efficient intermediate representation and immediately execute this.
3. Explicitly execute stored precompiled code made by a compiler which is part of the interpreter systems.

APPLICATIONS

1. Interpreters are frequently used to execute command language, and glue languages since each operator
executed in command language is usually an invocation of a complex routine such as an editor or
compiler.
2. Self-modifying code can easily be implemented in an interpreter language. This relates to the origins of
interpretation. In Lisp and artificial intelligence research.
3. Virtualization. Machine code intended for one hardware architecture can be run on another using a virtual
machine, which is essentially an interpreter.
4. Sandboxing: an interpreter or virtual machine is not compelled to actually execute all the instructions the
source code it is processing. In particular, it can refuse to execute code that violates any security
constraints it is operating under.

ADVANTAGES OF INTERPRETER

1. Easier to debug (check errors) than a compiler


2. Easier to create multi-platform code, as each different platform would have an interpreter to run the same
code.
3. Useful for prototyping software and testing basic program logic.

DISADVANTAGES OF INTERPRETER

1. Source code is required for the program to be executed and this source code can be read making it
insecure.
2. Interpreters are generally slower than compiled programs due to the per-line translation method.

ASSEMBLERS

An assembler translates assembly language into machine code. An assembler is a program that creates object
code by translating combinations of mnemonics and syntax for operations and addressing modes into their
numerical equivalents.

Assembly language

 It consists of mnemonics for machine opcodes so assemblers perform a 1:1 translation from mnemonic to
a direct instruction.
 An assembly language (or assembler language) is a low-level programming language for a computer, or
other programmable device, in which there is a very strong (generally one-to-one) correspondence
between the language and the architecture’s machine code instructions.
 Each assembly language is specific to a particular computer architecture, in contrast to most high-level
programming language, which are generally portable across multiple architectures, but require
interpreting or compiling.
 Assembly language is converted into executable machine code by a utility program referred to as an
assembler; the conversion process is referred to as assembly, or assembling the code.

For example:

LDA #4 converts to 0001001000100100

Conversely, one instruction in a high level language will translate to one or more instructions at machine level.

TYPES OF ASSEMBLERS

There are two types of assemblers based on how many passes through the source are needed to produce the
executable program.

1. One-pass assemblers go through the source code once. An y symbol used before it is defined will require
“errata” at the end of the object code (or, at least, no earlier than the point where the symbol is defined) telling
the linker or the loader to “go back” and overwrite a placeholder which had been left where there are yet
undefined symbol was used.

2. Multi-pass assemblers create a table with all symbols and their values in the first passes; then use the table in
later passes to generate code.

In both cases, the assembler must be able to determine the size of each instruction on the initial passes in order to
calculate the addresses of subsequent symbols.

This means that if the size of an operation referring to an operand defined later depends on the type or distance
of the operand, the assembler will make a pessimistic estimate when first encountering the operation, and if
necessary pad it with one or more “no operation” instructions in a later pass or the errata. In an assembler with
peephole optimization, addresses may be recalculated between passes to allow replacing pessimistic code with
code tailored to the exact distance from the target.

The original reason for the use of one-pass assemblers was speed of assembly – often a second pass would require
rewinding and rereading a tape or rereading a deck of cards.

With modern computers this has ceased to be an issue. The advantage of the multi-pass assembler is that the
absence of errata makes the linking process (or the program load if the assembler directly produces executable
code) faster.

APPLICATIONS OF ASSEMBLERS

1. Assembly language is typically used in a system’s boot code, the low-level code that initializes and tests
the system hardware prior to booting the operating system and is often store in ROM. (BIOS on IBM-
Compatible PC systems and CP/M is an example.)
2. Some compilers translate high-level languages into assembly first before fully compiling, allowing the
assembly code to be viewed for debugging and optimization purposes.
3. Relatively low-level languages, such as C, allow the programmer to embed assembly language directly in
the source code. Programs using such facilities, such as the Linux kernel, can then construct abstractions
using different assembly language on each hardware platform. The system’s portable code can then use
these processor-specific components through a uniform interface.
4. Assembly language is useful in reverse engineering. Many programs are distributed only in machine code
form which is straightforward to translate into assembly language, but more difficult to translate into a
higher-level language. Tools such as the interactive Disassembler make extensive use of disassembly for
such a purpose.
5. Assemblers can be used to generate blocks of data, with no higher-level language overhead, from
formatted and commented source code, to be used by other code.

ADVANTAGES OF ASSEMBLER

1. Very fast in translating assembly language to machine code as 1 to 1 relationship.


2. Assembly code is often very efficient (and therefore fast) because it is a low level language.
3. Assembly code is fairly easy to understand due to the use of English like mnemonics.
DISADVANTAGES OF ASSEMBLERS

1. Assembly language is written for a certain instruction set and/ or processor.


2. Assembly tends to be optimized for the hardware its designed for meaning it is often incompatible with
different hardware.
3. Lots of assembly code is needed to do relatively simple tasks, and complex programs require lots of
programming time.

STRUCTURE OF COMPILER
Compilers bridge source programs in high-level languages with the underlying hardware. A compiler verifies code
syntax, generates efficient object code, performs run-time organization, and formats the output according to
assembler and linker conventions.

A compiler consist of:

1. THE FRONT END


 It verifies syntax and semantics, and generates an intermediate representation or IR of the source
code for processing by the middle-end.
 Performs type checking by collecting type information.
 Generates errors and warning, if any, in a useful way.
 Aspects of the front end include lexical analysis, syntax analysis, and semantic analysis.
 The compiler frontend analyzes the source code to build an internal representation of the
program, called the intermediate representation or IR.
 It also manages the symbol table, a data structure mapping each symbol in the source code to
associated information such as location, type and scope.
 While the frontend can be a single monolithic function or program, as in a scannerless parser, it is
more commonly implemented and analyzed as several phases, which may execute sequentially or
concurrently.
 In some cases additional phases are used, notably line reconstruction and preprocessing, but
these rare.

PHASES OF COMPILATION

1. Line reconstruction:
 Languages which strop their keywords or allow arbitrary spaces within identifiers require a phase
before parsing, which converts the input character sequence to a canonical form ready for the
parser.
 The top-down, recursive-descent, table driven parsers used in the 1960s typically read the source
one character a separate tokenizing phase.
 Atlas Autocode, and Imp (and some implementations of ALGOL and Coral 66) are examples of
stropped languages which compilers would have Line Reconstruction phase.
2. Lexical analysis
 It breaks the source code text into small pieces called tokens. Each token is a single atomic unit of
the language, for instance a keyword, identifier or symbol name.
 The token syntax is typically a regular language, so a finite state automation constructed from a
regular expression can be used to recognize it.
 This phase is also called lexing or scanning, and the software doing lexical analysis is called a lexical
analyzer or scanner.
 This may not be a separate step – it can be combined with the parsing step in scannerless parsing,
in which case parsing is done at the character level, not the token level.
3. Preprocessing
 Some languages, e.g., C, require a preprocessing phase which supports macro substitution and
conditional compilation. Typically the preprocessing phase occurs before syntactic or semantic
analysis: e.g. in the case of C, the preprocessor manipulates lexical tokens rather than syntactic
forms. However, some languages such as scheme support macro substitutions based on syntactic
forms.
4. Syntax analysis
 It involves parsing the token sequence to identify the syntactic structure of the program.
 This phase typically builds a pare tree, which replaces the linear sequence of tokens with a tree
structure built according to the rules of a formal grammar which define the language’s syntax.
 The parse tree is often analyzed, augmented, and transformed by later phases in the compiler.
5. Semantic analysis
 It is the phase in which the compiler adds semantic information to the parse tree and builds the
symbol table. This phase performs semantic checks such as type checking( checking for type
errors), or object binding (associating variable and function references with their definitions), or
define assignment (requiring all local variables to be initialized before use), rejecting incorrect
programs or issuing warnings.
 Semantic analysis usually requires a complete parse tree, meaning that this phase logically follows
the parsing phase, and logically precedes the code generation phase, though it is often possible to
fold multiple phases into one pass over the code in a compiler implementation.

2. THE MIDDLE END


Performs optimizations, including removal of useless or unreachable code, discovery and propagation of
constant values, relocation of computation to a less frequently executed place (e.g., out of a loop), or
specialization of computation based on the context. Generates another IR for the backend.

3. THE BACK END


Generates the assembly code, performing register allocation in process. (assigns processor registers for
the program variables where possible.)

Optimizes target code utilization of the hardware by figuring out how to keep parallel execution units busy, filing
delay slots.

Although most algorithms for optimization are in NP, heuristic techniques are well-developed.
The main phases of the back end include the following:

1. Analysis: This is the gathering of program information from the intermediate representation derived from
the input; data-flow analysis is used to build use-define chains, together widthdependence analysis, alias
analysis, pointer analysis, escape analysis, etc. accurate analysis is the basis for any compiler optimization.
The call graph and control flow graph are usually also built during the analysis phase.
2. Optimization: The intermediate language representation is transformed into functionally equivalent but
faster (or smaller) forms. Popular optimizations are inline expansion, dead code elimination, constant
propagation, loop transformation, register allocation and even automatic parallelization.
3. Code generation: The transformed intermediate language is translated into the output language, usually
the native machine language of the system. This involve resource and storage decisions, such as deciding
which variables to fit into registers and memory and the selection and scheduling of appropriate machine
instructions along with their associated addressing modes (see also Sethi-Ullman algorithm). Debug data
may also need to be generated to facilitate debugging.

Compiler analysis is the prerequisite for any compiler optimization and they tightly work together. For example,
dependence analysis is crucial for loop transformation.

TYPES OF COMPILERS

1. SINGLE PASS COMPILER


 The ability to compile in a single pass has classically been seen as a benefit because it simplifies
the job of writing a compiler and one-pass compilers generally perform compilations faster than
multi-pass compilers.
 Thus, partly driven by the resource limitations of early systems, many early languages were
specifically designed so that they could be compiled in a single pass (e.g., Pascal).
 In some cases, the design of a language feature may require a compiler to perform more than one
pass over the source. For instance, consider a declaration appearing on line 20 of the source which
affects the translation of a statement appearing on line 10. In this case, the first pass needs to
gather information about declarations appearing after statements that they affect, with the actual
translation happening during a subsequent pass.
 The disadvantage of compiling in a single pass is that it is not possible to perform many of the
sophisticated optimization needed to generate high quality code. It can be difficult to count
exactly how many passes an optimizing compiler makes. For instance, different phases of
optimization may analyze one expression many times but only analyze another expression once.
 Splitting a compiler up into small program is a technique used by researchers interested in
producing provably correct compilers. Proving the correctness of a set of small programs often
required less effort than proving the correctness of a larger, single, equivalent program.
2. MULTI PASS COMPILER
 While the typical multi-pass compiler outputs machine code from its final pass, there are several
other types:
A “source-to-source compiler” is a type of compiler that takes a high-level language as its input
and outputs a high-level language. For example, an automatic parallelizing compiler will frequently
take in a high-level language program as an input and then transform the code and annotate it
with parallel code annotations (e.g., OpenMP) or language constructs (e.g., Fortran’s DOALL
statements).
3. INCREMENTAL COMPILER
 Individual functions can be compiled in a run-time environment that also includes interpreted
functions. Incremental compilation dates back to 1962 and the first Lisp compiler, and is still used I
common Lisp systems.
 Lisp systems
4. STAGE COMPILER
 The compilers that compile to assembly language of a theoretical machine, like some Prolog
implementations.
 This Prolog machine is also known as the Warren abstract machine (ow WAM). Byte code compiler
for Java, Python (and many more) are also a subtype of this.
5. JUST IN TIME COMPILER
 In computing, just-in-time (JIT) compilation also known as dynamic translation, is compilation
done during execution of a program – at run time- rather than prior to execution.
 Most often this consist of translation to machine code, which is then executed directly, but can
also refer to translation to another format.
 JIT compilation is a combination of the two traditional approaches to translation to machine code
– ahead-of-time compilation (AOT), and interpretation – and combines some advantages and
drawbacks of both.
 JIT compilation combines the speed of compiled code with the flexibility of interpretation, with
the overhead of an interpreter and the additional overhead of compiling (not just interpreting).
 JIT compilation is a form of dynamic compilation, and allows adaptive optimization such as
dynamic recompilation – thus in theory JIT compilation can yield faster execution than static
compilation. Interpretation and JIT compilation are particularly suited for dynamic programming
languages, as the runtime system can handle late-bound data types and enforce security
guarantees.

ADVANTAGES OF COMPILER

1. Source code is not included, therefore compiled code is more secure than interpreted code.
2. Tends to produce faster code than interpreting source code.
3. Produces an executable file, and therefore the program can be run without need of the source code.

DISADVANTAGES OF COMPILER

1. Object code needs to be produced before a final executable file; this can be a slow process.
2. The source code must be 100 percent correct for the executable file to be produced.

You might also like