Lecture 08 Language Translation PDF
Lecture 08 Language Translation PDF
Concepts
Language translators convert source code into language that the computer processor
understands. Source code has various structures and commands, but computer processors
only understand machine language.
Different types of translations must occur to turn source code into machine language, which
is made up of bits of binary data. The three major types of language translators are
compilers, assemblers, and interpreters.
A. Compilers
A compiler is a special program that takes written source code (high-level-language) and
translates it into an executable machine-language program.
When a compiler executes, it analyzes all of the language statements in the source code and
builds the machine language object code.
Once the translation is done, the machine-language program can be run any number of times.
However, it can only be run on one type of computer since each type of computer has its own
machine language. Therefore if the program is to run on another type of computer it has to be
re-translated, using a different compiler, into the appropriate machine language.
It is a slow mode of translation, but fast in execution.
Examples of compilers: COBOL, C, C++, Pascal
History bit: The first compiler was written by Grace Hopper, in 1952, for the A-0
programming language.
Compiler characteristics:
spends a lot of time analyzing and processing the program
the resulting executable is some form of machine- specific binary code
the computer hardware executes the resulting code
program execution is fast
The compilation process
The compilation process has several phases which in summary include:
• Lexical analysis – involves converting characters in the source program into lexical
units/tokens
• Syntax analysis - transforming lexical units into parse trees which represent the syntactic
structure of program
• Semantics analysis – generation of intermediate code
• Code generation - machine code is generated
1. Lexical Analysis
A.K.A Scanning, it involves the scanner breaking down the high-level language program into
its smallest meaningful symbols (tokens, atoms). A symbol table is then started with a list of
all tokens (symbols) found.
Tokens are the smallest meaningful units of the language and these are faster to manipulate
than characters e.g.:
• “Reserved words”: do, if, float, while
• Special characters: ( ,{ , + ,-, =, !, /,
• Names & numbers myValue , 3.07e02
Figure 1: Symbol Table Classification
2. Syntax Analysis
A.K.A Parsing, this phase entails the compiler determining whether the tokens recognized by
the scanner are syntactically legal statements. This is performed by a parser.
The syntax of the language is usually defined via a formal context-free (CF) grammar. This
language grammar is defined by a set of rules that identify legal (meaningful) combinations
of symbols. The parser applies these rules repeatedly to the program until leaves of parse tree
are “atoms” i.e. each application of a rule results in a node in the parse tree. However, in
cases when there is no pattern match, a syntax error arises.
In summary, the output of a parser is either a parse tree or error message if the tree can’t be
constructed
Example
High-level language statement: a = b + c
The resulting parse tree for the above HLL statement is as shown below:
3. Semantic Analysis
This is simply discovery of meaning in a program using the symbol table.
The compiler does this by making a first pass over the parse tree to determine whether all
branches of the tree are semantically valid. If they are valid, the compiler can generate
machine language instructions, if not, there is a semantic error and machine language
instructions are not generated
7. Machine-dependent optimization
Make improvements that require specific knowledge of machine architecture, e.g.
• Optimize use of available registers
• Reorder instructions to avoid waits
B. Interpreters
An interpreter translates a high-level language program instruction-by-instruction into
machine executable code i.e. it analyzes and executes each line of source code in order.
Instead of requiring a step before program execution, an interpreter processes the program as
it is being executed. It is therefore slower in execution.
One use of interpreters is to execute HLL programs and another is that they can let you use a
machine-language program meant for one type of computer on a completely different type of
computer, thus performing a last moment translation service.
Interpreter characteristics:
relatively little time is spent analyzing and processing the program
the resulting code is some sort of intermediate code
the resulting code is interpreted by another program
program execution is relatively slow
Examples of:
Perl programs are partially compiled to detect errors before interpretation.
All initial implementations of Java were hybrid; the intermediate form, byte code,
provides portability to any machine that has a byte code interpreter and a run-time
system (together, these are called Java Virtual Machine).
This is one of the essential features of Java: the same compiled program can be run on many
different types of computer.
Why, you might wonder, use the intermediate Java bytecode at all? Why not just distribute
the original Java program and let each person compile it into the machine language of
whatever computer they want to run it on? There are many reasons, but the main one is a
security concern. Many Java programs are meant to be downloaded over a network. This
leads to obvious security concerns: you don't want to download and run a program that will
damage your computer or your files. The bytecode interpreter acts as a buffer between you
and the program you download.
It should be noted that there is no necessary connection between Java and Java bytecode. A
program written in Java could certainly be compiled into the machine language of a real
computer. And programs written in other languages could be compiled into Java bytecode.
However, it is the combination of Java and Java bytecode that is platform-independent,
secure, and network-compatible while allowing you to program in a modern high-level
object-oriented language.
Compiler vs Interpreter
• Compilers translate a source (human-writable) program to an executable (machine-
readable) program while Interpreters convert a source program and execute it at the same
time.
• Interpretation is ‘simple’ and Compilation is ‘complicated’:
• compilation involves understanding of the whole source program.
• Not all computing languages are compiled. Some are interpreted. In interpreted languages,
the source code is always present and when it is run it is interpreted line-by-line and then it
is executed. Most scripting languages are of this sort – eg. JavaScript and VBScript
Editors
Editors are used for the creation and modification of programs, and possibly the associated
documentation. They can be general-purpose text or document editors, or they can be
specialized for a target language - Monolingual (e.g., Java editor) or multilingual
Linkers
Combine object-code fragments into a larger program can be monolingual or multilingual. In
a broader sense, they are tools for linking specification modules, able to perform checking
and binding across various specification modules.
Loader
This tool loads machine language program into memory.
Debuggers
These are special kinds of interpreters where
– execution state inspectable
– execution mode definable
– there is animation to support program understanding
Preprocessor
Preprocessor macros (instructions) are commonly used to specify that code from another file
is to be included. A preprocessor processes a program immediately before the program is
compiled to expand embedded preprocessor macros.
For example: The C preprocessor expands the macros; #include, #define, e.tc.
Machine-dependent vs machine-independent aspects of translation
The object code is machine-dependent meaning that a compiled program can only be
executed on a machine for which it has been compiled, whereas an interpreted program is
not machine-dependent because the machine-dependent part is in the interpreter itself.
Hardware compilation
Hardware compilation involves taking user’s specifications to automatically generate IC’s.
Some compiler output targets hardware at a very low level e.g. Field Programmable Gate
Arrays. Such compilers are said to be hardware compilers or synthesis tools because the
source code they compile effectively controls the final configuration of the hardware and how
it operates; the output of the compilation are not instructions that are executed in sequence -
only an interconnection of transistors or lookup tables.
For HDLs, 'compiler' refers to synthesis, a process of transforming the HDL code listing into
a physically realizable gate netlist. The netlist output can take any of many forms e.g. a
"simulation" netlist with gate-delay information.
On the other hand, a software compiler converts the source-code listing into
a microprocessor-specific object-code, for execution on the target microprocessor. As HDLs
and programming languages borrow concepts and features from each other, the boundary
between them is becoming less distinct.