0% found this document useful (0 votes)
15 views

What Is A Language Processor

The document discusses language processors including assemblers, compilers, and interpreters. It explains that assemblers translate assembly language to machine code, compilers translate entire programs to machine code and check for errors, and interpreters translate and execute code line-by-line. The document also compares compilers and interpreters in terms of debugging, object files, execution time, source code needs, memory usage, and more.

Uploaded by

akramshaik2004
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

What Is A Language Processor

The document discusses language processors including assemblers, compilers, and interpreters. It explains that assemblers translate assembly language to machine code, compilers translate entire programs to machine code and check for errors, and interpreters translate and execute code line-by-line. The document also compares compilers and interpreters in terms of debugging, object files, execution time, source code needs, memory usage, and more.

Uploaded by

akramshaik2004
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

UNIT -1

What Is a Language Processor?

A language processor, or translator, is a computer program that translates source code


from one programming language to another. They also identify errors during
translation.
Computer programs are usually written in high-level programming languages (like C++,
Python, and Java). Further, to make them understandable by the computer, a language
processor needs to translate the source code into machine code (also known as object
code, which is made up of ones and zeroes).
There are three types of language processors: assembler, compiler, and interpreter.

3. Assembler

The assembler translates a program written in assembly language into machine code.
Assembly language is a low-level, machine-dependent symbolic code that consists of
instructions (like ADD, SUB, MUX, MOV, etc.):

4. Compiler

A compiler reads an entire source code and then translates it into machine code.
Further, the machine code, aka the object code, is stored in an object file.
If the compiler encounters any errors during the compilation process, it continues to
read the source code to the end and then shows the errors and their line numbers to
the user.
Compiled programming languages are high-level and machine-independent. Moreover,
examples of compiled programming languages are C, C++, C#, Java, Rust, and Go:
UNIT -1
5. Interpreter

An interpreter receives the source code and then reads it line by line, translating each
line of code to machine code and executing it before moving on to the next line.
If the interpreter encounters any errors during its process, it stops the process and
shows an error message to the user.
Interpreted programming languages are also high-level and machine-independent.
Python, Javascript, PHP, and Ruby are examples of interpreted programming
languages:

6. Comparison Between Interpreter and Compiler

Compilers and interpreters both have their pros and cons:

6.1. Debugging

Debugging is easier with an interpreter since they stop right after encountering an error
message, whereas a compiler shows error messages after reading the entire program.

6.2. Object File

A compiler generates a file containing machine code after translating the source code.
This file is known as an object file.
An interpreter doesn’t create an object file.
UNIT -1
6.3. Execution Time

The execution time of a program written in an interpreted language is slower since an


interpreter needs to translate and execute each line of the source code. However,
since a compiler generates an object file, the execution time is faster.

6.4. Needs Source Code

A compiler generates an object file, so we don’t need the source code to execute the
program later. In contrast, an interpreter requires source code to execute the program.

6.5. Memory Usage

A compiler needs to generate object codes, so it requires more memory than an


interpreter.

What is a Compiler?

A compiler is a software tool that converts high-level programming language code


(source code) into machine code (target code) that a computer's processor can
execute. It reads the entire program and translates it into an efficient, executable
format, often performing optimizations to improve performance and reduce resource
consumption. The compilation process involves analyzing the source code,
understanding its structure and semantics, and then systematically transforming it into
a lower-level, hardware-compatible format. This enables developers to write code in
human-readable languages, such as C, C++, or Java, and then use a compiler to create
executable programs for various computing environments.

What is an Interpreter?

An interpreter is a software tool that directly executes instructions written in a


programming language without requiring them to be compiled into machine code first.
It reads the source code line by line or statement by statement, translating and running
it on the fly. This approach allows for immediate program execution, making
interpreters ideal for scripting, rapid prototyping, and languages that prioritize flexibility
and ease of use, such as Python and JavaScript. By eliminating the compilation step,
interpreters facilitate a more interactive development process, though at the cost of
slower execution speed compared to compiled languages.

Compiler Vs Interpreter

Feature Compiler Interpreter


UNIT -1
Feature Compiler Interpreter
Types are determined during Types are determined at runtime,
Type Checking compilation, leading to early which can introduce type-related
detection of type errors. errors during execution.
Translates the entire program Translates the program on-the-fly
Translation
during compilation before during runtime, interpreting code
Time
execution. line by line.
Translated instructions are
Generates machine code stored as
Output Storage temporarily stored in RAM, without
an executable file on disk.
creating a separate executable.
Can perform extensive compile- Limited optimization opportunities
Optimization time optimizations for enhanced due to runtime translation,
performance. affecting performance.
Suited for scripting, rapid
Ideal for performance-critical
Application prototyping, and applications
applications like operating
Suitability where immediate execution is
systems and games.
beneficial.
Offers better security through code Source code is more exposed,
Security obfuscation as the source is potentially posing higher security
compiled into an executable. risks if left unprotected.
Debugging can be complex as Facilitates easier debugging with
Debugging errors need to be traced back from errors reported in the context of
the executable to the source code. the source code during execution.
Slower execution speed owing to
Execution Generally faster due to direct
on-the-fly interpretation and
Speed execution of machine code.
execution.
Produces platform-specific
Highly portable, as the same code
Platform executables requiring
can run on any platform with a
Dependency recompilation for different
compatible interpreter.
platforms.
More memory-efficient as Less memory-efficient due to the
Memory
unnecessary code can be need to keep the entire program in
Efficiency
optimized out during compilation. memory during interpretation.
Shorter development cycles
Development Longer development cycles due to
beneficial for rapid testing and
UNIT -1
Feature Compiler Interpreter
Cycle the compile-link-execute steps. prototyping.

Highly flexible, allowing for


Runtime Less flexible at runtime since
dynamic code execution and
Flexibility changes require recompilation.
modifications at runtime.
Language C, C++, Rust are commonly Python, JavaScript, and Ruby are
Examples compiled languages. popular interpreted languages.

Role of a Compiler

• Compilers convert high-level code to executable machine instructions,


facilitating program execution.
• They optimize code for better speed and efficiency through in-depth analysis and
enhancements.
• By enforcing type checking and static type systems, compilers aid in early error
detection and enhance reliability.
• Compilers ensure software security and correctness, verifying consistency and
conducting thorough checks for potential errors.

Types of Compilers

• Single-Pass Compilers: Quickly process code in one go, trading off advanced
optimizations for speed.
• Multi-Pass Compilers: Analyze code in several stages, enhancing optimization at
the cost of more time and memory.
• Source-to-Source Compilers: Convert code from one high-level language to
another, aiding in language migration.
• Cross Compilers: Create executable code for different platforms, enabling
development for multiple architectures.
• Native Compilers: Generate machine code for the host system, maximizing
performance by exploiting hardware capabilities.
• Just-In-Time (JIT) Compilers: Compile code at runtime, offering a balance
between compilation and interpretation.
• Ahead-of-Time (AOT) Compilers: Compile entire programs before execution,
improving startup times and runtime efficiency.
• Optimizing Compilers: Focus on code analysis to apply optimizations, improving
the program's speed and resource usage.

Role of an Interpreter
UNIT -1
• Interpreters originated to mitigate early computer memory limitations, facilitating
program execution with minimal memory requirements.
• They execute code on-demand, bypassing the need for pre-compilation, which
suits immediate and adaptable execution scenarios.
• Real-time code translation by interpreters allows for dynamic responses to user
inputs and changing conditions.
• The ability of interpreted languages to instantaneously generate and assess
expressions supports dynamic data handling and swift adaptability to new
situations.amic data and rapidly changing conditions.

Types of Interpreters

• Sequential Interpreters: Execute code line by line in sequence.


• Batch Interpreters: Run entire programs at once,.
• Bytecode Interpreters: Convert source code to an intermediate form before
execution, enhancing portability.
• Just-In-Time (JIT) Interpreters: Blend interpreting and compiling, translating code
on-demand for efficient execution.
• Tree-Walk Interpreters: Navigate an abstract syntax tree to execute code, offering
a structured approach to interpretation.
• Source-to-Source Interpreters: Translate code from one high-level language to
another, aiding in cross-language development.
• Hardware Interpreters: Embedded in hardware to directly execute high-level
language constructs, optimizing performance.
• Dynamic Translators: Adapt code between architectures in real-time, facilitating
cross-platform software use.
• Domain-Specific Interpreters: Tailored for particular fields or applications,
optimizing for specialized tasks.

Cross-Compiler:
• A cross-compiler, on the other hand, is a compiler that runs on one platform but
generates executable code for another platform.
• It is useful in scenarios where the development environment is different from the
target environment. For instance, if you're developing software for an embedded
system with a different architecture, you would use a cross-compiler to compile
your code on your development machine (which could be a PC or a workstation)
into machine code that can run on the target embedded system.
• Cross-compilers are commonly used in embedded systems development, where
the target device might have limited resources or a different architecture.
• They require knowledge of both the source and target architectures and often
come as part of a software development kit (SDK) provided by the hardware
manufacturer or community.

Bootstrapping in Compiler Design


UNIT -1
Bootstrapping is a self-compiling compiler, written in its source language, ensuring
self-sufficiency and language evolution. A self-compiling compiler is written in a
source programming language that it determines to compile. In simple terms, a self-
compiling compiler compiles its source code. This compiled compiler can compile
everything else and its future versions as well.

Representation of Compiler in Bootstrapping


In Bootstrapping, a compiler is represented by three languages:
1. Source Language (S) is the language that the compiler compiles.

2. Implementation Language (I) is the language that the compiler is written in.

3. Target Language (T) is the language generated by the compiler.

A T-diagram is used to represent these three languages.

The above T-diagram shows that for a compiler with Source Language S and Target
Language T, implemented with Implementation Language I, i.e., SIT.
Steps for Bootstrapping in Compiler Design
There are several steps involved in bootstrapping process for Compilers:
1. Select a high-level language for which the compiler has to be implemented,
i.e., the target language. The language chosen should be capable and
effective in generating quality code.

2. Implement the initial version of the compiler with the chosen high-level
language. This initial version of the compiler should be capable of generating
a partially functional code of the target language. This initial version is known
as the stage 0 compiler.

3. Rather than directly generating the machine code, an intermediate compiler


representation should be made. The Bootstrap compiler should be extended
to create an Intermediate Representation.

4. Perform some optimizations on the bootstrapped compiler and use this


compiler to compile its source code. The output is called a stage 1 compiler.
UNIT -1
5. Test the stage 1 compiler and iterate the same process using the stage 1
compiler to compile its source code, leading to a newer version of the
compiler called the stage 2 compiler. The same process can be repeated
multiple times, leading to a more complex and advanced compiler version.
Cross Compilers in Bootstrapping
A cross-compiler is a compiler that creates executable code for a platform other than
the one on which it is running. At the same time, bootstrapping is using a simple
version of the compiler to compile a complex version of the same compiler.
Cross-Compilers in bootstrapping are used to compile the further and advanced
versions of the compiler. With the help of a cross-compiler, we can generate
executable code for the target platform without depending on the source compiler that
runs directly on that platform.
• A cross compiler for a source language ‘C,’ using implementation language
‘N’ and producing a target code for machine ‘S,’ is represented as CNS.

• A cross compiler for a source language ‘N,’ using implementation language


‘M’ and generating code for machine ‘M,’ is represented by NMM.

• A cross compiler for a source language ‘C,’ using implementation language


‘M’ and generating code for machine ‘S,’ is represented by CMS.

Using bootstrapping, the first two cross Compilers (CNS and NMM) can generate the
third compiler (CMS).
CNS + NMM -> CMS
UNIT -1

Creating a Cross-Compiler using Bootstrapping


Let’s create our own cross-compiler with the help of a compiler that has source
language ‘D’ intermediate language ‘O’, and target language ‘O’ using a compiler that
runs on source language ‘D’ with intermediate language ‘D’ and target language ‘T.’

Step-1: Start with writing the compiler representation for both compilers.
The source cross-compiler can be represented as DDT.
The intermediate cross-compiler can be represented as DOO.

Step-2: Create T-diagrams for both the cross compilers.


T-diagram for DDT :

T-diagram for DOO :

Step-3: Compiling the intermediate compiler with the source compiler we get.
DDT + DOO -> DOT
UNIT -1

Step-4: Finally, we have got a cross compiler written in O language, which compiles D
language and generates code in T language.
UNIT -1
What is Input Buffering in Compiler Design?

Lexical Analysis has to access secondary memory each time to identify tokens. It is
time-consuming and costly. So, the input strings are stored into a buffer and then
scanned by Lexical Analysis.

Lexical Analysis scans input string from left to right one character at a time to identify
tokens. It uses two pointers to scan tokens −

• Begin Pointer (bptr) − It points to the beginning of the string to be read.


• Look Ahead Pointer (lptr) − It moves ahead to search for the end of the token.

Example − For statement int a, b;

• Both pointers start at the beginning of the string, which is stored in the buffer.

• Look Ahead Pointer scans buffer until the token is found.

• The character ("blank space") beyond the token ("int") have to be examined
before the token ("int") will be determined.
UNIT -1

• After processing token ("int") both pointers will set to the next token ('a'), & this
process will be repeated for the whole program.

A buffer can be divided into two halves. If the look Ahead pointer moves towards
halfway in First Half, the second half is filled with new characters to be read. If the look
Ahead pointer moves towards the right end of the buffer of the second half, the first
half will be filled with new characters, and it goes on.

Sentinels − Sentinels are used to making a check, each time when the forward pointer
is converted, a check is completed to provide that one half of the buffer has not
converted off. If it is completed, then the other half should be reloaded.
UNIT -1
Buffer Pairs − A specialized buffering technique can decrease the amount of overhead,
which is needed to process an input character in transferring characters. It includes
two buffers, each includes N-character size which is reloaded alternatively.

There are two pointers such as the lexeme Begin and forward are supported. Lexeme
Begin points to the starting of the current lexeme which is discovered. Forward scans
ahead before a match for a pattern are discovered. Before a lexeme is initiated, lexeme
begin is set to the character directly after the lexeme which is only constructed, and
forward is set to the character at its right end.

Preliminary Scanning − Certain processes are best performed as characters are moved
from the source file to the buffer. For example, it can delete comments. Languages
like FORTRAN which ignores blank can delete them from the character stream. It can
also collapse strings of several blanks into one blank. Pre-processing the character
stream being subjected to lexical analysis saves the trouble of moving the look ahead
pointer back and forth over a string of blanks.

Lexical Analyzer Generator Tool

Lexical Analyzer Generator is typically implemented using a tool. There are some
standard tools available in the UNIX environment. Some of the standard tools are

• LEX – it helps in writing programs whose flow of control is regulated by the


various definitions of regular expressions in the input stream. The wrapper
programming language is C.
• FLEX – It is a faster lexical analyzer tool. This is also a C language version of the
LEX tool.
• JLEX – This is a Java version of LEX tool.
UNIT -1
The input will be a source file with a ―.l‖ extension. This will be compiled by a Lex
compiler and the output of this compiler will be a source file in C named as
―lex.yy.c‖. This can be compiled using a C compiler to get the desired output.

8.3 Components of a LEX program

A LEX program typically consists of three parts: An initial declaration section, a


middle set of translation rules and the last section that consists of other auxiliary
procedures. The ―%%‖ acts as a delimiter which separates the declaration
section from the translation rules section and the translation rules section from
the auxiliary procedures section. A program may miss the declaration section but
the delimiter is mandatory.

declaration
%%
translation rules
%%
auxiliary procedures

In the declaration section, declarations and initialization of variables take


place. In the translation rules section, regular expressions to match tokens along
with the necessary actions are defined. The auxiliary procedures section consists
of a main function corresponding to a C program and any other functions that are
required in the auxiliary procedures section.

8.3.1 Declaration

In the declaration section a regular expression can be defined. Following is an


example of declaration section. Each statement has two components: a name
and a regular expression that is used to denote the name.

1. delim [\t\n]
2. ws{delim}+
3. letter [A-Za-z]
4. digit [0-9]
5. id{letter}({letter}|{digit})*

Table 8.1 summarizes the operators and special characters used in the regular
expressions which are part of the declaration and translation rules section.
Table 8.1 Meta Characters
UNIT -1

Meta
Match
Character

. Any character except new line

\n newline

zero or more copies of the preceding


*
expression

one or more copies of the preceding


+
expression

zero or one copy of the preceding


?
expression

^ beginning of line

$ end of line

a|b a or b

(ab)+ one or more copies of ab (grouping)


UNIT -1

“a+b” literal “a+b” (C escapes still work)

[] character class

In addition, the declaration section may also contain some local variable
declarations and definitions which can be modified in the subsequent sections.

8.3.2 Translation Rules

This is the second section of the LEX program after the declarations. The
declarations section is separated from the Translation Rules section by means of
the ―%%‖ delimiter. Here, each statement consists of two components: a
pattern and an action. The pattern is matched with the input. If there is a match
of pattern, the action listed against the pattern is carried out. Thus the LEX tool
can be looked upon as a rule based programming language. The following is an
example of patterns p1, p2…pn and their corresponding actions 1 to n.

p1 {action1} /*p—pattern (Regular exp) */

pn {actionn}
For example, if the keyword IF is to be returned as a token for a match with the
input string ―if‖ then the translation rule is defined as

{if} {return(IF);}

The ―;‖ at the end of the (IF) indicates end of the first statement of an action
and the entire sequence of actions is available between a pair of parenthesis. If
the action has been written in multiple lines then the continuation character
needs to be used. Similarly the following is an example for an identifier ―id‖,
where the usage of ―id‖ is already stated in the first ―declaration‖ section.
{id} {yylval=install_id();return(ID);}

In the above statement, when encountering an identifier, two actions need to


be taken. The first one is call install_id() function and assign it to yylval and the
second one is a return statement.
UNIT -1

8.3.3 Auxiliary procedures

This section is separated from the translation rules section using the delimiter
―%%‖. In this section, the C program’s main function is declared and the other
necessary functions also defined. In the example defined in translation rules
section, the function install_id() is a procedure used to install the lexeme, whose
first character is pointed by yytext and length is provided by yyleng which are
inserted into the symbol table and return a pointer pointing to the beginning of
the lexeme.

install_id() {

The functionality of install_id can be written separately or combined with the


main function. The functions yytext ( ) and yyleng( ) are lex commands to indicate
the text of the input and the length of the string.

8.4 Example LEX program

The following LEX program is used to count the number of lines in the input
data stream.

1. int num_lines = 0;
2. %%
3. \n++num_lines;
4. .;
5. %%
6. main()
7. { yylex();
8. printf( “# of lines = %d\n”, num_lines); }

You might also like