What Is A Language Processor
What Is A Language Processor
3. Assembler
The assembler translates a program written in assembly language into machine code.
Assembly language is a low-level, machine-dependent symbolic code that consists of
instructions (like ADD, SUB, MUX, MOV, etc.):
4. Compiler
A compiler reads an entire source code and then translates it into machine code.
Further, the machine code, aka the object code, is stored in an object file.
If the compiler encounters any errors during the compilation process, it continues to
read the source code to the end and then shows the errors and their line numbers to
the user.
Compiled programming languages are high-level and machine-independent. Moreover,
examples of compiled programming languages are C, C++, C#, Java, Rust, and Go:
UNIT -1
5. Interpreter
An interpreter receives the source code and then reads it line by line, translating each
line of code to machine code and executing it before moving on to the next line.
If the interpreter encounters any errors during its process, it stops the process and
shows an error message to the user.
Interpreted programming languages are also high-level and machine-independent.
Python, Javascript, PHP, and Ruby are examples of interpreted programming
languages:
6.1. Debugging
Debugging is easier with an interpreter since they stop right after encountering an error
message, whereas a compiler shows error messages after reading the entire program.
A compiler generates a file containing machine code after translating the source code.
This file is known as an object file.
An interpreter doesn’t create an object file.
UNIT -1
6.3. Execution Time
A compiler generates an object file, so we don’t need the source code to execute the
program later. In contrast, an interpreter requires source code to execute the program.
What is a Compiler?
What is an Interpreter?
Compiler Vs Interpreter
Role of a Compiler
Types of Compilers
• Single-Pass Compilers: Quickly process code in one go, trading off advanced
optimizations for speed.
• Multi-Pass Compilers: Analyze code in several stages, enhancing optimization at
the cost of more time and memory.
• Source-to-Source Compilers: Convert code from one high-level language to
another, aiding in language migration.
• Cross Compilers: Create executable code for different platforms, enabling
development for multiple architectures.
• Native Compilers: Generate machine code for the host system, maximizing
performance by exploiting hardware capabilities.
• Just-In-Time (JIT) Compilers: Compile code at runtime, offering a balance
between compilation and interpretation.
• Ahead-of-Time (AOT) Compilers: Compile entire programs before execution,
improving startup times and runtime efficiency.
• Optimizing Compilers: Focus on code analysis to apply optimizations, improving
the program's speed and resource usage.
Role of an Interpreter
UNIT -1
• Interpreters originated to mitigate early computer memory limitations, facilitating
program execution with minimal memory requirements.
• They execute code on-demand, bypassing the need for pre-compilation, which
suits immediate and adaptable execution scenarios.
• Real-time code translation by interpreters allows for dynamic responses to user
inputs and changing conditions.
• The ability of interpreted languages to instantaneously generate and assess
expressions supports dynamic data handling and swift adaptability to new
situations.amic data and rapidly changing conditions.
Types of Interpreters
Cross-Compiler:
• A cross-compiler, on the other hand, is a compiler that runs on one platform but
generates executable code for another platform.
• It is useful in scenarios where the development environment is different from the
target environment. For instance, if you're developing software for an embedded
system with a different architecture, you would use a cross-compiler to compile
your code on your development machine (which could be a PC or a workstation)
into machine code that can run on the target embedded system.
• Cross-compilers are commonly used in embedded systems development, where
the target device might have limited resources or a different architecture.
• They require knowledge of both the source and target architectures and often
come as part of a software development kit (SDK) provided by the hardware
manufacturer or community.
2. Implementation Language (I) is the language that the compiler is written in.
The above T-diagram shows that for a compiler with Source Language S and Target
Language T, implemented with Implementation Language I, i.e., SIT.
Steps for Bootstrapping in Compiler Design
There are several steps involved in bootstrapping process for Compilers:
1. Select a high-level language for which the compiler has to be implemented,
i.e., the target language. The language chosen should be capable and
effective in generating quality code.
2. Implement the initial version of the compiler with the chosen high-level
language. This initial version of the compiler should be capable of generating
a partially functional code of the target language. This initial version is known
as the stage 0 compiler.
Using bootstrapping, the first two cross Compilers (CNS and NMM) can generate the
third compiler (CMS).
CNS + NMM -> CMS
UNIT -1
Step-1: Start with writing the compiler representation for both compilers.
The source cross-compiler can be represented as DDT.
The intermediate cross-compiler can be represented as DOO.
Step-3: Compiling the intermediate compiler with the source compiler we get.
DDT + DOO -> DOT
UNIT -1
Step-4: Finally, we have got a cross compiler written in O language, which compiles D
language and generates code in T language.
UNIT -1
What is Input Buffering in Compiler Design?
Lexical Analysis has to access secondary memory each time to identify tokens. It is
time-consuming and costly. So, the input strings are stored into a buffer and then
scanned by Lexical Analysis.
Lexical Analysis scans input string from left to right one character at a time to identify
tokens. It uses two pointers to scan tokens −
• Both pointers start at the beginning of the string, which is stored in the buffer.
• The character ("blank space") beyond the token ("int") have to be examined
before the token ("int") will be determined.
UNIT -1
• After processing token ("int") both pointers will set to the next token ('a'), & this
process will be repeated for the whole program.
A buffer can be divided into two halves. If the look Ahead pointer moves towards
halfway in First Half, the second half is filled with new characters to be read. If the look
Ahead pointer moves towards the right end of the buffer of the second half, the first
half will be filled with new characters, and it goes on.
Sentinels − Sentinels are used to making a check, each time when the forward pointer
is converted, a check is completed to provide that one half of the buffer has not
converted off. If it is completed, then the other half should be reloaded.
UNIT -1
Buffer Pairs − A specialized buffering technique can decrease the amount of overhead,
which is needed to process an input character in transferring characters. It includes
two buffers, each includes N-character size which is reloaded alternatively.
There are two pointers such as the lexeme Begin and forward are supported. Lexeme
Begin points to the starting of the current lexeme which is discovered. Forward scans
ahead before a match for a pattern are discovered. Before a lexeme is initiated, lexeme
begin is set to the character directly after the lexeme which is only constructed, and
forward is set to the character at its right end.
Preliminary Scanning − Certain processes are best performed as characters are moved
from the source file to the buffer. For example, it can delete comments. Languages
like FORTRAN which ignores blank can delete them from the character stream. It can
also collapse strings of several blanks into one blank. Pre-processing the character
stream being subjected to lexical analysis saves the trouble of moving the look ahead
pointer back and forth over a string of blanks.
Lexical Analyzer Generator is typically implemented using a tool. There are some
standard tools available in the UNIX environment. Some of the standard tools are
declaration
%%
translation rules
%%
auxiliary procedures
8.3.1 Declaration
1. delim [\t\n]
2. ws{delim}+
3. letter [A-Za-z]
4. digit [0-9]
5. id{letter}({letter}|{digit})*
Table 8.1 summarizes the operators and special characters used in the regular
expressions which are part of the declaration and translation rules section.
Table 8.1 Meta Characters
UNIT -1
Meta
Match
Character
\n newline
^ beginning of line
$ end of line
a|b a or b
[] character class
In addition, the declaration section may also contain some local variable
declarations and definitions which can be modified in the subsequent sections.
This is the second section of the LEX program after the declarations. The
declarations section is separated from the Translation Rules section by means of
the ―%%‖ delimiter. Here, each statement consists of two components: a
pattern and an action. The pattern is matched with the input. If there is a match
of pattern, the action listed against the pattern is carried out. Thus the LEX tool
can be looked upon as a rule based programming language. The following is an
example of patterns p1, p2…pn and their corresponding actions 1 to n.
pn {actionn}
For example, if the keyword IF is to be returned as a token for a match with the
input string ―if‖ then the translation rule is defined as
{if} {return(IF);}
The ―;‖ at the end of the (IF) indicates end of the first statement of an action
and the entire sequence of actions is available between a pair of parenthesis. If
the action has been written in multiple lines then the continuation character
needs to be used. Similarly the following is an example for an identifier ―id‖,
where the usage of ―id‖ is already stated in the first ―declaration‖ section.
{id} {yylval=install_id();return(ID);}
This section is separated from the translation rules section using the delimiter
―%%‖. In this section, the C program’s main function is declared and the other
necessary functions also defined. In the example defined in translation rules
section, the function install_id() is a procedure used to install the lexeme, whose
first character is pointed by yytext and length is provided by yyleng which are
inserted into the symbol table and return a pointer pointing to the beginning of
the lexeme.
install_id() {
The following LEX program is used to count the number of lines in the input
data stream.
1. int num_lines = 0;
2. %%
3. \n++num_lines;
4. .;
5. %%
6. main()
7. { yylex();
8. printf( “# of lines = %d\n”, num_lines); }