0% found this document useful (0 votes)
2 views

Compiler Construction CSEC325 Token

Uploaded by

Jaki Love
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Compiler Construction CSEC325 Token

Uploaded by

Jaki Love
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

WHAT IS COMPILER AND INTERPRETER?

TYPES OF COMPILER STORAGE MANAGEMENT


• Compiler: A compiler is a computer program that • Cross-Compiler: These are the compilers that run on one
translates computer code(higher level language) written in machine and make code for another machine. A cross • Storage Allocation Strategies:
one programming language into another language(machine compiler is a compiler adequate for making executable code • There are mainly three types of Storage Allocation
code). for a platform other than the one on which the compiler is Strategies:
• Interpreter: An interpreter is a program that directly running. Cross compiler tools are used to create executable • Static Allocation
executes the instructions in a high-level language, without for installed systems or several platforms. • Heap Allocation
converting it into machine code. • Source-to-source Compiler: Source-to-source compilers • Stack Allocation
typically perform a series of transformations on the original • Static allocation: Static allocation lays out or assigns the
Language Processing System
source code, analyzing its structure, syntax, and semantics, storage for all the data objects at the compile time. In static
and generating an equivalent representation in the target allocation names are bound to storage. The address of these
language. identifiers will be the same throughout. The memory will be
allocated in a static location once it is created at compile
OPERATORS
time. C and C++ use static allocation.
• An operator is a sign or symbol that specifies the type of
• Advantages of Static Allocation
calculation to perform within an expression
• 1. It is easy to understand.
symbol Operation Example Description
• 2. The memory is allocated once only at compile time and
+ Addition a+b Add the two operands
Subtraction a-b Subtract the second from the
remains the same throughout the program completion.
-
first operand • 3. Memory allocation is done before the program starts
* Multiplication a*b Multiply the two operands taking memory only on compile time.
/ Division a/b Divide the first operand by the • Disadvantages of Static Allocation
second operand • 1. Not highly scalable.
** Power a**b Raise the first operand by the • 2. Static storage allocation is not very efficient.
• Language Processing System: Language processors are power of the second operand • 3. The size of the data must be known at the compile time.
vital in converting human-readable code into machine- % Modulo a%b Divide the first operand by the • Heap allocation: is used where the Stack allocation lacks if
executable instructions. They consist of various components second operand and yield the we want to retain the values of the local variable after the
that work together to translate, analyze, and execute code remainder portion activation record ends, which we cannot do in stack
efficiently. allocation, here LIFO scheme does not work for the
Preprocessor: It includes all header files and also evaluates allocation and de-allocation of the activation record. Heap is
whether a macro(A macro is a piece of code that is given a the most flexible storage allocation strategy we can
name. Whenever the name is used, it is replaced by the contents dynamically allocate and de-allocate local variables
of the macro by an interpreter or compiler.) is included. It takes

Phases of a Compiler Introduction of Lexical Analysis Example of Lexical Analyzer


¾ We basically have two phases of compilers, namely the ¾ Lexical Analysis is the first phase of the compiler also known ¾Suppose
Analysis phase and Synthesis phase. as a scanner. It converts the High level input program into a ¾we pass a statement through lexical analyzer – a = b + c;
¾ Lexical Analysis: The first phase of a compiler is lexical sequence of Tokens. ¾It will generate token sequence like this: id=id+id;
analysis, also known as scanning. This phase reads the ¾ 1.Lexical Analysis can be implemented with the ¾Where each id refers to it’s variable in the symbol table
source code and breaks it into a stream of tokens, which are Deterministic finite Automata. referencing all details For example, consider the program
the basic units of the programming language. The tokens are ¾ 2.The output is a sequence of tokens that is sent to the parser ¾int main() ¾a = 10;
then passed on to the next phase for further processing. for syntax analysis ¾{ ¾return 0;
¾ Syntax Analysis: The second phase of a compiler is syntax ¾// 2 variables ¾}
analysis. This phase takes the stream of tokens generated by ¾int a, b;
the lexical analysis phase and checks whether they conform Example of Lexical Analyzer
to the grammar of the programming language. The output of ¾All the valid tokens are:
this phase is usually an Abstract Syntax Tree (AST). ¾'int' 'main‘ '(' ') ' '{' ‘int' 'a' ','
¾ Semantic Analysis: The third phase of a compiler is ¾'b' ';' 'a' '=' '10‘ ';‘ 'return‘ '0' ';' '}‘
semantic analysis. This phase checks whether the code is ¾Above are the valid tokens. You can observe that we have
semantically correct, i.e., whether it conforms to the omitted comments.
language’s type system and other semantic rules. In this What is a Token? another example, consider below printf statement.
stage, the compiler checks the meaning of the source code to ¾ A lexical token is a sequence of characters that can be treated
ensure that it makes sense. The compiler performs type as a unit in the grammar of the programming languages.
checking, which ensures that variables are used correctly ¾ Example of tokens:
and that operations are performed on compatible data types. ¾ Type token (id, number, real, . . . )
The compiler also checks for other semantic errors, such as ¾ Punctuation tokens (IF, void, return, . . . ) ¾There are 5 valid token in this printf statement.
undeclared variables and incorrect function calls. ¾ Alphabetic tokens (keywords) Lets Try
¾ Intermediate Code Generation: The fourth phase of a Keywords; Examples-for, while, if etc. ¾Exercise 1:
compiler is intermediate code generation. This phase Identifier; Examples-Variables name, function name, etc. ¾int main()
generates an intermediate representation of the source code Operators; Examples ‘+’, ‘++’, ‘-’ etc. ¾{
that can be easily translated into machine code. Separators ; Examples ‘,’ ‘;’ etc ¾int a = 10, b = 20;
¾ Optimization: The fifth phase of a compiler is ¾printf("sum is:%d",a+b);
optimization. This phase applies various optimization ¾return 0;
techniques to the intermediate code to improve the ¾}
performance of the generated machine code. ¾Total Number Of Token?

Syntax Analysis
o Syntax Analysis or Parsing is the second phase, i.e. after lexical
analysis. It checks the syntactical structure of the given input, i.e.
whether the given input is in the correct syntax (of the language in
which the input has been written) or not. It does so by building a
data structure, called a Parse tree or Syntax tree.
o The parse tree is constructed by using the pre-defined Grammar
of the language and the input string. If the given input string can
be produced with the help of the syntax tree (in the derivation
process), the input string is found to be in the correct syntax. if
not, the error is reported by the syntax analyzer.
o The main goal of syntax analysis is to create a parse tree or
abstract syntax tree (AST) of the source code.
Some Parsing Algorithm
o LL parsing: This is a top-down parsing algorithm that starts
with the root of the parse tree and constructs the tree by
successively expanding non-terminals. LL parsing is known
for its simplicity and ease of implementation.
o LR parsing: This is a bottom-up parsing algorithm that starts
with the leaves of the parse tree and constructs the tree by
successively reducing terminals. LR parsing is more powerful
than LL parsing and can handle a larger class of grammars.
Features of syntax analysis
o Syntax Trees: Syntax analysis creates a syntax tree, which is
a hierarchical representation of the code’s structure. The tree
shows the relationship between the various parts of the code,
including statements, expressions, and operators.
o Context-Free Grammar: Syntax analysis uses context-free
grammar to define the syntax of the programming language.
Context-free grammar is a formal language used to describe
the structure of programming languages.
whenever the user wants according to the user needs at run- Assignment Statements source code as input and produces modified source code as
time. • Assignment statements give values to variables and do output. The preprocessor is also known as a macro evaluator,
• Advantages of Heap Allocation arithmetic operations within a command list. the processing is optional that is if any language that does not
• 1. Heap allocation is useful when we have data whose size support #include macros processing is not required.
is not fixed and can change during the run time. • Compiler: The compiler takes the modified code as input and
• 2. We can retain the values of variables even if the produces the target code as output.
activation records end.
• 3. Heap allocation is the most flexible allocation scheme. PARAMETER TRANSMISSION
• Disadvantages of Heap Allocation
• 1. Heap allocation is slower as compared to stack • Parameter transmission is an essential step in compiler Source Compiler Target
allocation. design. It refers to the exchange of information between Program Program
• 2. There is a chance of memory leaks. methods, functions, and procedures. Some mechanism
• Stack Allocation: is commonly known as Dynamic transfers the values from a variable procedure to the called
allocation. Dynamic allocation means the allocation of procedure.
memory at run-time. • Parameters are of two types: Actual Parameters and Formal Error
messages Warnings
Parameters. Let us discuss each of them in detail.

Input-Output

• Actual parameters: are variables that accept data given by • Assembler: The assembler takes the target code as input
the calling process. These variables are listed in the called and produces real locatable machine code as output.
function's definition. They accept the calling process's data. • Linker: Linker or link editor is a program that takes a
The called function's definition contains a list of these collection of objects (created by assemblers and compilers)
variables. and combines them into an executable program.
• Formal parameters: are variables whose values and • Loader: The loader keeps the linked program in the main
functions are given to the called function. These variables memory.
are supplied as parameters in the function call. They must • Executable Code: It is low-level and machine-specific
include the data type. code that the machine can easily understand.

Advantages & Disadvantages Non-Tokens ¾ Code Generation: The final phase of a compiler is code
¾Advantages ¾Comments, preprocessor directive, macros, blanks, tabs, generation. This phase takes the optimized intermediate
¾ Simplifies Parsing: Breaking down the source code into tokens newline, etc. code and generates the actual machine code that can be
makes it easier for computers to understand and work with the ¾Lexeme: The sequence of characters matched by a pattern to executed by the target hardware.
code. This helps programs like compilers or interpreters to figure form the corresponding token or a sequence of input
out what the code is supposed to do. It’s like breaking down a big characters that comprises a single token is called a lexeme.
High Level
puzzle into smaller pieces, which makes it easier to put together eg- “float”, “abs_zero_Kelvin”, “=”, “-”, “273”, “;” Language
and solve. How Lexical Analyzer Works?
¾ Error Detection: Lexical analysis will detect lexical errors such
¾Input preprocessing: This stage involves cleaning up the Lexical Analyzer
as misspelled keywords or undefined symbols early in the
compilation process. This helps in improving the overall input text and preparing it for lexical analysis. This may
efficiency of the compiler or interpreter by identifying errors include removing comments, whitespace, and other non-
sooner rather than later. essential characters from the input text. Syntax Analyzer
¾ Efficiency: Once the source code is converted into tokens, ¾Tokenization: This is the process of breaking the input text
subsequent phases of compilation or interpretation can operate into a sequence of tokens. This is usually done by matching
more efficiently. Parsing and semantic analysis become faster the characters in the input text against a set of patterns or Semantic Analyzer
and more streamlined when working with tokenized input. regular expressions that define the different types of tokens. Symbol Error
¾ Disadvantages ¾Token classification: In this stage, the lexer determines the Handing
Table
¾ Limited Context: Lexical analysis operates based on individual type of each token. For example, in a programming language,
tokens and does not consider the overall context of the code. This the lexer might classify keywords, identifiers, operators, and
Intermediate Code
can sometimes lead to ambiguity or misinterpretation of the punctuation symbols as separate token types. Generator
code’s intended meaning especially in languages with complex ¾Token validation: In this stage, the lexer checks that each
syntax or semantics. token is valid according to the rules of the programming
¾ Overhead: Although lexical analysis is necessary for the language. For example, it might check that a variable name is Code Optimizer
compilation or interpretation process, it adds an extra layer of a valid identifier, or that an operator has the correct syntax. X
overhead. Tokenizing the source code requires additional ¾Output generation: In this final stage, the lexer generates
computational resources which can impact the overall the output of the lexical analysis process, which is typically a Target Code Generation
performance of the compiler or interpreter.
list of tokens. This list of tokens can then be passed to the
¾ Debugging Challenges: Lexical errors detected during the
next stage of compilation or interpretation.
analysis phase may not always provide clear indications of their
Assembly Code
origins in the original source code. Debugging such errors can be
challenging especially if they result from subtle mistakes in the
lexical analysis process.

You might also like