Compiler Construction CSEC325 Token
Compiler Construction CSEC325 Token
Syntax Analysis
o Syntax Analysis or Parsing is the second phase, i.e. after lexical
analysis. It checks the syntactical structure of the given input, i.e.
whether the given input is in the correct syntax (of the language in
which the input has been written) or not. It does so by building a
data structure, called a Parse tree or Syntax tree.
o The parse tree is constructed by using the pre-defined Grammar
of the language and the input string. If the given input string can
be produced with the help of the syntax tree (in the derivation
process), the input string is found to be in the correct syntax. if
not, the error is reported by the syntax analyzer.
o The main goal of syntax analysis is to create a parse tree or
abstract syntax tree (AST) of the source code.
Some Parsing Algorithm
o LL parsing: This is a top-down parsing algorithm that starts
with the root of the parse tree and constructs the tree by
successively expanding non-terminals. LL parsing is known
for its simplicity and ease of implementation.
o LR parsing: This is a bottom-up parsing algorithm that starts
with the leaves of the parse tree and constructs the tree by
successively reducing terminals. LR parsing is more powerful
than LL parsing and can handle a larger class of grammars.
Features of syntax analysis
o Syntax Trees: Syntax analysis creates a syntax tree, which is
a hierarchical representation of the code’s structure. The tree
shows the relationship between the various parts of the code,
including statements, expressions, and operators.
o Context-Free Grammar: Syntax analysis uses context-free
grammar to define the syntax of the programming language.
Context-free grammar is a formal language used to describe
the structure of programming languages.
whenever the user wants according to the user needs at run- Assignment Statements source code as input and produces modified source code as
time. • Assignment statements give values to variables and do output. The preprocessor is also known as a macro evaluator,
• Advantages of Heap Allocation arithmetic operations within a command list. the processing is optional that is if any language that does not
• 1. Heap allocation is useful when we have data whose size support #include macros processing is not required.
is not fixed and can change during the run time. • Compiler: The compiler takes the modified code as input and
• 2. We can retain the values of variables even if the produces the target code as output.
activation records end.
• 3. Heap allocation is the most flexible allocation scheme. PARAMETER TRANSMISSION
• Disadvantages of Heap Allocation
• 1. Heap allocation is slower as compared to stack • Parameter transmission is an essential step in compiler Source Compiler Target
allocation. design. It refers to the exchange of information between Program Program
• 2. There is a chance of memory leaks. methods, functions, and procedures. Some mechanism
• Stack Allocation: is commonly known as Dynamic transfers the values from a variable procedure to the called
allocation. Dynamic allocation means the allocation of procedure.
memory at run-time. • Parameters are of two types: Actual Parameters and Formal Error
messages Warnings
Parameters. Let us discuss each of them in detail.
Input-Output
• Actual parameters: are variables that accept data given by • Assembler: The assembler takes the target code as input
the calling process. These variables are listed in the called and produces real locatable machine code as output.
function's definition. They accept the calling process's data. • Linker: Linker or link editor is a program that takes a
The called function's definition contains a list of these collection of objects (created by assemblers and compilers)
variables. and combines them into an executable program.
• Formal parameters: are variables whose values and • Loader: The loader keeps the linked program in the main
functions are given to the called function. These variables memory.
are supplied as parameters in the function call. They must • Executable Code: It is low-level and machine-specific
include the data type. code that the machine can easily understand.
Advantages & Disadvantages Non-Tokens ¾ Code Generation: The final phase of a compiler is code
¾Advantages ¾Comments, preprocessor directive, macros, blanks, tabs, generation. This phase takes the optimized intermediate
¾ Simplifies Parsing: Breaking down the source code into tokens newline, etc. code and generates the actual machine code that can be
makes it easier for computers to understand and work with the ¾Lexeme: The sequence of characters matched by a pattern to executed by the target hardware.
code. This helps programs like compilers or interpreters to figure form the corresponding token or a sequence of input
out what the code is supposed to do. It’s like breaking down a big characters that comprises a single token is called a lexeme.
High Level
puzzle into smaller pieces, which makes it easier to put together eg- “float”, “abs_zero_Kelvin”, “=”, “-”, “273”, “;” Language
and solve. How Lexical Analyzer Works?
¾ Error Detection: Lexical analysis will detect lexical errors such
¾Input preprocessing: This stage involves cleaning up the Lexical Analyzer
as misspelled keywords or undefined symbols early in the
compilation process. This helps in improving the overall input text and preparing it for lexical analysis. This may
efficiency of the compiler or interpreter by identifying errors include removing comments, whitespace, and other non-
sooner rather than later. essential characters from the input text. Syntax Analyzer
¾ Efficiency: Once the source code is converted into tokens, ¾Tokenization: This is the process of breaking the input text
subsequent phases of compilation or interpretation can operate into a sequence of tokens. This is usually done by matching
more efficiently. Parsing and semantic analysis become faster the characters in the input text against a set of patterns or Semantic Analyzer
and more streamlined when working with tokenized input. regular expressions that define the different types of tokens. Symbol Error
¾ Disadvantages ¾Token classification: In this stage, the lexer determines the Handing
Table
¾ Limited Context: Lexical analysis operates based on individual type of each token. For example, in a programming language,
tokens and does not consider the overall context of the code. This the lexer might classify keywords, identifiers, operators, and
Intermediate Code
can sometimes lead to ambiguity or misinterpretation of the punctuation symbols as separate token types. Generator
code’s intended meaning especially in languages with complex ¾Token validation: In this stage, the lexer checks that each
syntax or semantics. token is valid according to the rules of the programming
¾ Overhead: Although lexical analysis is necessary for the language. For example, it might check that a variable name is Code Optimizer
compilation or interpretation process, it adds an extra layer of a valid identifier, or that an operator has the correct syntax. X
overhead. Tokenizing the source code requires additional ¾Output generation: In this final stage, the lexer generates
computational resources which can impact the overall the output of the lexical analysis process, which is typically a Target Code Generation
performance of the compiler or interpreter.
list of tokens. This list of tokens can then be passed to the
¾ Debugging Challenges: Lexical errors detected during the
next stage of compilation or interpretation.
analysis phase may not always provide clear indications of their
Assembly Code
origins in the original source code. Debugging such errors can be
challenging especially if they result from subtle mistakes in the
lexical analysis process.