Lexical Analysis
Lexical Analysis
Lexical Analysis
(Scanning)
1. THE ROLE OF LEXICAL ANALYZER
tokens
source lexical analyzer syntax analyzer
program (scanner) (parser)
symbol table
manager
Main task: to read input characters and group them into
“tokens.”
Secondary tasks:
Skip comments and whitespace;
Correlate error messages with source program (e.g., line number of error).
Different approaches for Implementing Lexical
Analyzers:
Using a scanner generator, e.g., lex or flex. This automatically
generates a lexical analyzer from a high-level description of the tokens.
(easiest to implement; least efficient)
Programming it in a language such as C, using the I/O facilities of the
language.
(intermediate in ease, efficiency)
Writing it in assembly language and explicitly managing the input.
(hardest to implement, but most efficient)
token: a name for a set of input strings with related
structure.
Example: “identifier,” “integer constant”
pattern: a rule describing the set of strings
associated with a token.
Example: “a letter followed by zero or more letters, digits, or
underscores.”
lexeme: the actual input string that matches a
pattern.
Example: count
Examples
Input: count = 123
Tokens:
identifier : Rule: “letter followed by …”
Lexeme: count
assg_op : Rule: =
Lexeme: =
integer_const : Rule: “digit followed by …”
Lexeme: 123
If more than one lexeme can match the pattern for a
token, the scanner must indicate the actual lexeme
that matched.
This information is given using an attribute
associated with the token.
Example: The program statement
count = 123
yields the following token-attribute pairs:
identifier, pointer to the string “count”
assg_op,
integer_const, the integer value 123
2. Input Buffering Scheme
LEX tool
Considering the language generated by the
following grammar for the recognition of the tokens by
Lexical Analyzer. The grammar is
The following is a LEX program that recognizes the
tokens of various categories like white space,
identifier, number, relational operators, and
keywrods:if, then, else
9. Design of scanner or Lexical Analyzer generator
In next step convert this compound NFA to DFA for
recognizing the tokens by LA.
First construct NFA for each pattern Pi in LEX program.
Then construct compound NFA which recognizes all string
represented by all the patterns.
Pattern P1
Pattern P2
Pattern P3
Converting the above NFA to DFA –
The starting state A of DFA is composed of {0,1,3,7} NFA States as E-closure(0)
{0,1,3,7 }
A=
B=
C=
D=
E=
F=