0% found this document useful (0 votes)
4 views

Lexical Analysis

It is all about lexical analysis

Uploaded by

harsh raj chikku
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lexical Analysis

It is all about lexical analysis

Uploaded by

harsh raj chikku
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 12

Lexical Analysis

HARSH RAJ
Galgotias University
Greater Noida
Overview
tokens
source lexical analyzer syntax analyzer
program (scanner) (parser)

symbol table
manager

 Main task: to read input characters and group them into


“tokens.”
 Secondary tasks:
 Skip comments and whitespace;
 Correlate error messages with source program (e.g., line number of error).

Lexical Analysis 2
Overview (cont’d)
Input file Token
sequence
keywd_int
/ * p g m . c * / \n i n
identifier: “main”
t m a i n ( i n t a
left_paren
r g c , c h a r * * a r keywod_int
g v ) { \n \t i n t x , lexical identifier: “argc”
analyzer comma
Y ; \n \t f l o a t w ;
keywd_char
... star
star
identifier: “argv”
right_paren
left_brace
keywd_int

CSc 453: Lexical Analysis 3
Implementing Lexical Analyzers
Different approaches:
 Using a scanner generator, e.g., lex or flex. This automatically
generates a lexical analyzer from a high-level description of the tokens.
(easiest to implement; least efficient)
 Programming it in a language such as C, using the I/O facilities of the
language.
(intermediate in ease, efficiency)
 Writing it in assembly language and explicitly managing the input.
(hardest to implement, but most efficient)

CSc 453: Lexical Analysis 4


Lexical Analysis: Terminology
 token: a name for a set of input strings with related
structure.
Example: “identifier,” “integer constant”
 pattern: a rule describing the set of strings
associated with a token.
Example: “a letter followed by zero or more letters, digits, or
underscores.”
 lexeme: the actual input string that matches a
pattern.
Example: count

Lexical Analysis 5
Examples
Input: count = 123
Tokens:
identifier : Rule: “letter followed by …”
Lexeme: count
assg_op : Rule: =
Lexeme: =
integer_const : Rule: “digit followed by …”
Lexeme: 123

Lexical Analysis 6
Algorithm / Pseudo code
 BEGIN
 Initialize character pointer to the start of the source code
 WHILE not end of source code
 Skip any white spaces and newlines
 IF character is a letter
 Begin identifier or keyword
 WHILE character is a letter or digit
 Add character to current token Advance character
pointer
 END WHILE
 IF current token is a keyword
 Output keyword token
 ELSE CSc 453: Lexical Analysis 7
pseudo
• Output identifier token
• END IF
• ELSE IF
• character is a digit Begin number
• WHILE character is a digit
• Add character to current token
• Advance character pointer
• END WHILE
• Output number token
• ELSE
• Output error token Advance character pointer
• END IF
• END WHILE
Lexical Analysis 8
Regular Expressions
A pattern notation for describing certain kinds
of sets over strings:
Given an alphabet :
  is a regular exp. (denotes the language {})
 for each a  , a is a regular exp. (denotes the language
{a})
 if r and s are regular exps. denoting L(r) and L(s)
respectively, then so are:
 (r) | (s) ( denotes the language L(r)  L(s) )
 (r)(s) ( denotes the language L(r)L(s) )
 (r)* ( denotes the language L(r)* )

Lexical Analysis 9
Working of Lexical Analyzer

Lexical Analysis 10
Conclusion
Content:
•Key Takeaways:
• Definition
• Purpose
• Components
•Importance in Compiler Design:
• Error Detection
• Efficiency
• Foundation for Parsing
•Practical Considerations:
• Token Definitions
• Handling Errors
• Tools and Libraries

CSc 453: Lexical Analysis 11


Thank You!

CSc 453: Lexical Analysis 12

You might also like