Unit-2.Compiler Design_ Lexical Analysis
Unit-2.Compiler Design_ Lexical Analysis
Lexical Analysis
Symbol
table
Switch (*forward++) {
case eof:
if (forward is at end of first buffer) {
reload second buffer;
forward = beginning of second buffer;
}
else if {forward is at end of second buffer) {
reload first buffer;\
forward = beginning of first buffer;
}
else /* eof within a buffer marks the end of input */
terminate lexical analysis;
break;
cases for the other characters;
} Compiler Design by Varun Arora 13
Specification of tokens
In theory of compilation regular expressions are used
to formalize the specification of tokens
Regular expressions are means for specifying regular
languages
Example:
Letter_(letter_ | digit)*
Each regular expression is a pattern specifying the
form of strings
Example:
letter_ -> A | B | … | Z | a | b | … | Z | _
digit -> 0 | 1 | … | 9
id -> letter_ (letter_ | digit)*
Example:
letter_ -> [A-Za-z_]
digit -> [0-9]
id -> letter_(letter|digit)*
lex.yy.c
C a.out
compiler
Sequence
Input stream a.out
of tokens
declarations
%%
translation rules Pattern {Action}
%%
auxiliary functions
%%
{ws} {/* no action and no return */}
if {return(IF);}
then {return(THEN);}
else {return(ELSE);}
{id} {yylval = (int) installID(); return(ID); }
{number} {yylval = (int) installNum(); return(NUMBER);}
…
Compiler Design by Varun Arora 27
Finite Automata
Regular expressions = specification
Finite automata = implementation
If end of input
If in accepting state => accept, othewise => reject
If no transition possible => reject
• An accepting state
a
• A transition
1 0
0 0
1
1
0 1
• Input: 1 0 1
1 0
0 0
DFA
1
1
NFA
Regular
expressions DFA
Lexical Table-driven
Specification Implementation of DFA
• For
• For input a
a
• For A | B
B
A
A
C
1
E
1
0
A B G H I J
D F
Regular
expressions DFA
Lexical Table-driven
Specification Implementation of DFA
C 1 E
1
0
A B G H I J
D F
0
0 FGABCDHI
0 1
ABCDHI
1
1 EJGABCDHI
0 1
S T U
T T U
U T U