Lab Session 1 - Lexical Analyzer
Lab Session 1 - Lexical Analyzer
Lexical Analyzer
1 Introduction
In this lab session, we look at how to generate a lexical analyzer using LexLab, view the
generated NFA and DFA, and then use the generated analyzer to scan some input strings and
view the output symbol table.
To do this we shall use the following simple abstract example that contains three regular
expressions: r1 = a, r2 = abb, r3 = a*b+
Next, we shall construct a lexical analyzer for simpleJava --- a small subset of Java language ---
which is a more realistic example, and then use the constructed lexical analyzer to scan some
realistic input strings.
1
2) Edit the text editor value to string “abb". After scanning the string, note that the symbol
list is changed. It shows that the lexical analyzer has recognized one symbol, r2=”abb".
Note that the grammar we defined contains some conflict resolutions which we
discussed in class. String “abb" matches both regular expressions r2 and r3 but it is considered
as a symbol for r2, since expression r2 is listed first in the grammar. 3) Again, edit the text
editor value to string “ababcdb". Note that the symbol list is changed. Note that the lexical
analyzer has recognized an error, i.e. a substring “cd" of length 2, starting at position 4 in the
input string.
Now, edit the text editor value to the following Java program text:
class pay{
int items;
int pay;
void computePay(){
if(item<10)
pay=1000;
else
pay=items * 10;
}
}
Note: Practice by editing the above program; try to include right and wrong tokens. Note down
what you observe.
2
Fig 1: GUI of LexLab showing a sample lexical grammar (top-left), input
string (bottom-left) and the generated list of symbols (right).
3
SimpleJava EBNF
ClassDeclaration=”class” Identifier “{“ VarDeclaration* MethodDeclaration*
”}” VarDeclaration= Type Identifier “;”
MethodDeclaration= Type Identifier “(” ”)” “{” Statement* ”}”
Type=int|boolean
Statement=”{“ Statement* “}”
| Identifier “=” Expression”;”
| ”if” “(“ Expression ”)” Statement “else” Statement
Expression= Expression (“<”|” >”|”+”|”-“|”*”) Expression
| “true”
| “false”
| Identifier
| Number
Lexical Aspects
An identifier is a sequence of letters (lower and upper) and digits starting with a
letter. A number is a sequence of digits 0 to 9
A binary operator is any of the following binary operators: <, >, +, -,*