0% found this document useful (0 votes)
120 views

Lab Session 1 - Lexical Analyzer

This document discusses constructing and using a lexical analyzer. It begins by introducing regular expressions that will be used to build a lexical analyzer. Next, it outlines the steps to construct a lexical analyzer from the regular expressions using a tool called LexLab. These steps include loading a grammar, generating an NFA and DFA, and viewing the transition tables. The document then demonstrates how to use the constructed lexical analyzer to scan input strings and view the resulting symbol table. Finally, it discusses constructing a lexical analyzer for a subset of Java called simpleJava and provides the EBNF grammar for simpleJava.

Uploaded by

Nasasira Julius
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views

Lab Session 1 - Lexical Analyzer

This document discusses constructing and using a lexical analyzer. It begins by introducing regular expressions that will be used to build a lexical analyzer. Next, it outlines the steps to construct a lexical analyzer from the regular expressions using a tool called LexLab. These steps include loading a grammar, generating an NFA and DFA, and viewing the transition tables. The document then demonstrates how to use the constructed lexical analyzer to scan input strings and view the resulting symbol table. Finally, it discusses constructing a lexical analyzer for a subset of Java called simpleJava and provides the EBNF grammar for simpleJava.

Uploaded by

Nasasira Julius
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Lab Session 1: Constructing and Using a

Lexical Analyzer

1 Introduction
In this lab session, we look at how to generate a lexical analyzer using LexLab, view the
generated NFA and DFA, and then use the generated analyzer to scan some input strings and
view the output symbol table.

To do this we shall use the following simple abstract example that contains three regular
expressions: r1 = a, r2 = abb, r3 = a*b+
Next, we shall construct a lexical analyzer for simpleJava --- a small subset of Java language ---
which is a more realistic example, and then use the constructed lexical analyzer to scan some
realistic input strings.

2 Constructing a Lexical Analyzer from Regular Expressions To proceed with


the construction of a lexical analyzer you need to define a lexical grammar, from the three regular
expressions, i.e., r1 = a, r2 = abb, r3 = a*b+. Obtain a copy of the grammar from the file
“regExp.txt” in the LexLab\dist folder. To construct the lexical analyzer:
1) Double click the executable jar file “Scanning.jar” from the LexLab folder. A simple
GUI opens. See Fig 1: for sample GUI.
2) Load the grammar into the grammar text editor. Note that for this example the
grammar is already there.
3) Parse the grammar by using the Commit button.
4) Generate NFA by using Grammar2NFA button.
5) Generate DFA by using NFA2DFA button.
Note: The NFA and the corresponding DFA transition tables have been generated and can be
viewed from the NFA and DFA tab sheets. The lexical analyzer for our regular expressions is
now constructed.

3 Scanning with the Lexical Analyzer


Follow the following steps:
1) Enter a string “aabbba” in the text editor, click ScanText. The lexical analyzer reads the string
and produces a list of symbols in Symbol Table tab sheet. The table contains the following
fields: ∙ SymCode - a unique code of a symbol.
∙ SymName - name of the symbol.
∙ SymValue - value of the symbol
∙ SymStart - start position of a symbol in the input string.
∙ SymLength - length of the value of the symbol.
From string “aabbba" the table shows that the lexical analyzer has recognized two sym- bols,
r3=”aabbb" and r1=`a', because the rule is to take the longest match. The last symbol
“endmarker" is a special symbol which signals the end of input to the analyzer.

1
2) Edit the text editor value to string “abb". After scanning the string, note that the symbol
list is changed. It shows that the lexical analyzer has recognized one symbol, r2=”abb".
Note that the grammar we defined contains some conflict resolutions which we
discussed in class. String “abb" matches both regular expressions r2 and r3 but it is considered
as a symbol for r2, since expression r2 is listed first in the grammar. 3) Again, edit the text
editor value to string “ababcdb". Note that the symbol list is changed. Note that the lexical
analyzer has recognized an error, i.e. a substring “cd" of length 2, starting at position 4 in the
input string.

4 Constructing a Lexical Analyzer for simpleJava


To proceed with the construction, follow similar steps as presented in Step 2 above. Obtain a
copy of the lexical grammar for simpleJava from a file “simpleJava.txt". After completing the
process in Step 2 above, the grammar, NFA and DFA transition tables will be updated and the
lexical analyzer for simpleJava is ready for use.
Note: The transition tables become extremely large depending on the set of tokens of a
language. As a result, they may not be interesting to look at.

Now, edit the text editor value to the following Java program text:

class pay{
int items;
int pay;

void computePay(){
if(item<10)
pay=1000;
else
pay=items * 10;
}
}

When the Java program is scanned the symbol table is updated


immediately with a long list of symbols that have been recognized.

Note: Practice by editing the above program; try to include right and wrong tokens. Note down
what you observe.

2
Fig 1: GUI of LexLab showing a sample lexical grammar (top-left), input
string (bottom-left) and the generated list of symbols (right).

3
SimpleJava EBNF
ClassDeclaration=”class” Identifier “{“ VarDeclaration* MethodDeclaration*
”}” VarDeclaration= Type Identifier “;”
MethodDeclaration= Type Identifier “(” ”)” “{” Statement* ”}”
Type=int|boolean
Statement=”{“ Statement* “}”
| Identifier “=” Expression”;”
| ”if” “(“ Expression ”)” Statement “else” Statement
Expression= Expression (“<”|” >”|”+”|”-“|”*”) Expression
| “true”
| “false”
| Identifier
| Number

Lexical Aspects
An identifier is a sequence of letters (lower and upper) and digits starting with a
letter. A number is a sequence of digits 0 to 9
A binary operator is any of the following binary operators: <, >, +, -,*

You might also like