0% found this document useful (0 votes)
21 views36 pages

Unit-2 Lexical Analysis

The document outlines the key concepts of lexical analysis in compiler design, including the interaction between the scanner and parser, the definitions of tokens, patterns, and lexemes, and input buffering techniques. It also covers regular expressions, their operations, and examples of various regular expressions. The content is structured to provide a comprehensive understanding of how lexical analyzers function and the theoretical foundations behind them.

Uploaded by

devanshisoni2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views36 pages

Unit-2 Lexical Analysis

The document outlines the key concepts of lexical analysis in compiler design, including the interaction between the scanner and parser, the definitions of tokens, patterns, and lexemes, and input buffering techniques. It also covers regular expressions, their operations, and examples of various regular expressions. The content is structured to provide a comprehensive understanding of how lexical analyzers function and the theoretical foundations behind them.

Uploaded by

devanshisoni2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

9/13/2023

Compiler Design (CD)


GTU # 2170701

Unit – 2
Lexical Analyzer

Topics to be covered
 Looping
• Interaction of scanner & parser
• Token, Pattern & Lexemes
• Input buffering
• Specification of tokens
• Regular expression & Regular definition
• Transition diagram
• Finite automata
• Regular expression to NFA using Thompson's rule
• Conversion from NFA to DFA using subset construction method
• DFA optimization
• Conversion from regular expression to DFA using Syntax Tree method

1
9/13/2023

Interaction of scanner & parser


Token
Source Lexical
Parser
Program Analyzer
Get next token

Symbol Table

 Upon receiving a “Get next token” command from parser, the lexical analyzer reads the input
character until it can identify the next token.
 Lexical analyzer also stripping out comments and white space in the form of blanks, tabs, and
newline characters from the source program.

2
9/13/2023

Token, Pattern & Lexemes


Token Pattern
Sequence of character having a collective The set of rules called pattern associated
meaning is known as token. with a token.
Categories of Tokens: Example: “non-empty sequence of digits”,
“letter followed by letters and digits”
1. Identifier
2. Keyword
Lexemes
3. Operator
The sequence of character in a source
4. Special symbol
program matched with a pattern for a token
5. Constant is called lexeme.
Example: Rate, DIET, count, Flag

Example: Token, Pattern & Lexemes


Example: total = sum + 45
Tokens:
total Identifier1

= Operator1

sum Identifier2 Tokens

+ Operator2

45 Constant1

Lexemes
Lexemes of identifier: total, sum
Lexemes of operator: =, +
Lexemes of constant: 45

3
9/13/2023

Input buffering
 Lexical Analysis has to access secondary memory each time to identify tokens.
 It is time-consuming and costly. So, the input strings are stored into a buffer and then scanned
by Lexical Analysis.
 Lexical Analysis scans input string from left to right one character at a time to identify tokens. It
uses two pointers to scan tokens −
 Begin Pointer (bp) − It points to the beginning of the string to be read.
 Forward Pointer (fp) − It moves ahead to search for the end of the lexeme.

4
9/13/2023

Input buffering
 Initially both the pointers point to the first character of the input string.
 The forward pointer moves ahead to search for end of lexeme.
 As soon as the blank space is encountered, it indicates end of lexeme.
 In above example as soon as fp encounters a blank space the lexeme “int” is identified.
 The fp will be moved ahead at white space, when fp encounters white space, it ignore and moves
ahead.
 Then both the begin ptr(bp) and forward ptr(fp) are set at next token.
 In this scheme, only one buffer is used to store the input string but the problem with this scheme
is that if lexeme is very long then it crosses the buffer boundary, to scan rest of the lexeme the
buffer has to be refilled, that makes overwriting the first of lexeme.

Input buffering
 There are mainly two techniques for input buffering:
1. Buffer pairs
2. Sentinels

Buffer Pair

 The lexical analysis scans the input string from left to right one character at a time.
 Buffer divided into two N-character halves, where N is the number of character on one disk block.

: : : E : : = : : Mi : * : : : C: * : * : 2 : eof : : :

5
9/13/2023

Buffer pairs
: : : E : : = : : Mi : * : : : C: * : * : 2 : eof : : :

forward forward
lexeme_beginnig

 Pointer Lexeme Begin, marks the beginning of the current lexeme.


 Pointer Forward, scans ahead until a pattern match is found.
 Once the next lexeme is determined, forward is set to character at its right end.
 Lexeme Begin is set to the character immediately after the lexeme just found.
 If forward pointer is at the end of first buffer half then second is filled with N input character.
 If forward pointer is at the end of second buffer half then first is filled with N input character.

Buffer pairs
: : : E : : = : : Mi : * : : : C: * : * : 2 : eof : : :

forward forward forward


lexeme_beginnig
Code to advance forward pointer
if forward at end of first half then begin
reload second half;
forward := forward + 1;
end
else if forward at end of second half then begin
reload first half;
move forward to beginning of first half;
end
else forward := forward + 1;

6
9/13/2023

Sentinels

: : E : : = : : Mi : * : eof : C: * : * : 2 : eof : : eof

forward
lexeme_beginnig

 In buffer pairs we must check, that one half of the buffer has not moved off. If it is done, then the
other half must be reloaded.
 Thus, for each character read, we make two tests.
 We can combine the buffer-end test with the test for the current character.
 We can reduce the two tests to one if we extend each buffer to hold a sentinel character at the
end.
 The sentinel is a special character that cannot be part of the source program, and a natural
choice is the character EOF.

Sentinels
: : E : : = : : Mi : * : eof : C: * : * : 2 : eof : : eof

forward forward forward


lexeme_beginnig
forward := forward + 1;
if forward == eof then begin
if forward at end of first half then begin
reload second half;
forward := forward + 1;
end
else if forward at the second half then begin
reload first half;
move forward to beginning of first half;
end
else terminate lexical analysis;
end

7
9/13/2023

Regular expression
 A regular expression is a sequence of characters that define a pattern.

Notational shorthand's

1. One or more instances: +

2. Zero or more instances: *

3. Zero or one instances: ?

4. Alphabets: Σ

8
9/13/2023

Operations on languages
Operation Definition
Union of L and M 𝐿 𝑈 𝑀 = {𝑠 | 𝑠 𝑖𝑠 𝑖𝑛 𝐿 𝑜𝑟 𝑠 𝑖𝑠 𝑖𝑛 𝑀 }
Written L U M
Concatenation of L
and M 𝐿𝑀 = {𝑠𝑡 | 𝑠 𝑖𝑠 𝑖𝑛 𝐿 𝑎𝑛𝑑 𝑡 𝑖𝑠 𝑖𝑛 𝑀 }
Written LM
Kleene closure of L 𝐿

𝑑𝑒𝑛𝑜𝑡𝑒𝑠 “𝑧𝑒𝑟𝑜 𝑜𝑟 𝑚𝑜𝑟𝑒 𝑐𝑜𝑛𝑐𝑎𝑡𝑒𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓” 𝐿.
Written L∗
Positive closure of L 𝐿
+
𝑑𝑒𝑛𝑜𝑡𝑒𝑠 “𝑜𝑛𝑒 𝑜𝑟 𝑚𝑜𝑟𝑒 𝑐𝑜𝑛𝑐𝑎𝑡𝑒𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓” 𝐿.
Written L+

Regular expression examples


1. 0 or 1
𝐒𝐭𝐫𝐢𝐧𝐠𝐬: 𝟎, 𝟏 𝐑. 𝐄. = 𝟎 | 𝟏

2. 0 or 11 or 111
𝐒𝐭𝐫𝐢𝐧𝐠𝐬: 𝟎, 𝟏𝟏, 𝟏𝟏𝟏 𝐑. 𝐄. = 𝟎 𝟏𝟏 𝟏𝟏𝟏

3. String having zero or more a.



𝐒𝐭𝐫𝐢𝐧𝐠𝐬: 𝛜, 𝐚, 𝐚𝐚, 𝐚𝐚𝐚, 𝐚𝐚𝐚𝐚 … . . 𝐑. 𝐄. = 𝐚

4. String having one or more a.


+
𝐒𝐭𝐫𝐢𝐧𝐠𝐬: 𝐚, 𝐚𝐚, 𝐚𝐚𝐚, 𝐚𝐚𝐚𝐚 … . . 𝐑. 𝐄. = 𝐚

5. Regular expression over Σ = {𝑎, 𝑏, 𝑐} that represent all string of length 3.


𝐒𝐭𝐫𝐢𝐧𝐠𝐬: 𝐚𝐛𝐜, 𝐛𝐜𝐚, 𝐛𝐛𝐛, 𝐜𝐚𝐛, 𝐚𝐛𝐚 … . 𝐑. 𝐄. = 𝐚 𝐛 𝐜 𝐚 𝐛 𝐜 (𝐚 𝐛 𝐜)
6. All binary string
𝐒𝐭𝐫𝐢𝐧𝐠𝐬: 𝟎, 𝟏𝟏, 𝟏𝟎𝟏, 𝟏𝟎𝟏𝟎𝟏, 𝟏𝟏𝟏𝟏 … 𝐑. 𝐄. = (𝟎 | 𝟏)+

9
9/13/2023

Regular expression examples


7. 0 or more occurrence of either a or b or both
𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝜖, 𝒂, 𝒂𝒂, 𝒂𝒃𝒂𝒃, 𝒃𝒂𝒃 … 𝑹. 𝑬. = (𝒂 | 𝒃) ∗

8. 1 or more occurrence of either a or b or both


𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝒂, 𝒂𝒂, 𝒂𝒃𝒂𝒃, 𝒃𝒂𝒃, 𝒃𝒃𝒃𝒂𝒂𝒂 … 𝑹. 𝑬. = (𝒂 | 𝒃)+

9. Binary no. ends with 0


𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝟎, 𝟏𝟎, 𝟏𝟎𝟎, 𝟏𝟎𝟏𝟎, 𝟏𝟏𝟏𝟏𝟎 … 𝑹. 𝑬. = (𝟎 | 𝟏)* 𝟎

10. Binary no. ends with 1


𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝟏, 𝟏𝟎𝟏, 𝟏𝟎𝟎𝟏, 𝟏𝟎𝟏𝟎𝟏, … 𝑹. 𝑬. = (𝟎 | 𝟏) ∗ 𝟏

11. Binary no. starts and ends with 1


𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝟏𝟏, 𝟏𝟎𝟏, 𝟏𝟎𝟎𝟏, 𝟏𝟎𝟏𝟎𝟏, … 𝑹. 𝑬. = 𝟏 (𝟎 | 𝟏) ∗ 𝟏

12. String starts and ends with same character


𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝟎𝟎, 𝟏𝟎𝟏, 𝒂𝒃𝒂, 𝒃𝒂𝒂𝒃 … 𝑹. 𝑬. = 𝟏 (𝟎 | 𝟏) ∗ 𝟏 𝐨𝐫 𝟎 (𝟎 | 𝟏) ∗ 𝟎
∗ ∗
𝒂 (𝒂 | 𝒃) 𝒂 𝐨𝐫 𝒃 (𝒂 | 𝒃) 𝒃

Regular expression examples


13. All string of a and b starting with a
𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝒂, 𝒂𝒃, 𝒂𝒂𝒃, 𝒂𝒃𝒃… 𝑹. 𝑬. = 𝒂(𝒂 | 𝒃)*

14. String of 0 and 1 ends with 00


𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝟎𝟎, 𝟏𝟎𝟎, 𝟎𝟎𝟎, 𝟏𝟎𝟎𝟎, 𝟏𝟏𝟎𝟎… 𝑹. 𝑬. = (𝟎 | 𝟏) ∗ 𝟎𝟎

15. String ends with abb


𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝒂𝒃𝒃, 𝒃𝒂𝒃𝒃, 𝒂𝒃𝒂𝒃𝒃… 𝑹. 𝑬. = (𝒂 | 𝒃) ∗ 𝒂𝒃𝒃

16. String starts with 1 and ends with 0


𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝟏𝟎, 𝟏𝟎𝟎, 𝟏𝟏𝟎, 𝟏𝟎𝟎𝟎, 𝟏𝟏𝟎𝟎… 𝑹. 𝑬. = 𝟏(𝟎 | 𝟏) ∗ 𝟎

17. All binary string with at least 3 characters and 3rd character should be zero
𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝟎𝟎𝟎, 𝟏𝟎𝟎, 𝟏𝟏𝟎𝟎, 𝟏𝟎𝟎𝟏… 𝑹. 𝑬. = 𝟎 𝟏 𝟎 𝟏 𝟎(𝟎 | 𝟏) ∗

18. Language which consist of exactly two b’s over the set Σ = {𝑎, 𝑏}
∗ ∗ ∗
𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝒃𝒃, 𝒃𝒂𝒃, 𝒂𝒂𝒃𝒃, 𝒂𝒃𝒃𝒂… 𝑹. 𝑬. = 𝒂 𝒃 𝒂 𝒃 𝒂

10
9/13/2023

Regular expression examples


19. The language with Σ = {𝑎, 𝑏} such that 3rd character from right end of the string is always a.
𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝒂𝒂𝒂, 𝒂𝒃𝒂, 𝒂𝒂𝒃𝒂, 𝒂𝒃𝒃… 𝑹. 𝑬. = (𝒂 | 𝒃) ∗ 𝒂(𝒂|𝒃)(𝒂|𝒃)

19. Any no. of 𝑎 followed by any no. of 𝑏 followed by any no. of 𝑐


∗ ∗ ∗
𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝜖, 𝒂𝒃𝒄, 𝒂𝒂𝒃𝒃𝒄𝒄, 𝒂𝒂𝒃𝒄, 𝒂𝒃𝒃… 𝑹. 𝑬. = 𝒂 𝒃 𝒄

20. String should contain at least three 1


𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝟏𝟏𝟏, 𝟎𝟏𝟏𝟎𝟏, 𝟎𝟏𝟎𝟏𝟏𝟏𝟎…. 𝑹. 𝑬. = (𝟎|𝟏)∗ 𝟏 (𝟎|𝟏)∗ 𝟏 (𝟎|𝟏)∗ 𝟏 (𝟎|𝟏)∗

21. String should contain exactly two 1


𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝟏𝟏, 𝟎𝟏𝟎𝟏, 𝟏𝟏𝟎𝟎, 𝟎𝟏𝟎𝟎𝟏𝟎, 𝟏𝟎𝟎𝟏𝟎𝟎…. 𝑹. 𝑬. = 𝟎∗ 𝟏𝟎∗ 𝟏𝟎∗

22. Length of string should be at least 1 and at most 3


𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝟎, 𝟏, 𝟏𝟏, 𝟎𝟏, 𝟏𝟏𝟏, 𝟎𝟏𝟎, 𝟏𝟎𝟎…. 𝑹. 𝑬. = 𝟎|𝟏 𝟎|𝟏 𝟎|𝟏 𝟎|𝟏 𝟎|𝟏 𝟎|𝟏
23. No. of zero should be multiple of 3
𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝟎𝟎𝟎, 𝟎𝟏𝟎𝟏𝟎𝟏, 𝟏𝟏𝟎𝟏𝟎𝟎, 𝟎𝟎𝟎𝟎𝟎𝟎, 𝟏𝟎𝟎𝟎𝟏𝟎𝟎𝟏𝟎…. 𝑹. 𝑬. = (𝟏∗ 𝟎𝟏∗ 𝟎𝟏∗ 𝟎𝟏∗ )∗

Regular expression examples


24. The language with Σ = {𝑎, 𝑏, 𝑐} where 𝑎 should be multiple of 3
𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝒂𝒂𝒂, 𝒃𝒂𝒂𝒂, 𝒃𝒂𝒄𝒂𝒃𝒂, 𝒂𝒂𝒂𝒂𝒂𝒂. . 𝑹. 𝑬. = ( 𝒃|𝒄 ∗ 𝒂 𝒃|𝒄 ∗ 𝒂 𝒃|𝒄 ∗ 𝒂 𝒃|𝒄 ∗ )∗

25. Even no. of 0


𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝟎𝟎, 𝟎𝟏𝟎𝟏, 𝟎𝟎𝟎𝟎, 𝟏𝟎𝟎𝟏𝟎𝟎…. 𝑹. 𝑬. = (𝟏∗ 𝟎𝟏∗ 𝟎𝟏∗ )∗

26. String should have odd length


𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝟎, 𝟎𝟏𝟎, 𝟏𝟏𝟎, 𝟎𝟎𝟎, 𝟏𝟎𝟎𝟏𝟎…. 𝑹. 𝑬. = 𝟎|𝟏 ( 𝟎 𝟏 (𝟎|𝟏))∗
27. String should have even length
𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝟎𝟎, 𝟎𝟏𝟎𝟏, 𝟎𝟎𝟎𝟎, 𝟏𝟎𝟎𝟏𝟎𝟎…. 𝑹. 𝑬. = ( 𝟎 𝟏 (𝟎|𝟏))∗
28. String start with 0 and has odd length
𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝟎, 𝟎𝟏𝟎, 𝟎𝟏𝟎, 𝟎𝟎𝟎, 𝟎𝟎𝟎𝟏𝟎…. 𝑹. 𝑬. = 𝟎 ( 𝟎 𝟏 (𝟎|𝟏))∗
30. String start with 1 and has even length
𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝟏𝟎, 𝟏𝟏𝟎𝟎, 𝟏𝟎𝟎𝟎, 𝟏𝟎𝟎𝟏𝟎𝟎…. 𝑹. 𝑬. = 𝟏(𝟎|𝟏)( 𝟎 𝟏 (𝟎|𝟏))∗

31. All string begins or ends with 00 or 11



𝑺𝒕𝒓𝒊𝒏𝒈𝒔: 𝟎𝟎𝟏𝟎𝟏, 𝟏𝟎𝟏𝟎𝟎, 𝟏𝟏𝟎, 𝟎𝟏𝟎𝟏𝟏 … 𝑹. 𝑬. = (𝟎𝟎|𝟏𝟏)(𝟎 | 𝟏) ∗ | 𝟎 𝟏 (𝟎𝟎|𝟏𝟏)

11
9/13/2023

Regular definition
 A regular definition gives names to certain regular expressions and uses those names in other
regular expressions.
 Regular definition is a sequence of definitions of the form:
𝑑1 → 𝑟1
𝑑2 → 𝑟2
……
𝑑𝑛 → 𝑟𝑛
Where 𝑑𝑖 is a distinct name & 𝑟𝑖 is a regular expression.
 Example: Regular definition for identifier
letter  A|B|C|………..|Z|a|b|………..|z
digit  0|1|…….|9|
id letter (letter | digit)*

Transition Diagram
 A stylized flowchart is called transition diagram.

is a state

is a transition

is a start state

is a final state

12
9/13/2023

Finite Automata
 Finite Automata are recognizers.
 FA simply say “Yes” or “No” about each possible input string.
 Finite Automata is a mathematical model consist of:
1. Set of states 𝑺
2. Set of input symbol 𝜮
3. A transition function move
4. Initial state 𝑺𝟎
5. Final states or accepting states 𝐅

13
9/13/2023

Types of finite automata


 Types of finite automata are:
DFA
b

 Deterministic finite automata (DFA): have for


each state exactly one edge leaving out for a b b
1 2 3 4
each symbol.
a
a
b a
NFA DFA
 Nondeterministic finite automata (NFA): a
There are no restrictions on the edges leaving
a state. There can be several with the same a b b
1 2 3 4
symbol as label and some edges can be
labeled with 𝜖.
b NFA

Regular expression to DFA


 Regular expression can be converted into DFA using any of the following two methods:
1. Subset Construction Method:
First NFA-^ is constructed using Thompson’s Notation
NFA-^ is converted directly into DFA using Subset Construction Method.

2. Syntax Tree Method

R.E to DFA

1) R.E to NFA (Thompson’s rule)


Syntax Tree Method
2) NFA to DFA (Subset Construction)

14
9/13/2023

Regular expression to NFA using Thompson's rule


1. For ∈ , construct the NFA 3. For regular expression 𝑠𝑡

start
start 𝜖 𝑖 N(s) N(t) 𝑓
𝑖 𝑓

2. For 𝑎 in Σ, construct the NFA Ex: ab

start a a b
𝑖 𝑓 1 2 3

15
9/13/2023

Regular expression to NFA using Thompson's rule


4. For regular expression 𝑠|𝑡 5. For regular expression 𝑠*
𝜖
N(s) 𝜖
𝜖
start 𝜖 𝜖
start 𝑖 N(s) 𝑓
𝑖 𝑓

𝜖 N(t) 𝜖 𝜖

Ex: (a|b) Ex: a*


𝜖
a
2 3
𝜖 𝜖 𝜖 𝑎 𝜖
1 2 3 4
1 6

𝜖 𝜖 𝜖
4 5
b

Regular expression to NFA using Thompson's rule


 a*b

𝜖 𝑎 𝜖 𝑏
1 2 3 4 5

 b*ab
𝜖

𝜖 𝑏 𝜖 𝑎 𝑏
1 2 3 4 5 6

16
9/13/2023

Subset construction algorithm


Input: An NFA 𝑁.
Output: A DFA D accepting the same language.
Method: Algorithm construct a transition table 𝐷𝑡𝑟𝑎𝑛 for D. We use the following operation:

OPERATION DESCRIPTION
 − 𝑐𝑙𝑜𝑠𝑢𝑟𝑒(𝑠) Set of NFA states reachable from NFA state 𝑠 on
– transition alone.
 − 𝑐𝑙𝑜𝑠𝑢𝑟𝑒(𝑇) Set of NFA states reachable from some NFA state 𝑠
in 𝑇 on – transition alone.
M𝒐𝒗𝒆 (𝑇, 𝑎) Set of NFA states to which there is a transition on
input symbol 𝑎 from some NFA state 𝑠 in 𝑇.

17
9/13/2023

Subset construction algorithm


initially  − 𝑐𝑙𝑜𝑠𝑢𝑟𝑒(𝑠0) be the only state in 𝐷𝑠𝑡𝑎𝑡𝑒𝑠 and it is unmarked;
while there is unmarked states T in 𝐷𝑠𝑡𝑎𝑡𝑒𝑠 do begin
mark 𝑇;
for each input symbol 𝑎 do begin
𝑈 = 𝜖 − 𝑐𝑙𝑜𝑠𝑢𝑟𝑒 𝑚𝑜𝑣𝑒 𝑇, 𝑎 ;
if 𝑈 is not in 𝐷𝑠𝑡𝑎𝑡𝑒𝑠 then
add 𝑈 as unmarked state to 𝐷𝑠𝑡𝑎𝑡𝑒𝑠;
𝐷𝑡𝑟𝑎𝑛[ 𝑇, 𝑎 ] = 𝑈
end
end

Conversion from NFA to DFA

(a|b)* abb 𝜖

a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10

𝜖 𝜖
4 5
b

18
9/13/2023

Conversion from NFA to DFA

a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10

𝜖 𝜖
4 5
b

𝜖- Closure(0)= {0, 1, 7, 2, 4}

= {0,1,2,4,7} ---- A

Conversion from NFA to DFA

a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8}

𝜖 𝜖
4 5
b

𝜖
A= {0, 1, 2, 4, 7}
Move(A,a) = {3,8}
𝜖- Closure(Move(A,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B

19
9/13/2023

Conversion from NFA to DFA

a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8}
C = {1,2,4,5,6,7}
𝜖 𝜖
4 5
b

𝜖
A= {0, 1, 2, 4, 7}
Move(A,b) = {5}
𝜖- Closure(Move(A,b)) = {5, 6, 7, 1, 2, 4}
= {1,2,4,5,6,7} ---- C

Conversion from NFA to DFA

a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B
C = {1,2,4,5,6,7}
𝜖 𝜖
4 5
b

𝜖
B = {1, 2, 3, 4, 6, 7, 8}
Move(B,a) = {3,8}
𝜖- Closure(Move(B,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B

20
9/13/2023

Conversion from NFA to DFA

a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7}
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9}
b

B= {1, 2, 3, 4, 6, 7, 8}
Move(B,b) = {5,9}
𝜖- Closure(Move(B,b)) = {5, 6, 7, 1, 2, 4, 9}
= {1,2,4,5,6,7,9} ---- D

Conversion from NFA to DFA

a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9}
b

C= {1, 2, 4, 5, 6 ,7}
Move(C,a) = {3,8}
𝜖- Closure(Move(C,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B

21
9/13/2023

Conversion from NFA to DFA

a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9}
b

𝜖
C= {1, 2, 4, 5, 6, 7}
Move(C,b) = {5}
𝜖- Closure(Move(C,b))= {5, 6, 7, 1, 2, 4}
= {1,2,4,5,6,7} ---- C

Conversion from NFA to DFA

a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9} B
b

D= {1, 2, 4, 5, 6, 7, 9}
Move(D,a) = {3,8}
𝜖- Closure(Move(D,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B

22
9/13/2023

Conversion from NFA to DFA

a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9} B E
b
E = {1,2,4,5,6,7,10}
𝜖

D= {1, 2, 4, 5, 6, 7, 9}
Move(D,b) = {5,10}
𝜖- Closure(Move(D,b)) = {5, 6, 7, 1, 2, 4, 10}
= {1,2,4,5,6,7,10} ---- E

Conversion from NFA to DFA

a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9} B E
b
E = {1,2,4,5,6,7,10} B
𝜖
E= {1, 2, 4, 5, 6, 7, 10}
Move(E,a) = {3,8}
𝜖- Closure(Move(E,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B

23
9/13/2023

Conversion from NFA to DFA

a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9} B E
b
E = {1,2,4,5,6,7,10} B C
𝜖
E= {1, 2, 4, 5, 6, 7, 10}
Move(E,b)= {5}
𝜖- Closure(Move(E,b))= {5,6,7,1,2,4}
= {1,2,4,5,6,7} ---- C

Conversion from NFA to DFA

a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9} B E
b
E = {1,2,4,5,6,7,10} B C
𝜖

{5}
{5,6,7,1,2,4}

24
9/13/2023

Conversion from NFA to DFA

b
States a b B D
a
A = {0,1,2,4,7} B C a
B = {1,2,3,4,6,7,8} B D
A a a b
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E b
C E
E = {1,2,4,5,6,7,10} B C b

Transition Table
b
Note:
• Accepting state in NFA is 10 DFA
• 10 is element of E
• So, E is acceptance state in DFA

Exercise
 Convert following regular expression to DFA using subset construction method:
1. (a+b)* a b b (a+b)* (Nov-2011)
2. a a* (b | c) a* c# (Dec-2012)
3. a+ (c |d) b* f # (May-2015)
4. a (b | c)* a* c# (Nov-2016)
5. (a | b)* a b# (April-2017)
6. (a | b)* a b* a (May-2019, Dec-2021_NEW)
7. a a* a b* c # (Aug-2021)
8. (a/b) c a* c# (Jun-2023)

25
9/13/2023

DFA optimization
1. Construct an initial partition Π of the set of states with two groups: the accepting states 𝐹 and
the non-accepting states 𝑆 − 𝐹.
2. Apply the repartition procedure to Π to construct a new partition Π𝑛𝑒𝑤.
3. If Π 𝑛𝑒𝑤 = Π, let Π𝑓𝑖𝑛𝑎𝑙 = Π and continue with step (4). Otherwise, repeat step (2) with
Π = Π𝑛𝑒𝑤.
for each group 𝐺 of Π do begin
partition 𝐺 into subgroups such that two states 𝑠 and 𝑡
of 𝐺 are in the same subgroup if and only if for all
input symbols 𝑎, states 𝑠 and 𝑡 have transitions on 𝑎
to states in the same group of Π.
replace 𝐺 in Π𝑛𝑒𝑤 by the set of all subgroups formed.
end

26
9/13/2023

DFA optimization
4. Choose one state in each group of the partition Π𝑓𝑖𝑛𝑎𝑙 as the representative for that group.
The representatives will be the states of 𝑀′. Let s be a representative state, and suppose on
input a there is a transition of 𝑀 from 𝑠 to 𝑡. Let 𝑟 be the representative of 𝑡′s group. Then 𝑀′
has a transition from 𝑠 to 𝑟 on 𝑎. Let the start state of 𝑀′ be the representative of the group
containing start state 𝑠0 of 𝑀, and let the accepting states of 𝑀′ be the representatives that
are in 𝐹.
5. If 𝑀′ has a dead state 𝑑, then remove 𝑑 from 𝑀′. Also remove any state not reachable from the
start state.

DFA optimization

States a b
{𝐴, 𝐵, 𝐶, 𝐷, 𝐸}
A B C
B B D
Nonaccepting States Accepting States
{𝐴, 𝐵, 𝐶, 𝐷} {𝐸} C B C
D B E
E B C
{𝐴, 𝐵, 𝐶} {𝐷}

States a b
{𝐴, 𝐶} {𝐵}
A B A
B B D
 Now no more splitting is possible. D B E
E B A
 If we chose A as the representative for group
Optimized
(AC), then we obtain reduced transition table Transition Table

27
9/13/2023

Rules to compute nullable, firstpos, lastpos


 nullable(n)
 The subtree at node 𝑛 generates languages including the empty string.
 firstpos(n)
 The set of positions that can match the first symbol of a string generated by the subtree at node 𝑛.
 lastpos(n)
 The set of positions that can match the last symbol of a string generated be the subtree at node 𝑛.

 followpos(i)
 The set of positions that can follow position 𝑖 in the tree.

28
9/13/2023

Rules to compute nullable, firstpos, lastpos

Node n nullable(n) firstpos(n) lastpos(n)


A leaf labeled by  true ∅ ∅
A leaf with position
false {i} {i}
𝐢

n nullable(c1) firstpos(c1) lastpos(c1)


|
or  
c1 c2 nullable(c2) firstpos(c2) lastpos(c2)
if (nullable(c1)) if (nullable(c2))
n . nullable(c1)
thenfirstpos(c1)  then lastpos(c1) 
and
c1 firstpos(c2) lastpos(c2)
c2 nullable(c2)
else firstpos(c1) else lastpos(c2)
n ∗
true firstpos(c1) lastpos(c1)
c1

Rules to compute followpos


1. If n is concatenation node with left child c1 and right child c2 and i is a position in lastpos(c1),
then all position in firstpos(c2) are in followpos(i)

2. If n is * node and i is position in lastpos(n), then all position in firstpos(n) are in followpos(i)

29
9/13/2023

Conversion from regular expression to DFA

(a|b) * abb # Step 1: Construct Syntax Tree


. Step 2: Nullable node
.
#
Here, * is only nullable node
. 𝟔
𝑏
. 𝑏
𝟓

𝟒
∗ 𝑎
𝟑

𝑎 𝑏
𝟏 𝟐

Conversion from regular expression to DFA

Step 3: Calculate firstpos


Firstpos
{1,2,3} .

{1,2,3} . A leaf with position 𝒊 = {𝒊}


{6} #
{1,2,3} . 𝟔
{5} 𝑏 n
|
{1,2,3} . {4} 𝑏
𝟓 firstpos(c1)  firstpos(c2)

𝟒 c1 c2
{1,2} ∗ {3} 𝑎
n ∗
𝟑 firstpos(c1)
c1
{1,2} |
n if (nullable(c1))
.
𝑎 𝑏 thenfirstpos(c1) 
{1} 𝟏 {2}𝟐 firstpos(c2)
c1 c2 else firstpos(c1)

30
9/13/2023

Conversion from regular expression to DFA

Step 3: Calculate lastpos


Lastpos
{1,2,3} . {6}

{1,2,3} . {5} A leaf with position 𝒊 = {𝒊}


{6} # {6}
{1,2,3} . {4} 𝟔
n
{5} 𝑏 {5} |
{1,2,3} . {3} {4} 𝑏 {4} 𝟓 lastpos(c1)  lastpos(c2)
c1 c2
𝟒
{1,2} ∗ {1,2} {3} 𝑎 {3} n ∗
𝟑 lastpos(c1)
c1
{1,2} | {1,2}
n
. if (nullable(c2)) then
𝑎 𝑏 lastpos(c1)  lastpos(c2)
{1} {1} {2} {2} else lastpos(c2)
𝟏 𝟐 c1 c2

Conversion from regular expression to DFA

Step 4: Calculate followpos Position followpos


5 6
Firstpos {1,2,3} . {6}
Lastpos
{1,2,3} . {5}
{6} # {6}
{1,2,3} . {4} {5} 𝑏 {5}
𝟔

{1,2,3} . {3} {4} 𝑏 {4} 𝟓 .


𝟒
{1,2} ∗ {1,2} {3} 𝑎 {3} {1,2,3} 𝒄 {5}
𝟏
{6} 𝒄𝟐 {6}
𝟑

{1,2} | {1,2}
𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {5}
𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑐2 = 6
𝑎 𝑏 𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 5 = 6
{1} {1} {2} {2}
𝟏 𝟐

31
9/13/2023

Conversion from regular expression to DFA

Step 4: Calculate followpos Position followpos


5 6
{1,2,3} . {6}
4 5
{1,2,3} . {5}
{6} # {6}
{1,2,3} . {4} 𝟔
{5} 𝑏 {5}
{1,2,3} . {3} {4} 𝑏 {4} 𝟓 .
𝟒
{1,2} ∗ {1,2} {3} 𝑎 {3} {1,2,3} 𝒄 {4}
𝟏
{5} 𝒄𝟐 {5}
𝟑

{1,2} | {1,2}
𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {4}
𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑐2 = 5
𝑎 𝑏 𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 4 = 5
{1} {1} {2} {2}
𝟏 𝟐

Conversion from regular expression to DFA

Step 4: Calculate followpos Position followpos


5 6
Firstpos {1,2,3} . {6}
4 5
Lastpos
{1,2,3} . {5} 3 4
{6} # {6}
{1,2,3} . {4} {5} 𝑏 {5}
𝟔

{1,2,3} . {3} {4} 𝑏 {4} 𝟓 .


𝟒
{1,2} ∗ {1,2} {3} 𝑎 {3} {1,2,3} 𝒄 {3}
𝟏
{4} 𝒄𝟐 {4}
𝟑

{1,2} | {1,2}
𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {3}
𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑐2 = 4
𝑎 𝑏 𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 3 = 4
{1} {1} {2} {2}
𝟏 𝟐

32
9/13/2023

Conversion from regular expression to DFA

Step 4: Calculate followpos Position followpos


5 6
Firstpos {1,2,3} . {6}
4 5
Lastpos
{1,2,3} . {5} 3 4
{6} # {6}
2 3
{1,2,3} . {4} 𝟔
{5} 𝑏 {5} 1 3
{1,2,3} . {3} {4} 𝑏 {4} 𝟓 .
𝟒
{1,2} ∗ {1,2} {3} 𝑎 {3} {1,2} 𝒄𝟏 {1,2} {3} 𝒄𝟐 {3}
𝟑

{1,2} | {1,2}
𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {1,2}
𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑐2 = 3
𝑎 𝑏 𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 1 = 3
{1} {1} {2} {2}
𝟏 𝟐 𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 2 = 3

Conversion from regular expression to DFA

Step 4: Calculate followpos Position followpos


5 6
Firstpos {1,2,3} . {6}
4 5
Lastpos
{1,2,3} . {5} 3 4
{6} # {6}
2 1,2, 3
{1,2,3} . {4} {5} 𝑏 {5}
𝟔
1 1,2, 3
{1,2,3} . {3} {4} 𝑏 {4} 𝟓
𝟒 {1,2} * {1,2}
{1,2} ∗ {1,2} {3} 𝑎 {3} 𝒏
𝟑

{1,2} | {1,2}
𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑛) = {1,2}
𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑛 = 1,2
𝑎 𝑏 𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 1 = 1,2
{1} {1} {2} {2}
𝟏 𝟐 𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 2 = 1,2

33
9/13/2023

Conversion from regular expression to DFA


Initial state = 𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 of root = {1,2,3} ----- A
Position followpos
State A
5 6
δ( (1,2,3),a) = followpos(1) U followpos(3) 4 5
=(1,2,3) U (4) = {1,2,3,4} ----- B 3 4
2 1,2,3

δ( (1,2,3),b) = followpos(2) 1 1,2,3

=(1,2,3) ----- A
States a b
A={1,2,3} B A
B={1,2,3,4}

Conversion from regular expression to DFA


State B
Position followpos
δ( (1,2,3,4),a) = followpos(1) U followpos(3)
5 6
=(1,2,3) U (4) = {1,2,3,4} ----- B 4 5
3 4
δ( (1,2,3,4),b) = followpos(2) U followpos(4) 2 1,2,3

=(1,2,3) U (5) = {1,2,3,5} ----- C 1 1,2,3

State C
δ( (1,2,3,5),a) = followpos(1) U followpos(3) States a b
A={1,2,3} B A
=(1,2,3) U (4) = {1,2,3,4} ----- B
B={1,2,3,4} B C
C={1,2,3,5} B D
δ( (1,2,3,5),b) = followpos(2) U followpos(5) D={1,2,3,6}

=(1,2,3) U (6) = {1,2,3,6} ----- D

34
9/13/2023

Conversion from regular expression to DFA


State D
Position followpos
δ( (1,2,3,6),a) = followpos(1) U followpos(3)
5 6
=(1,2,3) U (4) = {1,2,3,4} ----- B 4 5
3 4
δ( (1,2,3,6),b) = followpos(2) 2 1,2,3

=(1,2,3) ----- A 1 1,2,3

States a b
A={1,2,3} B A
B={1,2,3,4} B C
C={1,2,3,5} B D
D={1,2,3,6} B A

Conversion from regular expression to DFA

b
a States a b
A={1,2,3} B A
a b b B={1,2,3,4} B C
A B C D
C={1,2,3,5} B D
a
a D={1,2,3,6} B A
b

DFA

35
9/13/2023

Conversion from regular expression to DFA


Construct DFA for following regular expression using Syntax Tree method:
1. a* b* a (a | b)* b* a# (Dec-2012)
2. ( a* | b* )* (Dec-2015)
3. a+ b* (c | d) f # (May-2015)
4. (a | b | c)* d* (a* | b) a c+# (May-2016)
5. a* b* a (a | b) b* a# (Nov-2016, Nov-2019)
6. ( a | b)* a (Nov-2016, May-2017, April-2018, May-2019)
7. ( a | ε )* a b ( a | b )* # (May-2018)
8. a (a | b )* a b (Dec-2018)
9. (a | b | c)* a (b | c)* # (May-2019)
10.(a | b)* a b b* (Jan-2023_NEW)

36

You might also like