DFA-Based Pattern Matcher Optimization

The document discusses three algorithms for optimizing pattern matchers constructed from regular expressions: 1. A algorithm that constructs a DFA directly from a regular expression without an intermediate NFA, potentially resulting in fewer states. 2. An algorithm that minimizes the number of states in any DFA by combining states with the same future behavior, running in O(n log n) time. 3. An algorithm that produces more compact DFA transition tables than the standard two-dimensional table representation.

Uploaded by

dhiraj751075

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views2 pages

DFA-Based Pattern Matcher Optimization

Uploaded by

dhiraj751075

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Optimization of DFA-Based Pattern Matchers

Optimization of DFA-Based Pattern Matchers

Introduction: In this section we present three algorithms that have been used to
implement and optimize pattern matchers constructed from regular expressions.

1. The first algorithm is useful in a Lex compiler, because it constructs a DFA directly
from a regular expression, without constructing an intermediate NFA. The resulting
DFA also may have fewer states than the DFA constructed via an NFA.

2. The second algorithm minimizes the number of states of any DFA, by combining
states that have the same future behavior. The algorithm itself is quite efficient, running
in time 0(n log n), where n is the number of states of the DFA.

3. The third algorithm produces more compact representations of transition tables

than the standard, two-dimensional table.

Important States of an NFA: To begin our discussion of how to go directly from a

regular expression to a DFA, we must first dissect the NFA construction of Algorithm
3.23 and consider the roles played by various states. We call a state of an NFA
important if it has a non-e out-transition. Notice that the subset construction (Algorithm
3.20) uses only the important states in a set T when it computes e-closure (move (T,
a)), the set of states reachable from T on input a. That is, the set of states move(s, a)
is nonempty only if state s is important. During the subset construction, two sets of
NFA states can be identified (treated as if they were the same set) if they:

1. Have the same important states, and

2. Either both have accepting states or neither does.

When the NFA is constructed from a regular expression, we can say more about the
important states. The only important states are those introduced as initial states in the
basis part for a particular symbol position in the regular expression. That is, each
important state corresponds to a particular operand in the regular expression.

The constructed NFA has only one accepting state, but this state, having no out-
transitions, is not an important state. By concatenating a unique right end marker # to
a regular expression r, we give the accepting state for r a transition on #, making it an
important state of the NFA for ( r ) # . In other words, by using the augmented regular
expression ( r ) # , we can forget about accepting states as the subset construction
proceeds; when the construction is complete, any state with a transition on # must be
an accepting state.

The important states of the NFA correspond directly to the positions in the regular
expression that hold symbols of the alphabet. It is useful, as we shall see, to present
the regular expression by its syntax tree, where the leaves correspond to operands
and the interior nodes correspond to operators. An interior node is called a cat-node,
or-node, or star-node if it is labeled by the concatenation operator (dot), union operator
|, or star operator *, respectively.
Example:Figure 3.56 shows the syntax tree for the regular expression of our running
example. Cat-nodes are represented by circles.

Leaves in a syntax tree are labeled by e or by an alphabet symbol. To each leaf not
labeled e, we attach a unique integer. We refer to this integer as the position of the
leaf and also as a position of its symbol. Note that a symbol can have several positions;
for instance, a has positions 1 and 3 in Fig. 3.56. The positions in the syntax tree
correspond to the important states of the constructed NFA.

E x a m p l e:Figure 3.57 shows the NFA for the same regular expression as Fig. 3.56,
with the important states numbered and other states represented by letters. The
numbered states in the NFA and the positions in the syntax tree correspond in a way
we shall soon see.

Lexical Analyzer Design with LEX Tool
No ratings yet
Lexical Analyzer Design with LEX Tool
13 pages
Information Security Lab Record
No ratings yet
Information Security Lab Record
3 pages
Overview of Expert System Architecture
No ratings yet
Overview of Expert System Architecture
18 pages
Lexical Analyzer Design in Compiler
No ratings yet
Lexical Analyzer Design in Compiler
20 pages
NFA to DFA Conversion Guide
No ratings yet
NFA to DFA Conversion Guide
23 pages
Introduction to Finite Automata
No ratings yet
Introduction to Finite Automata
24 pages
FA to Regular Expression Conversion Guide
No ratings yet
FA to Regular Expression Conversion Guide
3 pages
Structure and Phases of a Compiler
No ratings yet
Structure and Phases of a Compiler
2 pages
Peephole Optimization in Compiler Design
No ratings yet
Peephole Optimization in Compiler Design
14 pages
Regular Expressions and Their Operations
No ratings yet
Regular Expressions and Their Operations
33 pages
Lex: Specifying Lexical Analyzers
No ratings yet
Lex: Specifying Lexical Analyzers
19 pages
Runtime Environment in Compiler Design
No ratings yet
Runtime Environment in Compiler Design
31 pages
Regular Expression for Language L
No ratings yet
Regular Expression for Language L
25 pages
Getting Lost in Reinforcement Learning
No ratings yet
Getting Lost in Reinforcement Learning
30 pages
Understanding Syntax Analysis in Compilers
No ratings yet
Understanding Syntax Analysis in Compilers
5 pages
Intermediate Code Generation in Compilers
No ratings yet
Intermediate Code Generation in Compilers
29 pages
User Roles in Network Security
No ratings yet
User Roles in Network Security
13 pages
Automata and Compiler Design Overview
No ratings yet
Automata and Compiler Design Overview
59 pages
Understanding P, NP, NP-complete, NP-hard
No ratings yet
Understanding P, NP, NP-complete, NP-hard
1 page
Types and Declarations in Intermediate Code Generation
No ratings yet
Types and Declarations in Intermediate Code Generation
8 pages
Propositional and First-Order Logic
No ratings yet
Propositional and First-Order Logic
24 pages
Virtualization and Web 2.0 Integration
No ratings yet
Virtualization and Web 2.0 Integration
6 pages
Software Engineering Process Artifacts
No ratings yet
Software Engineering Process Artifacts
45 pages
Role of Parser in Compiler Design
No ratings yet
Role of Parser in Compiler Design
20 pages
Compilation Overview and Phases
No ratings yet
Compilation Overview and Phases
30 pages
Central Concepts of Automata Theory
100% (2)
Central Concepts of Automata Theory
10 pages
Transaction and Data Flow Testing Overview
No ratings yet
Transaction and Data Flow Testing Overview
36 pages
Firewalls in Cryptography and Security
No ratings yet
Firewalls in Cryptography and Security
21 pages
Compiler Design Principles Question Bank
No ratings yet
Compiler Design Principles Question Bank
14 pages
Syntax Directed Translation in Compilers
No ratings yet
Syntax Directed Translation in Compilers
25 pages
Evaluation Orders for Syntax-Directed Definitions
No ratings yet
Evaluation Orders for Syntax-Directed Definitions
40 pages
Space-Time Tradeoffs in Algorithms
No ratings yet
Space-Time Tradeoffs in Algorithms
41 pages
Syntax Directed Translation in Compilers
No ratings yet
Syntax Directed Translation in Compilers
9 pages
Epsilon-NFA: Definition and Elimination
No ratings yet
Epsilon-NFA: Definition and Elimination
11 pages
Understanding Spatial Data Mining Primitives
No ratings yet
Understanding Spatial Data Mining Primitives
10 pages
Counter Machines Explained
No ratings yet
Counter Machines Explained
32 pages
Automata Theory Course Overview
No ratings yet
Automata Theory Course Overview
113 pages
Automata Theory and Regular Expressions
No ratings yet
Automata Theory and Regular Expressions
19 pages
Syntax Directed Translation in Compilers
No ratings yet
Syntax Directed Translation in Compilers
49 pages
Turing Machines and Undecidability Concepts
No ratings yet
Turing Machines and Undecidability Concepts
10 pages
DES Algorithm and Feistel Structure
No ratings yet
DES Algorithm and Feistel Structure
30 pages
LISP and PROLOG in AI Programming
No ratings yet
LISP and PROLOG in AI Programming
41 pages
Input Buffering in Compiler Design
No ratings yet
Input Buffering in Compiler Design
129 pages
Finite Automata Design and Conversion
No ratings yet
Finite Automata Design and Conversion
17 pages
DFA Minimization Using Table Filling Method
No ratings yet
DFA Minimization Using Table Filling Method
9 pages
SPPM Unit 1
No ratings yet
SPPM Unit 1
37 pages
SLR, CLR, and LALR Parser Differences
No ratings yet
SLR, CLR, and LALR Parser Differences
20 pages
Proving Undecidability of Ld
100% (1)
Proving Undecidability of Ld
48 pages
Overview of General Learning Model
No ratings yet
Overview of General Learning Model
3 pages
Closure Properties of Context-Free Languages
No ratings yet
Closure Properties of Context-Free Languages
25 pages
Regression Testing: Selection & Prioritization
No ratings yet
Regression Testing: Selection & Prioritization
32 pages
Syntax Directed Translation Overview
No ratings yet
Syntax Directed Translation Overview
93 pages
IoT Overview and Applications at KIIT
No ratings yet
IoT Overview and Applications at KIIT
76 pages
TOC Lab Manual: DFA and Automata Experiments
No ratings yet
TOC Lab Manual: DFA and Automata Experiments
49 pages
String Matching Algorithms Explained
100% (1)
String Matching Algorithms Explained
27 pages
Concept Learning in Machine Learning
No ratings yet
Concept Learning in Machine Learning
85 pages
Overview of X.800 Security Services
No ratings yet
Overview of X.800 Security Services
5 pages
DFA Optimization Techniques Explained
No ratings yet
DFA Optimization Techniques Explained
34 pages
NFA and DFA in Lexical Analyzer Design
No ratings yet
NFA and DFA in Lexical Analyzer Design
35 pages
DFA and NFA Construction Guide
No ratings yet
DFA and NFA Construction Guide
21 pages
JavaScript Promise and Error Handling
No ratings yet
JavaScript Promise and Error Handling
102 pages
Introduction to Java Programming Basics
No ratings yet
Introduction to Java Programming Basics
19 pages
Daily Time Record Template
No ratings yet
Daily Time Record Template
1 page
Linux Programming Overview and Features
No ratings yet
Linux Programming Overview and Features
30 pages
Digital Logic & Computer Org Syllabus
No ratings yet
Digital Logic & Computer Org Syllabus
2 pages
The Practice of Programming Guide
100% (1)
The Practice of Programming Guide
4 pages
Stack and Queue Data Structures Explained
No ratings yet
Stack and Queue Data Structures Explained
42 pages
Number Theory Test Questionnaire
No ratings yet
Number Theory Test Questionnaire
3 pages
Python for Data Science Course Overview
No ratings yet
Python for Data Science Course Overview
36 pages
Fresher Software Engineer Portfolio
No ratings yet
Fresher Software Engineer Portfolio
2 pages
Informatics Practices Exam Paper XI 2024
No ratings yet
Informatics Practices Exam Paper XI 2024
3 pages
OOP Concepts: Encapsulation & Abstraction
No ratings yet
OOP Concepts: Encapsulation & Abstraction
32 pages
M Sc Computer Science Exam Questions 2022
No ratings yet
M Sc Computer Science Exam Questions 2022
9 pages
Finite Automata and Epsilon Transitions
No ratings yet
Finite Automata and Epsilon Transitions
41 pages
Synchronous Counter Design Guide
No ratings yet
Synchronous Counter Design Guide
10 pages
Python Programming Question Bank
No ratings yet
Python Programming Question Bank
10 pages
CS-350 Homework 3: Server Queue Management
No ratings yet
CS-350 Homework 3: Server Queue Management
4 pages
HackerRank Input/Output Guidelines
No ratings yet
HackerRank Input/Output Guidelines
4 pages
Real-Time Scheduling Algorithms Explained
No ratings yet
Real-Time Scheduling Algorithms Explained
3 pages
C++ Inheritance: Access Control & Ambiguity
No ratings yet
C++ Inheritance: Access Control & Ambiguity
12 pages
Design of '1101' Sequence Detector
No ratings yet
Design of '1101' Sequence Detector
84 pages
Overview of Java Programming Language
No ratings yet
Overview of Java Programming Language
185 pages
CH18 COA11e
No ratings yet
CH18 COA11e
40 pages
Neutrosophic Z-Open Sets Explained
No ratings yet
Neutrosophic Z-Open Sets Explained
6 pages
Python Lists, Tuples, and Dictionaries Guide
No ratings yet
Python Lists, Tuples, and Dictionaries Guide
45 pages
Crash Course Coding Companion
No ratings yet
Crash Course Coding Companion
136 pages
Python Interview Questions Guide
No ratings yet
Python Interview Questions Guide
3 pages
Slang and Abbreviations in English
No ratings yet
Slang and Abbreviations in English
21 pages
Data Encryption Standard Questions and Answers - Sanfoundry
100% (1)
Data Encryption Standard Questions and Answers - Sanfoundry
6 pages
Symmetric Key Cryptography Concepts
No ratings yet
Symmetric Key Cryptography Concepts
70 pages

DFA-Based Pattern Matcher Optimization

Uploaded by

DFA-Based Pattern Matcher Optimization

Uploaded by

Optimization of DFA-Based Pattern Matchers

3. The third algorithm produces more compact representations of transition tables

Important States of an NFA: To begin our discussion of how to go directly from a

1. Have the same important states, and

2. Either both have accepting states or neither does.

You might also like