Open In App

Regular Expression to DFA

Last Updated : 04 Oct, 2024
Summarize
Comments
Improve
Suggest changes
Share
Like Article
Like
Report

The main function of regular expressions is to define patterns for matching strings; automata theory provides a structured pattern recognition of these patterns through Finite Automata. A very common method to construct a Deterministic Finite Automaton (DFA) based on any given regular expression is first to construct an NFA and then transform the NFA into the equivalent DFA by the method of subset construction. However, this two-step procedure can be avoided by directly constructing the DFA from the regular expression.

What is DFA?

A DFA is a type of finite automaton such that, for any state and any input symbol, there is exactly one possible transition to a subsequent state. NFAs do not have €-transitions (transitions without the consumption of any input). Because of this determinism, DFAs are an efficient model for pattern recognition tasks because the next state of the automaton is completely determined from the current state and the input symbol at any given point.

Construction of DFA

In order to construct a DFA directly from a regular expression, we need to follow the steps listed below:

Example: Suppose given regular expression r = (a|b)*abb

1. Firstly, we construct the augmented regular expression for the given expression. By concatenating a unique right-end marker '#' to a regular expression r, we give the accepting state for r a transition on '#' making it an important state of the NFA for r#.

So, r' = (a|b)*abb#

2. Then we construct the syntax tree for r#.

Syntax tree for (a|b)*abb#
Syntax tree for (a|b)*abb#

3. Next we need to evaluate four functions nullable, firstpos, lastpos, and followpos.

  1. nullable(n) is true for a syntax tree node n if and only if the regular expression represented by n has € in its language.
  2. firstpos(n) gives the set of positions that can match the first symbol of a string generated by the subexpression rooted at n.
  3. lastpos(n) gives the set of positions that can match the last symbol of a string generated by the subexpression rooted at n.

We refer to an interior node as a cat-node, or-node, or star-node if it is labeled by a concatenation, | or * operator, respectively.

Rules for Computing nullable, firstpos, and lastpos

Node nnullable(n)firstpos(n)lastpos(n)
n is a leaf node labeled €true  ∅
n is a leaf node labelled with position ifalse{ i } { i } 
n is an or node with left child c1 and right child c2nullable(c1) or nullable(c2)firstpos(c1) ∪ firstpos(c2)lastpos(c1) ∪ lastpos(c2)
n is a cat node with left child c1 and right child c2nullable(c1) and nullable(c2)If nullable(c1) then firstpos(c1) ∪ firstpos(c2) else firstpos(c1)If nullable(c2) then lastpos(c2) ∪ lastpos(c1) else lastpos(c2)
n is a star node with child node c1truefirstpos(c1)lastpos(c1)

Rules for computing followpos:

  1. If n is a cat-node with left child c1 and right child c2 and i is a position in lastpos(c1), then all positions in firstpos(c2) are in followpos(i).
  2. If n is a star-node and i is a position in lastpos(n), then all positions in firstpos(n) are in followpos(i).
  3. Now that we have seen the rules for computing firstpos and lastpos, we now proceed to calculate the values of the same for the syntax tree of the given regular expression (a|b)*abb#.
firstpos and lastpos for nodes in syntax tree for (a|b)*abb#
firstpos and lastpos for nodes in syntax tree for (a|b)*abb#

Let us now compute the followpos bottom up for each node in the syntax tree.

NODEfollowpos
1{1, 2, 3}
2{1, 2, 3}
3{4}
4{5}
5{6}
6

4.Now we construct Dstates, the set of states of DFA D and Dtran, the transition table for D. The start state of DFA D is firstpos(root) and the accepting states are all those containing the position associated with the endmarker symbol #.

According to our example, the firstpos of the root is {1, 2, 3}. Let this state be A and consider the input symbol a. Positions 1 and 3 are for a, so let B = followpos(1) ∪ followpos(3) = {1, 2, 3, 4}. Since this set has not yet been seen, we set Dtran[A, a] := B.

When we consider input b, we find that out of the positions in A, only 2 is associated with b, thus we consider the set followpos(2) = {1, 2, 3}. Since this set has already been seen before, we do not add it to Dstates but we add the transition Dtran[A, b]:= A.

Continuing like this with the rest of the states, we arrive at the below transition table.

 Input
Stateab
⇢ ABA
    BBC
    CBD
    DBA

Here, A is the start state and D is the accepting state.

5. Finally we draw the DFA for the above transition table.

The final DFA will be :

DFA for (a|b)*abb
DFA for (a|b)*abb

Conclusion

Construction of a DFA from a regular expression is one of the very fundamental processes in automata theory that ties formal languages to practice, such as lexical analysis in compilers. The construction of a DFA from the regular expression avoids taking the middle step of creating the NFA, so the process is much shorter but it does preserve the determinism of the automaton. Understanding how DFAs work also deepens knowledge of formal languages but enhances the implementation of efficient pattern recognition and parsing algorithms in many computer science applications.


Next Article

Similar Reads