Chapter 2 RegularExpressions (3)
Chapter 2 RegularExpressions (3)
Chapter 2
Topics
1) Regular Expressions(RE)
2) FA to RE conversion and vice-versa
3) How to prove whether a given language is
regular or not?
4) Closure properties of regular languages
1
RE’s: Introduction
Regular expressions are an algebraic way to describe
languages.
They describe exactly the regular languages.
If E is a regular expression, then L(E) is the language it
defines.
A regular is expression (sometimes called a rational
expression) in computer science and formal languages
theory.
A sequence of characters that define a search pattern. usually
this pattern is then used by in string searching algorithm
"find" or "find and replace" operations on strings.
2
RE’s: Introduction…..
Regular expressions are the most effective way to represent
any language.
A regular expression can be defined as a language or string
accepted by a finite automata.
Basis 1: If a is any symbol, then a is a RE, and L(a) = {a}.
Note: {a} is the language containing one string, and that
string is of length 1.
Basis 2: ε is a RE, and L(ε) = {ε}.
Basis 3: ∅ is a RE, and L(∅) = ∅.
3
RE’s: Introduction…
The set of regular expression of defined by the following rules:
(i) Every letter of ∑ can be made into regular expression, null
string,€ itself is a regular expression.
(ii)If r1 and r2 are regular expression, then
(a) (r1) (b) r1r2
(c) r1+r2 (d) r*1
+
(e) r1 are also regular expression
(iii) Nothing else is regular expression.
Regular Expressions vs. Finite Automata
Offers a declarative way to express the pattern of any
string we want to accept
E.g., 01*+ 10*
Automata => more machine-like
< input: string , output: [accept/reject] >
Regular expressions => more program syntax-like
Unix environments heavily use regular expressions
E.g., bash shell, grep, vi & other editors, sed
Perl scripting – good for string processing
Lexical analyzers such as Lex or Flex
5
Regular Expressions
Regular = Finite Automata
expressions (DFA, NFA, -NFA)
Syntactical
expressions Automata/machines
Regular
Languages
6
Language Operators
Union of two languages:
L U M = all strings that are either in L or M
language
Concatenation of two languages:
L . M = all strings that are of the form xy
s.t., x L and y M
The dot operator is usually omitted
i.e., LM is same as L.M
7
“i” here refers to how many strings to concatenate from the parent language L
to produce strings in the language L i
9
Building Regular Expressions
(i) The constants ϵ(null string) and ɸ(empty set) are regular
expression,
denote the languages {ϵ} and ɸ, respectively.
That is, L(ϵ) = {ϵ} , and L(ɸ)= ɸ.
(ii)
If a is any symbol, then a is regular expression. This
expression denotes
the language {a}. That is L(a)={a}.
(iii) A variable, usually capitalized and such as L is a variable,
representing any language.
Building Regular Expressions
Let E be a regular expression and the language represented
by E is L(E)
Then:
(E) = E
L(E*) = (L(E))*
11
identity Rules for RE
The two regular expression’s P and Q are equivalent (denoted as P=Q) if and
only if P represents the same set of strings as Q does.
For showing the equivalence of two regular expressions we need to show some
identities of regular expression’s
Let P, Q and R be the regular expressions then the identity rules are as follows −
εR=R ε=R
ε*= ε ε is null string (P+Q)R=PR+QR
(Φ)*= ε Φ is empty string (P+Q)*=(P*Q*)*=(P*+Q*)*
ΦR=R Φ= Φ R*(ε+R)=( ε+R)R*=R*
Φ+R=R (R+ε)*=R*
R+R=R ε +R*=R*
RR*=R*R=R+ (PQ)*P=P(QP)*
(R*)*=R* R*R+R=R*R
ε +RR*=R*
12
Example: how to use these regular expression
properties and language operators?
L = { w | w is a binary string which does not
contain two consecutive 0s or two
consecutive 1s anywhere)
E.g., w = 01010101 is in L, while w = 10010 is not
in L • Regular expression for
Goal: Build a regular expression for L the four cases:
Case A: (01)*
Four cases for w: Case B: (10)*
Case A: w starts with 0 and |w| is even Case C: 0(10)*
Case B: w starts with 1 and |w| is even Case D: 1(01)*
Case C: w starts with 0 and |w| is odd
Case D: w starts with 1 and |w| is odd
13
Examples
Write the regular expression for the
language accepting all the string r.e. = (a + b)*
containing any number of a's and b's.
Write the regular expression for the language which accepting all the
strings, having the first symbol should be “b” and last symbol should
be “a” over ∑ = {a, b}.
Solution: Language for given example is given below
L = {ba, baa, baba, bbaa, baaaa, babbbba……….. }
Regular expression for above language is given by: L(R) = b (a+b)* a
15
Precedence of Operators
Highest to lowest
* operator (star)
. (concatenation)
+ operator
Example:
01* + 1 = ( 0 . ((1)*) ) + 1
16
Equivalence between regular expressions
and finite automata
Strategy:
Convert regular expression to an -NFA
-NFA NFA
Theorem 2 Kleene Theorem
Reg Ex DFA
Theorem 1
18
DFA to RE construction
The two popular methods for converting a given DFA to its
regular expression are-
19
DFA to RE construction
DFA Reg Ex
Theorem 1
Informally, trace all distinct paths (traversing cycles only once)
from the start state to each of the final states and enumerate all
the expressions along the way.
1 0 0,1
Example: q0 0 q1 1 q2
1* 00* 1 (0+1)*
Statement − Conditions-
• Let P and Q be two regular To use Arden’s Theorem, following
expressions. conditions must be satisfied-
• If P does not contain null string • The transition diagram must not
have any ∈ transitions.
(I) R = Q + RP has a unique
• There must be only a single initial
solution,
state.
(II) R = QP*
Cont…
Proof −
R = Q + (Q + RP)P [After putting the value R = Q +
RP]
= Q + QP + RPP
When we put the value of R recursively again and again, we get the
following equation −
R = Q + QP + QP2 + QP3…..
R = Q (ϵ + P + P2 + P3 + …. )
R = QP* [As P* represents (ϵ
+ P + P2 + P3 + ….) ]
proved.
Assumptions for Applying Arden’s
Theorem
• The transition diagram must not have NULL
transitions
• It must have only one initial state:
Method
Step 1 − Create equations as the following form for all the states of the DFA
having n states with initial state q1.
q1 = q1R11 + q2R21 + … + qnRn1 + ϵ
q2 = q1R12 + q2R22 + … + qnRn2
…………………………………………………………….
…………………………………………………………….
24
Cont.…
= q1a + q2aa + ε (Substituting value of q3)
= q1a + q1b(b + ab*)aa + ε (Substituting value of q2)
= q1(a + b(b + ab)*aa) + ε
= ε (a+ b(b + ab)*aa)*
= (a + b(b + ab)*aa)*
Hence, the regular expression is (a + b(b + ab)*aa)*.
25
Example 2
Find regular expression for the following
DFA using Arden’s Theorem-
Step-02:
Solution- Bring final state in the form R = Q + RP.
Step-01: Using (1) in (2), we get-
Form a equation for each state- B = (∈ + B.1).0
A = ∈ + B.1 ……(1) B = ∈.0 + B.1.0
B = A.0 ……(2) B = 0 + B.(1.0) ……(3)
Using Arden’s Theorem in (3), we get-
B = 0.(1.0)*
Thus, Regular Expression for the given
DFA = 0(10)*
26
Example 3
Find regular expression for the
following DFA using Arden’s
Theorem-
Step-02:
Solution- Bring final state in the form
Step-01: R = Q + RP.
Form a equation for each state- Using (1) in (2), we get-
q1 = ∈ ……(1) q2 = ∈.a
q2 = q1.a ……(2) q2 = a …….(4)
q3 = q1.b + q2.a + q3.a …….(3) Using (1) and (4) in (3), we get-
q3 = q1.b + q2.a + q3.a
Using Arden’s Theorem in (5), we get- q3 = ∈.b + a.a + q3.a
q3 = (b + a.a)a* q3 = (b + a.a) + q3.a …….(5)
Thus, Regular Expression for the given
DFA = (b + aa)a* 27
Exercise
Construct the regular expression for the following FA
q3
State Elimination Method-
This method involves the following steps in finding the regular
expression for any given DFA-
Thumb Rule
Step-01:
The initial state of the DFA must not have any incoming edge.
• If there exists any incoming edge to the initial state, then create a new
initial state having no incoming edge to it.
Example-
29
State Elimination Method…..
Step-02: Thumb Rule
There must exist only one final state in the DFA.
• If there exists multiple final states in the DFA,
then convert all the final states into non-final
states and create a new single final state.
Example-
30
State Elimination Method…..
Thumb Rule
Step-03: The final state of the DFA must not have any outgoing
edge.
If there exists any outgoing edge from the final state,
then create a new final state having no outgoing edge
from it.
Example-
31
State Elimination Method…..
Step-04:
Eliminate all the intermediate states one by one.
These states may be eliminated in any order.
In the end,
• Only an initial state going to the final state will be left.
• The cost of this transition is the required regular expression.
NOTE: The state elimination method can be applied to any finite automata.
(NFA, ∈-NFA, DFA etc)
32
Example 1
Find regular expression for the following FA-
Solution-
Step-01:
Initial state A has an incoming edge. Step-02:
So, we create a new initial state qi. Final state B has an outgoing
The resulting FA is- edge.
So, we create a new final state
qf.
The resulting FA is-
33
Cont….
Step-03:
Now, we start eliminating the intermediate states.
First, let us eliminate state A.
There is a path going from state qi to state B via state A.
So, after eliminating state A, we put a direct path from state
qi to state B having cost ∈.0 = 0
There is a loop on state B using state A.
So, after eliminating state A, we put a direct loop on state B
having cost 1.0 = 10.
34
Cont…
Step-04:
Now, let us eliminate state B.
• There is a path going from state qi to state qf via state B.
• So, after eliminating state B, we put a direct path from state q i to state qf having
cost 0.(10)*.∈ = 0(10)*
35
Example 2
Find regular expression for the following DFA
Solution-
Step 01:
There exist multiple final states.
So, we convert them into a single final
state.
The resulting FA is
36
Cont…
Step-02:
Now, we start eliminating the intermediate states.
First, let us eliminate state q4.
There is a path going from state q2 to state qf via state q4.
So, after eliminating state q4 , we put a direct path from
state q2 to state qf having cost b.∈ = b.
37
Cont…
Step-03:
Now, let us eliminate state q3.
There is a path going from state q2 to state qf via state q3.
So, after eliminating state q3 , we put a direct path from
state q2 to state qf having cost c.∈ = c.
38
Cont…
Step-04:
Now, let us eliminate state q5.
There is a path going from state q2 to state qf via state q5.
So, after eliminating state q5 , we put a direct path from state q2 to state
qf having cost d.∈ = d.
39
Cont…
Step-05:
Now, let us eliminate state q2.
There is a path going from state q1 to state qf via state q2.
So, after eliminating state q2 , we put a direct path from state q1 to state
qf having cost a.(b+c+d).
40
Example 3
41
Example3:
Step-03:
Now, we start eliminating the intermediate states.
First, let us eliminate state q1.
There is a path going from state qi to state q2 via state q1 .
So, after eliminating state q1, we put a direct path from state qi to state q2 having
cost ∈.c*.a = c*a
There is a loop on state q2 using state q1 .
So, after eliminating state q1 , we put a direct loop on state q2 having cost b.c*.a =
bc*a
Eliminating state q1, we get-
42
Example3:
Step-04:
43
Exercises
Find regular expression for the following DFA-
44
RE to -NFA construction
(Thompson Construction )
Reg Ex -NFA
Theorem 2
(0+1)*01(0+1)*
Example:
(0+1)* 01 (0+1)*
0 0
0 1
1
1
45
Thompson Construction Method
The algorithm works recursively by splitting an expression into its
constituent sub expressions, from which the NFA will be constructed
using a set of rules.
Following are the rules :
46
Cont…
1. The union expression s/t converted to
Start q
a q
qf
b
1 2
Case 3 − For a regular expression (a+b), we can construct the following FA
−
Start q1
a qf
b
a,b
Start qf
Example:-
Start ϵ ϵ
b
Cont…
Step 2: Since closure is required to take next, we construct automation for
(a+b)* using automation for (a+b) ……..
ϵ
a
ϵ ϵ
Start ϵ ϵ
ϵ b ϵ
ϵ
Cont…
ϵ a ϵ ϵ
Star a ϵ ϵ
ϵ b ϵ ϵ
Cont…
ϵ
Cont…
Step 5: Now finally we can construct automation for a.
(a+b)*.b.b
ϵ ϵ a ϵ
Start a ϵ b b
ϵ ϵ
ϵ b ϵ
Algebraic Laws of Regular Expressions
Commutative: Distributive:
E+F = F+E E(F+G) = EF + EG
E+Φ = E Φ* =
E = E = E * =
Annihilator: E+ =EE*
ΦE = EΦ = Φ E? = +E
57
True or False?
Let R and S be two regular expressions. Then:
1. ((R*)*)* = R* ?
2. (R+S)* = R* + S* ?
58
The Pumping Lemma for Regular
Languages
What it is?
The Pumping Lemma is a property of all regular
languages.
How is it used?
A technique that is used to show that a given language is
not regular
59
Pumping Lemma for Regular Languages
Let L be a regular language
Then there exists some constant N such that for every
60
Method to prove that a language
L is not regular
At first, we have to assume that L is regular.
So, the pumping lemma should hold for L.
Use the pumping lemma to obtain a contradiction −
Select w such that |w| ≥ c
Select y such that |y| ≥ 1
Select x such that |xy| ≤ c
Assign the remaining string to z.
Select k such that the resulting string is not in L.
61
Pumping Lemma: Proof
L is regular => it should have a DFA.
Set N := number of states in the DFA
62
Pumping Lemma: Proof…
=> We should be able to break w=xyz as follows:
x=a1a2..ai; y=ai+1ai+2..aJ; z=aJ+1aJ+2..am
x’s path will be p0..pi
y’s path will be pi pi+1..pJ (but pi=pJ implying a loop)
z’s path will be pJpJ+1..pm yk (for k loops)
Now consider another x z
p0 pi pm
string wk=xykz , where k≥0
=pj
Case k=0
DFA will reach the accept state pm
Case k>0
DFA will loop for yk, and finally reach the accept state pm for z
This proves part (3) of the lemma
In either case, wk L
63
Pumping Lemma: Proof…
For part (1): yk (for k loops)
Since i<j, y ≠ p0
x
pi
z
pm
=pj
For part (2):
By PHP, the repetition of states has to occur within
the first N symbols in w
==> |xy|≤N
64
Applications of Pumping Lemma
65
Using the Pumping Lemma
Note: We don’t have any control over N, except that it is positive.
We also don’t have any control over how to split w=xyz,
but xyz should respect the P/L conditions (1) and (2).
68
Cont…
F={ww | w∈{0,1}* } is non-regular
proof:
Suppose F is regular
Let P be the pumping length given by the pumping
lemma
Let s = 0P10P1∈F
Split s into 3 pieces, s =x yz
By condition 3 in the lemma: |x y| ≤ P
Thus y must have 0 only.
⇒ x yyz ∉ F 0…010…01 →← w y
69
Cont….
E={0i1j : i >j } is non-regular
proof:
Assume E is regular
Let P be the pumping length
Let s = 0P+11P∈E
Split s into 3 pieces, s =x yz
By pumping lemma: x yi z∈E for any i ≥ 0
|y |>0, y have 0 only. x z∈E.
But x z has #(0) ≤ #(1)
70
Cont…
n2
D={1 : n ≥ 0} is not regular
proof:
Assume D is regular
Let P be the pumping length
2
Let s = 1P ∈D
Split s into 3 pieces, s =x yz ⇒ x yiz∈D, i ≥ 0
Consider x yiz∈D and x yi+1z∈D
⇒|x yiz| and |x yi+1z| are perfect squre for any i
≥0
If m=n2, (n+1)2 - n2 =2n+1 = 2 +1
71
Cont…
Let m=|x yiz|
|y| ≤ |s |= P2
Let i = P4
|y|= |x yi+1z|-|x yiz|
≤ P2 = (P4)1/2
< 2(P4)1/2+1
≤ 2(|x yiz|)1/2+1
=2 +1
→←
72
Example of using the Pumping Lemma to prove that a
language is not regular
Let Leq = {w | w is a binary string with equal number of 1s
and 0s}
Your Claim: Leq is not regular
Proof:
adv.
By contradiction, let Leq be regular
P/L constant should exist adv.
Let N = that P/L constant
you
Consider input w = 0N1N
(your choice for the template string)
you
By pumping lemma, we should be able to break w=xyz,
such that:
1) y≠
2) |xy|≤N
3) For all k≥0, the string xykz is also in L 73
Template string w = 0N1N = 00 …. 011 … 1
N N
Proof…
Because |xy|≤N, xy should contain only 0s you
(This and because y≠ , implies y=0+)
Therefore x can contain at most N-1 0s
Also, all the N 1s must be inside z
By (3), any string of the form xykz Leq for all k≥0
Setting k=0 is Case k=0: xz has at most N-1 0s but has N 1s
referred to as
“pumping down” Therefore, xy0z Leq
This violates the P/L (a contradiction)
(N-1)2 < N2 - N ≤ #zeros (xy0z) ≤ N2 - 1 < N2
xy0z L
But the above will complete the proof ONLY IF N>1.
… (proof contd.. Next slide)
75
Example 3: Pumping Lemma
(proof contd…)
If the adversary pick N=1, then (N-1)2 ≤ N2 – N, and therefore the
#zeros(xy0z) could end up being a perfect square!
This means that pumping down (i.e., setting k=0) is not giving us the
proof!
So lets try pumping up next…
Case k=2:
#zeros (xy2z) = #zeros (xyz) + #zeros (y)
N2 + 1 ≤ #zeros (xy2z) ≤ N2 + N
N2 < N2 + 1 ≤ #zeros (xy2z) ≤ N2 + N < (N+1)2
xy2z L
(Notice that the above should hold for all possible N values of N>0.
Therefore, this completes the proof.)
76
Closure properties for Regular
Languages (RL) This Thisisisdifferent
different
from
fromKleene
Kleene
closure
Closure property: closure
If a set of regular languages are combined using an
Reversal
Kleene closure
Concatenation
77
RLs are closed under union
if L and M are two RLs then:
they both have two corresponding regular expressions,
R and S respectively
(L U M) can be represented using the regular
expression R+S
Therefore, (L U M) is also regular
78
RLs are closed under
complementation
If L is an RL over ∑, then L=∑*-L
To show L is also regular, make the following construction
Convert every final state into non-final, and
every non-final state into a final state
q0 qi qF2 q0 qi qF2
…
…
qFk qFk
80
DFA construction for L ∩ M
AL = DFA for L = {QL, ∑ , qL,FL, δL }
AM = DFA for M = {QM, ∑ , qM,FM, δM }
Build AL ∩ M = {QLx QM,∑, (qL,qM), FLx FM,δ} such
that:
δ((p,q),a) = (δL(p,a), δM(q,a)), where p in QL, and q in
QM
This construction ensures that a string w will be
accepted if and only if w reaches an accepting
state in both input DFAs.
81
DFA construction for L ∩ M
DFA for L DFA for M
qF1 pF1
a a
q0 qi qj qF2 p0 pi pj pF2
…
DFA for LM
(qF1 ,pF1)
a
(qi ,pi) (qj ,pj)
…
(q0 ,p0)
82
RLs are closed under set
difference
Closed under intersection
We observe: Closed under
L-M=L∩M complementation
83
RLs are closed under reversal
Reversal of a string w is denoted by wR
E.g., w=00111, wR=11100
Reversal of a language:
LR = The language generated by reversing all
strings in L
84
-NFA Construction for LR
New -NFA for LR
DFA for L
qF1
a
q0 qi qj qF2 q’0 New start
state
…
Make the
old start state
as the only new qFk
final state
Here we are using two Machines for finding the Finite Automata Output
Start q1 0/1 q2
1/0
95