chapter twoRegular_anguage
chapter twoRegular_anguage
Example 1:
ab* specifies the strings starting with a followed by 0 or
more number of b’s,
(ab)* specifies 0 or more repetitions of ab
Example 2:
For Σ={a,b} the expression
r = (a+b)*(a+bb) is regular.
It denotes the language L(r)={a,bb,aa,abb,ba,bbb,.....} So,
L(r) is the set of all strings on {a,b}, terminated by either
an a or a bb.
Example 3:
r=(aa)*(bb)*b denotes the set of all strings with an even
number of a’s followed by an odd number of b’s that is
L(r)={a2nb2m+1: n>=0,m>=0}
Example 4:
r=(0+1)*00(0+1)*
Algebra of regular expressions
Identity laws
a. ε. R =R. ε = R
b. Ø + R = R+ Ø = R
Idempotent laws
R+R=R
(R*)*=R*
Distributive laws
A.(B+C)=A.B+A.C
Associative laws
A.(B.C)=(A.B).C
A+(B+C)=(A+B)+C
Regular Grammars
A language is said to be regular if it can be represented
with a regular grammar. Regular languages are equivalent
to type 3 grammars.
The Linear Grammars are either left or right:
Right Linear Grammars:
Rules of the forms
A → ε
A → a
A → aB
Left Linear Grammars:
Rules of the forms
A → ε
A → a
A → Ba
Transform the following Right Linear
grammar in an equivalent NFAε.
S → aS | bA
A → cA | ε
Right linear Grammar
A -> aB
1. A is a single symbol (corresponding to a state) called a ‘non-terminal symbol’
2. a corresponds to a lexical item
3. B is a single non-terminal symbol.
Formal definition of Right Linear Grammars
A right linear grammar is a 4-tuple <T, N, S, R>, where:
1. N is a finite set of non-terminals
2. T is a finite set of terminals, including the empty string
3. S is the start symbol
4. R is a finite set of rewriting rules of the form A-> xB or A-> x, where A and B
stand for non-terminals and x stands for a terminal.
Formal example:
G1 = <T, N, S, R>, where T = {a, b}, N = {S, A, B}, and
R=
S -> aA
A -> aA
A -> bB
B -> bB
In a left regular grammar (also called
left linear grammar), all rules obey the forms
A → a - where A is a non-terminal in N and a is a terminal
in Σ
A → Ba - where A and B are in N and a is in Σ
A → ε - where A is in N and ε is the empty string.
An example of a left regular grammar G with N = {S, A},
Σ = {a, b, c}, P consists of the following rules
S → Sa
S → Ab
A→ε
A → cA
and S is the start symbol. This grammar describes the
same language as the regular expression a*bc*.
A regular grammar is a left or right regular grammar.
Relation between regular
language and Regular expression
They are equivalent:
With every regular expression we can associate a
regular language.
Conversely, every regular language can be obtained
from a regular expression.
Examples:
–Regular expression = ab*c
–Regular language = {ac, abc, abbc, ….}
Let Σ be an alphabet. The regular expressions over Σ
are:
Ø Represents the empty set { }
ε Represents the set {ε}
a Represents the set {a}, for any symbol a in Σ
Con’t
For Ø:
For ε:
For a:
Types of automata
There are four basic types of automata,
distinguished by the following
characteristics:
FSA have no memory, regular
grammars
Pushdown automata -In addition to the tape, they use
a stack to read from and write to,
-context-free grammars
Linear-bound automata -read and write on a tape of finite
length in both directions
- context sensitive grammars
Turing machine -read and write on an infinite tape
in both directions
Finite Automata
An abstract machine which can be used to implement
regular expressions (etc.).
Has a finite number of states, and a finite amount of
memory (i.e., the current state).
Can be represented by directed graphs(state transition
diagrams) or transition tables
Representation
An FSA may be represented as a directed graph; each
node (or vertex) represents a state, and the edges (or
arcs) connecting the nodes represent transitions.
Each state is labeled.
Each transition is labeled with a symbol from the
alphabet over which the regular language represented
by the FSA is defined, or with e, the empty string.
Con’t
Among the FSA’s states, there is a start state and at least
one final state (or accepting state).
Given an input string, an FSA will either accept or reject the
input.
If the FSA is in a final (or accepting) state after all input
symbols have been consumed, then the string is accepted
(or recognized).
Otherwise (including the case in which an input symbol
cannot be consumed), the string is rejected.
Informally, a state diagram that comprehensively captures
all possible states and transitions that a machine can take
while responding to a stream or sequence of input symbols
Recognizer for “Regular Languages”
Deterministic Finite Accepters
The first types of automaton we study in detail are finite
accepters that are deterministic in their operation. We start
with a precise formal definition of deterministic accepters. A
deterministic acceptor has internal states, rules for
transitions from one state to another, some input, and ways
of making decisions.
Definition:
A DFA is defined by the quintuple
M = (Q, Σ, δ, q0, F)
Q A finite set of states
Σ A finite input alphabet
q0 The initial/starting state, q 0 is in Q
F A set of final/accepting states, which is a subset of Q
δ A transition function, which is a total function from Q x Σ
to Q
A deterministic finite accepter operates in the following manner. At
the initial time, it is assumed to be in the initial state q0, with its input
mechanism on the leftmost symbol of the input string. During each
move of the automaton, the input mechanism advances one position
to the right, so each move consumes one input symbol. When the end
of the string is reached, the string is accepted if the automaton is one
of its final states. Otherwise the string is rejected. The input
mechanism can move only from left to right and reads exactly one
symbol on each step. The transition from one internal state to another
are governed by the transition function δ. For example
δ(q0,a)= q1. If the dfa is in state q0 and the current input symbol is a,
the dfa will go into state q1.
The graph below represents the dfa
M = ({q0, q1, q2} , {0,1}, δ, q0,{ q1})
Where δ is given by
δ(q0,0) = q0,
δ(q0,1) = q1
δ(q1,0) = q0,
δ(q1,1) = q2,
δ(q2,0) = q2,
δ(q2,1) = q1,
Cont..
The string 01 is accepted. The dfa does not accept the
string 00, since after reading two consecutive 0’s, it will be in
state q0. By similar reasoning, we see that the automaton
will accept the strings 101, 0111, and 11001, but not 100 or
1100.
The language accepted by a dfa M=(Q,∑, δ, q0,F) is the set
of all strings on ∑ accepted by M. In formal notation,
L(M)={wÎ∑*: δ*(q0,w) ÎF}.
A dfa will process every string in ∑* and either accept it or nor accept it.
Non acceptance means dfa stops in a non final state.
Theorem
Let M=(Q, Σ, δ,q0,F) be a deterministic finite accepter, and
let GM be its associated transition graph. Then for every qi, qj
Î Q and w Î Σ* , δ*(qi,w)=qj, if and only if there is in GM a walk
with label w from qi to qj.
The following automaton is an example for trap state or
dead state i.e a state is a dead state or trap state if it is not an
accepting state and has no out-going transitions except to itself.
Nondeterministic Finite State Automata (NFA)
An NFA is an automaton that its states might have none, one or more outgoing
arrows under a specific symbol. Example
• An NFA is a five-tuple:
M = (Q, Σ, δ, q0, F)
• F = {q2}
δ: 0 1
qo {q0, q1} {}
q1 {} {q1, q2}
q2 {q2} {q2}
An NFA for the language of all strings over {a,b} that contain ababb
Example
State A B
A B C
B B D
C B C
D B E
E B C
Transition Table:
Cont..
Transition table:
Input
State
A B
Start A B A
B B D
D B E
Accept E B A
The transition diagram for the optimized or minimized DFA
Closure properties of regular languages
We have seen already union concatenation kleene star
properties. Now let us move on to compliment
All we did was to make the accepting states be non-accepting,
and make the non accepting states be accepting
In terms of the 5-tuple M = (Q, Σ, δ, q0, F), all we did was to
replace F with Q-F
Using this construction, we have a proof that the complement
of any regular language is another regular language.
Refer the below diagram. The regular languages
are closed under complement.
Cont..
Cont..
Intersection
We can cross product the two DFAs as:
LM = (QLxQM, , LM, (qL, qM), FLxFM)
Demorgan’s law