Kuk B.tech Cse Automata Theory
Kuk B.tech Cse Automata Theory
THEORY
B.Tech (CSE)
Notes
Prepared By:
Automata Theory TOPPERWORLD.IN
UNIT - I
Introduction to Finite Automata: Structural Representations, Automata and
Complexity, the Central Concepts of Automata Theory – Alphabets, Strings,
Languages, Problems.
Nondeterministic Finite Automata: Formal Definition, an application, Text
Search, Finite Automata with Epsilon-Transitions.
Deterministic Finite Automata: Definition of DFA, How A DFA Process
Strings, The language of DFA, Conversion of NFA with €-transitions to NFA
without €-transitions. Conversion of NFA to DFA, Moore and Melay machines
Regular Expressions: Finite Automata and Regular Expressions, Applications of
Regular Expressions, Algebraic Laws for Regular Expressions, Conversion of Finite
Automata to Regular Expressions.
Pumping Lemma for Regular Languages, Statement of the pumping lemma,
Applications of the Pumping Lemma.
Closure Properties of Regular Languages: Closure properties of Regular
languages, Decision Properties of Regular Languages, Equivalence and
Minimization of Automata.
UNIT - II
Context-Free Grammars: Definition of Context-Free Grammars, Derivations
Using a Grammar,
Leftmost and Rightmost Derivations, the Language of a Grammar, Sentential Forms,
Parse Tress,
Applications of Context-Free Grammars, Ambiguity in Grammars and
Languages. Push
Down Automata: Definition of the Pushdown Automaton, the Languages of a PDA,
Equivalence of
PDA's and CFG's, Acceptance by final state, Acceptance by empty stack,
Deterministic Pushdown Automata. From CFG to PDA, From PDA to CFG
UNIT - III
Normal Forms for Context- Free Grammars: Eliminating useless symbols,
Eliminating €Productions. Chomsky Normal form Griebech Normal form.
Automata Theory TOPPERWORLD.IN
UNIT - IV
Types of Turing machine: Turing machines and halting
Undecidability: Undecidability, A Language that is Not Recursively Enumerable,
An Undecidable Problem That is RE, Undecidable Problems about Turing Machines,
Recursive languages, Properties of recursive languages, Post's Correspondence
Problem, Modified Post Correspondence problem, Other Undecidable Problems,
Counter machines.
Automata Theory TOPPERWORLD.IN
The term "Automata" is derived from the Greek word "αὐτόματα" which means
"self-acting". An automaton (Automata in plural) is an abstract self-propelled
computing device which follows a predetermined sequence of operations automatically.
Structural Representations
The purpose of a logical framework such as LF is to provide a language for
defining logical systems suitable for use in a logic-independent proof development
environment. All inferential activity in an object logic (in particular, proof search) is to
be conducted in the logical framework via the representation of that logic in the
framework. An important tool for controlling search in an object logic, the need for which
is motivated by the difficulty of reasoning about large and complex systems, is the use of
structured theory presentations. In this paper a rudimentary language of structured theory
presentations is presented, and the use of this structure in proof search for an arbitrary
object logic is explored. The behaviour of structured theory presentations under
representation in a logical framework is studied, focusing on the problem of “lifting”
presentations from the object logic to the metalogic of the framework. The topic of
imposing structure on logic presentations, so that logical systems may themselves be
defined in a modular fashion, is also briefly consider.
Alphabet
String
A string or word is a finite sequence of symbols chosen from ∑ (epsilon) Empty string
is donated by λ (lambda).
Length of a String
Powers of an alphabet
∑* = ∑0 U ∑1 U ∑2 U … ∑+ = ∑1 U ∑2 U ∑3 U …
• Definition − The set ∑+ is the infinite set of all possible strings of all possible
lengths over ∑ excluding λ.
• Representation − ∑+ = ∑1 ∑2 ∑3 …….
∑+ = ∑* − { λ }
Language
Example –
Automata Theory TOPPERWORLD.IN
1. If the language takes all possible strings of length 2 over ∑ = {a, b}, then L = {
ab, bb, ba, bb}
Exercise: Prove that for any language L, and any m _ 0, n _ 0, LmLn = Lm+n.
– Kleene closure (or star closure): L_ = L0 [ L1 [ L2 [ . . . .
In NDFA, for a particular input symbol, the machine can move to any combination of
the states in the machine. In other words, the exact state to which the machine moves
cannot be determined. Hence, it is called Non-deterministic Automaton. As it has finite
number of states, the machine is called Nondeterministic Finite Machine or Non-
deterministic Finite Automaton.
(Here the power set of Q (2Q) has been taken because in case of NDFA, from a
state, transition can occur to any combination of Q states)
• q0 is the initial state from where any input is processed (q0 Q).
a function from Q x S to 2Q
(but not to Q)
• Let M = (Q, Σ, δ,q0,F) be an NFA. Then the language accepted by M is the set:
L(M) = {w | w is in Σ* and δ({q0},w) contains at least one state in F}
Example
ON, starting state is OFF and collection of states are OFF and ON. It is having only a
single input PUSH for making the transition from the state OFF to ON, then ON to OFF.
The switch is the simplest practical application of finite automata.
Acceptor (Recognizer)
An automaton that computes a Boolean function is called an acceptor. All the states of
an acceptor is either accepting or rejecting the inputs given to it.
Classifier
A classifier has more than two final states and it gives a single output when it
terminates. Transducer
An automaton that produces outputs based on current input and/or previous state is
called a transducer.
Transducers can be of two types −
• Mealy Machine − The output depends both on the current state and the current
input.
• Moore Machine − The output depends only on the current state.
Example
Example
• Q = {a, b, c},
• ∑ = {0, 1},
• q0 = {a},
• F = {c}, and
Automata Theory TOPPERWORLD.IN
Present state next state for input 0 next state for input 1
A A B
B C A
C B C
DFA vs NDFA
DFA NDFA
• Empty string transitions are not seen • NDFA permits empty string
in DFA. transitions.
• Backtracking is allowed in DFA • In NDFA, backtracking is not
always possible.
• Requires more space. • Requires less space.
The following table lists the differences between DFA and NDFA.
Automata Theory TOPPERWORLD.IN
Acceptor (Recognizer)
An automaton that computes a Boolean function is called an acceptor. All the states of
an acceptor is either accepting or rejecting the inputs given to it.
Classifier
A classifier has more than two final states and it gives a single output when it
terminates.
Transducer
An automaton that produces outputs based on current input and/or previous state is
called a transducer.
Transducers can be of two types −
• Mealy Machine − The output depends both on the current state and the current
input.
• Moore Machine − The output depends only on the current state.
A string is accepted by a DFA/NDFA iff the DFA/NDFA starting at the initial state
ends in an accepting state (any of the final states) after reading the string wholly.
{S | S ∈ ∑* and δ*(q0, S) ∈ F}
Example
Let us consider the DFA shown in Figure 1.3. From the DFA, the acceptable strings can
be derived.
Automata Theory TOPPERWORLD.IN
Strings accepted by the above DFA: {0, 00, 11, 010, 101,
111, ........}
• Notes:
– A DFA M = (Q, Σ, δ,q0,F) partitions the set
Σ* into two sets: L(M) and Σ* - L(M).
Eliminating ε Transitions
NFA with ε can be converted to NFA without ε, and this NFA without ε can be
converted to DFA. To do this, we will use a method, which can remove all the ε
transition from given NFA. The method will be:
1. Find out all the ε transitions from each state from Q. That will be called as
εclosure{q1} where qi Q.
2. Then δ' transitions can be obtained. The δ' transitions mean a ε-closure on δ
moves.
3. Repeat Step-2 for each input symbol and each state of given NFA.
4. Using the resultant states, the transition table for equivalent NFA without ε
can be built.
Example:
Convert the following NFA with ε to NFA without ε.
Automata Theory TOPPERWORLD.IN
ε-closure(q0) = {q0} ε-
closure(q1) = {q1, q2} ε-
closure(q2) = {q2}
δ'(q0, a) = {q0,
q1} δ'(q0, b) =
Ф δ'(q1, a) = Ф
δ'(q1, b) = {q2}
δ'(q2, a) = Ф
δ'(q2, b) = {q2}
States a b
→q0 Ф
{q1,
q2}
*q1 Ф {q2}
*q2 Ф {q2}
State q1 and q2 become the final state as ε-closure of q1 and q2 contain the final
state q2. The NFA can be shown by the following transition diagram:
Automata Theory TOPPERWORLD.IN
Let, M = (Q, ∑, δ, q0, F) is an NFA which accepts the language L(M). There should be
equivalent DFA denoted by M' = (Q', ∑', q0', δ', F') such that L(M) = L(M').
Step 2: Add q0 of NFA to Q'. Then find the transitions from this start state.
Step 3: In Q', find the possible set of states for each input symbol. If this set of states is
not in Q', then add it to Q'.
Step 4: In DFA, the final state will be all the states which contain F(final states of NFA)
Example 1:
Convert the NFA to DFA.
Solution: For the given transition diagram we will first construct the transition table.
State 0 1
→q0 q0 q1
Automata Theory TOPPERWORLD.IN
q1 q1
{q1,
q2}
*q2 q2
{q1,
q2}
Automata Theory TOPPERWORLD.IN
State 0 1
[q1] [q1]
[q1, q2]
*[q2] [q2]
[q1, q2]
Example 2:
Convert the given NFA to DFA.
Solution: For the given transition diagram we will first construct the transition
table.
State 0 1
Automata Theory TOPPERWORLD.IN
*q1 ϕ
{q0,
q1}
Now we will obtain δ' transition for state q0.
δ'([q1], 0) = ϕ
δ'([q1], 1) = [q0, q1]
Similarly,
As in the given NFA, q1 is a final state, then in DFA wherever, q1 exists that
state becomes a final state. Hence in the DFA, final states are [q1] and [q0, q1].
Therefore set of final states F = {[q1], [q0, q1]}.
State 0 1
Suppose
A = [q0]
B = [q1]
C = [q0, q1]
Where
NFA with ∈ move: If any FA contains ε transaction or move, the finite automata is called
NFA with ∈ move.
ε-closure: ε-closure for a given state A means a set of states which can be reached from
the state A with only ε(null) move including the state A itself.
Step 2: Find the states for each input symbol that can be traversed from the present. That
means the union of transition value and their closures for each state of NFA present in the
current state of DFA.
Step 3: If we found a new state, take it as current state and repeat step 2.
Step 4: Repeat Step 2 and Step 3 until there is no new state present in the transition table
of DFA.
Automata Theory TOPPERWORLD.IN
Automata Theory TOPPERWORLD.IN
Now,
For state C:
Example 2:
Convert the given NFA into its equivalent DFA.
Automata Theory TOPPERWORLD.IN
Automata Theory TOPPERWORLD.IN
Automata Theory TOPPERWORLD.IN
As A = {q0, q1, q2} in which final state q2 lies hence A is final state. B = {q1, q2} in which
the state q2 lies hence B is also final state. C = {q2}, the state q2 lies hence C is also a
final state.
Automata Theory TOPPERWORLD.IN
Moore Machine
Moore machine is a finite state machine in which the next state is decided by the
current state and current input symbol. The output symbol at a given time
depends only on the present state of the machine. Moore machine can be
described by 6 tuples (Q, q0, ∑, O, δ, λ) where,
Example 1:
The state diagram for Moore Machine is
In the above Moore machine, the output is represented with each input state separated by
/. The output length for a Moore machine is greater than input by 1.
Input: 010
Output: 1110(1 for q0, 1 for q1, again 1 for q1, 0 for q2)
Example 2:
Design a Moore machine to generate 1's complement of a given binary number.
Solution: To generate 1's complement of a given binary number the simple logic is that if
the input is 0 then the output will be 1 and if the input is 1 then the output will be 0. That
means there are three states. One state is start state. The second state is for taking 0's as
input and produces output as 1. The third state is for taking 1's as input and producing
output as 0.
Input 1 0 1 1
State q0 q2 q1 q2 q2
Output 0 0 1 0 0
Thus we get 00100 as 1's complement of 1011, we can neglect the initial 0 and
the output which we get is 0100 which is 1's complement of 1011. The
transaction table is as follows:
Automata Theory TOPPERWORLD.IN
Thus Moore machine M = (Q, q0, ∑, O, δ, λ); where Q = {q0, q1, q2}, ∑ = {0, 1},
O = {0, 1}. the transition table shows the δ and λ functions.
Mealy Machine
A Mealy machine is a machine in which output symbol depends upon the present
input symbol and present state of the machine. In the Mealy machine, the output
is represented with each input symbol for each state separated by /. The Mealy
machine can be described by 6 tuples (Q, q0, ∑, O, δ, λ') where
Example 1:
Design a Mealy machine for a binary input sequence such that if it has a substring 101, the
machine output A, if the input has substring 110, it outputs B otherwise it outputs C.
Solution: For designing such a machine, we will check two conditions, and those are 101
and 110. If we get 101, the output will be A. If we recognize 110, the output will be B. For
other strings the output will be C.
Now we will insert the possibilities of 0's and 1's for each state. Thus the Mealy machine
becomes:
Automata Theory TOPPERWORLD.IN
Example 2:
Design a mealy machine that scans sequence of input of 0 and 1 and generates output 'A' if
the input string terminates in 00, output 'B' if the string terminates in 11, and output 'C'
otherwise.
The following steps are used for converting Mealy machine to the Moore machine:
Step 1: For each state(Qi), calculate the number of different outputs th at are available in
the transition table of the Mealy machine.
Step 2: Copy state Qi, if all the outputs of Qi are the same. Break qi into n states as Qin, if
it has n distinct outputs where n = 0, 1, 2....
Step 3: If the output of initial state is 0, insert a new initial state at the starting which gives
1 output.
Automata Theory TOPPERWORLD.IN
Example 1:
Convert the following Mealy machine into equivalent Moore machine.
Solution:
o For state q1, there is only one incident edge with output 0. So, we don't need to split
this state in Moore machine.
o For state q2, there is 2 incident edge with output 0 and 1. So, we will split this state
into two states q20( state with output 0) and q21(with output 1).
o For state q3, there is 2 incident edge with output 0 and 1. So, we will split this state
into two states q30( state with output 0) and q31( state with output 1).
o For state q4, there is only one incident edge with output 0. So, we don't need to split
this state in Moore machine.
Automata Theory TOPPERWORLD.IN
Example 2:
Convert the following Mealy machine into equivalent Moore machine.
Solution:
The state q1 has only one output. The state q2 and q3 have both output 0 and 1. So we will
create two states for these states. For q2, two states will be q20(with output 0) and
q21(with output 1). Similarly, for q3 two states will be q30(with output 0) and q31(with
output 1).
We cannot directly convert Moore machine to its equivalent Mealy machine because the
length of the Moore machine is one longer than the Mealy machine for the given input. To
convert Moore machine to Mealy machine, state output symbols are distributed into input
symbol paths. We are going to use the following method to convert the Moore machine to
Mealy machine.
Example 1:
Convert the following Moore machine into its equivalent Mealy machine.
Automata Theory TOPPERWORLD.IN
Solution:
Q A b Output(λ)
q0 q0 q1 0
q1 q0 q1 1
Hence the transition table for the Mealy machine can be drawn as follows:
Automata Theory TOPPERWORLD.IN
Note: The length of output sequence is 'n+1' in Moore machine and is 'n' in the
Mealy machine.
Automata Theory TOPPERWORLD.IN
Example 2:
Convert the given Moore machine into its equivalent Mealy machine.
Solution:
Q a b Output(λ)
q0 q1 q0 0
q1 q1 q2 0
Automata Theory TOPPERWORLD.IN
q2 q1 q0 1
Hence the transition table for the Mealy machine can be drawn as follows:
UNIT -11
Regular Expressions:
Let r and s be regular expressions that represent the sets R and S, respectively.
– r+s Represents the set R U S (precedence 3)
– rs Represents the set RS (precedence 2)
*
– r Represents the set R* (highest precedence)
– (r) Represents the set R (not an op, provides
precedence)
Identities:
1. Øu = uØ = Ø Multiply by 0
2. εu = uε = u Multiply by 1
3. Ø* = ε
4. ε* = ε
5. u+v = v+u
6. u+Ø=u
6. u + u = u 8.
7. u* = (u*)*
9. u(v+w) = uv+uw
= (u+vu*)*
= (u*v*)*
Formal Language Automata and Automata Theory TOPPERWORLD.IN
= u*(vu*)*
= (u*v)*u*
transition is always possible, since epsilon (or the empty string) can be said to
exist between any two input symbols. We can show that such epsilon transitions
are a notational convenience; for every FA with epsilon transitions there is a
corresponding FA without them.
In this article, we will see some popular regular expressions and how we can
convert them to finite automata.
• Even number of a’s : The regular expression for even number of a’s is
(b|ab*ab*)*. We can construct a finite automata as shown in Figure 1.
The above automata will accept all strings which have even number of a’s.
For zero a’s, it will be in q0 which is final state. For one ‘a’, it will go from
q0 to q1 and the string will not be accepted. For two a’s at any positions, it
will go from q0 to q1 for 1st ‘a’ and q1 to q0 for second ‘a’. So, it will
accept all strings with even number of a’s.
• String with ‘ab’ as substring : The regular expression for strings with ‘ab’
as substring is (a|b)*ab(a|b)*. We can construct finite automata as shown
in Figure 2.
• The above automata will accept all string which have ‘ab’ as substring. The
automata will remain in initial state q0 for b’s. It will move to q1 after
reading ‘a’ and remain in same state for all ‘a’ afterwards. Then it will
Formal Language Automata and Automata Theory TOPPERWORLD.IN
move to q2 if ‘b’ is read. That means, the string has read ‘ab’ as substring if
it reaches q2.
• String with count of ‘a’ divisible by 3 : The regular expression for strings
with count of a divisible by 3 is {a3n | n >= 0}. We can construct automata
as shown in Figure 3.
Formal Language Automata and Automata Theory TOPPERWORLD.IN
The above automata will accept all string of form a3n. The automata will
remain in initial state q0 for ɛ and it will be accepted. For string ‘aaa’, it will
move from q0 to q1 then q1 to q2 and then q2 to q0. For every set of three
a’s, it will come to q0, hence accepted. Otherwise, it will be in q1 or q2,
hence rejected.
The above automata will accept all binary numbers divisible by 3. For 1001,
the automata will go from q0 to q1, then q1 to q2, then q2 to q1 and finally
q2 to q0, hence accepted. For 0111, the automata will go from q0 to q0, then
q0 to q1, then q1 to q0 and finally q0 to q1, hence rejected.
Formal Language Automata and Automata Theory TOPPERWORLD.IN
• String with regular expression (111 + 11111)* : The string accepted using
this regular expression will have 3, 5, 6(111 twice), 8 (11111 once and 111
once), 9 (111 thrice), 10 (11111 twice) and all other counts of 1 afterwards.
The DFA corresponding to given regular expression is given in Figure 5.
Formal Language Automata and Automata Theory TOPPERWORLD.IN
• Note: Throughout the following, keep in mind that a string is accepted by an NFA-ε if
there exists a path from the start state to a final state.
• Lemma 1: Let r be a regular expression. Then there exists an NFA-ε M such that L(M) =
L(r). Furthermore, M has exactly one final state with no transitions out of it.
• Proof: (by induction on the number of operators, denoted by OP(r), in r).
• Basis: OP(r) = 0
• Inductive Hypothesis: Suppose there exists a k 0 such that for any regular expression r
where 0 OP(r) k, there exists an NFA-ε such that L(M) = L(r). Furthermore, suppose that M
has exactly one final state.
• Inductive Step: Let r be a regular expression with k + 1 operators (OP(r) = k + 1), where
k + 1 >= 1.
Case 1) r = r1 + r2
Since OP(r) = k +1, it follows that 0<= OP(r1), OP(r2) <= k. By the inductive hypothesis there
exist NFA-ε machines M1 and M2 such that L(M1) = L(r1) and L(M2) = L(r2). Furthermore,
both M1 and M2 have exactly one final state.
Case 2) r = r1r2
Since OP(r) = k+1, it follows that 0<= OP(r 1), OP(r 2) <= k. By the inductive
hypothesis there exist NFA -ε machines M 1 and M 2 such that L(M 1) = L(r 1) and
L(M2) = L(r2). Furthermore, both M1 and M2 have exactly one final state.
Case 3) r = r1*
Since OP(r) = k+1, it follows that 0<= OP(r1) <= k. By the inductive hypothesis there exists an NFA-ε
machine M1 such that L(M1) = L(r1). Furthermore, M1 has exactly one final state.
Formal Language Automata and Automata Theory TOPPERWORLD.IN
Examples:
Solution: r = r1r2 r1 = 0
r2 = (0+1)*
r2 = r3* r3 = 0+1
r3 = r4 + r5 r4 = 0
r5 = 1
Transition graph:
Formal Language Automata and Automata Theory TOPPERWORLD.IN
1.Let M = (Q, Σ, δ, q1, F) be a DFA with state set Q = {q1, q2, …, qn},
and define: Ri,j = { x | x is in Σ* and δ(qi,x) = qj
Ri,j is the set of all strings that define a path in M from qi to qj.
Observations;
Formal Language Automata and Automata Theory TOPPERWORLD.IN
Formal Language Automata and Automata Theory TOPPERWORLD.IN
Basis: k=0
R0i,j contains single symbols, one for each transition from qi to qj, and possibly ε if i=j.
r0i,j = Ø
Inductive Hypothesis:
n And 0
Formal Language Automata and Automata Theory TOPPERWORLD.IN
Suppose that Rk-1i,j can be represented by the regular expression rk-1i,j for all
1 i,j n, and some k 1.
Formal Language Automata and Automata Theory TOPPERWORLD.IN
Formal Language Automata and Automata Theory TOPPERWORLD.IN
and more generally string processing, where the data need not be textual.
same language.
Specifically, the pumping lemma says that for any regular language L there exists
a constant p such that any word w in L with length at least p can be split into
three substrings, w = xyz, where the middle portion y must not be empty, such
that the words xz, xyz, xyyz, xyyyz, … constructed by repeating y zero or more
times are still in L. This process of repetition is known as "pumping". Moreover,
the pumping lemma guarantees that the length of xy will be at most p, imposing a
limit on the ways in which w may be split. Finite languages vacuously satisfy the
pumping lemma by having p equal to the maximum string length in L plus one.
1. |y| ≥ 1,
2. |xy| ≤ p, and
3. xynz L for all n ≥ 0.
y is the substring that can be pumped (removed or repeated any number of times,
and the resulting string is always in L). (1) means the loop y to be pumped must
be of length at least one; (2) means the loop must occur within the first p
characters. |x| must be smaller than p (conclusion of (1) and (2)), but apart from
that, there is no restriction on x and z.
Formal Language Automata and Automata Theory TOPPERWORLD.IN
In simple words, for any regular language L, any sufficiently long word w (in L)
can be split into 3 parts. i.e. w = xyz , such that all the strings xynz for n ≥ 0 are also
in L.
The pumping lemma is often used to prove that a particular language is non-
regular: a proof by contradiction (of the language's regularity) may consist of
exhibiting a word (of the required length) in the language that lacks the property
outlined in the pumping lemma.
For example, the language L = {anbn : n ≥ 0} over the alphabet Σ = {a, b} can be
shown to be non-regular as follows. Let w, x, y, z, p, and i be as used in the formal
statement for the pumping lemma above. Let w in L be given by w = apbp. By the
pumping lemma, there must be some decomposition w = xyz with |xy| ≤ p and |y| ≥
1 such that xyiz in L for every i ≥ 0. Using |xy| ≤ p, we know y only consists of
instances of a. Moreover, because |y| ≥ 1, it contains at least one instance of the
letter a. We now pump y up: xy2z has more instances of the letter a than the letter b,
since we have added some instances of a without adding instances of b. Therefore,
xy2z is not in L. We have reached a contradiction. Therefore, the assumption that L
is regular must be incorrect. Hence L is not regular.
The proof that the language of balanced (i.e., properly nested) parentheses is not
regular follows the same idea. Given p, there is a string of balanced parentheses
that begins with more than p left parentheses, so that y will consist entirely of left
parentheses. By repeating y, we can produce a string that does not contain the same
number of left and right parentheses, and so they cannot be balanced.
Formal Language Automata and Automata Theory TOPPERWORLD.IN
For every regular language there is a finite state automaton (FSA) that accepts the
language. The number of states in such an FSA are counted and that count is used
as the pumping length p. For a string of length at least p, let q0 be the start state
and let q1, ..., qp be the sequence of the next p states visited as the string is emitted.
Because the FSA has only p states, within this sequence of p + 1 visited states
there must be at least one state that is repeated. Write qs for such a state. The
transitions that take the machine from the first encounter of state qs to the second
encounter of state qs match some string. This string is called y in the lemma, and
since the machine will match a string without the y portion, or with the string y
repeated any number of times, the conditions of the lemma are satisfied.
The FSA accepts the string: abcd. Since this string has a length at least as large as
the number of states, which is four, the pigeonhole principle indicates that there
must be at least one repeated state among the start state and the next four visited
states. In this example, only q1 is a repeated state. Since the substring bc takes the
machine through transitions that start at state q1 and end at state q1, that portion
could be repeated and the FSA would still accept, giving the string abcbcd.
Alternatively, the bc portion could be removed and the FSA would still accept
giving the string ad. In terms of the pumping lemma, the string abcd is broken
into an x portion a, a y portion bc and a z portion d.
Formal Language Automata and Automata Theory TOPPERWORLD.IN
such that every string uwv in L with |w| ≥ p can be written in the form uwv =
From this, the above standard version follows a special case, with both u and v
being the empty string.
Since the general version imposes stricter requirements on the language, it can
be used to prove the non-regularity of many more languages, such as { ambncn :
m≥1 and n≥1 }.[6]
While the pumping lemma states that all regular languages satisfy the conditions
described above, the converse of this statement is not true: a language that
satisfies these conditions may still be non-regular. In other words, both the
original and the general version of the pumping lemma give a necessary but not
sufficient condition for a language to be regular.
In other words, L contains all strings over the alphabet {0,1,2,3} with a substring
of length 3 including a duplicate character, as well as all strings over this
alphabet where precisely 1/7 of the string's characters are 3's. This language is
not regular but can still be "pumped" with p = 5. Suppose some string s has
length at least 5. Then, since the alphabet has only four characters, at least two of
the first five characters in the string must be duplicates. They are separated by at
most three characters.
Myhill-Nerode theorem.
Not all languages are regular. For example, the language L = {anbn : n ≥ 0} is
not regular. Similarly, the language {ap : p is a prime number} is not regular. A
pertinent question therefore is how do we know if a language is not regular.
Question: Can we conclude that a language is not regular if no one could come
up with a DFA, NFA, ε-NFA, regular expression or regular grammar so far?
- No. Since, someone may very well come up with any of these in future.
We need a property that just holds for regular languages and so we can prove
that any language without that property is not regular. Let's recall some of the
properties.
• Since the states are finite, if the automaton has no loop, the language
would be finite.
Formal Language Automata and Automata Theory TOPPERWORLD.IN
- Any finite language is indeed a regular language since we can express the
language using the regular expression: S1 + S2 + ... + SN, where N is the
total number of strings accepted by the automaton.
- Because, we can loop around any number of times and keep producing
more and more strings.
- This property is called the pumping property (elaborated below).
Any finite automaton with a loop can be divided into parts three.
For example consider the following DFA. It accepts all strings that start with aba followed by
any number of baa's and finally ending with ba.
Investigating this further, we can say that any string w accepted by this DFA can be written as w
Formal Language Automata and Automata Theory TOPPERWORLD.IN
6. What is the shortest string accepted if there are more final states? Say q 2 is
final. ab of length 2.
7. What is the longest string accepted by the DFA without going through the
loop even once? ababa (= xz). So, any string of length > 5 accepted by DFA
must go through the loop at least once.
8. What is the longest string accepted by the DFA by going through the loop
exactly once? ababaaba (= xyz) of length 8. We call this pumping length.
More precisely, pumping length is an integer p denoting the length of the string
w such that w is obtained by going through the loop exactly once. In other words,
|w| = |xyz| = p.
1. |y| > 0
2. |xy| ≤ p
Formal Language Automata and Automata Theory TOPPERWORLD.IN
Before proving L is not regular using pumping property, let's see why we can't
come up with a DFA or regular expression for L.
It may be tempting to use the regular expression a*b* to describe L. No doubt, a*b* generates
ngs. However, it is not appropriate since it generates other strings not in L such as a, b,
these stri
Let's try to come up with a DFA. Since it has to accept ε, start state has to be final. The following
DFA can accept a nbn for n ≤ 3. i.e. {ε, a, b, ab, aabb, aaabbb}
Formal Language Automata and Automata Theory TOPPERWORLD.IN
We know that w can be broken into three terms xyz such that y ≠ ε and xy iz ∈ L.
Then xy2z has more a's than b's and does not belong to L.
Then xy2z has more b's than a's and does not belong to L.
Then xy2z has a's and b's out of order and does not belong to L.
Since none of the 3 cases hold, the pumping property does not hold for L. And
therefore L is not regular.
|w| = 2p + 2 ≥ p
Therefore, pumping property does not hold for L. Hence, L is not regular.
Formal Language Automata and Automata Theory TOPPERWORLD.IN
We know that w can be broken into three terms xyz such that y ≠ ε and xy iz ∈ L
|xyq+1z| = |xyzyq|
q
= |xyz| + |y |
= q + q.|y|
= q(1 + |y|) which is a composite number.
Exercises
Show that the following languages are not regular.
4. L = { anbm : n ≠ m }
5. L = { anbm : n > m }
6. L = { w : na(w) = nb(w) }
7. L = { ww : w ∈ {a,b}* }
8. L = { an2 : n > 0 }
IMPORTANT NOTE
"Kleene's theorem" redirects here. For his theorems for recursive functions, see
Kleene's recursion theorem.
Regular languages are very useful in input parsing and programming language
design
• The empty language Ø, and the empty string language {ε} are regular
languages.
• For each a ∈ Σ (a belongs to Σ), the singleton language {a} is a regular
language.
• If A and B are regular languages, then A ∪ B (union), A • B
(concatenation), and A* (Kleene star) are regular languages.
• No other languages over Σ are regular.
Formal Language Automata and Automata Theory TOPPERWORLD.IN
See regular expression for its syntax and semantics. Note that the above cases
are in effect the defining rules of regular expression.
Examples
All finite languages are regular; in particular the empty string language {ε} = Ø*
is regular. Other typical examples include the language consisting of all strings
over the alphabet {a, b} which contain an even number of as, or the language
consisting of all strings of the form: several as followed by several bs.
A simple example of a language that is not regular is the set of strings { anbn | n
≥ 0 }.[4] Intuitively, it cannot be recognized with a finite automaton, since a
finite automaton has finite memory and it cannot remember the exact number of
a's. Techniques to prove this fact rigorously are given below.
Equivalent formalisms
Properties 9. and 10. are purely algebraic approaches to define regular languages;
a similar set of statements can be formulated for a monoid M⊂Σ*. In this case,
equivalence over M leads to the concept of a recognizable language.
Some authors use one of the above properties different from "1." as alternative
definition of regular languages.
Some of the equivalences above, particularly those among the first four
formalisms, are called Kleene's theorem in textbooks. Precisely which one (or
which subset) is called such varies between authors. One textbook calls the
equivalence of regular expressions and NFAs ("1." and "2." above) "Kleene's
theorem".[6] Another textbook calls the equivalence of regular expressions and
DFAs ("1." and "3." above) "Kleene's theorem".[7] Two other textbooks first
prove the expressive equivalence of NFAs and DFAs ("2." and "3.") and then
state "Kleene's theorem" as the equivalence between regular expressions and
finite automata (the latter said to describe "recognizable languages").[2][8] A
linguistically oriented text first equates regular grammars ("4." above) with
DFAs and NFAs, calls the languages generated by (any of) these "regular", after
which it introduces regular expressions which it terms to describe "rational
languages", and finally states "Kleene's theorem" as the coincidence of regular
and rational languages.[9] Other authors simply define "rational expression" and
"regular expressions" as synonymous and do the same with "rational languages"
and "regular languages".[1][2]
Closure properties
The regular languages are closed under the various operations, that is, if the
languages K and L are regular, so is the result of the following operations:
Complexity results
To locate the regular languages in the Chomsky hierarchy, one notices that
every regular language is context-free. The converse is not true: for
example the language consisting of all strings having the same number of
a's as b's is context-free but not regular. To prove that a language such as
this is not regular, one often uses the Myhill–Nerode theorem or the
pumping lemma among other methods.[23]
Let denote the number of words of length in . The ordinary generating function
for L is the formal power series
of length in the Dyck language is equal to the Catalan number , which is not
of the form , witnessing the non-regularity of the Dyck language. Care must
be taken since some of the eigenvalues could have the same magnitude. For
example, the number of words of length in the language of all even binary
words is not of the form , but the number of words of even or odd length
are of this form; the corresponding eigenvalues are . In general, for every
regular language there exists a constant such that for all , the number of
words of length is asymptotically .The zeta function of a language L is
The zeta function of a regular language is not in general rational, but that of a
cyclic language is.
Generalizations
The notion of a regular language has been generalized to infinite words (see ω-
automata) and to trees (see tree automaton).
Rational set generalizes the notion (of regular/rational language) to monoids that
are not necessarily free. Likewise, the notion of a recognizable language (by a
finite automaton) has namesake as recognizable set over a monoid that is not
necessarily free. Howard Straubing notes in relation to these facts that “The term
"regular language" is a bit unfortunate. Papers influenced by Eilenberg's
monograph[35] often use either the term "recognizable language", which refers to
the behavior of automata, or "rational language", which refers to important
analogies between regular expressions and rational power series. (In fact,
Eilenberg defines rational and recognizable subsets of arbitrary monoids; the two
notions do not, in general, coincide.) This terminology, while better motivated,
never really caught on, and "regular language" is used almost universally.”[36]
Input − DFA
Step 3 − We will try to mark the state pairs, with green colored check mark,
transitively. If we input 1 to state ‘a’ and ‘f’, it will go to state ‘c’ and ‘f’
respectively. (c, f) is already marked, hence we will mark pair (a, f). Now, we
input 1 to state ‘b’ and ‘f’; it will go to state ‘d’ and ‘f’ respectively. (d, f) is
already marked, hence we will mark pair (b, f).
After step 3, we have got state combinations {a, b} {c, d} {c, e} {d, e} that
are unmarked.
Formal Language Automata and Automata Theory TOPPERWORLD.IN
So the final minimized DFA will contain three states {f}, {a, b} and {c, d, e}
If X and Y are two states in a DFA, we can combine these two states into {X, Y} if they are not
distinguishable. Two states are distinguishable, if there is at least one string S, such that one of δ
(X, S) and δ (Y, S) is accepting and another is not accepting. Hence, a DFA is minimal if and
only if all the states are distinguishable.
Algorithm 3
Step 1 − All the states Q are divided in two partitions − final states and non-final states and are
denoted by P0. All the states in a partition are 0th equivalent. Take a counter k and initialize it
with 0.
Step 2 − Increment k by 1. For each partition in Pk, divide the states in Pk into two partitions if
they are k-distinguishable. Two states within this partition X and Y are k-distinguishable if there
is an input S such that δ(X, S) and δ(Y, S) are (k-1)-distinguishable.
Step 4 − Combine kth equivalent sets and make them the new states of the reduced DFA.
Formal Language Automata and Automata Theory TOPPERWORLD.IN
Example
q δ(q,0) δ(q,1)
a b c
b a d
c e f
d e f
e e f
f f f
• P0 = {(c,d,e), (a,b,f)}
• P1 = {(c,d,e), (a,b),(f)}
• P2 = {(c,d,e), (a,b),(f)}
Hence, P1 = P2.
There are three states in the reduced DFA. The reduced DFA is as follows −
Formal Language Automata and Automata Theory TOPPERWORLD.IN
Q δ(q,0) δ(q,1)
Example
Representation Technique
Top-down Approach −
Bottom-up Approach −
The derivation or the yield of a parse tree is the final string obtained by
concatenating the labels of the leaves of the tree from left to right, ignoring the
Nulls. However, if all the leaves are Null, derivation is Null.
Example
A partial derivation tree is a sub-tree of a derivation tree/parse tree such that either
all of its children are in the sub-tree or none of them are in the sub-tree.
Example
Example
Context-Free Grammars
A context-free grammar (CFG) is a set of recursive rewriting rules (or productions)
used to generate patterns of strings.
• a set of terminal symbols, which are the characters of the alphabet that
appear in the strings generated by the grammar.
• a set of nonterminal symbols, which are placeholders for patterns of terminal
symbols that can be generated by the nonterminal symbols.
• a set of productions, which are rules for replacing (or rewriting) nonterminal
symbols (on the left side of the production) in a string with other
nonterminal or terminal symbols (on the right side of the production).
• a start symbol, which is a special nonterminal symbol that appears in the
initial string generated by the grammar.
The first rule (or production) states that an <expression> can be rewritten as (or
replaced by) a number. In other words, a number is a valid expression.
The remaining rules say that the sum, difference, product, or division of two
<expression>s is also an expression.
In our grammar for arithmetic expressions, the start symbol is <expression>, so our
initial string is:
<expression>
Using rule 5 we can choose to replace this nonterminal, producing the string:
<expression> * <expression>
We now have two nonterminals to replace. We can apply rule 3 to the first
nonterminal, producing the string:
<expression> + <expression> * <expression>
We can apply rule two to the first nonterminal in this string to produce:
(<expression>) + <expression> * <expression>
A CFG may have a production for a nonterminal in which the right hand side is the
empty string (which we denote by epsilon). The effect of this production is to
remove the nonterminal from the string being generated.
P --> ( P )
P --> P P
P --> epsilon
We begin with the string P. We can replace P with epsilon, in which case we have
generated the empty string (which does have balanced parentheses). Alternatively,
we can generate a string of balanced parentheses within a pair of balanced
parentheses, which must result in a string of balanced parentheses. Alternatively,
we can concatenate two strings of balanced parentheses, which again must result in
a string of balanced parentheses.
P --> ( P ) | P P | epsilon
We use the notational shorthand '|', which can be read as "or", to represent multiple
rewriting rules within a single line.
CFG Examples
A CFG describing strings of letters with the word "main" somewhere in the string:
There are several ways to generate the (possibly infinite) set of strings generated by
a grammar. We will show a technique based on the number of productions used to
generate the string.
1. Applying at most one production (starting with the start symbol) we can generate
{wcd<S>, b<L>e, s}. Only one of these strings consists entirely of terminal
symbols, so the set of terminal strings we can generate using at most one
production is {s}.
2. Applying at most two productions, we can generate all the strings we can
generate with one production, plus any additional strings we can generate with
an additional production.
The set of terminal strings we can generate with at most two productions is
therefore {s, wcds}.
The set of terminal strings we can generate with at most three productions is
therefore {s, wcds, wcdwcds, bse}.
We can repeat this process for an arbitrary number of steps N, and find all the
strings the grammar can generate by applying N productions.
CFGs vs Regular Expressions
As a corollary, CFGs are strictly more powerful than DFAs and NDFAs.
means that can be replaced with either or .In context-free grammars, all rules are
one-to-one, oneto-many, or one-to-none. These rules can be applied regardless of
context. The left-hand side of the production rule is always a nonterminal symbol.
This means that the symbol does not appear in the resulting formal language. So in
our case, our language contains the letters and but not
Context-free grammars arise in linguistics where they are used to describe the
structure of sentences and words in a natural language, and they were in fact
invented by the linguist Noam Chomsky for this purpose, but have not really lived
up to their original expectation. By contrast, in computer science, as the use of
recursively-defined concepts increased, they were used more and more. In an early
application, grammars are used to describe the structure of programming
languages. In a newer application, they are used in an essential part of the
Extensible Markup Language (XML) called the Document Type Definition.[2]
In linguistics, some authors use the term phrase structure grammar to refer to
context-free grammars, whereby phrase-structure grammars are distinct from
dependency grammars. In computer science, a popular notation for context-free
grammars is Backus–Naur form, or BNF.
Since the time of Pāṇini, at least, linguists have described the grammars of
languages in terms of their block structure, and described how sentences are
recursively built up from smaller phrases, and eventually individual words or word
elements. An essential property of these block structures is that logical units never
overlap. For example, the sentence:John, whose blue car was in the garage, walked
to the grocery store.can be logically parenthesized as follows:(John, ((whose blue
car) (was (in the garage))), (walked (to (the grocery store)))).
A context-free grammar provides a simple and mathematically precise mechanism
for describing the methods by which phrases in some natural language are built
from smaller blocks, capturing the "block structure" of sentences in a natural way.
Its simplicity makes the formalism amenable to rigorous mathematical study.
Important features of natural language syntax such as agreement and reference are
not part of the context-free grammar, but the basic recursive structure of sentences,
the way in which clauses nest inside other clauses, and the way in which lists of
adjectives and adverbs are swallowed by nouns and verbs, is described exactly.The
formalism of context-free grammars was developed in the mid-1950s by Noam
Chomsky,[3] and also their classification as a special type of formal grammar
(which he called phrase-structure grammars).[4] What Chomsky called a phrase
structure grammar is also known now as a constituency grammar, whereby
constituency grammars stand in contrast to dependency grammars. In Chomsky's
generative grammar framework, the syntax of natural language was described by
context-free rules combined with transformation rules.
Formal definitions
A context-free grammar G is defined by the 4-tuple:where
1. V is a finite set; each element is called a nonterminal character or a
variable. Each variable represents a different type of phrase or clause in the
sentence. Variables are also sometimes called syntactic categories. Each
variable defines a sub-language of the language defined by G.
2. Σ is a finite set of terminals, disjoint from V, which make up the actual
content of the sentence. The set of terminals is the alphabet of the language
defined by the grammar G.
Rule application
For any strings , we say u directly yields v, written as , if with and such that and .
Thus, v is a result of applying the rule to u.
For any strings we say u yields v, written as (or in some textbooks), if such that . In
this case, if (i.e., ), the relation holds. In other words, and are the reflexive
transitive closure (allowing a word to yield itself) and the transitive closure
(requiring at least one step) of , respectively.
Context-free language
Proper CFGs
• no unreachable symbols:
• no unproductive symbols:
• no ε-productions:
• no cycles:
Example
S → aSa,
S → bSb,
S → ε,
Examples
Well-formed parentheses
S → SS
S → (S)
S → ()
The first rule allows the S symbol to multiply; the second rule allows the S symbol
to become enclosed by matching parentheses; and the third rule terminates the
recursion.
S → SS
S → ()
S → (S)
S → []
S → [S] with terminal symbols [ ]
( ) and nonterminal S.
([ [ [ ()() [ ][ ] ] ]([ ]) ])
A regular grammar
Every regular grammar is context-free, but not all context-free grammars are
regular. The following context-free grammar, however, is also regular.
S→a
S → aS
S → bS
The terminals here are a and b, while the only nonterminal is S. The language
This grammar is regular: no rule has more than one nonterminal in its right-hand
side, and each of these nonterminals is at the same end of the right-hand side.
Using pipe symbols, the grammar above can be described more tersely as follows:
S → a | aS | bS
Matching pairs
S → aSb
S → ab
This grammar generates the language , which is not regular (according to the
pumping lemma for regular languages).
The special character ε stands for the empty string. By changing the above
grammar to
S → aSb | ε
we obtain a grammar generating the language instead. This differs only in that it
contains the empty string while the original grammar did not.
Algebraic expressions
1. S→x
2. S→y
3. S→z
4. S→S+S
5. S→S-S
6. S→S*S
7. S→S/S
8. S→(S)
(x+y)*x-z*y/(x
+ x ) as follows:
Note that many choices were made underway as to which rewrite was going to be
performed next. These choices look quite arbitrary. As a matter of fact, they are, in
the sense that the string finally generated is always the same. For example, the
second and third rewrites
Also, many choices were made on which rule to apply to each selected S. Changing
the choices made and not only the order they were made in usually affects which
terminal string comes out at the end.
Let's look at this in more detail. Consider the parse tree of this derivation:
Starting at the top, step by step, an S in the tree is expanded, until no more
unexpanded Ses (nonterminals) remain. Picking a different order of expansion will
produce a different derivation, but the same parse tree. The parse tree will only
change if we pick a different rule to apply at some position in the tree.
But can a different parse tree still produce the same terminal string, which is ( x + y ) *
x - z * y / ( x + x ) in this case? Yes, for this particular grammar, this is possible.
Grammars with this property are called ambiguous.
For example, x + y * z can be produced with these two different parse trees:
Further examples
Example 1
A context-free grammar for the language consisting of all strings over {a,b}
containing an unequal number of a's and b's:
S →U|V
U → TaU | TaT | UaT
Here, the nonterminal T can generate all strings with the same number of a's as b's,
the nonterminal U generates all strings with more a's than b's and the nonterminal
V generates all strings with fewer a's than b's. Omitting the third alternative in the
rule for U and V doesn't restrict the grammar's language.
Example 2
Another example of a non-regular language is . It is context-free as it can be
generated by the following context-free grammar:
S → bSbb | A
A → aA | ε
Other examples
The formation rules for the terms and formulas of formal logic fit the definition of
context-free grammar, except that the set of symbols may be infinite and there may
be more than one start symbol.
(1) S→S+S
(2) S → 1 (3) S → a the string 1 + 1 +
S
→ (rule 1 on the
first S) S+S
→ (rule 1 on the
second S) S+S+S
→ (rule 2 on the
second S) S+1+S
→ (rule 3 on the third
S) S+1+a
→ (rule 2 on the first S)
1+1+a
S
→ (rule 1 on the
first S) S+S
→ (rule 2 on the
first S) 1+S
→ (rule 1 on the first
S) 1+S+S
→ (rule 2 on the first
S) 1+1+S
→ (rule 3 on the first S)
1+1+a
can be summarized as
S → S + S (1)
→ 1 + S (2)
→ 1 + S + S (1)
→ 1 + 1 + S (2)
→ 1 + 1 + a (3)
{ { 1 }S + { { 1 }S + { a }S }S }S
This tree is called a parse tree or "concrete syntax tree" of the string, by contrast
with the abstract syntax tree. In this case the presented leftmost and the rightmost
derivations define the same parse tree; however, there is another (rightmost)
derivation of the same string
S → S + S (1)
→ S + a (3)
→ S + S + a (1)
→ S + 1 + a (2)
→ 1 + 1 + a (2) and this
Note however that both parse trees can be obtained by both leftmost and rightmost
derivations. For example, the last tree can be obtained with the leftmost derivation
as follows:
S → S + S (1)
→ S + S + S (1)
→ 1 + S + S (2)
→ 1 + 1 + S (2)
→ 1 + 1 + a (3)
If a string in the language of the grammar has more than one parsing tree, then the
grammar is said to be an ambiguous grammar. Such grammars are usually hard to
parse because the parser cannot always decide which grammar rule it has to apply.
Usually, ambiguity is a feature of the grammar, not the language, and an
unambiguous grammar can be found that generates the same context-free language.
However, there are certain languages that can only be generated by ambiguous
grammars; such languages are called inherently ambiguous languages.
A→A+A→a+A→a+A−A→a+a−A→a+a−a
The rightmost derivation corresponding to the left parse tree is
A→A+A→A+A−A→A+A−a→A+a−a→a+a−a
The leftmost derivation corresponding to the right parse tree is
A→A−A→A+A−A→a+A−A→a+a−A→a+a−a
The rightmost derivation corresponding to the right parse tree is
A→A−A→A−a→A+A−a→A+a−a→a+a−a
At any stage during a parse, when we have derived some sentential form (that is
not yet a sentence) we will potentially have two choices to make:
The first decision here is relatively easy to solve: we will be reading the input
string from left to right, so it is our own interest to derive the leftmost terminal of
the resulting sentence as soon as possible. Thus, in a top-down parse we always
choose the leftmost non-terminal in a sentential form to apply a production rule to -
this is called a leftmost derivation.
If we were doing a bottom-up parse then the situation would be reversed, and we
would want to do apply the production rules in reverse to the leftmost symbols;
thus we are performing a rightmost derivation in reverse.
Note that this has no effect on the parse tree; we still get:
=1.00mm
THE LANGUAGE OF A GRAMMAR:
A formal grammar is a set of rules for rewriting strings, along with a "start symbol"
from which rewriting starts. Therefore, a In formal language theory, a grammar
(when the context is not given, often called a formal grammar for clarity) is a set
of production rules for strings in a formal language. The rules describe how to form
strings from the language's alphabet that are valid according to the language's syntax.
A grammar does not describe the meaning of the strings or what can be done with
them in whatever context—only their form.
Formal language theory, the discipline that studies formal grammars and languages,
is a branch of applied mathematics. Its applications are found in theoretical computer
science, theoretical linguistics, formal semantics, mathematical logic, and other
areas.
Introductory example
A grammar mainly consists of a set of rules for transforming strings. (If it only
consisted of these rules, it would be a semi-Thue system.) To generate a string in the
language, one begins with a string consisting of only a single start symbol. The
production rules are then applied in any order, until a string that contains neither the
start symbol nor designated nonterminal symbols is produced. A production rule is
applied to a string by replacing one occurrence of the production rule's left-hand side
in the string by that production rule's right-hand side (cf. the operation of the
theoretical Turing machine). The language formed by the grammar consists of all
distinct strings that can be generated in this manner. Any particular sequence of
production rules on the start symbol yields a distinct string in the language. If there
are essentially different ways of generating the same single string, the grammar is
said to be ambiguous.
For example, assume the alphabet consists of a and b, the start symbol is S,
and we have the following production rules:
1. S→aSb.
2. S→ba.
then we start with S, and can choose a rule to apply to it. If we choose rule 1, we
obtain the string aSb. If we then choose rule 1 again, we replace S with aSb and
obtain the string aaSbb. If we now choose rule 2, we replace S with ba and obtain
the string aababb, and are done. We can write this series of choices more briefly,
using symbols: . The language of the grammar is then the infinite set , where is
repeated times (and in particular represents the number of times production rule 1
has been applied).
Sentential Form
A sentential form is any string derivable from the start symbol. Thus, in the
derivation of a + a * a , E + T * F and E + F * a and F + a * a are all
sentential forms as are E and a + a * a themselves.
Sentence
A sentence is a sentential form consisting only of terminals such as a + a *
a. A sentence can be derived using the following algorithm:
Algorithm
Derive String
Parse trees concretely[clarification needed] reflect the syntax of the input language, making
them distinct from the abstract syntax trees used in computer programming. Unlike
Reed-Kellogg sentence diagrams used for teaching grammar, parse trees do not use
distinct symbol shapes for different types of constituents.
Parse trees are usually constructed based on either the constituency relation of
constituency grammars (phrase structure grammars) or the dependency relation of
dependency grammars. Parse trees may be generated for sentences in natural
languages (see natural language processing), as well as during processing of
computer languages, such as programming languages.
• NP for noun phrase. The first (leftmost) NP, a single noun "John",
serves as the subject of the sentence. The second one is the object of
the sentence. • VP for verb phrase, which serves as the predicate
• N for noun
Each node in the tree is either a root node, a branch node, or a leaf node.[2] A root
node is a node that doesn't have any branches on top of it. Within a sentence, there
is only ever one root node. A branch node is a mother node that connects to two or
more daughter nodes. A leaf node, however, is a terminal node that does not
dominate other nodes in the tree. S is the root node, NP and VP are branch nodes,
and John (N), hit (V), the (D), and ball (N) are all leaf nodes. The leaves are the
lexical tokens of the sentence.[3] A mother node is one that has at least one other node
linked by a branch under it. In the example, S is a parent of both N and VP. A
daughter node is one that has at least one node directly above it to which it is linked
by a branch of a tree. From the example, hit is a daughter node of V. The terms
parent and child are also sometimes
This parse tree lacks the phrasal categories (S, VP, and NP) seen in the constituency-
based counterpart above. Like the constituency-based tree, constituent structure is
acknowledged. Any complete sub-tree of the tree is a constituent. Thus this
dependency-based parse tree acknowledges the subject noun John and the object
noun phrase the ball as constituents just like the constituencybased parse tree does.
replaces with . There can be multiple replacement rules for any given value. For
example,
means that can be replaced with either or .In context-free grammars, all rules are
one-to-one, one-to-many, or one-to-none. These rules can be applied regardless of
context. The left-hand side of the production rule is lways a nonterminal symbol.
This means that the symbol does not appear in the resulting formal language. So in
our case, our language contains the letters and but not Rules can also be applied in
reverse to check if a string is grammatically correct according to the grammar.Here
is an example context-free grammar that describes all two-letter strings containing
the letters and .
If we start with the nonterminal symbol then we can use the rule to turn into . We
can then apply one of the two later rules. For example, if we apply to the first we
get . If we then apply to the
second we get . Since both and are terminal symbols, and in context-free
grammars terminal symbols never appear on the left hand side of a production rule,
there are no more rules that can be applied. This same process can be used,
applying the second two rules in different orders in order to get all possible strings
within our simple context-free grammar.
Context-free grammars arise in linguistics where they are used to describe the
structure of sentences and words in a natural language, and they were in fact
invented by the linguist Noam Chomsky for this purpose, but have not really lived
up to their original expectation. By contrast, in computer science, as the use of
recursively-defined concepts increased, they were used more and more. In an early
application, grammars are used to describe the structure of programming
languages. In a newer application, they are used in an essential part of the
Extensible Markup Language (XML) called the Document Type Definition.[2]
In linguistics, some authors use the term phrase structure grammar to refer to
context-free grammars, whereby phrase-structure grammars are distinct from
dependency grammars. In computer science, a popular notation for context-free
grammars is Backus–Naur form, or BNF.
Ambiguity in Grammars and Languages:
If a context free grammar G has more than one derivation tree for some string w
L(G), it is called an ambiguous grammar. There exist multiple right-most or left-
most derivations for some string generated from that grammar.
Problem
Check whether the grammar G with production rules −
Solution
Let’s find out the derivation tree for the string "a+a*a". It has two leftmost
derivations.
Parse tree 1 −
Pushdown automata are used in theories about what can be computed by machines.
They are more capable than finite-state machines but less capable than Turing
machines. Deterministic pushdown automata can recognize all deterministic
context-free languages while nondeterministic ones can recognize all context-free
languages, with the former often used in parser design.
The term "pushdown" refers to the fact that the stack can be regarded as being
"pushed down" like a tray dispenser at a cafeteria, since the operations never work
on elements other than the top element. A stack automaton, by contrast, does
allow access to and operations on deeper elements. Stack automata can recognize a
strictly larger set of languages than pushdown automata. A nested stack automaton
allows full access, and also allows stacked values to be entire sub-stacks rather
than just single finite symbols.
Informal description
A diagram of a pushdown automaton
A finite state machine just looks at the input signal and the current state: it has no
stack to work with. It chooses a new state, the result of following the transition. A
pushdown automaton (PDA) differs from a finite state machine in two ways:
1. It can use the top of the stack to decide which transition to take.
2. It can manipulate the stack as part of performing a transition.
A pushdown automaton reads a given input string from left to right. In each step, it
chooses a transition by indexing a table by input symbol, current state, and the
symbol at the top of the stack. A pushdown automaton can also manipulate the
stack, as part of performing a transition. The manipulation can be to push a
particular symbol to the top of the stack, or to pop off the top of the stack. The
automaton can alternatively ignore the stack, and leave it as it is.
Put together: Given an input symbol, current state, and stack symbol, the
automaton can follow a transition to another state, and optionally manipulate (push
or pop) the stack.
If, in every situation, at most one such transition action is possible, then the
automaton is called a deterministic pushdown automaton (DPDA). In general, if
several actions are possible, then the automaton is called a general, or
nondeterministic, PDA. A given input string may drive a nondeterministic
pushdown automaton to one of several configuration sequences; if one of them
leads to an accepting configuration after reading the complete input string, the
latter is said to belong to the language accepted by the automaton.
Definition of the Pushdown Automaton: Formal
definition
We use standard formal language notation: denotes the set of strings over alphabet
and denotes the empty string.
Here contains all possible actions in state with on the stack, while reading on the
input. One writes for example precisely when because . Note that finite in this
definition is essential.
Computations
Example
The following is the formal description of the PDA which recognizes the language
by final state:
PDA for
(by final state), where
• states:
• input alphabet:
• stack alphabet:
• start state:
• start stack symbol: Z
• accepting states:
The transition relation consists of the following six instructions:,,,,, and.In words,
the first two instructions say that in state p any time the symbol 0 is read, one A is
pushed onto the stack. Pushing symbol A on top of another A is formalized as
replacing top A by AA (and similarly for pushing symbol A on top of a Z).The third
and fourth instructions say that, at any moment the automaton may move from
state p to state q.The fifth instruction says that in state q, for each symbol 1 read,
one A is popped.
Finally, the sixth instruction says that the machine may move from state q to
accepting state r only when the stack consists of a single Z.There seems to be no
generally used representation for PDA. Here we have depicted the instruction by
an edge from state p to state q labelled by (read a; replace A by ).
The following illustrates how the above PDA computes on different input strings.
The subscript M from the step symbol is here omitted.
Description
A pushdown automaton (PDA) is a finite state machine which has an additional
stack storage. The transitions a machine makes are based not only on the input and
current state, but also on the stack. The formal definition (in our textbook) is that a
PDA is this:
M = (K,Σ,Γ,Δ,s,F) where K = finite state set
We have to have the finite qualifier because the full subset is infinite by virtue of
the Γ* component. The meaning of the transition relation is that, for σ ∈ Σ, if
((p,σ,α),(q,β)) ∈ Δ:
The language accepted by a PDA M, L(M), is the set of all accepted strings.
The empty stack is our key new requirement relative to finite state machines. The
examples that we generate have very few states; in general, there is so much more
control from using the stack memory. Acceptance by empty stack only or final
state only is addressed in problems 3.3.3 and 3.3.4.
Graphical Representation and ε-transition
The book does not indicate so, but there is a graphical representation of PDAs. A
transition
((p,x,α),(q,β)) where x = ε or x ∈ Σ would be depicted like this (respectively):
or
• Δ is a relation
• there are ε-transitions in terms of the input
• there are ε-transitions in terms of the stack contents
The true PDA ε-transition, in the sense of being equivalent to the NFA ε-transition
is this:
because it consults neither the input, nor the stack and will leave the previous
configuration intact.
Palindrome examples
These are examples 3.3.1 and 3.3.2 in the textbook. The first is this:
The machine pushes a's and b's in state s, makes a transition to f when it sees the
middle marker, c, and then matches input symbols with those on the stack and pops
the stack symbol. Nonaccepting string examples are these:
ε in state s
ab in state s with non-empty stack
abcab in state f with unconsumed input and non-
empty stack abcb in state f with non-empty stack
abcbab in state f with unconsumed input and empty
stack
Observe that this PDA is deterministic in the sense that there are no choices in
transitions.
The second example is:
{x {a,b}* : x = wwR for w
{a,b}*}
This PDA is identical to the previous one except for the ε-transition
Nevertheless, there is a significant difference in that this PDA must guess when to
stop pushing symbols, jump to the final state and start matching off of the stack.
Therefore this machine is decidedly non-deterministic. In a general programming
model (like Turing Machines), we have the luxury of preprocessing the string to
determine its length and thereby knowing when the middle is coming.
and
The idea in both of these machines is to stack the a's and match off the b's. The first
one is nondeterministic in the sense that it could prematurely guess that the a's are
done and start matching off b's. The second version is deterministic in that the first
b acts as a trigger to start matching off. Note that we have to make both states final
in the second version in order to accept ε.
x = σ Σ or ε
means to do so without consulting the stack; it says nothing about whether the
stack is empty or not.
Nevertheless, one can maintain knowledge of an empty stack by using a dedicated
stack symbol, c, representing the "stack bottom" with these properties:
• it is pushed onto an empty stack by a transition from the start state with no
other outgoing or incoming transitions
• it is never removed except by a transition to state with no other outgoing
transitions
Behavior of PDA
The three groups of loop transitions in state q represent these respective functions:
For example if we have seen 5 b's and 3 a's in any order, then the stack should be
"bbc". The transition to the final state represents the only non-determinism in the
PDA in that it must guess when the input is empty in order to pop off the stack
bottom.
DPDA/DCFL
The textbook defines DPDAs (Deterministic PDAs) and DCFLs (Deterministic
CFLs) in the introductory part of section 3.7. According to the textbook's
definition, a DPDA is a PDA in which no state p has two different outgoing
transitions
((p,x,α),(q,β)) and ((p,x′,α′),(q′,β′))
which are compatible in the sense that both could be applied. A DCFL is basically
a language which accepted by a DPDA, but we need to qualify this further.
We want to argue that the language L = { w {a,b}* : #a(w) = #b(w)} is
deterministic context free in the sense there is DPDA which accepts it.
In the above PDA, the only non-determinism is the issue of guessing the end of
input; however this form of non-determinism is considered artificial. When one
considers whether a language L supports a DPDA or not, a dedicated end-of-input
symbol is always added to strings in the language.
Formally, a language L over Σ is deterministic context free, or L is a DCFL , if
L$ is accepted by a DPDA M
where $ is a dedicated symbol not belonging to Σ. The significance is that we can
make intelligent usage of the knowledge of the end of input to decide what to do
about the stack. In our case, we would simply replace the transition into the final
state by:
a*b* examples
Two common variations on a's followed by b's. When they're equal, no stack
bottom is necessary. When they're unequal, you have to be prepared to recognize
that the stacked a's have been completely matched or not.
a. { anbn : n ≥ 0 }
b. { ambn : 0 ≤
m
<n}
Let's look at a few sample runs of (b). The idea is that you cannot enter the final
state with an "a" still on the stack. Once you get to the final state, you can consume
remaining b's and end marker.
We can start from state 1 with the stack bottom pushed on:
success: abb
state input
stack
1 abb$ c
1 bb$ ac
2 b$ ac
2 $c
3 εε
success: abbbb
state input
stack
1 abbbb$ c
1 bbbb$ ac
2 bbb$ ac
2 bb$ c
3 b$ ε
3 $ ε
3 ε ε
(fail: ab) state
input stack
1 ab$ c
1 b$ ac
2 $ ac
(fail: ba) state
input stack
1 ba$ c
2 a$ c
Observe that a string like abbba also fails due to the inability to consume the very last
a.
L(G) = L(P)
In the next two topics, we will discuss how to convert from PDA to CFG and vice
versa.
into GNF.
Step 3 − The start symbol of CFG will be the start symbol in the PDA.
Step 4 − All non-terminals of the CFG will be the stack symbols of the PDA and
all the terminals of the CFG will be the input symbols of the PDA.
Step 5 − For each production in the form A → aX where a is terminal and A, X are
combination of terminal and non-terminals, make a transition δ (q, a, A).
Problem
XS | ε , A → aXb | Ab | ab
Solution
{(q, ε )}
δ(q, 1, 1) = {(q, ε )}
Output − Equivalent PDA, P = (Q, ∑, S, δ, q0, I, F) such that the non- terminals of
the grammar G will be {Xwx | w,x Q} and the start state will be Aq0,F.
• Top-Down Parser − Top-down parsing starts from the top with the start-
symbol and derives a string using a parse tree.
• Bottom-Up Parser − Bottom-up parsing starts from the bottom with the
string and comes to the start symbol using a parse tree.
• Pop the non-terminal on the left hand side of the production at the top of the
stack and push its right-hand side string.
• If the top symbol of the stack matches with the input symbol being read, pop
it.
• Push the start symbol ‘S’ into the stack.
• If the input string is fully read and the stack is empty, go to the final state
‘F’.
Example
Design a top-down parser for the expression "x+y*z" for the grammar G with the
following production rules −
Solution
(y*z, X*YI) (y*z, y*YI) (*z,*YI) (z, YI) (z, zI) (ε, I)
Design of a Bottom-Up Parser
For bottom-up parsing, a PDA has the following four types of transitions −
Example
Design a top-down parser for the expression "x+y*z" for the grammar G with the
following production rules −
Solution
(y*z, +SI) (*z, y+SI) (*z, Y+SI) (*z, X+SI) (z, *X+SI)
Machine transitions are based on the current state and input symbol, and also the
current topmost symbol of the stack. Symbols lower in the stack are not visible and
have no immediate effect. Machine actions include pushing, popping, or replacing
the stack top. A deterministic pushdown automaton has at most one legal transition
for the same combination of input symbol, state, and top stack symbol. This is
where it differs from the nondeterministic pushdown automaton.
Formal definition
A (not necessarily deterministic) PDA can be defined as a 7-tuple:
There are two possible acceptance criteria: acceptance by empty stack and
acceptance by final state. The two are not equivalent for the deterministic pushdown
automaton (although they are for the non-deterministic pushdown automaton). The
languages accepted by empty stack are those languages that are accepted by final
state and are prefix-free: no word in the language is the prefix of another word in the
language.
The usual acceptance criterion is final state, and it is this acceptance criterion
which is used to define the deterministic context-free languages.
Languages recognized
If is a language accepted by a PDA , it can also be accepted by a DPDA if and only
if there is a single computation from the initial configuration until an accepting one
for all strings belonging to . If can be accepted by a PDA it is a context free language
and if it can be accepted by a DPDA it is a deterministic context-free language.
Not all context-free languages are deterministic. This makes the DPDA a strictly
weaker device than the PDA. For example, the language of even-length palindromes
on the alphabet of 0 and 1 has the context-free grammar S → 0S0 | 1S1 | ε. An
arbitrary string of this language cannot be parsed without reading all its letters first
which means that a pushdown automaton has to try alternative state transitions to
accommodate for the different possible lengths of a semi-parsed string.
Restricting the DPDA to a single state reduces the class of languages accepted to
the LL(1) languages. In the case of a PDA, this restriction has no effect on the class
of languages accepted.
UNIT-IV
or
A a
where A, B, and C are variables and a is a terminal. Any context-free grammar that
does not contain can be put into Chomsky Normal Form.
(Most textbook authors also allow the production S so long as S does not
appear on the right hand side of any production.)
Chomsky Normal Form is particularly useful for programs that have to manipulate
grammars.
Grammars in Greibach Normal Form are typically ugly and much longer than the
cfg from which they were derived. Greibach Normal Form is useful for proving the
equivalence of cfgs and npdas. When we discuss converting a cfg to an npda, or
vice versa, we will use Greibach Normal Form.
Formal statement
Proof idea: If s is sufficiently long, its derivation tree w.r.t. a Chomsky normal form
grammar must contain some nonterminal N twice on some tree path (upper picture).
Repeating n times the derivation part N ⇒...⇒ vNx obtains a derivation for uvnwxny
(lower left and right picture for n=0 and 2, respectively).
2. |vx| ≥ 1, and
The property is a property of all strings in the language that are of length at least p,
where p is a constant—called the pumping length—that varies between context-
free languages.
The pumping lemma states that s can be split into five substrings, s = uvwxy, where
vx is nonempty and the length of vwx is at most p, such that repeating v and x any
(and the same) number of times in s produces a string that is still in the language (it
is possible and often useful to repeat zero times, which removes v and x from the
string). This process of "pumping up" additional copies of v and x is what gives the
pumping lemma its name.
Finite languages (which are regular and hence context-free) obey the pumping
lemma trivially by having p equal to the maximum string length in L plus one. As
there are no strings of this length the pumping lemma is not violated.
2.
3. vwx = bj for some j ≤ p.
4. vwx = bjck for some j and k with j+k ≤ p.
5. vwx = cj for some j ≤ p.
For each case, it is easily verified that uviwxiy does not contain equal numbers of
each letter for any i ≠ 1. Thus, uv2wx2y does not have the form aibici. This contradicts
the definition of L. Therefore, our initial assumption that L is context free must be
false.
While the pumping lemma is often a useful tool to prove that a given language is not
context-free, it does not give a complete characterization of the context-free
languages. If a language does not satisfy the condition given by the pumping lemma,
we have established that it is not context-free.
On the other hand, there are languages that are not context-free, but still satisfy the
condition given by the pumping lemma, for example L = { bjckdl | j, k, l ∈ ℕ } ∪ {
aibjcjdj | i, j ∈ ℕ, i≥1 }: for s=bjckdl with e.g. j≥1 choose vwx to consist only of b’s,
for s=aibjcjdj choose vwx to consist only of a’s; in both cases all pumped strings are
still in L.
Then if and are context-free languages, so are. Since we have proved closure
under union, must also be context-free, and, by our assumption, so must its
complement .
However, by de Morgan's laws (for sets), , so this must also
be a context-free language.
Since our choice of and was arbitrary, we have contradicted the non-closure
of intersection, and have thus proved the lemma.
Decision Properties
Now we consider some important questions for which algorithms exist to answer
the question/
If we are given the CFL as a PDA, we can answer this ismply by executing the
PDA.
It may not be obvious that our procedures for finding derivations will always
terminate. We have seen that, when finding a derivation, we have choices as to
which variable to replace and which production to use in the replacement.
However,
• We have previously noted that if a string is in the language, then it will have
a leftmost derivation. So we can systematically always choose to replace the
leftmost variable.
• That leaves the choice of production. We can systematically try all available
choices, in a form of backtracking.
Actually, we can do better than that. The CYK algorithm can parse a string wfor a
Chomsky Normal Form grammar in O(|w|3) time.
Is a CFL empty?
We’ve already seen how to detect whether a variable is nullable.
We apply that test and determine if the grammar’s start symbol is nullable.
Closure Properties
Substitution
Given a CFG G, if we replace each terminal symbol by a set of strings that is itself
a CFL, the result is still a CFL.
All I really have to do is to change ‘a’ in the first grammar to a variable, make that
variable the new starting symbol for the wwR
It’s pretty obvious that this is still a CFG, so the resulting language is still a CFL.
CFLs are closed under reversal. (No surprise, given the stack-nature of PDAs.)
A CFG is in Chomsky Normal Form if the Productions are in the following forms
−
• A→a
• A → BC • S → ε where A, B, and C are non-terminals and
a is terminal.
Step 2 − Remove Null productions. (Using the Null production removal algorithm
discussed earlier)
Step 3 − Remove unit productions. (Using the Unit production removal algorithm
discussed earlier)
Problem
S → ASA | aB, A → B | S, B → b | ε
Solution
(1) Since S appears in R.H.S, we add a new state S0 and S0→S is added to the
production set and it becomes −
B → and A →
S0→S, S→ ASA | aB | a, A → B | S | , B →
becomes −
S0→S, S→ ASA | aB | a | AS | SA | S, A → B | S, B → b
→b
After removing S0→ S, the production set
| aB | a | AS | SA
A → B | S, B → b
ASA | aB | a | AS | SA
A→S|b
B→b
ASA | aB | a | AS | SA
A → b |ASA | aB | a | AS | SA, B → b
(4) Now we will find out more than two variables in the R.H.S
Hence we will apply step 4 and step 5 to get the following final production set
which is in CNF −
S0→ AX | aB | a | AS | SA
S→ AX | aB | a | AS | SA
A → b |AX | aB | a | AS | SA
B→b
X → SA
S0→ AX | YB | a | AS | SA
S→ AX | YB | a | AS | SA
A → b A → b |AX | YB | a | AS | SA
B→b
X → SA
Y→a
Definition
A Turing Machine (TM) is a mathematical model which consists of an infinite
length tape divided into cells on which input is given. It consists of a head which
reads the input tape. A state register stores the state of the Turing machine. After
reading an input symbol, it is replaced with another symbol, its internal state is
changed, and it moves from one cell to the right or left. If the TM reaches the final
state, the input string is accepted, otherwise rejected.
The following table shows a comparison of how a Turing machine differs from
Finite Automaton and Pushdown Automaton.
Tape alphabet symbol Present State ‘q0’ Present State ‘q1’ Present State ‘q2’
Here the transition 1Rq1 implies that the write symbol is 1, the tape moves right,
and the next state is q1. Similarly, the transition 1Lq2 implies that the write symbol
is 1, the tape moves left, and the next state is q2.
S(n) = O(n)
A Turing machine has a finite number of states in its CPU. However, the
number of states is not always small (like 6). Like a Pentium chip we can
store values in it -- as long as there are only finite number of them. For
example all real computers have registers but there are only a fixed number
of them, AND each register can only hold one of a fixed (and finite) number
of bits.
It is sad that we don't have a high level language for TMs then we could
write statements like
1. x=tape.head; perhaps.
Suppose that we wanted to code (in C++) a TM with states 'q0', 'q1',... plus
storage 'x' we might write code like this
q0: b=tape.head; tape.move_right(); goto q1; q1:
if(tape.head()==B) return FAILS;
if(tape.head()==x) tape.move_right(); goto q1; ...
Example 8.6
I found it very confusing, the idea of storage in states. Could you go over
this kind of problem? And is it beneficial to convey the transition function in
table format, as it was done in sections
8.1 and 8.2, for this example and what would the table be in this case?
A typical transition table when data is being stored has entries like this
State Read Next Write Move
(q0,B) 1 (q1,1) 1 R
(q0,B) 0 (q1,0) 0 R
We might interpret this as part of a step that stores the symbol read from the
tape in the CPU. Note: the book uses "[]" where I use "()".
Tabulating the 6 states >< 3 symbols on page 331 is not difficult and may
help.
But suppose you have a CPU that has 6 binary flags stored in it plus 3
control states. Now you have 3*64 states to write down...
Tabulating all possible states is not very helpful for machines with structured
or compound states.... unless you allow variables and conditions to appear in
the state descriptions AND expressions in the body of the table.
The other thing missing from the table is the documentation that explains
what each transition means.
Key idea: \Gamma is the Cartesian product of a finite number of finite sets.
(Cartesian product works like a struct in C/C++)
For example: computer tape storage is multi-track tape with something like 8
or 9 bits in each cell.
(*,1)/(B,0)->
(*,0)/(B,1)->
(B,0)/(B,0)->
(B,1)/(B,1)->
Suppose that all of the above are on a loop from q to q in a diagram then we
have these entries in the table:
State Next Read Write
Move
q (*,1) q (B,0) R
q (*,0) q (B,1) R
q (B,1) q (B,1) R
q (B,0) q (B,0) R
Both have the effect of taking a marked (*) bits and flipping them.
Exercise: complete the \delta descriptions that fit with the above:
Note the book is using "[]" in place of '()' for tuples. Other books use '<>'!
Error on page 332
In the description of \Sigma does not mention the encoding of "c" as [B,c].
- Multiple Tracks
In example 8.7, i had problem understanding the transition functions, and the
duty or assignment for each part in the function.
Can you explain, what each part of the transition function is supposed to
mean or do?
Subroutines 333
Subroutines
Would you please explain a little bit more on figure 8.14 and figure 8.15, on
how the subroutine Copy works?
q0 a b c d L R
...
Which means that in state q0, if the first head sees an "a", and the second
head sees a "b" then it write "c" on tape 1 and moves that head left, and it
also writes "d" on tape 2 and moves that head right. Notice that we have two
symbols that are read, two are written and their are two moves.
The transition would be something like:
ab/cd<-->
- NTM
How does a nondeterministic Turing machine differ from a deterministic
Turing machine?
The machine has choices in some states and with some symbols: there
is at least one pair of state><head with two (or more) possible
transitions. This spawns extra (hypothetical) threads. The diagram has
many arrows leaving a state with the same symbol on it.
One way to handle this is to draw a tree. This can take a lot of paper but
works well for humans. For example, here is a tree of IDs generated by the
NTM in the execises for this section:
In class we traced several the IDs that could come from the NTM in the
dreaded Exercise 8.4.4.
4. Two-dimensional tape
We assume that each cell of the one-way infinite tape can contain any
natural number.
We have also a fixed number of registers, and each can contain any
natural number.
In summary, we have:
Theorem:
Learning goals
• Be able to describe the types of Turing machines and state the related
theorems.
Semi-infinite Tapes’ TM
⚫ A TM with a semi-infinite tape means that there are no cells to the left of the
initial head position. ⚫ A TM with a semi-infinite tape simulates a TM with an
infinite tape by using a two-track tape.
A Turing Machine with a semi-infinite tape has a left end but no right end. The left
end is limited with an end marker.
It is a two-track tape − • Upper track − It represents the cells to the
• Lower track − It represents the cells to the left of the initial head position in
reverse order.
The infinite length input string is initially written on the tape in contiguous tape
cells.
The machine starts from the initial state q0 and the head scans from the left end
marker ‘End’. In each step, it reads the symbol on the tape under its head. It writes
a new symbol on that tape cell and then it moves the head either into left or right
one tape cell. A transition function determines the actions to be taken.
It has two special states called accept state and reject state. If at any point of time
it enters into the accepted state, the input is accepted and if it enters into the reject
state, the input is rejected by the TM. In some cases, it continues to run infinitely
without being accepted or rejected for some certain input symbols.
Note − Turing machines with semi-infinite tape are equivalent to standard Turing
machines.
Here,
Memory information ≤ c × Input information
The computation is restricted to the constant bounded area. The input alphabet
contains two special symbols which serve as left end markers and right end
markers which mean the transitions neither move to the left of the left end marker
nor to the right of the right end marker of the tape.
A linear bounded automaton can be defined as an 8-tuple (Q, X, ∑, q0, ML, MR, δ,
F) where −
For a decidable language, for each input string, the TM halts either at the accept or
the reject state as depicted in the following diagram −
Example 1
Find out whether the following problem is decidable or not −
Solution
Prime numbers = {2, 3, 5, 7, 11, 13, …………..}
Divide the number ‘m’ by all the numbers between ‘2’ and ‘√m’ starting from ‘2’.
If any of these numbers produce a remainder zero, then it goes to the “Rejected
state”, otherwise it goes to the “Accepted state”. So, here the answer could be
made by ‘Yes’ or ‘No’.
Example 2
Given a regular language L and string w, how can we check if w ∈ L?
Solution
Note −
The Turing machine was invented in 1936 by Alan Turing, who called it an a-
machine
(automatic machine. With this model, Turing was able to answer two questions in
the negative: (1) Does a machine exist that can determine whether any arbitrary
machine on its tape is "circular" (e.g., freezes, or fails to continue its computational
task); similarly, (2) does a machine exist that can determine whether any arbitrary
machine on its tape ever prints a given symbol. Thus by providing a mathematical
description of a very simple device capable of arbitrary computations, he was able
to prove properties of computation in general—and in particular, the
uncomputability of the Entscheidungsproblem ("decision problem.
Assuming a black box, the Turing machine cannot know whether it will eventually
enumerate any one specific string of the subset with a given program. This is due
to the fact that the halting problem is unsolvable, which has major implications for
the theoretical limits of computing.
A Turing machine that is able to simulate any other Turing machine is called a
universal Turing machine (UTM, or simply a universal machine). A more
mathematically oriented definition with a similar "universal" nature was introduced
by Alonzo Church, whose work on lambda calculus intertwined with Turing's in a
formal theory of computation known as the Church– Turing thesis. The thesis
states that Turing machines indeed capture the informal notion of effective
methods in logic and mathematics, and provide a precise definition of an algorithm
or "mechanical procedure". Studying their abstract properties yields many insights
into computer science and complexity theory.
Physical description
In his 1948 essay, "Intelligent Machinery", Turing wrote that his machine consisted
of:
Informal description
For visualizations of Turing machines, see Turing machine gallery.
The head is always over a particular square of the tape; only a finite stretch of
squares is shown. The instruction to be performed (q4) is shown over the scanned
square. (Drawing after Kleene (1952) p. 375.)
Here, the internal state (q1) is shown inside the head, and the illustration describes
the tape as being infinite and pre-filled with "0", the symbol serving as blank. The
system's full state (its complete configuration) consists of the internal state, any
non-blank symbols on the tape (in this illustration "11B"), and the position of the
head relative to those symbols including blanks, i.e. "011B". (Drawing after
Minsky (1967) p. 121.)
• A tape divided into cells, one next to the other. Each cell contains a symbol
from some finite alphabet. The alphabet contains a special blank symbol
(here written as '0') and one or more other symbols. The tape is assumed to
be arbitrarily extendable to the left and to the right, i.e., the Turing machine
is always supplied with as much tape as it needs for its computation. Cells
that have not been written before are assumed to be filled with the blank
symbol. In some models the tape has a left end marked with a special
symbol; the tape extends or is indefinitely extensible to the right.
• A head that can read and write symbols on the tape and move the tape left
and right one (and only one) cell at a time. In some models the head moves
and the tape is stationary.
• A state register that stores the state of the Turing machine, one of finitely
many. Among these is the special start state with which the state register is
initialized. These states, writes Turing, replace the "state of mind" a person
performing computations would ordinarily be in.
• A finite table[19] of instructions[20] that, given the state(qi) the machine is
currently in and the symbol(aj) it is reading on the tape (symbol currently
under the head), tells the machine to do the following in sequence (for the 5-
tuple models):
In the 4-tuple models, erasing or writing a symbol (aj1) and moving the head left or
right (dk) are specified as separate instructions. Specifically, the table tells the
machine to (ia) erase or write a symbol or (ib) move the head left or right, and then
(ii) assume the same or a new state as prescribed, but not both actions (ia) and (ib)
in the same instruction. In some models, if there is no entry in the table for the
current combination of symbol and state then the machine will halt; other models
require all entries to be filled.
Note that every part of the machine (i.e. its state, symbol-collections, and used tape
at any given time) and its actions (such as printing, erasing and tape motion) is
finite, discrete and distinguishable; it is the unlimited amount of tape and runtime
that gives it an unbounded amount of storage space.
Formal definition
Following Hopcroft and Ullman (1979, p. 148), a (one-tape) Turing machine can
• is the blank symbol (the only symbol allowed to occur on the tape
infinitely often at any step during the computation);
• is the set of input symbols, that is, the set of symbols allowed to appear
in the initial tape contents;
• is the set of final states or accepting states. The initial tape contents is
The 7-tuple for the 3-state busy beaver looks like this (see more about this busy
beaver at Turing machine examples):
• (states);
• (blank symbol);
• (input symbols);
• (initial state);
• (final states);
Write symbol Move tape Next state Write symbol Move tape Next state Write
symbol Move tape Next state
0 1 R B 1 L A 1 L B1 1 L C
1 R B 1 R HALT
For instance,
• There will need to be many decisions on what the symbols actually look like,
and a failproof way of reading and writing symbols indefinitely.
• The shift left and shift right operations may shift the tape head across the
tape, but when actually building a Turing machine it is more practical to
make the tape slide back and forth under the head instead.
• The tape can be finite, and automatically extended with blanks as needed
(which is closest to the mathematical definition), but it is more common to
think of it as stretching infinitely at both ends and being pre-filled with
blanks except on the explicitly given finite fragment the tape head is on.
(This is, of course, not implementable in practice.) The tape cannot be fixed
in length, since that would not correspond to the given definition and would
seriously limit the range of computations the machine can perform to those
of a linear bounded automaton.
Alternative definitions
Other authors (Minsky (1967) p. 119, Hopcroft and Ullman (1979) p. 158, Stone
(1972) p. 9) adopt a different convention, with new state qm listed immediately
after the scanned symbol Sj:
For the remainder of this article "definition 1" (the Turing/Davis convention) will
be used.
Example: state table for the 3-state 2-symbol busy beaver reduced to 5-tuples
Current state Scanned symbol Print symbol Move tape Final (i.e. next) state 5-
tuples
A 0 1 R B (A, 0, 1, R,
B)
(A, 1, 1, L,
A 1 1 L C
C)
(B, 0, 1, L,
B 0 1 L A
A)
(B, 1, 1, R,
B 1 1 R B
B)
(C, 0, 1, L,
C 0 1 L B
B)
C 1 N H (C, 1, 1, N,
1 H)
In the following table, Turing's original model allowed only the first three lines that
he called N1, N2, N3 (cf. Turing in The Undecidable, p. 126). He allowed for
erasure of the "scanned square" by naming a 0th symbol S0 = "erase" or "blank",
etc. However, he did not allow for nonprinting, so every instruction-line includes
"print symbol Sk" or "erase" (cf. footnote 12 in Post (1947), The Undecidable, p.
300). The abbreviations are Turing's (The Undecidable, p. 119). Subsequent to
Turing's original paper in 1936–1937, machine-models have allowed all nine
possible types of five-tuples:
Current m- Final m-
Tape Print- Tape- 5-tuple
configuration symbol operation motion configuration 5-tuple comments 4-
tuple
(Turing state) (Turing state)
(qi, Sj,
N1 qi Sj Print(Sk) Left L qm Sk, L, "blank" = S1=S , etc.
0,
qm) 1
(qi, Sj,
N2 qi Sj Print(Sk) Right R qm Sk, R, "blank" = S1=S , etc.
0,
qm) 1
(qi, Sj,
N3 qi Sj Print(Sk) None N qm Sk, N, "blank" = S0, (qi, Sj,
qm) 1=S1, etc. Sk, qm)
(qi, Sj,
4 qi Sj None N Left L qm N, L, (L, qqi, Smj),
qm)
(qi, Sj,
5 qi Sj None N Right R qm N, R, R, q(qi, Smj),
qm)
(qi, Sj,
6 qi Sj None N None N qm N, N, Direct "jump" (qi, Sj,
qm) N, qm)
(qi, Sj,
7 qi Sj Erase Left L qm E, L,
qm)
(qi, Sj,
8 qi Sj Erase Right R qm E, R,
qm)
(qi, Sj,
9 qi Sj Erase None N qm E, N, E, q(qi, Smj),
qm)
Any Turing table (list of instructions) can be constructed from the above nine 5-
tuples. For technical reasons, the three non-printing or "N" instructions (4, 5, 6)
can usually be dispensed with. For examples see Turing machine examples.
Less frequently the use of 4-tuples are encountered: these represent a further
atomization of the Turing instructions (cf. Post (1947), Boolos & Jeffrey (1974,
1999), Davis-Sigal-Weyuker (1994)); also see more at Post–Turing machine.
The "state"
The word "state" used in context of Turing machines can be a source of confusion,
as it can mean two things. Most commentators after Turing have used "state" to
mean the name/designator of the current instruction to be performed—i.e. the
contents of the state register. But Turing (1936) made a strong distinction between
a record of what he called the machine's "m-configuration", and the machine's (or
person's) "state of progress" through the computation - the current state of the total
system. What Turing called "the state formula" includes both the current
instruction and all the symbols on the tape:
Thus the state of progress of the computation at any stage is completely determined
by the note of instructions and the symbols on the tape. That is, the state of the
system may be described by a single expression (sequence of symbols) consisting
of the symbols on the tape followed by Δ (which we suppose not to appear
elsewhere) and then by the note of instructions. This expression is called the 'state
formula'.
Earlier in his paper Turing carried this even further: he gives an example where he
placed a symbol of the current "m-configuration"—the instruction's label—beneath
the scanned square, together with all the symbols on the tape (The Undecidable, p.
121); this he calls "the complete configuration" (The Undecidable, p. 118). To
print the "complete configuration" on one line, he places the state-label/m-
configuration to the left of the scanned symbol.
A variant of this is seen in Kleene (1952) where Kleene shows how to write the
Gödel number of a machine's "situation": he places the "m-configuration" symbol
q4 over the scanned square in roughly the center of the 6 non-blank squares on the
tape (see the Turing-tape figure in this article) and puts it to the right of the
scanned square. But Kleene refers to "q4" itself as "the machine state" (Kleene, p.
374-375). Hopcroft and Ullman call this composite the "instantaneous description"
and follow the Turing convention of putting the "current state" (instruction-label,
mconfiguration) to the left of the scanned symbol (p. 149).
Example: total state of 3-state 2-symbol busy beaver after 3 "moves" (taken
from example "run" in the figure below):
1A1
This means: after three moves the tape has ... 000110000 ... on it, the head is
scanning the rightmost 1, and the state is A. Blanks (in this case represented by
"0"s) can be part of the total state as shown here: B01; the tape has a single 1 on it,
but the head is scanning the 0 ("blank") to its left and the state is B.
Turing's biographer Andrew Hodges (1983: 107) has noted and discussed this
confusion.
Usually large tables are better left as tables (Booth, p. 74). They are more readily
simulated by computer in tabular form (Booth, p. 74). However, certain concepts—
e.g. machines with "reset" states and machines with repeating patterns (cf. Hill and
Peterson p. 244ff)—can be more readily seen when viewed as a drawing.
The evolution of the busy-beaver's computation starts at the top and proceeds to the
bottom.
The reader should again be cautioned that such diagrams represent a snapshot of
their table frozen in time, not the course ("trajectory") of a computation through
time and space. While every time the busy beaver machine "runs" it will always
follow the same state-trajectory, this is not true for the "copy" machine that can be
provided with variable input "parameters".
The diagram "Progress of the computation" shows the three-state busy beaver's
"state"
(instruction) progress through its computation from start to finish. On the far right
is the Turing "complete configuration" (Kleene "situation", Hopcroft–Ullman
"instantaneous description") at each step. If the machine were to be stopped and
cleared to blank both the "state register" and entire tape, these "configurations"
could be used to rekindle a computation anywhere in its progress (cf. Turing
(1936) The Undecidable, pp. 139–140).
Many machines that might be thought to have more computational capability than
a simple universal Turing machine can be shown to have no more power (Hopcroft
and Ullman p. 159, cf. Minsky (1967)). They might compute faster, perhaps, or use
less memory, or their instruction set might be smaller, but they cannot compute
more powerfully (i.e. more mathematical functions). (Recall that the Church–
Turing thesis hypothesizes this to be true for any kind of machine: that anything
that can be "computed" can be computed by some Turing machine.)
At the other extreme, some very simple models turn out to be Turing-equivalent,
i.e. to have the same computational power as the Turing machine model.
Common equivalent models are the multi-tape Turing machine, multi-track Turing
machine, machines with input and output, and the non-deterministic Turing
machine (NDTM) as opposed to the deterministic Turing machine (DTM) for
which the action table has at most one entry for each combination of symbol and
state.
Read-only, right-moving Turing machines are equivalent to NDFAs (as well as
DFAs by conversion using the NDFA to DFA conversion algorithm).
For practical and didactical intentions the equivalent register machine can be used
as a usual assembly programming language.
...whose motion is only partially determined by the configuration ... When such a
machine reaches one of these ambiguous configurations, it cannot go on until some
arbitrary choice has been made by an external operator. This would be the case if
we were using machines to deal with axiomatic systems.
This is indeed the technique by which a deterministic (i.e. a-) Turing machine can
be used to mimic the action of a nondeterministic Turing machine; Turing solved
the matter in a footnote and appears to dismiss it from further consideration.
This finding is now taken for granted, but at the time (1936) it was considered
astonishing. The model of computation that Turing called his "universal
machine"—"U" for short—is considered by some (cf. Davis (2000)) to have been
the fundamental theoretical breakthrough that led to the notion of the stored-
program computer.
Turing's paper ... contains, in essence, the invention of the modern computer and
some of the programming techniques that accompanied it.
It is often said[who?] that Turing machines, unlike simpler automata, are as powerful
as real machines, and are able to execute any operation that a real program can.
What is neglected in this statement is that, because a real machine can only have a
finite number of configurations, this "real machine" is really nothing but a linear
bounded automaton. On the other hand, Turing machines are equivalent to
machines that have an unlimited amount of storage space for their computations.
However, Turing machines are not intended to model computers, but rather they
are intended to model computation itself. Historically, computers, which compute
only on their (fixed) internal storage, were developed only later.
There are a number of ways to explain why Turing machines are useful models of
real computers:
1. Anything a real computer can compute, a Turing machine can also compute.
For example: "A Turing machine can simulate any type of subroutine found
in programming languages, including recursive procedures and any of the
known parameter-passing mechanisms" (Hopcroft and Ullman p. 157). A
large enough FSA can also model any real computer, disregarding IO. Thus,
a statement about the limitations of Turing machines will also apply to real
computers.
2. The difference lies only with the ability of a Turing machine to manipulate
an unbounded amount of data. However, given a finite amount of time, a
Turing machine (like a real machine) can only manipulate a finite amount of
data.
3. Like a Turing machine, a real machine can have its storage space enlarged as
needed, by acquiring more disks or other storage media. If the supply of
these runs short, the Turing machine may become less useful as a model. But
the fact is that neither Turing machines nor real machines need astronomical
amounts of storage space in order to perform useful computation. The
processing time required is usually much more of a problem.
4. Descriptions of real machine programs using simpler abstract models are
often much more complex than descriptions using Turing machines. For
example, a Turing machine describing an algorithm may have a few hundred
states, while the equivalent deterministic finite automaton
(DFA) on a given real machine has quadrillions. This makes the DFA
representation infeasible to analyze.
5. Turing machines describe algorithms independent of how much memory
they use. There is a limit to the memory possessed by any current machine,
but this limit can rise arbitrarily in time. Turing machines allow us to make
statements about algorithms which will (theoretically) hold forever,
regardless of advances in conventional computing machine architecture.
6. Turing machines simplify the statement of algorithms. Algorithms running
on Turing-equivalent abstract machines are usually more general than their
counterparts running on real machines, because they have arbitrary-precision
data types available and never have to deal with unexpected conditions
(including, but not limited to, running out of memory).
Concurrency
This section does not cite any sources. Please help improve this section
by adding citations to reliable sources. Unsourced material may be
challenged and removed. (April 2015) (Learn how and when to remove this
template message)
Another limitation of Turing machines is that they do not model concurrency well.
For example, there is a bound on the size of integer that can be computed by an
always-halting nondeterministic Turing machine starting on a blank tape. (See
article on unbounded nondeterminism.) By contrast, there are always-halting
concurrent systems with no inputs that can compute an integer of unbounded size.
(A process can be created with local storage that is initialized with a count of 0 that
concurrently sends itself both a stop and a go message. When it receives a go
message, it increments its count by 1 and sends itself a go message. When it
receives a stop message, it stops with an unbounded number in its local storage.)
Interaction
In the early days of computing, computer use was typically limited to batch
processing, i.e., noninteractive tasks, each producing output data from given input
data. Computability theory, which studies computability of functions from inputs
to outputs, and for which Turing machines were invented, reflects this practice.
Since the 1970s, interactive use of computers became much more common. In
principle, it is possible to model this by having an external agent read from the tape
and write to it at the same time as a Turing machine, but this rarely matches how
interaction actually happens; therefore, when describing interactivity, alternatives
such as I/O automata are usually preferred.
UNIT-V
Undecidable problem
From Wikipedia, the free encyclopedia
Background
A decision problem is any arbitrary yes-or-no question on an infinite set of inputs.
Because of this, it is traditional to define the decision problem equivalently as the
set of inputs for which the problem returns yes. These inputs can be natural numbers,
but also other values of some other kind, such as strings of a formal language. Using
some encoding, such as a Gödel numbering, the strings can be encoded as natural
numbers. Thus, a decision problem informally phrased in terms of a formal language
is also equivalent to a set of natural numbers. To keep the formal definition simple,
it is phrased in terms of subsets of the natural numbers.
Alan Turing proved in 1936 that a general algorithm running on a Turing machine
that solves the halting problem for all possible program-input pairs necessarily
cannot exist. Hence, the halting problem is undecidable for Turing machines.
We have shown that the powerset of an infinite set is not enumerable -- that it has
more than 0 subsets. Each of these subsets represents a language. Therefore,
there must be languages that are not computable by a Turing machine.
Now let's also suppose that the complement of L, -L = {w: w L}, is recursively
enumerable. That means there is some other Turing machine T2 that, given any
string of -L, halts and accepts that string.
Clearly, any string (over the appropriate alphabet ) belongs to either L or -L.
Hence, any string will cause either T1 or T2 (or both) to halt. We construct a new
Turing machine that emulates both T1 and T2, alternating moves between them.
When either one stops, we can tell (by whether it accepted or rejected the string) to
which language the string belongs. Thus, we have constructed a Turing machine
that, for each input, halts with an answer whether or not the string belongs to L.
Therefore L and -L are recursive languages.
We have just proved the following theorem: If a language and its complement are
both recursively enumerable, then both are recursive.
We have shown how to enumerate strings for a given alphabet, w1, w2, w3, .... We
have also shown how to enumerate Turing machines, T1, T2, T3, .... (Recall that
each Turing machine defines a recursively enumerable language.) Consider the
language
L = {wi: wi L(Ti)}
A little thought will show that L is itself recursively enumerable. But now consider
its complement:
-L = {wi: wi L(Ti)}
• If wk belongs to L then (by the way we have defined L) Tk accepts this string.
But Tk accepts only strings that do not belong to L, so we have a contradiction.
• If wk does not belong to L, then it belongs to -L and is accepted by Tk. But
since Tk accepts wk, wk must belong to L. Again, a contradiction.
Undecidable Problems
i xi yi
1
2
3
•
• EXAMPLE #2: SAT, the satisfiability problem to test
whether a given Boolean
• formula is satisfiable.
•
• All sets of input must be tried systematically (brute-force)
until a satisfying case is discovered.
• EXAMPLE #3: Integer partition: Can you partition n
integers into two subsets such
• that the sums of the subsets are equal?
•
• As the size of the integers (i.e. the size of n) grows linearly,
the size of the computations required to check all subsets
and their respective sums grows exponentially. This is
because, once again, we are forced to use the brute-force
method to test the subsets of each division and their sums.
•
• EXAMPLE #4: Graph coloring: How many colors do
you need to color a graph • such that no
two adjacent vertices are of the same color?
•
• EXAMPLE #5: Bin packing: How many bins of a
given size do you need to hold n
• items of variable size?
•
• Again, the best algorithm for this problem involves going
through all subsets of n items, seeing how they fit into the
bins, and backtracking to test for better fits among subsets
until all possibles subsets have been tested to achieve the
proper answer. Once again, brute-force. Once again,
exponential.
P versus NP problem
The P versus NP problem is a major unsolved problem in computer science. It asks
whether every problem whose solution can be quickly verified (technically, verified
in polynomial time) can also be solved quickly (again, in polynomial time).
The underlying issues were first discussed in the 1950s, in letters from John Forbes
Nash Jr. to the National Security Agency, and from Kurt Gödel to John von
Neumann. The precise statement of the P versus NP problem was introduced in
1971 by Stephen Cook in his seminal paper "The complexity of theorem proving
procedures"[2] and is considered by many to be the most important open problem
in the field.[3] It is one of the seven Millennium Prize Problems selected by the Clay
Mathematics Institute to carry a US$1,000,000 prize for the first correct solution.
The informal term quickly, used above, means the existence of an algorithm solving
the task that runs in polynomial time, such that the time to complete the task varies
as a polynomial function on the size of the input to the algorithm (as opposed to,
say, exponential time). The general class of questions for which some algorithm
can provide an answer in polynomial time is called "class P" or just "P". For some
questions, there is no known way to find an answer quickly, but if one is provided
with information showing what the answer is, it is possible to verify the answer
quickly. The class of questions for which an answer can be verified in polynomial
time is called NP, which stands for "nondeterministic polynomial time".[Note 1]
Consider Sudoku, an example of a problem that is easy to verify, but whose answer
may be difficult to compute. Given a partially filled-in Sudoku grid, of any size, is
there at least one legal solution? A proposed solution is easily verified, and the
time to check a solution grows slowly (polynomially) as the grid gets bigger.
However, all known algorithms for finding solutions take, for difficult examples,
time that grows exponentially as the grid gets bigger. So Sudoku is in NP (quickly
checkable) but does not seem to be in P (quickly solvable). Thousands of other
problems seem similar, fast to check but slow to solve. Researchers have shown
that a fast solution to any one of these problems could be used to build a quick
solution to all the others, a property called NP-completeness. Decades of
searching have not yielded a fast solution to any of these problems, so most
scientists suspect that none of these problems can be solved quickly. However, this
has never been proven.
An answer to the P = NP question would determine whether problems that can be
verified in polynomial time, like Sudoku, can also be solved in polynomial time. If
it turned out that P ≠ NP, it would mean that there are problems in NP that are
harder to compute than to verify: they could not be solved in polynomial time, but
the answer could be verified in polynomial time.
Context
The relation between the complexity classes P and NP is studied in computational
complexity theory, the part of the theory of computation dealing with the resources
required during computation to solve a given problem. The most common
resources are time (how many steps it takes to solve a problem) and space (how
much memory it takes to solve a problem).
In such analysis, a model of the computer for which time must be analyzed is
required. Typically such models assume that the computer is deterministic (given
the computer's present state and any inputs, there is only one possible action that
the computer might take) and sequential (it performs actions one after the other).
In this theory, the class P consists of all those decision problems (defined below)
that can be solved on a deterministic sequential machine in an amount of time that
is polynomial in the size of the input; the class NP consists of all those decision
problems whose positive solutions can be verified in polynomial time given the
right information, or equivalently, whose solution can be found in polynomial time
on a non-deterministic machine.[7] Clearly, P NP. Arguably the biggest open
question in theoretical computer science concerns the relationship between those
two classes:
Is P equal to NP?
In a 2002 poll of 100 researchers, 61 believed the answer to be no, 9 believed the
answer is yes, and 22 were unsure; 8 believed the question may be independent of
the currently accepted axioms and therefore impossible to prove or disprove.[8]
In 2012, 10 years later, the same poll was repeated. The number of researchers who
answered was 151: 126 (83%) believed the answer to be no, 12 (9%) believed the
answer is yes, 5 (3%) believed the question may be independent of the currently
accepted axioms and therefore impossible to prove or disprove, 8 (5%) said either
don't know or don't care or don't want the answer to be yes nor the problem to be
resolved.[9]
NP-completeness
Euler diagram for P, NP, NP-complete, and NP-hard set of problems (excluding
the empty language and its complement, which belong to P but are not NP-
complete)
Main article: NP-completeness
NP-hard problems are those at least as hard as NP problems, i.e., all NP problems
can be reduced (in polynomial time) to them. NP-hard problems need not be in NP,
i.e., they need not have solutions verifiable in polynomial time.
For instance, the Boolean satisfiability problem is NP-complete by the Cook–Levin
theorem, so any instance of any problem in NP can be transformed mechanically
into an instance of the Boolean satisfiability problem in polynomial time. The
Boolean satisfiability problem is one of many such NP-complete problems. If any
NP-complete problem is in P, then it would follow that P = NP. However, many
important problems have been shown to be NP-complete, and no fast algorithm for
any of them is known.
Based on the definition alone it is not obvious that NP-complete problems exist;
however, a trivial and contrived NP-complete problem can be formulated as
follows: given a description of a Turing machine M guaranteed to halt in
polynomial time, does there exist a polynomial-size input that M will accept?[10] It
is in NP because (given an input) it is simple to check whether M accepts the input
by simulating M; it is NP-complete because the verifier for any particular instance
of a problem in NP can be encoded as a polynomial-time machine M that takes the
solution to be verified as input. Then the question of whether the instance is a yes
or no instance is determined by whether a valid input exists.
The first natural problem proven to be NP-complete was the Boolean satisfiability
problem. As noted above, this is the Cook–Levin theorem; its proof that
satisfiability is NP-complete contains technical details about Turing machines as
they relate to the definition of NP. However, after this problem was proved to be
NP-complete, proof by reduction provided a simpler way to show that many other
problems are also NP-complete, including the Sudoku discussed earlier. In this
case, the proof shows that a solution of Sudoku in polynomial time, could also be
used to complete Latin squares in polynomial time.[11] This in turn gives a solution
to the problem of partitioning tri-partitite graphs into triangles,[12] which could then
be used to find solutions for 3-sat,[13] which then provides a solution for general
boolean satisfiability. So a polynomial time solution to Sudoku leads, by a series of
mechanical transformations, to a polynomial time solution of satisfiability, which
in turn can be used to solve any other NP-complete problem in polynomial time.
Using transformations like this, a vast class of seemingly unrelated problems are
all reducible to one another, and are in a sense "the same problem".
Harder problems
See also: Complexity class
Although it is unknown whether P = NP, problems outside of P are known. A
number of succinct problems (problems that operate not on normal input, but on a
computational description of the input) are known to be EXPTIME-complete.
Because it can be shown that P ≠ EXPTIME, these problems are outside P, and so
require more than polynomial time. In fact, by the time hierarchy theorem, they
cannot be solved in significantly less than exponential time. Examples include
finding a perfect strategy for chess (on an N × N board). and some other board
games. The problem of deciding the truth of a statement in Presburger arithmetic
requires even more time. Fischer and Rabin proved in 1974 that every algorithm
that decides the truth of Presburger
statements of length n has a runtime of at least for some constant c. Hence, the
problem is known to need more than exponential run time. Even more difficult are
the undecidable problems, such as the halting problem. They cannot be completely
solved by any algorithm, in the sense that for any particular algorithm there is at
least one input for which that algorithm will not produce the right answer; it will
either produce the wrong answer, finish without giving a conclusive answer, or
otherwise run forever without producing any answer at all.
It is also possible to consider questions other than decision problems. One such
class, consisting of counting problems, is called #P: whereas an NP problem asks
"Are there any solutions?", the corresponding #P problem asks "How many
solutions are there?" Clearly, a #P problem must be at least as hard as the
corresponding NP problem, since a count of solutions immediately tells if at least
one solution exists, if the count is greater than zero. Surprisingly, some #P
problems that are believed to be difficult correspond to easy (for example linear-
time) P problems.[17] For these problems, it is very easy to tell whether solutions
exist, but thought to be very hard to tell how many. Many of these problems are
#P-complete, and hence among the hardest problems in #P, since a polynomial
time solution to any of them would allow a polynomial time solution to all other #P
problems.
to factor an n-bit integer. However, the best known quantum algorithm for this
problem, Shor's algorithm, does run in polynomial time, although this does not
indicate where the problem lies with respect to non-quantum complexity classes.
All of the above discussion has assumed that P means "easy" and "not in P" means
"hard", an assumption known as Cobham's thesis. It is a common and reasonably
accurate assumption in complexity theory; however, it has some caveats.
First, it is not always true in practice. A theoretical polynomial algorithm may have
extremely large constant factors or exponents thus rendering it impractical. On the
other hand, even if a problem is shown to be NP-complete, and even if P ≠ NP,
there may still be effective approaches to tackling the problem in practice. There
are algorithms for many NP-complete problems, such as the knapsack problem, the
traveling salesman problem and the Boolean satisfiability problem, that can solve
to optimality many real-world instances in reasonable time. The empirical average-
case complexity (time vs. problem size) of such algorithms can be surprisingly
low. An example is the simplex algorithm in linear programming, which works
surprisingly well in practice; despite having exponential worst-case time
complexity it runs on par with the best known polynomial-time algorithms.[23]
Second, there are types of computations which do not conform to the Turing
machine model on which P and NP are defined, such as quantum computation and
randomized algorithms.
Reasons to believe P ≠ NP
According to polls,[8][24] most computer scientists believe that P ≠ NP. A key reason
for this belief is that after decades of studying these problems no one has been able
to find a polynomialtime algorithm for any of more than 3000 important known
NP-complete problems (see List of NP-complete problems). These algorithms were
sought long before the concept of NPcompleteness was even defined (Karp's 21
NP-complete problems, among the first found, were all well-known existing
problems at the time they were shown to be NP-complete). Furthermore, the result
P = NP would imply many other startling results that are currently believed to be
false, such as NP = co-NP and P = PH.
It is also intuitively argued that the existence of problems that are hard to solve but
for which the solutions are easy to verify matches real-world experience.]
If P = NP, then the world would be a profoundly different place than we usually
assume it to be. There would be no special value in "creative leaps," no
fundamental gap between solving a problem and recognizing the solution once it's
found.
Consequences of solution
One of the reasons the problem attracts so much attention is the consequences of
the answer. Either direction of resolution would advance theory enormously, and
perhaps have huge practical consequences as well.
P = NP
A proof that P = NP could have stunning practical consequences if the proof leads
to efficient methods for solving some of the important problems in NP. It is also
possible that a proof would not lead directly to efficient methods, perhaps if the
proof is non-constructive, or the size of the bounding polynomial is too big to be
efficient in practice. The consequences, both positive and negative, arise since
various NP-complete problems are fundamental in many fields.
On the other hand, there are enormous positive consequences that would follow
from rendering tractable many currently mathematically intractable problems. For
instance, many problems in operations research are NP-complete, such as some
types of integer programming and the travelling salesman problem. Efficient
solutions to these problems would have enormous implications for logistics. Many
other important problems, such as some problems in protein structure prediction,
are also NP-complete;[30] if these problems were efficiently solvable it could spur
considerable advances in life sciences and biotechnology.
But such changes may pale in significance compared to the revolution an efficient
method for solving NP-complete problems would cause in mathematics itself.
Gödel, in his early thoughts on computational complexity, noted that a mechanical
method that could solve any problem would revolutionize mathematics: [31][32]
If there really were a machine with φ(n) ∼ k ⋅ n (or even ∼ k ⋅ n2), this would have
consequences of the greatest importance. Namely, it would obviously mean that in
spite of the undecidability of the Entscheidungsproblem, the mental work of a
mathematician concerning Yes-or-No questions could be completely replaced by a
machine. After all, one would simply have to choose the natural number n so large
that when the machine does not deliver a result, it makes no sense to think more
about the problem.
Research mathematicians spend their careers trying to prove theorems, and some
proofs have taken decades or even centuries to find after problems have been
stated—for instance, Fermat's Last Theorem took over three centuries to prove. A
method that is guaranteed to find proofs to theorems, should one exist of a
"reasonable" size, would essentially end this struggle.
Donald Knuth has stated that he has come to believe that P = NP, but is reserved
about the impact of a possible proof:[34]
[...] I don't believe that the equality P = NP will turn out to be helpful even if it is
proved, because such a proof will almost surely be nonconstructive.
P ≠ NP
A proof that showed that P ≠ NP would lack the practical computational benefits of
a proof that P = NP, but would nevertheless represent a very significant advance in
computational complexity theory and provide guidance for future research. It
would allow one to show in a formal way that many common problems cannot be
solved efficiently, so that the attention of researchers can be focused on partial
solutions or solutions to other problems. Due to widespread belief in P ≠ NP, much
of this focusing of research has already taken place.
Also P ≠ NP still leaves open the average-case complexity of hard problems in NP.
For example, it is possible that SAT requires exponential time in the worst case,
but that almost all randomly selected instances of it are efficiently solvable. Russell
Impagliazzo has described five hypothetical "worlds" that could result from
different possible resolutions to the average-case complexity question. These range
from "Algorithmica", where P = NP and problems like SAT can be solved
efficiently in all instances, to "Cryptomania", where P ≠ NP and generating hard
instances of problems outside P is easy, with three intermediate possibilities
reflecting different possible distributions of difficulty over instances of NP-hard
problems. The "world" where P ≠ NP but all problems in NP are tractable in the
average case is called "Heuristica" in the paper. A Princeton University workshop
in 2009 studied the status of the five worlds.
As additional evidence for the difficulty of the problem, essentially all known
proof techniques in computational complexity theory fall into one of the following
classifications, each of which is known to be insufficient to prove that P ≠ NP:
Classification Definition
Imagine a world where every algorithm is allowed to make queries to
some fixed subroutine called an oracle (a black box which can answer
a fixed set of questions in constant time. For example, a black box that
solves any given travelling salesman problem in 1 step), and the
Relativizing
running time of the oracle is not counted against the running time of
proofs
the algorithm. Most proofs (especially classical ones) apply uniformly
in a world with oracles regardless of what the oracle does. These
proofs are called relativizing. In 1975, Baker, Gill, and Solovay
showed that P = NP with respect to some oracles, while P ≠ NP for
other oracles.[38] Since relativizing proofs can only prove statements
that are uniformly true with respect to all possible oracles, this showed
that relativizing techniques cannot resolve P = NP.
These barriers are another reason why NP-complete problems are useful: if a
polynomial-time algorithm can be demonstrated for an NP-complete problem, this
would solve the P = NP problem in a way not excluded by the above results.
These barriers have also led some computer scientists to suggest that the P versus
NP problem may be independent of standard axiom systems like ZFC (cannot be
proved or disproved within them). The interpretation of an independence result
could be that either no polynomial-time algorithm exists for any NP-complete
problem, and such a proof cannot be constructed in (e.g.) ZFC, or that polynomial-
time algorithms for NP-complete problems may exist, but it is impossible to prove
in ZFC that such algorithms are correct. However, if it can be shown, using
techniques of the sort that are currently known to be applicable, that the problem
cannot be decided even with much weaker assumptions extending the Peano
axioms (PA) for integer arithmetic, then there would necessarily exist nearly-
polynomial-time algorithms for every problem in NP. Therefore, if one believes (as
most complexity theorists do) that not all problems in NP have efficient
algorithms, it would follow that proofs of independence using those techniques
cannot be possible. Additionally, this result implies that proving independence
from PA or ZFC using currently known techniques is no easier than proving the
existence of efficient algorithms for all problems in NP.
Claimed solutions
While the P versus NP problem is generally considered unsolved, many amateur
and some professional researchers have claimed solutions. Gerhard J. Woeginger
has a comprehensive list. As of 2018, this list contained 62 purported proofs of P =
NP, 50 of P ≠ NP, 2 proofs the problem is unprovable, and one proof that it is
undecidable. An August 2010 claim of proof that P ≠ NP, by Vinay Deolalikar, a
researcher at HP Labs, received heavy Internet and press attention after being
initially described as "seem[ing] to be a relatively serious attempt" by two leading
specialists. The proof has been reviewed publicly by academics, and Neil
Immerman, an expert in the field, has pointed out two possibly fatal errors in the
proof. In September 2010, Deolalikar was reported to be working on a detailed
expansion of his attempted proof. However, opinions expressed by several notable
theoretical computer scientists indicate that the attempted proof is neither correct
nor a significant advancement in the understanding of the problem. This
assessment prompted a May 2013 The New Yorker article to call the proof attempt
"thoroughly discredited".
Logical characterizations
The P = NP problem can be restated in terms of expressible certain classes of
logical statements, as a result of work in descriptive complexity.
Consider all languages of finite structures with a fixed signature including a linear
order relation. Then, all such languages in P can be expressed in first-order logic
with the addition of a suitable least fixed-point combinator. Effectively, this, in
combination with the order, allows the definition of recursive functions. As long as
the signature contains at least one predicate or function in addition to the
distinguished order relation, so that the amount of space taken to store such finite
structures is actually polynomial in the number of elements in the structure, this
precisely characterizes P.
Polynomial-time algorithms
No algorithm for any NP-complete problem is known to run in polynomial time.
However, there are algorithms known for NP-complete problems with the property
that if P = NP, then the algorithm runs in polynomial time on accepting instances
(although with enormous constants, making the algorithm impractical). However,
these algorithms do not qualify as polynomial time because their running time on
rejecting instances are not polynomial. The following algorithm, due to Levin
(without any citation), is such an example below. It correctly accepts the
NPcomplete language SUBSET-SUM. It runs in polynomial time on inputs that are
in SUBSETSUM if and only if P = NP:
If, and only if, P = NP, then this is a polynomial-time algorithm accepting an NP-
complete language. "Accepting" means it gives "yes" answers in polynomial time,
but is allowed to run forever when the answer is "no" (also known as a semi-
algorithm).
Formal definitions
P and NP
2. there exists such that , where O refers to the big O notation and
L NP if, and only if, there exists a binary relation and a positive integer k such
that the following two conditions are satisfied:
Example
Let
NP-completeness
Main article: NP-completeness
L is NP-complete if, and only if, the following two conditions are satisfied:
1. L NP; and
Although any given solution to an NP-complete problem can be verified quickly (in
polynomial time), there is no known efficient way to locate a solution in the first
place; the most notable characteristic of NP-complete problems is that no fast
solution to them is known. That is, the time required to solve the problem using any
currently known algorithm increases very quickly as the size of the problem grows.
As a consequence, determining whether it is possible to solve these problems
quickly, called the P versus NP problem, is one of the principal unsolved problems
in computer science today.
Overview
NP-complete problems are in NP, the set of all decision problems whose solutions
can be verified in polynomial time; NP may be equivalently defined as the set of
decision problems that can be solved in polynomial time on a non-deterministic
Turing machine. A problem p in NP is NP-complete if every other problem in NP
can be transformed (or reduced) into p in polynomial time.
NP-complete problems are studied because the ability to quickly verify solutions to
a problem (NP) seems to correlate with the ability to quickly solve that problem
(P). It is not known whether every problem in NP can be quickly solved—this is
called the P versus NP problem. But if any NP-complete problem can be solved
quickly, then every problem in NP can, because the definition of an NP-complete
problem states that every problem in NP must be quickly reducible to every NP-
complete problem (that is, it can be reduced in polynomial time). Because of this, it
is often said that NP-complete problems are harder or more difficult than NP
problems in general.
Formal definition
See also: formal definition for NP-completeness
if:
1. is in NP, and
any other Turing-equivalent abstract machine) for , we could solve all problems in
NP in polynomial time.
Background
The concept of NP-completeness was introduced in 1971 (see Cook–Levin
theorem), though the term NP-complete was introduced later. At 1971 STOC
conference, there was a fierce debate among the computer scientists about whether
NP-complete problems could be solved in polynomial time on a deterministic
Turing machine. John Hopcroft brought everyone at the conference to a consensus
that the question of whether NP-complete problems are solvable in polynomial
time should be put off to be solved at some later date, since nobody had any formal
proofs for their claims one way or the other. This is known as the question of
whether P=NP.
The Cook–Levin theorem states that the Boolean satisfiability problem is NP-
complete (a simpler, but still highly technical proof of this is available). In 1972,
Richard Karp proved that several other problems were also NP-complete (see
Karp's 21 NP-complete problems); thus there is a class of NP-complete problems
(besides the Boolean satisfiability problem). Since the original results, thousands of
other problems have been shown to be NP-complete by reductions from other
problems previously shown to be NP-complete; many of these problems are
collected in Garey and Johnson's 1979 book Computers and Intractability: A Guide
to the Theory of NPCompleteness.[3] For more details refer to Introduction to the
Design and Analysis of Algorithms by Anany Levitin.
NP-complete problems
The easiest way to prove that some new problem is NP-complete is first to prove
that it is in NP, and then to reduce some known NP-complete problem to it.
Therefore, it is useful to know a variety of NP-complete problems. The list below
contains some well-known problems that are NP-complete when expressed as
decision problems.
To the right is a diagram of some of the problems and the reductions typically used
to prove their NP-completeness. In this diagram, an arrow from one problem to
another indicates the direction of the reduction. Note that this diagram is
misleading as a description of the mathematical relationship between these
problems, as there exists a polynomial-time reduction between any two NP-
complete problems; but it indicates where demonstrating this polynomial-time
reduction has been easiest.
Another type of reduction that is also often used to define NP-completeness is the
logarithmicspace many-one reduction which is a many-one reduction that can be
computed with only a logarithmic amount of space. Since every computation that
can be done in logarithmic space can also be done in polynomial time it follows
that if there is a logarithmic-space many-one reduction then there is also a
polynomial-time many-one reduction. This type of reduction is more refined than
the more usual polynomial-time many-one reductions and it allows us to
distinguish more classes such as P-complete. Whether under these types of
reductions the definition of NP-complete changes is still an open problem. All
currently known NP-complete problems are NP-complete under log space
reductions. All currently known NP-complete problems remain NP-complete even
under much weaker reductions.[4] It is known, however, that
AC0 reductions define a strictly smaller class than polynomial-time reductions.[5]
Naming
According to Donald Knuth, the name "NP-complete" was popularized by Alfred
Aho, John
Hopcroft and Jeffrey Ullman in their celebrated textbook "The Design and
Analysis of Computer
Algorithms". He reports that they introduced the change in the galley proofs for the
book (from "polynomially-complete"), in accordance with the results of a poll he
had conducted of the theoretical computer science community.[6] Other suggestions
made in the poll[7] included "Herculean", "formidable", Steiglitz's "hard-boiled" in
honor of Cook, and Shen Lin's acronym
"PET", which stood for "probably exponential time", but depending on which way
the P versus NP problem went, could stand for "provably exponential time" or
"previously exponential time".[8]
Common misconceptions
The following misconceptions are frequent.
"NP-complete problems are the most difficult known problems." Since NP-
complete problems are in NP, their running time is at most exponential. However,
some problems provably require more time, for example Presburger arithmetic.