0% found this document useful (0 votes)

926 views

Chapter 1. Preliminaries: Example 1.1

Uploaded by

Smily Smily

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

926 views

Chapter 1. Preliminaries: Example 1.1

Uploaded by

Smily Smily

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 760

Chapter 1.

Preliminaries
In this chapter, we review some basic concepts required for
understanding the contents of this book.

Sets, Relations, and Functions

Sets
A set is a collection of well-defined objects. Usually the elements of
a set have common properties. For example, all the students who
enroll for a course ‘Computability’ make up a set. Formally,

Definition 1.1

A set is an unordered collection of objects.

Note that the definition of a set is intuitive in nature and was

stated by the German mathematician Cantor in 1895. The objects
in a set are called the elements or members of the set.

Example 1.1.

The set E of even positive integers less than 20 can be expressed

by:

E = {2, 4, 6, 8, 10, 12, 14, 16, 18}

E = {x|x is even and 0 < x < 20}

A set is finite if it contains a finite number of elements and is

infinite otherwise. The empty set has no elements and is denoted
by φ. Cardinality of a set A is the number of elements in A and is
denoted by #A or |A|. For example, if A = {2, 4, 6, 8}, then #A = 4.
Note that #φ = 0. If a set is infinite, one can not list all its
elements. This set can be specified by providing a property that all
members of the set must satisfy.

For example, A = {x|x is a positive integer divisible by 3}. The

general format of such specification is A = {x|P(x) is a property
that x must satisfy}.

Sets can be represented by either set builder form or by explicitly

listing its elements. Sets can be defined using an inductive
definition also. An inductive definition of a set has three
components. They are:

1.
The basis or basis clause that lists some elements in the set (which
are basic building blocks).
2.
3.
The induction or inductive clause which states how to produce new
elements from the existing elements of the set.
4.
5.
The extremal clause which asserts that no object is a member of
the set unless its being so follows from a finite number of
applications of the basis and inductive clauses.
6.
Example 1.2.

Let W denote the set of well-formed parentheses. It can be defined

inductively as follows:


Basis clause: [ ] ∊W


Inductive clause: if x, y ∊W, xy ∊ W and [x] ∊W


Extrenal clause: No object is a member of W unless its being so
follows from a finite number of applications of the basis and the
inductive clauses.

A set A is a subset of a set B if each element of A is also an element
of B and is denoted by A ⊆ B. We also say that A is included in B.
If A is a subset of B and A is not same as B, then we say that A is a
proper subset B and is denoted by A ⊂ B. φ is a subset of every set.
Two sets A and B are equal if every element of A is also an element
of B and vice versa. We denote equal sets by A = B.

Two sets can be combined to form a third set by various set

operations. They are union, intersection, and difference. The union
of two sets has as elements, the elements of one of the two sets and
possibly both. Union is denoted by the symbol ∪ so that A ∪ B =
{x|x ∊ A or x ∊ B}.

The intersection of two sets is the collection of all elements of the

two sets which are common and is denoted by ∩. For two
sets A, B, A ∩ B = {x|x ∊ A and x ∊ B}.

The difference of two sets A and B denoted by A − B is the set of all

elements that are in the set A but not in the set B. For the
sets A and B, A − B = {x|x ∊ A and x ∉ B}.

Let U be the universal set and A ⊆ U. The complement of A with

respect to U is defined as Ā = U − A.

The power set of set S is the set of all subsets of S and is denoted
by . If S = { a, b, c} then,

= {φ, {a}, {b}, {c},{a, b},{b, c}, {a, c}, {a, b, c}}.

Sequences and Tuples

A sequence of objects is a list of objects in some order. For
example, the sequence 7, 14, 21 would be written as (7, 14, 21). In a
set, the order does not matter but in a sequence it does. Also,
repetition is not permitted in a set but is allowed in a sequence. For
example, (7, 7, 14, 21, 21) is a sequence whereas {7, 14, 21} is a set.
Like sets, sequences may be finite or infinite. Finite sequences are
called tuples. A sequence of k elements is called a k-tuple. For
example, (2, 4) is a 2-tuple, (7, 14, 21) is a 3-tuple.

If A and B are two sets, the Cartesian product or cross product

of A and B written as A × B is the set of all pairs where the first
component is a member of the set A and second component is a
member of the set B. For example, if

A = {0, 1}, B = {x, y, z}, A × B = {(0, x), (0, y), (0, z), (1, x), (1, y), (1, z) }

One can also write the Cartesian product of k-sets A1, A2,..., Ak in a

similar fashion.

Relations and Functions

A binary relation on two sets A and B is a subset of A × B. For
example, if A = {1, 3, 9}, B = {x, y}, then {(1, x), (3, y), (9, x)} is a
binary relation on 2-sets. Binary relations on k–
sets A1, A2,..., Ak can similarly be defined.

Another basic concept on sets is function. A function is an object

that sets up an input-output relationship. That is, a function takes
an input and produces the required output. For a function f, with
input x, the output y, we write f(x) = y. We also say
that f maps x to y. For example, addition is a function which
produces the sum of the numbers that are input. The set of possible
inputs to a function is called its “domain.” The output of a function
comes from a set called its “range.” Let D be the domain and R be
the range of a function f. We denote this description of a function
as “f: D → R”. For example, f is a function with domain Z and
range Z, we write it as f: Z → Z. A function that uses all the
elements of the range is said to be onto (surjective). A function
“f: D → R” is said to be one-to-one or 1–1 (injective) if for any two
distinct elements a, b ∊ D, f(a)≠f(b). A function which is both one-
to-one and onto is called a bijective function.
A binary relation K ⊆ A × B has an inverse, say K−1 ⊆ B × A defined
as (b, a) ∊ K−1 if and only if (a, b) ∊ K. For example,


K={(x, y)|x ∊ S, y ∊ T, x is the student of y}


K−1={(y, x)|x ∊ S, y ∊ T, y is the teacher of x}

Note that the inverse of a function need not be a function. A
function f: A → B may not have an inverse if there is some
element b ∊ B such that f(a)=b for no a ∊ A. But every bijective ‘f’
possesses an inverse f−1 (say), and f−1(f(a)) = a for each a ∊ A.

A binary relation R is an equivalence relation if R satisfies the

following conditions:


R is reflexive i.e., for every x, (x, x) ∊ R.


R is symmetric i.e., for every x and y, (x, y) ∊ R implies (y, x) ∊ R.


R is transitive i.e., for every x,y and z, (x, y) ∊ R and (y, z)
∊ R implies (x, z) ∊ R.

Equivalence relation on a set A partitions A into equivalence
classes. The number of equivalence classes is called the index or
rank of an equivalence relation. Index can be finite or infinite.

Example 1.3.

Let N be the set of non-negative integers. The relation ≡ is defined

as follows: a ≡ b if and only if a and b leave the same remainder
when divided by 3. This can be easily seen to be an equivalence
relation of index 3.

An equivalence relation induces a partition on the underlying set.

In the above example, the set of non-negative integers is
partitioned into three equivalence classes:

E11={0, 3, 6, ...},
E12={1, 4, 7, ...},
E13={2, 5, 8, ...},

An equivalence relation E2 is a refinement of an equivalence

relation E1 if every equivalence class of E2 is contained in an
equivalence class of E1. For example, let E1 denote the mod 3
equivalence relation defined in Example 1.3. Let E2 be an
equivalence relation on the set of non-negative integers such
that aE2b if a and b leave the same remainder when divided by 6.
In this case there are 6 equivalence classes.

E21={0, 6, 12, ...}

E22={1, 7, 13, ...}
E23={2, 8, 14, ...}
E24={3, 9, 15, ...}
E25={4, 10, 16, ...}
E26={5, 11, 17, ...}

It can be seen that every E2j is completely included in an E1k, 1≤ j ≤

6, 1 ≤ k ≤ 3. Hence, E2 is a refinement of E1.

Ordered Pairs
Ordered pairs of natural numbers can be represented by a single
natural number. That is we are not only interested in encoding
ordered pairs into natural numbers, but also in decoding the
natural numbers into ordered pairs. That is, we are interested to
get a 1–1 mapping from N2 to N. One of the simplest form of
bijection from N2 to N is as below:


Cantor-numbering scheme: Let π2 : N2 → N be such that



π2(x, y) is called the Cantor number of the ordered pair (x, y). For
example, the Cantor number of (2, 4) is 23. The following table
lists some Cantor numbers for some pairs.

x|y 0 1 2 3 4

0 0 1 3 6 10

1 2 4 7 11 16

2 5 8 12 17 23

3 9 13 18 – –

4 14 19 – – –

5 20 26 – – –

This bijection is required in some computer science applications.

This method can be used to enumerate ordered triples,
as π3(x1, x2, x3) can be looked as π2(π2(x1, x2), x3) and also to
enumerate higher-order tuples.

Closures
Closure is an important relationship among sets and is a general
tool for dealing with sets and relations of many kinds.

Let R be a binary relation on a set A. Then the reflexive

(symmetric, transitive) closure of R is a relation R′ such that:

1.
R′ is reflexive (symmetric, transitive)
2.
3.
R′ ⊇ R
4.
5.
If R″ is a reflexive (symmetric, transitive) relation containing R,
then R″ ⊇ R′.
6.
The reflexive, transitive closure of R is denoted by R*. Reflexive,
transitive closure of a binary relation R is only one of the several
possible closures. R+ is a transitive closure of R which need not be
reflexive, unless R itself happens to be reflexive.

Finite and Infinite Sets

A finite set contains finite number of elements. The size of the set
is its basic property. The set which is not finite is said to be infinite.
Two sets A and B have equal number of elements, or is said to be
equinumerous if there is a one-to-one, onto function f: A → B. In
general, a set is finite if it is equinumerous with the set {1, 2,
3, ..., n} for some natural number n. A set is said to be countably
infinite if it is equinumerous with N, the set of natural numbers. A
set which is not countable is said to be uncountable.

Methods of Proof
Here we introduce three basic methods of proofs:


(i) mathematical induction, (ii) pigeonhole principle, and (iii)
diagonalization.


Mathematical Induction
Let A be a set of natural numbers such that:

1.
0∊A
2.
3.
for each natural number n, if {0, 1, 2, ..., n} ∊ A, then n + 1 ∊ A.
Then A = N. In particular, induction is used to prove assertions of
the form “For all n ∊ N, the property P is valid.” i.e.,
4.
1.
In the basis step, one has to show that P(0) is true i.e., the property
is true for 0.
2.
3.
P holds for n will be the assumption.
4.
5.
Then one has to prove the validity of P for n + 1.
6.
Example 1.4.

Property for all n ≥ 0. To see the validity

of P for all n ≥ 0, we employ mathematical induction.

1.
P is true for n = 0 as left-hand side and right-hand side of P will be
0.
2.
3.

Assume P to be true for some n ≥ 0, with .

4.
5.
Consider
6.
7.
which is P(n + 1).

Sometimes a property P(x) may hold for all n ≥ t. In this case for

basis clause one must prove P(t).

Strong Mathematical Induction

Another form of proof by induction over natural numbers is called
strong induction. Suppose we want to prove that P(n) is true for
all n ≥ t. Then in the induction step, we assume that P(j) is true for
all j, t ≤ j < k. Then using this, we prove P(k). In ordinary induction
(called weak induction) in the induction step, we assume P(k−1) to
prove P(k). There are some instances, where the result can be
proved easily using strong induction. In some cases, it will not be
possible to use weak induction and one has to use strong
induction.

Let us give some examples.

Example 1.5.

P(n): sum of the interior angles of an n-sided convex polygon is

(2n − 4)π/2.

Basis: P(3): Interior angles of a triangle sum upto 180° = (2 * 3 −

4)π/2.

Induction Step: Let the result be true for all n upto k, 3 ≤ n ≤ k.

To prove P(k + 1): Sum of the interior angles of a (k + 1)-sided

polygon is to be computed. Let the polygon be,
To compute the sum, join vertex numbered ‘1’ with vertex
numbered j, (j ≠ 2 or k+1). Now, the interior angle sum will be the
sum of interior angles of two convex
polygons ‘1’ and ‘2’. Polygon ‘1’ has j sides and polygon ‘2’ has k + 3
− j sides. The sum of interior angles of polygon ‘1’ is (2j − 4)π/2
and the sum of interior angles of polygon ‘2’ is (2(k + 3− j)− 4)π/2.

Hence, sum of the interior angles of the (k + 1)-sided polygon is

(2(k + 1)−4)π/2.

Pigeonhole Principle
If A and B are two non-empty finite sets with #A> #B, then there
exists no 1–1 function from A to B. i.e., if we attempt to pair the
elements of A with the elements of B, sooner or later, we have to
put more than one element of A in the already paired group.

Example 1.6.

Among any group of 367 people, there must be at least two with
the same birthday, because there are only 366 possible birthdays.

Diagonalization Principle
Let R be a binary relation on a set A and let D = {a|a ∊ A, and (a, a)
∉ R}. For each a ∊ A, let Ra = {b|b ∊ A, and (a, b) ∊ R}. Then D is
distinct from Ra for all a ∊ A.

Example 1.7.

Let A = {a,b,c,d} and R = {(a,b), (a,d), (b,b), (b,c), (c,c),

(d,b)}. R can be represented as a square array as below:

a b c
a X

b X X

c X

d X

Ra = {b, d}, Rb = {b, c}, Rc = {c}, Rd = {b}, D = {a, d}

Clearly, D ≠ Ra; Rb; Rc; Rd.

Remark. The diagonalization principle holds for infinite sets as

well.

Graphs
An undirected graph or simply a graph is a set of points with lines
connecting some points. The points are called nodes or vertices.
The lines are called edges.

Example 1.8.

The number of edges at a particular vertex is the degree of the

vertex. In Figure 1.1, degree of 1 is 3. No more than one edge is
allowed between any two vertices. We label the vertices for
convenience, and call the graph as labeled graph.

Figure 1.1. Examples of undirected graphs

An induced subgraph H of a graph G is a graph with nodes

of H being a subset of the nodes of G, and edges of H being the
edges of G on the corresponding nodes. A path in a graph is a
sequence of nodes connected by edges. A simple path is a path that
does not repeat any node. A graph is connected if any two nodes
have a path between them. A path is a cycle if it starts and ends in
the same node. A simple cycle is one that does not repeat any node
except the first and the last. A tree graph is a connected graph that
has no simple cycle. The nodes of degree 1 in a tree are called the
leaves of the tree.

A directed graph has directed lines between the nodes. The

number of arrows pointing from a particular node is the outdegree
of that node and the number of arrows to a particular node is the
indegree.

Example 1.9.

In the following directed graph (Figure 1.2) the indegree of node

labeled 2 is 3 and its outdegree is 1.

Figure 1.2. An example of a directed graph

Definition 1.2

An undirected graph is connected if every pair of vertices is

connected by a path. A path in a graph is a contiguous sequence
of its vertices.

2.
3.
In any graph G, a path forms a cycle if its starting vertex and end
vertex are same.

4.
5.

A connected, acyclic, undirected graph is a tree.

6.
Example 1.10.

Consider the following graphs:


Graphs (i) and (ii) are connected.


1 2 3 4 6 is a path in Graph (i)


Graph (i) contains a cycle 1 2 3 4 5 1.


Graph (ii) is a tree.

The following are the observations about a tree.
Let G = (V, E) be an undirected graph. Then the following
statements are equivalent.

1.
G is a tree.
2.
3.
Any two vertices in G are connected by a unique simple path.
4.
5.
G is connected, but if any edge is removed from E, the resulting
graph is disconnected.
6.
7.
G is connected and |E| = |V| − 1.
8.
9.
G is acyclic and |E| = |V| − 1.
10.
11.
G is acyclic, but if any edge is added to E, the resulting graph
contains a cycle.
12.
All the above properties can be proved; and this is left as an
exercise.

Definition 1.3

A rooted tree is a tree in which one of the vertices is distinguished

from others. Such a vertex is called the root of the tree.

It can also be looked at as a directed graph, where only one node

has indegree 0 (root). All other nodes have indegree 1. If the
outdegree is 0, the node is a leaf.
Languages: Basic Concepts
Basic data structure or input to grammars or automaton are
strings. Strings are defined over an alphabet which is finite.
Alphabet may vary depending upon the application. Elements of an
alphabet are called symbols. Usually we denote the basic alphabet
set either as Σ or T. For example, the following are a few examples
of an alphabet set.

Σ1 = {a,b}
Σ2 = {0,1,2}
Σ3 = {0,1,2,3,4,5,6,7,8,9}.

A string or word is a finite sequence of symbols from that alphabet,

usually written as concatenated symbols and not separated by gaps
or commas. For example, if Σ = {a, b}, a string abbab is a string or
word over Σ. If w is a string over an alphabet Σ, then the length
of w written as len(w) or |w| is the number of symbols it contains.
If |w| = 0, then w is called as empty string denoted either as λ or ε.

For any word w, w ε = εw = w. For any string w = a1 ... an of

length n, the reverse of w is written as wR which is the
string anan−1 ... a1, where each symbol ai belongs to the basic
alphabet Σ. A string z that is appearing consecutively within
another string w is called a substring or subword of w. For
example, aab is a substring of baabb.

The set of all strings over an alphabet Σ is denoted by Σ* which

includes the empty string ε. For example, for Σ = {0, 1}, Σ* = {ε, 0,
1, 00, 01, 10, 11, ...}. Note that Σ* is a countably infinite set. Also,
Σn denotes the set of all strings over Σ whose length is n. Hence, Σ*
= Σ0 ∪ Σ1 ∪ Σ2 ∪ Σ3 ... and Σ+ = Σ1 ∪ Σ2 ∪ Σ3 .... Subsets of Σ* are
called languages.

For example, if Σ = {a, b}, the following are languages over Σ

L1 = {ε, a, b}
L2 = {ab, aabb, aaabbb, ...}
L3 = {w ε Σ*| number of a’s and number of b’s in w are equal.}
In the above example, L1 is finite, and L2 and L3 are infinite
languages. φ denotes an empty language.

We have the following inductive definition of Σ+ and Σ*, where Σ is

any basic alphabet set.

Definition 1.4

Let Σ be any alphabet set. Σ+ is a set of non-empty strings

over Σ defined as follows:

Basis: If a ∊ Σ, then a ∊ Σ+.

2.
3.

Induction: If α ∊ Σ+ and a ∊ Σ, αa, aα are in Σ+.

4.
5.

No other element belongs to Σ+.

Clearly, the set Σ+ contains all strings of length n, n ≥ 1.

Example 1.11.

Let Σ = {0,1,2}. Then

Σ+ = {0, 1, 2, 00, 01, 02, 10, 11, 12, 20, 21, 22, ... }
Suppose we wish to include ‘ε’ in Σ+, we modify the above
definition as given below.

Definition 1.5

Let Σ be any alphabet set. Σ* is defined as follows:

Basis: ε ∊ Σ*.

2.
3.

Induction: If α ∊ Σ, α ∊ Σ, then aα, α a ∊ Σ.

4.
5.

No other element is in Σ*.

Since languages are sets, one can define the set-theoretic

operations of union, intersection, difference, and complement in the
usual fashion.

The following operations are also defined for languages.

If x = a1 ... an, y = b1 ... bm, the concatenation of x and y is defined
as xy = a1 ... anb1 ... bm. The catenation (or concatenation) of two
languages L1 and L2 is defined by, L1L2 = {w1w2|w1 ∊
L1 and w2 ∊ L2}.
Note that concatenation of languages is associative because
concatenation of strings is associative. Also L0 = {ε} and Lφ = φL =
φ, Lε = εL = L.

The concatenation closure (Kleene closure) of a language L, in

symbols L* is defined to be the union of all powers of L:

The right quotient and right derivative are the following sets,
respectively.

Similarly, left quotient of a language L1 by a language L2 is defined

L2/L1 = {z|yz ∊ L1 for some y ∊ L2}.

The left derivative of a language L with respect to a word y is

denoted as L which is equal to {z|yz ∊ L}.

The mirror image (or reversal) of a language is the collection of the

mirror images of its words and mir (L) = {mir (w)|w ∊ L} or LR = {wR|
w ∊ L}.
The operations, substitutions and homomorphisms are defined as
follows.

For each symbol a of an alphabet Σ, let σ(a) be a language over

Σa. Also, σ(ε) = ε, σ(αβ)= σ(α).σ(β) for α, β ∊ Σ+. σ is a mapping
from Σ* to 2v* or use (v*) where V is the union of the alphabets
Σa, is called a substitution. For a language L over Σ, we define:

σ(L) = {α|α ∊ σ(β) for some β ∊ L}.

A substitution is ε-free if and only if none of the language σ(a)

contains ε. A family of languages is closed under substitution if and
only if whenever L is in the family and σ is a substitution such
that σ(a) is in the family, then σ(L) is also in the family.

A substitution σ such that σ(a) consists of a single word wa is called

a homomorphism. It is called ε-free homomorphism if none of σ(a)
is ε.

Algebraically, one can see that Σ* is a free semigroup with ε as its

identity. The homomorphism which is defined above agrees with
the customary definition of homomorphism of one semigroup into
another.

Inverse homomorphism can be defined as follows:

h−1(w) = {x|h(x) = w}

h−1(L) = {x|h(x) is in L}

It should be noted that h(h-1(L)) need not be equal to L.

Generally h(h−1(L))⊆ L and h−1(h(L)) ⊇ L.
Asymptotic Behavior of Functions
The time taken to execute any algorithm depends upon the
machine on which it is implemented and also on the algorithm.
Hence, efficiency of any algorithm is measured by the amount of
time it takes and also the space it needs for execution on the
machine. Comparison of algorithms has become an important
topic. We give here a mathematical basis for comparing
algorithms.

A complexity function f is a function of n, the size of the problem or

parameter on which the problem is dependent. That is, f(n) is
either a measure of the time required to execute an algorithm on a
problem of size n or the measure of memory space. If f(n) is
describing the measure of time, then f(n) is called the time-
complexity function; if f(n) is describing the measure of space, then
it is called space-complexity function.

We have the following important definitions of complexity

functions.

Definition 1.6

Let f and g be two functions from N to R. Then g asymptotically

dominates f or is an asymptotic upper bound for f or f is
asymptotically dominated by g if there exist k ≥ 0 and c ≥ 0 such
that f(n) ≤ cg(n) for all n ≥ k.

2.
3.

The set of all functions which are asymptotically dominated by a

given function g is denoted by O(g) and read as ‘big-oh’ of g. That
is f∊ O(g), then f is said to be in O(g).

4.
Example 1.12.
1.
Let f(n) = n, g(n) = n2. Then clearly f ∊ O(g) as n ≤ 1.n2,
whereas g ∉ O(f).
2.
3.
O(1) ⊂ O(logn) ⊂ O(n) ⊂ O(nlogn) ⊂ O(n2) ⊂ O(cn) ⊂ O(n!).
4.
Definition 1.7

Let f and g be two functions from N to R. Then g is asymptotically

tight bound for f(n) if there exist positive constants c 1, c2 and k
such that 0 ≤ c1g(n) ≤ f(n) ≤ c2g(n) for all n ≥ k.

2.
3.

The set of all functions for which g is asymptotically tight bound is

denoted by θ(g).

4.
Example 1.13.

If f(n) = an3 + bn2 + cn + d, where a, b, c, d are constants and a >0.

Then f(n) ∊ θ(n3).

Definition 1.8

Let f and g be any two functions from N to R. Then g is said to be

asymptotic lower bound for f if there exist positive constants c and
k such that 0 ≤ cg(n) ≤ f(n) for all n ≥ k.

2.
3.

The set of all functions for which g is asymptotic lower bound is

denoted by Ω(g).

4.
Example 1.14.

f(n) = an2 + bn + c, a, b, c, are constants, a > 0, belongs to Ω(n2).

Problems and Solutions

Problem It is known that at the university 60 percent of the professors play tennis, 50 p
1 bridge, 70 percent jog, 20 percent play tennis and bridge, 30 percent play tenn
and 40 percent play bridge and jog. If someone claimed that 20 percent of the
play bridge and tennis, would you believe this claim? why?

Solution. Let T denote the percentage of professors who play tennis.

Let B denote the percentage of professors who play bridge.
Let J denote the percentage of professors who jog.
Given that |T| = 60, |B| = 50, |J| = 70,
|T ∩ B| = 20, |T ∩ J| = 30, |B ∩ J| = 40.
|T ∩ B ∩ J| = |T ∪ B ∪ J| − |T| − |B| − |J| + |T ∩ B| + |T ∩ J| + |B ∩ J|.
|T ∩ B ∩ J| = 100 − 60 − 50 − 70 + 20 + 30 + 40 = 10. Given claim is wrong.

Problem Use mathematical induction to prove for all positive integers n, n(n2 + 5) is an
2

Solution. Base: n = 1, 1(1 + 5) = 6, is an integer multiple of 6.

Hypothesis: Let us assume that it is true for n.
Induction Step: We need to prove that it is true for n + 1.
Consider .
know that n(n2 + 5) is divisible by 6. Clearly the last expression is divisible by
all n, n(n2 + 5) is an integer multiple of 6.

Problem Suppose that we have a system of currency that has $3 and $5 bills. Show that
3 be paid with only $3 and $5 bills for each integer n ≥ 8. Do the same problem
and n ≥ 9.
Solution. Base: n = 8, clearly it can be paid with $3 and $5 bills.
Hypothesis: Assume that debt of $n can be with $3 and $5 bills.
Induction Step:
Consider a debt of $n + 1. Let n = 3k1 + 5k2
1.
If k1 ≥ 3, then n + 1 = (k1 − 3) 3 + (k2 + 2) 5.
2.
3.
If k1 = 2, then n + 1 = 4 × 3 + (k2−1) 5.
4.
5.
If k1 = 1, then n + 1 = 3 × 3 + (k2 − 1) 5.
6.
7.
If k1 = 0, then n + 1 = 2 × 3 + (k2 − 1) 5.
8.
(Note that k2 ≥ 1 in cases 2, 3 and 4 as we need to prove only for n ≥ 8.)
Hence n + 1 = k3. 3 + k4. 5 where k3 and k4 are integers. Hence the result.

Problem Let A be a set with n distinct elements. How many different binary relations on
4 1.
How many of them are reflexive?
2.
3.
How many of them are symmetric?
4.
5.
How many of them are reflexive and symmetric?
6.
7.
How many of them are total ordering relation?
8.

Solution. There are n2 elements in the cross product A × A. Since relation is a subset of c

number of different binary relations on A are 2n2.
1.
There are 2n2−n reflexive relations.
2.
3.
There are 2n(n+1)/2 symmetric relations.
4.
5.

There are relations which are both reflexive and symmetric.

6.
7.
There are n! total ordering relations.
8.

Problem Let R be a symmetric and transitive relation on set A. Show that if for every ‘a
5 exists ‘b’ in A, such that (a, b) is in R, then R is an equivalence relation.

Solution. Given that R is a symmetric and transitive relation on A. To prove that R is an
we need to prove that R is reflexive. By hypothesis we know that ∀a∃b(a, b ∊
Since R is symmetric, it follows that if (a, b) ∊ R then (b, a) ∊ R. Also given th
follows that (a, a) ∊ R. This implies that ∀a∊A, (a, a) ∊ R. This proves that R is
Therefore, R is an equivalence relation.

Exercises
1. Out of a total of 140 students, 60 are wearing hats to class, 51 are wearing scarves, a
both hats and scarves. Of the 54 students who are wearing sweaters, 26 are wearing h
scarves, and 12 are wearing both hats and scarves. Every one wearing neither a hat n
gloves.
1.
How many students are wearing gloves?
2.
3.
How many students not wearing a sweater are wearing hats but not scarves?
4.
5.
How many students not wearing a sweater are wearing neither a hat nor a scarf?
6.

2. Among 100 students, 32 study mathematics, 20 study physics, 45 study biology, 15 s

and biology, 7 study mathematics and physics, 10 study physics and biology, and 32
the three subjects.
1.
Find the number of students studying all three subjects.
2.
3.
Find the number of students studying exactly one of the three subjects.
4.

3. At a family group meeting of 30 women, 17 are descended from George, 16 are desc
5 are not descended from George or John. How many of the 30 women are descende
and John?

4. 80 children went to a park where they can ride on three games namely A, B and C. It
them have taken all three rides, and 55 of them have taken at least two of the three ri
$0.50, and the total receipt of the park was $70. Determine the number of children w
the rides.

5. Use Induction to prove the following: If an = 5an−1 − 6an−2 for n ≥ 2 and a0 = 12 and
5(3n) + 7(2n).

6. Use induction to prove for each integer n ≥ 5, 2n > n2.

7. 1.
Show that
2.
3.
Show that
4.
5.
Show that
6.

8. Show that for any integer n, (11)n+2 + (12)2n+1 is divisible by 133.

9. For each of the following check whether ‘R’ is reflexive, symmetric, antisymmetric,
equivalence relation.
1.
R = {(a, b)|a − b is an odd positive integer}.
2.
3.
R = {(a, b)|a = b2 where a, b ∊ I+}.
4.
5.
Let P be the set of all people. Let R be a binary relation on P such that (a, b) is in R i
6.
7.
Let R be a binary relation on the set of all strings of 0’s and 1’s, such that R = {(a, b)
that have same number of 0’s}.
8.

10. Let A be a set with n elements.

1.
Prove that there are 2n unary relations on A.
2.
3.
Prove that there are 2n2 binary relations on A.
4.
5.
How many ternary relations are there on A?
6.

11. Let R1 and R2 be arbitrary relations on A. Prove or Disprove the following assertions
1.
if R1 and R2 are reflexive, then R1R2 is reflexive.
2.
3.
if R1 and R2 are irreflexive, then R1R2 is irreflexive.
4.
5.
if R1 and R2 are symmetric, then R1R2 is symmetric.
6.
7.
if R1 and R2 are antisymmetric, then R1R2 is antisymmetric.
8.
9.
if R1 and R2 are transitive, then R1R2 is transitive.
10.

12. Find a set A with n-elements and a relation R on A such that R1, R2, ..., Rn are all distin

the bound

13. Let R1 and R2 be equivalence relations on a set A. Then R1 ∩ R2 is an equivalence rel

equivalence relation?

14. Prove that the universal relation on any set A is an equivalence relation. What is the r

15. Suppose A = {a, b, c, d} and π1 is the following partition of A:

π1 = {{a, b, c}, {d}}
1.
List the ordered pairs of the equivalence relation induced by π1.
2.
3.
Do the same for the partitions
4.
π2 = {{a}, {b}, {c}, {d}}
π3 = {{a, b, c, d}}
5.

16. Name five situations (Games, activities, real-life problems etc.,) that can be represen
graphs. Explain what the vertices and the edges denote.

17. Prove that in a group of n-people there are two who have the same number of acquai

18. Let A = {ε, a}, B = {ab}. List the elements of the following sets.
1.
A2
2.
3.
B3
4.
5.
AB
6.
7.
A+
8.
9.
B*
10.

19. Under what conditions is the length function which maps Σ* to N a bijection?

20. Let A and B finite sets. Suppose |A| = m, |B| = n. State the relationship which must ho
‘n’ for each of the following to be true.
1.
There exists an injection from A to B.
2.
3.
There exists an surjection from A to B.
4.
5.
There exists an bijection from A to B.
6.

 Copy
 Add Highlight
 Add Note

Chapter 2. Grammars
The idea of a grammar for a language has been known in India
since the time of Panini (about 5th century B.C). Panini gave a
grammar for the Sanskrit language. His work on Sanskrit had
about 4000 rules (Sutras). From that, it is clear that the concept of
recursion was known to the Indians and was used by them in very
early times.

In 1959, Noam Chomsky tried to give a mathematical definition for

grammar. The motivation was to give a formal definition for
grammar for English sentences. He defined four types of grammar
viz., type 0, type 1, type 2, and type 3. At the same time, the
programming language ALGOL was being considered. It was
considered as a block-structured language and a grammar was
required which could describe all syntactically correct programs.
The definition given was called Backus Normal Form or Backus-
Naur Form (BNF). This definition was found to be the same as the
definition given by Chomsky for type 2 grammar.

Consider the following two sentences in English and their parse

trees (Figures 2.1 and 2.2).

Figure 2.1. Parse tree for an English sentence

Figure 2.2. Parse tree for another English sentence

We can see that the internal nodes of the parse trees are syntactic
categories like article, noun, noun phrase, verb phrase etc. (Figures
2.1 and 2.2). The leaves of the tree are words from these two
sentences.

‘The man ate the fruit’

‘Venice is a beautiful city.’

The rules of the grammar for a sentence can be written as follows:

S – sentence

NP – noun phrase

VP – verb phrase

Art – article

<Art> → the

<Noun1> → man

<Verb> → ate

<Noun2> → fruit

The sentence can be derived as follows:

⇒ <Art><Noun1><VP>

⇒ The <Noun1><VP>
⇒ The man <VP>

⇒ The man <Verb><NP2>

⇒ The man ate <NP2>

⇒ The man ate <Art><Noun2>

⇒ The man ate the <Noun2>

⇒ The man ate the fruit

The rules for grammar for sentence 2 are as follows:

<PropN> → Venice

<Verb> → is

<Art> → a

<adj> → beautiful

<N2> → city

We find that the grammar has syntactic categories such as <verb

phrase> which are rewritten further; it has words occurring at the
leaves which cannot be rewritten further. We shall call them as
non-terminals and terminals, respectively. The derivation always
starts with the start or sentence symbol, and there are rules by
which the non-terminals are rewritten.
Definitions and Classification of
Grammars
We now formally define the four types of grammars:

Definition 2.1

A phrase-structure grammar or a type 0 grammar is a 4-tuple G =

(N, T, P, S), where N is a finite set of non-terminal symbols called
the non-terminal alphabet, T is a finite set of terminal symbols
called the terminal alphabet, S ∊ N is the start symbol and P is a
set of productions (also called production rules or simply rules) of
the form u → υ, where u ∊ (N ∪ T)*N(N ∪ T)* and υ ∊ (N ∪ T)*.

The left-hand side of a rule is a string of the total alphabet N ∪ T,

which contains at least one non-terminal and the right-hand side is
a string of the total alphabet.

Derivations are defined as follows:

If αuβ is a string in (N ∪ T)* and u → υ is a rule in P, from αuβ we

get αυβ by replacing u by υ. This is denoted as αuβ ⇒ αυβ, where ⇒
is read as ‘directly derives.’

If α1 ⇒ α2, α2 ⇒ α3,..., αn−1 ⇒ αn, the derivation is denoted

as α1 ⇒ α2 ⇒ ... ⇒ αn or , where is the reflexive, transitive
closure of ⇒.

Definition 2.2

The language generated by a grammar G = (N, T, P, S) is the set

of terminal strings derivable in the grammar from the start symbol.

Example 2.1.
Consider the grammar G = (N, T, P, S) where N = {S, A}, T = {a, b,
c}, production rules in P are:

1.
S → aSc
2.
3.
S → aAc
4.
5.
A → b
6.
A typical derivation in the grammar is:

S ⇒ aSc

⇒ aaScc

⇒ aaaAccc

⇒ aaabccc

The language generated is:

L(G) = {anbcn|n ≥ 1}

Rule 1 generates an equal number of a’s and c’s by repeated

application. When we apply rule 2, one a and one c are generated.
The derivation terminates by applying rule 3.

By putting restrictions on the form of production rules, we get type

1 grammar, type 2 grammar, and type 3 grammar.

Definition 2.3

If the rules are of the form αAβ → αγβ, α, β ∊ (N ∪ T)*, A ∊ N, γ ∊

(N ∪ T)+, the grammar is called context-sensitive grammar (CSG).
Definition 2.4

If in the rule u → υ, |u| ≤ |υ|, the grammar is called length-

increasing grammar.

It can be shown that Definitions 2.3 and 2.4 are equivalent in the

sense that the language class generated is the same in both the
cases. These types of grammars are called type 1 grammars and
the languages generated are called type 1 languages or context-
sensitive languages (CSL).

NOTE

It should be noted that, by definition ε cannot be in any CSL

language. To make an exception to include ε in a CSL, we
can allow for a rule S → ε (S–start symbol) and make sure
that S does not appear on the right-hand side of any
production. we give below an example of type 1 grammar and
language.

Example 2.2.

Let G = (N, T, P, S) where N = {S, B}, T = {a, b, c}, P has the
following rules:

1. S → aSBc

2. S → abc

3. cB → Bc

4. bB → bb

The above rules satisfy the condition that the length of the right-
hand side is greater or equal to the length of the left-hand side.
Hence, the grammar is length-increasing or type 1.
Let us consider the language generated. The number appearing
above ⇒ denotes the rule being used.

Similarly,

In general, any string of the form anbncn will be generated.

S an−1 S(Bc)n−1 (by applying rule 1 (n − 1) times)

⇒ anbc(Bc)n−1 (rule 2 once)

anbBn−1cn (by applying rule times)

anbncn (by applying rule 4, (n − 1) times)

Hence, L(G) = {anbncn|n ≥ 1}. This is a type 1 language.

Definition 2.5

If in a grammar, the production rules are of the form, A → α, where

A ∊ N and α ∊ (N ∪ T)*, the grammar is called a type 2 grammar
or context-free grammar (CFG). The language generated is called
a type 2 language or context-free language (CFL). Example
2.1 gives a CFG and language.

Definition 2.6

If the rules are of the form A → αB, A → β, A, B ∊ N, α, β ∊ T*, the
grammar is called a right-linear grammar or type 3 grammar and
the language generated is called a type 3 language or regular set.
We can even put the restriction that the rules can be of the form
A → aB, A → b, where A, B ∊ N, a ε T, b ε T ∪ ε. This is possible
because a rule A → a1...akB can be split into A → a1B1,
B1 → a2B2,..., Bk−1 → akB by introducing new non-terminals B1,...,
Bk−1.

Example 2.3.

Let G = (N, T, P, S), where N = {S}, T = {a, b} P consists of the

following rules.

1.
S → aS
2.
3.
S → bS
4.
5.
S → ε
6.
This grammar generates all strings in T*.

For example, the string abbaab is generated as follows:

S ⇒ aS (rule 1)

⇒ abS (rule 2)
⇒ abbS (rule 2)

⇒ abbaS (rule 1)

⇒ abbaaS (rule 1)

⇒ abbaabS (rule 2)

⇒ abbaab (rule 3)

Derivation Trees
We have considered the definition of a grammar and derivation.
Each derivation can be represented by a tree called a derivation
tree (sometimes called a parse tree).

A derivation tree for the derivation considered in Example 2.1 will

be of the form as given in Figure 2.3.

Figure 2.3. Derivation tree for a3bc3

A derivation tree for the derivation of aaaaaa considered

in Example 2.3 will have the form as given in Figure 2.4.
Figure 2.4. Derivation tree for a6

Derivation trees are considered for type 2 and type 3 grammars

only. In the first section also, we have considered some English
sentences and their parse trees.

Example 2.4.

Consider the following CFG, G = (N, T, P, S), N = {S, A, B}, T = {a,
b}. P consists of the following productions:

1.
S → aB
2.
3.
B → b
4.
5.
B → bS
6.
7.
B → aBB
8.
9.
S → bA
10.
11.
A → a
12.
13.
A → aS
14.
15.
A → bAA
16.
The derivation tree for aaabbb is as follows (Figure 2.5):

Figure 2.5. Derivation tree for a3b3

The derivation tree has the structure of a rooted directed tree. Each
node has a label. The labels of internal nodes are non-terminals
and the labels of leaves are terminal symbols. The labels of the
leaves of the tree read from the left to right gives the string
generated, which is called the result or yield of the tree.

If A → B1... Bm is a rule, when this is applied, the node with

label A has sons with labels B1,..., Bm in that order. A is the father
of B1,..., Bm and B1,..., Bm are the sons of A. The words ancestor and
descendant are used in a similar manner as for directed rooted
trees. The root has the label which is the start symbol. A non-
terminal node with label A with all its descendants is called the
subtree rooted at A. Let us consider an example.

The language generated by the grammar in Example 2.4 consists of

strings having equal number of a’s and b’s.

Proof. It can be proved by induction.

Induction hypothesis:

1.
if and only if w has equal number of a’s and b’s.
2.
3.
if and only if w has one more a than, it has b’s.
4.
5.
if and only if w has one more b than, it has a’s.
6.
Basis

The minimum length of the string derivable from S is 2(ab or ba),

S ⇒ aB ⇒ ab and S ⇒ bA ⇒ ba. It has one a and one b.

The minimum of length of the string derivable from A is one and it

is a (A ⇒ a). It has one a and no b’s. Similarly, the minimum length
of the string derivable from B is one and it is b(B ⇒ b). The next
longer string derivable from A or B is of length 3. So, the result
holds for n = 1 and 2, n denoting the length of the string.

Induction

Assume that the induction hypotheses hold for strings of length up

to k, show that the result holds for string of length k + 1.
Consider and |w| = k + 1. It should be shown that w has equal
number of a’s and b’s. The first step of the derivation of w is:


S ⇒ aB or S ⇒ bA

and the derivation is:


or

where |w1| = |w2| = k and by inductive hypothesis w1 has one
more b than a’s and w2 has one more a than b’s and
so w = aw1 (or bw2) has equal number of a’s and b’s.

Conversely, if w has equal number of a’s and b’s, we should prove

that it is derivable from S. In this case,
either w = aw1 or w = bw2 where w1 has one more b than it has a’s
and w2 has one more a than it has b’s. By inductive
hypothesis, and . Using the first or the fifth rule, we
get a derivation:

(or) .

Similarly, consider and |w| = k + 1. It should be proved

that w has one more a than it has b’s. If , the derivation either
begins with A ⇒ aS or A ⇒ bAA if |w| ≥ 2. In the former case, we
have a derivation , where w1 has equal number of a’s
and b’s by inductive hypothesis. Hence, w = aw1 has one
more a than it has b’s. In the latter case,

w3 and w4 are derivable from A and have length less than k. So,

they have one more a than b’s. Therefore, w = bw3w4 has one
more a than b’s.

Conversely, if w has one more a than it has b’s, then w is derivable

from A. w either begins with a or b. If w begins
with a, then w = aw1, where w1 has equal number of a’s and b’s. By
inductive hypothesis . Using the rule A → aS, we
have . On the other hand, if w begins with
a b, w can be written in the form bw3w4, where w3 and w4 have
one more a than b’s. (This way of writing (decomposition) need not
be unique). Hence, using A → bAA, we
have . Hence, w is derivable from A. A
similar argument can be given for , if and only if w has one
more b than it has a’s.

Consider the derivation of the string aaabbb.

The derivation tree is given in Figure 2.5.

In the above derivation, the leftmost non-terminal in the sentential

form is always replaced. Such a derivation is called a leftmost
derivation. If the rightmost non-terminal in a sentential form is
always replaced, such a derivation is called a rightmost derivation.
There can be derivations which are neither leftmost nor rightmost

is a derivation which is neither leftmost nor rightmost.

represents a rightmost derivation. All these derivations are

represented by the same tree given in Figure 2.5. Hence, we find
that when a string is generated in a grammar, there can be many
derivations (leftmost, rightmost, and arbitrary) which can be
represented by the same derivation tree. Thus, correspondence
between derivation trees and derivations is not one-one (bijection).
But, we can easily see that the correspondence between leftmost
derivations and derivation trees is a bijection.
The sequence of rules applied in the leftmost derivation in the
above example is 144222 which gives a ‘left parse’ for the
string aaabbb. In general, the sequence of rules applied in a
leftmost derivation is called ‘left parse’ for the string generated.
The reversal of the sequence of rules applied in a rightmost
derivation is called a ‘right parse’. In the above example, the
sequence of rules applied in a rightmost derivation is 142422. The
right parse is 224241.

Example 2.5.

Consider the grammar G = ({S}, {a,b}, {S → SaSbS, S → SbSaS,

S → ε}, S). The language generated by this grammar is the same as
the language generated by the grammar in Example 2.3, except
that ε, the empty string is also generated here.

Proof. It is not difficult to see that any string generated by this

grammar will have equal number of a’s and b’s. If we apply rule 1
or 2, one a and one b will be generated. When we use rule 3, no
symbol is generated. Hence, any string generated in the grammar
will have equal number of a’s and b’s. The proof of the converse is
slightly involved.

Consider a string w having equal number of a’s and b’s. We use

induction.

Basis

|w| = 0, S ⇒ ε

|w| = 2, it is either ab or ba, then

Induction

Assume that the result holds up to strings of length k − 1. Prove

that the result holds for strings of length k. Draw a graph, where
the x axis represents the length of the prefixes of the given
string. y axis represents the number of a’s - number of b’s. For the
string aabbabba, the graph will appear as given in Figure 2.6.
Figure 2.6. The graph for the string aabbabba referred to in Example 2.5

For a given string w with equal number of a’s and b’s, there are

three possibilities:

1.
The string begins with ‘a’ and ends with ‘b’.
2.
3.
The string begins with ‘b’ and ends with ‘a’.
4.
5.
Other two cases (begins with a and ends with a, begins with b and
ends with b)
6.
In the first case, w = aw1b and w1 has equal numbers of a’s and b’s.
So, we have as by inductive
hypotheses. A similar argument holds for case 2. In case 3, the
graph mentioned above will cross the x axis.

Consider w = w1w2 where w1, w2 have equal number of a’s and b’s.

Let us say w1 begins with a, and w1 corresponds to the portion
where the graph touches the x axis for the first time. In the above
example, w1 = aabb and w2 = abba.
In this case, we can have a derivation as follows:

follows from inductive hypothesis.

The difference between Example 2.5 and Example 2.4 is that in the

former example, ε is not generated, whereas it is generated
in Example 2.5.

We recall the definition of CSG and length-increasing grammar

from Definition 2.3 and Definition 2.4, respectively.

Theorem 2.1

Every CSL is length increasing and conversely.

Proof. That every CSL is length-increasing can be seen from

definitions.

That every length-increasing language is context-sensitive can be

seen from the following construction.

Let L be a length-increasing language generated by the length-

increasing grammar G = (N, T, P, S). Without loss of generality,
one can assume that the productions in P are of the form X → a,
X → X1... Xm, X1... Xm → Y1... Yn, 2 ≤ m ≤ n, X, X1,..., Xm, Y1,... ,
Yn ∊ N, a ∊ T. Productions in P which are already context-sensitive
productions are not modified. Hence, consider a production of the
form:


X1 ...Xm → Y1 ...Yn, 2 ≤ m ≤ n

(which is not context-sensitive). It is replaced by the following set of

context-sensitive productions:

X1 ... Xm → Z1X2 ... Xm

Z1X2 ... Xm → Z1Z2X3 ... Xm

⋮

Z1Z2 ... Zm−1Xm → Z1Z2 ... ZmYm+1 ... Yn

Z1Z2 ... ZmYm+1 ... Yn → Y1Z2 ... ZmYm+1 ... Yn

⋮

Y1Y2 ... Ym−1 ZmYm+1 ... Yn → Y1Y2 ... YmYm+1 ... Yn

where Zk, 1 ≤ k ≤ m are new non-terminals.

Each production that is not context-sensitive is to be replaced by a

set of context-sensitive productions as mentioned above.
Application of this set of rules has the same effect as
applying X1...Xm→ Y1 ... Yn. Hence, a new grammar G′ thus
obtained is context-sensitive that is equivalent to G.

Example 2.6.

Let L = {anbmcndm|n, m ≥ 1}.

The type 1 grammar generating this CSL is given by G = (N, T, P, S)

with N = {S, A, B, X, Y}, T = {a, b, c, d} and P.
S → aAB|aB

A → aAX|aX

B → bBd |bYd

Xb → bX

XY → Yc

Y → c.

Sample Derivations

S ⇒ aB ⇒ abYd ⇒ abcd.

S ⇒ aB ⇒ abBd ⇒ abbYdd ⇒ ab2cd2.

S ⇒ aAB ⇒ aaXB ⇒ aaXbYd

⇒ aabXYd

⇒ aabYcd

⇒ a2bc2d.

S ⇒ aAB ⇒ aaAXB

⇒ aaaXXB

⇒ aaaXXbYd

⇒ aaaXbXYd

⇒ aaabXXYd

⇒ aaabXYcd

⇒ aaabYccd
⇒ aaabcccd.

Example 2.7.

Let L = {ww|w ∊ {0, 1}+}. The above CSL is generated by the type 1
grammar G = (N, T, P, S), where N = {S, X0, X1, L0, L1, R0, R1}, T =
{0, 1} and P consists of the following rules:

1.
S → 0SX0, S → 1SX1.
2.
3.
S → L0R0, S → L1R1.
4.
5.
R0X0 → X0R0.
6.
R0X1 → X1R0.
7.
R1X0 → X0R1.
8.
R1X1 → X1R1.
9.
10.
R0 → 0, R1 → 1.
11.
12.
L0X0 → L0R0.
13.
L0X1 → L0R1.
14.
L1X0 → L1R0.
15.
L1X1 → L1R1.
16.
17.
L0— 0, L1 → 1.
18.
Sample derivations

1. S ⇒ 0SX0 ⇒ 01SX1X0

⇒ 01L0R0X1X0

⇒ 01L0X1R0X0

⇒ 01L0X1X0R0

⇒ 01L0R1X0R0

⇒ 01L0X0R1R0

⇒ 01L0R0R1R0

010010.

2. S ⇒ 0SX0 ⇒ 00SX0X0

⇒ 001SX1X0X0

⇒ 001L0R0X1X0X0

⇒ 001L0X1R0X0X0

⇒ 001L0X1X0R0X0

⇒ 001L0X1X0X0R0

⇒ 001L0R1X0X0R0

⇒ 001L0X0R1X0R0

⇒ 001L0R0R1X0R0

⇒ 001L0R0X0R1R0
⇒ 001L0X0R0R1R0

⇒ 001L0R0R0R1R0

00100010.

Example 2.8.

Consider the length-increasing grammar given in Example 2.2. All

rules except rule 3 are context-sensitive. The rule cB → Bc is not
context-sensitive. The following grammar is a CSG equivalent to
the above grammar.

S → aSBC

S → abc

C → c

CB → DB

DB → DC

DC → BC

bB → bb.

Ambiguity
Ambiguity in CFL is an important concept. It has applications to
compilers. Generally, when a grammar is written for an expression
or a programming language, we expect it to be unambiguous and
during compiling, a unique code is generated. Consider the
following English statement: “They are flying planes.” It can be
parsed in two different ways.

In Figure 2.7, ‘They’ refers to the planes, and in Figure 2.8 ‘They’

refers to the persons on the plane. The ambiguity arises because we
are able to have two different parse trees for the same sentence.
Figure 2.7. Parse tree 1

Figure 2.8. Parse tree 2

Now, we define ambiguity formally.

Definition 2.7

Let G = (N, T, P, S) be a CFG. A word w in L(G) is said to be

ambiguously derivable in G, if it has two or more different
derivation trees in G.

Since the correspondence between derivation trees and leftmost

derivations is a bijection, an equivalent definition in terms of
leftmost derivations can be given.

Definition 2.8
Let G = (N, T, P, S) be a CFG. A word w in L(G) is said to be
ambiguously derivable in G, if it has two or more different leftmost
derivations in G.

Definition 2.9

A CFG is said to be ambiguous if there is a word w in L(G) which is

ambiguously derivable. Otherwise, it is unambiguous.

Example 2.9.

Consider the CFG G with rules: 1.S → aSb and 2. S → ab where S is

the non-terminal and a, b are terminal symbols. L(G) = {anbn|
n ≥ 1}. Each anbn has a unique derivation tree as given in Figure
2.12.

There is a unique leftmost derivation where rule 1 is used (n − 1)

times and rule 2 is used once in the end. Hence, this grammar is
unambiguous.

Example 2.10.

Consider the grammar G with rules: 1.S → SS and

2.S → a where S is the non-terminal and a is the terminal
symbol. L(G) = {an|n ≥ 1}. This grammar is ambiguous as a3 has
two different derivation trees as follows: (Figure 2.9).

Figure 2.9. Two derivation trees for a3

It should be noted that eventhough the grammar in Example
2.10 is ambiguous, language {an|n≥1} can be generated
unambiguously.

Example 2.11.

{an|n ≥ 1} can be generated by: 1.S → aS and 2.S → a. Each string

has a unique derivation tree as shown in Figure 2.10. There is a
unique leftmost derivation for an using rule 1 (n − 1) times and rule
2 once in the end.

Figure 2.10. Unique derivation tree for an

So, now we have come to the concept of ambiguity in CFL. We see

that a CFL L may have several grammars G1 , G2,.... If all of G1,
G2, ... are ambiguous, then L is said to be inherently ambiguous.

Definition 2.10

A CFL L is said to be inherently ambiguous if all the grammars

generating it are ambiguous or in other words, there is no
unambiguous grammar generating it.

Example 2.12.

L = {anbmcp|n, m, p ≥ 1, n = m or m = p}.

This can be looked at as L = L1 ∪ L2

L1 = {anbncp|n,p ≥ 1}
L2 = {anbmcm|n,m ≥ 1}.

L1 and L2 can be generated individually by unambiguous

grammars, but any grammar generating L1 ∪ L2 will be
ambiguous. Since, strings of the form anbncn will have two different
derivation trees, one corresponding to L1 and another
corresponding to L2. Hence, L is inherently ambiguous.

It should be noted that the above argument is informal. A rigorous

proof should be given to prove the result. This will be a slightly
lengthy process.

Examples of some other inherently ambiguous CFL are:

1.
L = {anbncmdm|n, m ≥ 1} ∪ {anbmcmdn|n, m ≥ 1}.
2.
3.
L = {anbmcndp|n, m, p ≥ 1} ∪ {anbmcqdm|n, m, q ≥ 1}.
4.
5.
L = {aibjck|i, j ≥ 1, k ≤ i} ∪ {aibjck|i, j ≥ 1, k ≤ j}.
6.
7.
L = {aibicjdjek|i, j, k ≥ 1} ∪ {aibjcjdkek|i, j, k ≥ 1}.
8.
Now the question arises: how do we find out whether a grammar is
ambiguous or not, or a language is inherently ambiguous or not.
Though for particular cases, it may be easy to find out, the general
problem is undecidable. We shall see the proofs of these theorems
in later chapters.

Theorem 2.2

It is undecidable to determine whether a given CFG G is

ambiguous or not.
This means we cannot have an algorithm for this problem.

Theorem 2.3

It is undecidable in general to determine whether a CFL L is

inherently ambiguous or not.

But for particular subclass it is decidable.

Definition 2.11

A CFL L is bounded, if there exists strings w1,..., wk such that

L ⊆ w1*w2*...wk*.

Theorem 2.4

There exists an algorithm to find out whether a given bounded CFL

is inherently ambiguous or not.

Next we consider the concept of degree of ambiguity.

Consider the grammar G having rules: 1.S → SS and 2.S → a. We

have earlier seen that a3 has two different derivation trees. a4 has 5
and a5 has 14. As the length of the string increases, the number of
derivation trees also increases. In this case, we say that the degree
of ambiguity is not bounded or infinite. On the other hand,
consider the following grammar G = (N, T, P, S) generating {anbmcp|
n,m,p ≥ 1, n = m or m = p}, where N = {S, A, B, C, D}, T = {a, b, c},
P has rules

S → AB

S → CD
A → aAb

A → ab

B → cB

B → c

C → aC

C → a

D → bDc

D → bc.

Any string of the form anbncp has a derivation starting

with S → AB. Any string of the form anbmcm has a derivation starting
with S → CD. A string of the form anbncn will have two different
leftmost derivations, one starting with S → AB and another
with S → CD. So, any string will have one derivation tree or at most
two different derivation trees. No string has more than two
different derivation trees. Here, we say that the degree of
ambiguity is 2.

Definition 2.12

Let G = (N, T, P, S) be a CFG then the degree of ambiguity of G is

the maximum number of derivation trees a string w ∊ L(G) can
have in G.

We can also use the idea of power series and find out the number
of different derivation trees a string can have. Consider the
grammar with rules S → SS, S → a, write an equation S = SS + a.
Initial solution is S = a, S1 = a.

Using this in the equation for S on the right-hand side

S2 = S1S1 + a

= aa + a.

S3 = S2S2 + a

= (aa + a)(aa + a) + a

= a4 + 2a3 + a2 + a.

S4 = S3S3 + a

= (a4 + 2a3 + a2 + a)2 + a

= a8 + 4a7 + 6a6 + 6a5 + 5a4 + 2a3 + a2 + a.

We can proceed like this using Si = Si−1Si−1 + a. In Si, upto strings

of length i, the coefficient of the string will give the number of
different derivation trees it can have in G. For example,
in S4 coefficient of a4 is 5 and a4 has 5 different derivation trees
in G. The coefficient of a3 is 2–the number of different derivation
trees for a3 is 2 in G.

In arithmetic expression, it is better to avoid ambiguity.

E → E + E, E → E * E

E → id

will generate id + id * id which will have two different derivation

trees (Figure 2.11).

Figure 2.11. Two parse trees for id + id * id

We would like to have the first one rather than the second (as * has
higher precedence than +). Evaluation according to the second one
will give a wrong answer. Hence in many cases, it is advantageous
to have unambiguous grammars.

Simplification of CFGs
Context-free grammars can be put in simple forms. The
simplification is done on productions and symbols of the grammar.
Such simplified grammars should be equivalent to the original
grammars started with. Such simplifications on CFGs are
important as CFGs have wide applications in compilers.

A given CFG may have rules and symbols which do not contribute
ultimately to its language. Hence, one can modify the grammar by
removing such rules and symbols. Another issue is the presence
of ε-rules in the productions. One may want to have a ε-free set of
context-free rules in the grammar whenever the underlying CFL
does not contain the empty string. We also give a simplified CFG
that has no unit rules, which are of the
form A → B where A and B are non-terminal symbols.

Removing Useless Productions

We now introduce the definition of useful and useless symbols in
any CFG. We give an algorithm to obtain an equivalent CFG that
has only useful symbols.

Definition 2.13

Let G = (N, T, P, S) be a CFG. A variable X in N is said to be

useful, if and only if there is at least a string α ∊ L(G) such that:
where α1, α2 ∊ (N ∪ T)* i.e., X is useful because it appears in at
least one derivation from S to a word α in L(G). Otherwise X is
useless. Consequently, the production involving X is useful.

One can understand the ‘useful symbol’ concept in two steps.

For a symbol X ∊ N to be useful, it should occur in some derivation

starting from S, i.e., .

2.
3.

Also X has to derive a string α ∊ T* i.e., .

These two conditions are necessary. But they are not sufficient.
These two conditions maybe satisfied still α1 or α2 may contain a
non-terminal from which a terminal string cannot be derived. So,
the usefulness of a symbol has to be tested in two steps as above.

5.
Lemma 2.1

Let G = (N, T, P, S) be a CFG such that L(G) ≠ φ. Then there
exists an equivalent CFG G′= (N′, T′, P′, S) that does not contain
any useless symbol or productions.

Proof. The CFG G′ is obtained by the following elimination

procedures:


First eliminate all those symbols X, such that X does not derive any

string over T*. Let G2 = (N2, T, P2, S) be the grammar thus
modified. As L(G) ≠ φ, S will not be eliminated. The following
algorithm identifies symbols that need not be eliminated.

Algorithm GENERATING
1.
Let GEN = T;
2.
3.
If A → α and every symbol of α belongs to GEN, then
add A to GEN.
4.
Remove from N all those symbols that are not in the set GEN and
all the rules using them.
5.
Let the resultant grammar be G2 = (N2, T, P2, S).
6.

Now eliminate all symbols in the grammar G2 that are not
occurring in any derivation from ‘S.’ i.e., .


Algorithm REACHABLE
Let REACH = {S}.

If A ∊ REACH and A → α ∊ P then add every symbol

of α to REACH.
The above algorithm terminates as any grammar has only finite set
of rules. It collects all those symbols which are reachable
from S through derivations from S.

Now in G2 remove all those symbols that are not in REACH and

also productions involving them. Hence, one gets the modified
grammar G′ = (N′, T′, P′, S), a new CFG without useless symbols
and productions using them.

The equivalence of G with G′ can easily be seen since only symbols

and productions leading to derivation of terminal strings
from S are present in G′. Hence, L(G) = L(G′).

Theorem 2.5

Given a CFG G = (N, T, P, S). Procedure I of Lemma 2.1 is

executed to get G2 = (N2, T, P2, S) and procedure II of Lemma
2.1 is executed to get G′ = (N′, T′, P′, S). Then G′ contains no
useless symbols.

Proof. Suppose G′ contains a symbol X (say) which is useless. It is

easily seen that N′ ⊆ N2, T′ ⊆ T′, P′ ⊆ P2. Since X is obtained after
execution of II, , α1, α2 ∊ (N′ ∪ T′)*. Every symbol of N′
is also in N2. Since G2 is obtained from by execution of I, it is
possible to get a terminal string from every symbol of N2 and hence
from N′: and , w1, w2 ∊ T*.
Thus, .

Clearly, X is not useless as supposed. Hence G′ contains only

useful symbols.

Example 2.1.

Let G = (N, T, P, S) be a CFG with

N = {S, A, B, C}
T = {a, b} and

P = {S → Sa|A|C, A → a, B → bb, C → aC}

First ‘GEN’ set will be {S, A, B, a, b}. Then G2 = (N2, T, P2, S) where

N2 = {S, A, B}

P = {S − Sa | A, A → a, B → bb}

In G2, REACH set will be {S, A, a}. Hence G′ = (N′, T′, P′, S) where

N′ = {S, A}

T′ = {a}

P′ = {S → Sa | A, A → a}

L(G) = L(G′) = {an|n ≥1}.

ε-rule Elimination Method

Next simplification of CFG is to remove ε-rules when the language
is ε-free.

Definition 2.14

Any production of the form A → ε is called a ε-rule. If A ε, then

we call A a nullable symbol.

Theorem 2.6

Let G = (N, T, P, S) be a CFG such that ε ∉ L(G). Then there

exists a CFG without ε-rules generating L(G).
Proof. Before modifying the grammar one has to identify the set of
nullable symbols of G. This is done by the following procedure.

Algorithm NULL
1.
Let NULL : = φ,
2.
3.
If A → ε ∊ P, then A ∊ NULL and
4.
5.
If A → B1 ...Bt ∊ P, and each Bi is in NULL, then A ∊ NULL.
6.
Run algorithm NULL for G and get the NULL set.

The modification of G to get G′ = (N, T, P′, S) with respect

to NULL is given below.

If A → A1 ...At ∊ P, t ≥ 1 and if n (n < t) of these Ais are

in NULL. Then P will contain 2n rules where the variables
in NULL are either present or absent in all possible combinations.
If n = t then remove A → ε from P. The grammar G′ = (N, T, P′, S)
thus obtained is ε-free.

To prove that a word w ∊ L(G), if and only if w ∊ L(G′). As G and G′

do not differ in N and T, one can equivalently show that:

if and only if,

for A ∊ N.

Clearly, w ≠ ε. Let , then A → w is in P and

hence in P′ Then . Assume the result to be true for all
derivations from A of length n − 1. Let w i.e.,
.

Let w = w1... wk and let X1, X2,..., Xm be those in order such

that , wj ≠ ε. Clearly, k ≥ 1 as w ≠ ε. Hence, A → X1... Xm is a
rule in G′.

We can see that w as some Yj derive only ε. Since

each , wj takes fewer than n steps, by induction, ,
for wj ≠ ε. Hence, .

Conversely, if , then to show that also. Again the

proof is by induction on the number of derivation steps.

Basis
If , then A → w ∊ G′. By the construction of G′, one can see
that there exists a rule A → α in G such that α and w differ only in
nullable symbols. Hence where for only ε-rules
are used.

Induction
Assume . Then let .
For the corresponding equivalent derivation by G will
be as some of the are nullable.
Hence, by induction hypothesis.
Hence A . Hence L(G) = L(G′).

Example 2.13.

Consider G = (N, T, P, S) where N = {S, A, B, C, D}, T = {0, 1} and

P = {S → AB0C, A → BC, B → 1|ε, C → D|ε, D → ε} The set NULL = {A, B, C, D}. Then

P′ = {S → AB0C|AB0|A0C|B0C|A0|B0|0C|0, A → BC|B|C, B → 1, C → D}.
Procedure to Eliminate Unit Rules
Definition 2.15

Any rule of the form X → Y, X, Y ∊ N is called a unit rule. Note that
A → a with a ∊ T is not a unit rule.

In simplification of CFGs, another important step is to make the

given CFG unit rule free i.e., to eliminate unit rules. This is
essential for the application in compilers. As the compiler spends
time on each rule used in parsing by generating semantic routines,
having unnecessary unit rules will increase compile time.

Lemma 2.2

Let G be a CFG, then there exists a CFG G′ without unit rules such
that L(G) = L(G′).

Proof. Let G = (N, T, P, S) be a CFG. First, find the sets of pairs of

non-terminals (A, B) in G such that by unit rules. Such a
pair is called a unit-pair.

Algorithm UNIT-PAIR
1.
(A, A) ∊ UNIT-PAIR, for every variable A ∊ N.
2.
3.
If (A, B) ∊ UNIT- PAIR and B → C ∊ P then (A, C) ∊ UNIT- PAIR
4.
Now construct G′ = (N, T, P′, S) as follows. Remove all unit
productions. For every unit-pair (X, Y), if Y → α ∊ P is a non-unit
rule, add to P′, X → α. Thus G′ has no unit rules.
Let us consider leftmost derivations in G and G′.

If w ∊ L(G′), then . Clearly, there exists a derivation

of w from S by G where zero or more applications of unit rules are
used. Hence, whose length may be different from .

If w ∊ L(G), then one can consider by G. A sequence of

unit-productions applied in this derivation is to be written by a
non-unit production. Since this is a leftmost derivation, for such
sequence there exists one rule in G′ doing the same job. Hence, one
can see the simulation of a derivation of G with G′. Hence L(G)
= L(G′).

Example 2.14.

Let G = (N, T, P, S) be a CFG, where N = {X, Y}, T = {a, b} and

P = {X → aX|Y|b, Y → bK|K|b, K → a}

UNIT- PAIR = {(X, X), (Y, Y), (K, K), (X, Y), (Y, K), (X, K)}

Then G′ = (N, T, P′, S), where P′ = {X → aX|bK|b|a, Y → bK|b|a,

K → a}.

Remark. Since removal of ε-rule can introduce unit productions,

to get a simplified CFG to generate L(G) − {ε}, the following steps
have to be used in the order given.

1.
Remove ε-rules;
2.
3.
Remove unit-rules and
4.
5.
Remove useless symbols.
6.
.
Remove symbols not deriving terminal strings.
.
.
Remove symbols not reachable from S.
.
Example 2.15.

It is essential that steps 3(i) and 3(ii) have to be executed in that

order. If step 3(ii) is executed first and then step 3(i), we may not
get the required reduced grammar.

Consider CFG G = (N, T, P, S) where N = {S, A, B, C}, T = {a, b}

and P = {S → ABC|a, A → a, C → b}

L(G) = {a}

Applying step 3(ii) first removes nothing. Then apply step 3(i)
which removes B and S → ABC leaving S → a, A → a,
C → b. Though A and C do not contribute to the set L(G), they are
not removed.

On the other hand applying step 3(i) first, removes B, S → ABC.

Afterwards apply step 3(ii), removes A, C, A → a,
C → b. Hence S → a is the only rule left which is the required result.

Normal Forms
We have seen in Section 2.3, how a given CFG can be simplified. In
this section, we see different normal forms of CFG i.e., one can
express the rules of the CFG in a particular form. These normal
form grammars are easy to handle and are useful in proving
results. The most popular normal forms are Weak Chomsky
Normal Form (WCNF), Chomsky Normal Form (CNF), Strong
Chomsky Normal Form (SCNF), and Greibach Normal Form
(GNF).

Weak Chomsky Normal Form

Definition 2.16
Let G = (N, T, P, S) be a CFG. If each rule in P is of the form A →
Δ , A → a or A → ε, where A ∊ N, Δ ∊ N+, a ∊ T, then G is said to
be in WCNF.

Example 2.16.

Let G = (N, T, P, S) be a CFG, where N = {S, A, B}, T ={a, b}

and P = {S → ASB |AB, A → a, B → b}. G is in WCNF.

Theorem 2.7

For any CFG G = (N, T, P, S) there exists a CFG G′ in WCNF such
that L(G) = L(G′).

Proof. Let G = (N, T, P, S) be a CFG. One can construct an

equivalent CFG in WCNF as below. Let G′ = (N′, T′, P′, S′) be an
equivalent CFG where N′ = N ∪ {Aa/a ∊ T}, none of Aa’s belong
to N.

P′ = {A → Δ |A → α ∊ P and every occurrence of a symbol

from T present in α is replaced by Aa, giving Δ} ∪ {Aa → a|a ∊ T}.
Clearly, Δ ∊ N′+ and P′ gets the required form. G′ is in WCNF.
That G and G′ equivalent can be seen easily.

Chomsky Normal Form

Definition 2.17

Let ε ∉ L(G) and G = (N, T, P, S) be a CFG. G is said to be in

CNF, if all its productions are of the form A → BC or A → a, A, B,
C ∊ N, a ∊ T.

Example 2.17.

The following CFGs are in CNF.

1. G1 = (N, T, P, S), where N = {S, A, B, C}, T = {0, 1} and

P = {S → AB |AC |SS, C → SA, A → 0, B → 1}.

2. G2 = (N, T, P, S) where N = {S, A, B, C}, T = {a, b} and

P = {S → AS |SB, A → AB |a, B → b}.

Remark. No CFG in CNF can generate ε. If ε is to be added

to L(G), then a new start symbol S′ is to be taken and S′ → ε should
be added. For every rule S → α,S′ → α should be added, which
make sure that the new start symbol does not appear on the right-
hand side of any production.

Theorem 2.8

Given CFG G, there exists an equivalent CFG G′in CNF.

Proof. Let G = (N, T, P, S) be a CFG without ε rules, unit-rules and

useless symbols and also ε ∉ L(G). Modify G to G′ such that G′ =
(N′, T, P′, S) is in WCNF. Let A → Δ ∊P. If |Δ| = 2, such rules need
not be modified. If |Δ| ≥ 3, the modification is as below:

If A → Δ = A1A2A3, the new set of equivalent rules will be:

A → A1B1

B1 → A2A3.

Similarly, if A → A1 A2 ... An ∊ P, it is replaced by

A → A1B1

B1 → A2B2
⋮

Bn−2 → An−1An.

Let P″ be the collection of modified rules and G″ = (N″, T, P″, S) be

the modified grammar which is clearly in CNF. Also L(G) = L(G″).

Example 2.18.

Let G = (N, T, P, S) be a CFG where N = {S, A, B},

T = {a, b}

P = {S → SAB |AB |SBC, A → AB |a, B → BAB |b, C → b}.

Clearly, G is not in CNF but in WCNF. Hence, the modification of

rules in P are as below:

For S → SAB, the equivalent rules are S → SB1, B1 → AB.

For S → SBC, the equivalent rules are S → SB2, B2 → BC.

For B → BAB, the equivalent rules are B → BB3, B3 → AB.

Hence G″ = (N″, T, P″, S) will be with N″ = {S, A, B, C, B1, B2, B3}, T = {a, b}

P″ = {S → SB1|SB2|AB, B1 → AB, B2 → BC, B3 → AB,

A → AB|a, B → BB3|b, C → b}.

Clearly, G″ is in CNF.

Strong Chomsky Normal Form

Definition 2.18
A CFG G = (N, T, P, S) is said to be in SCNF when rules in P are
only of the forms A → a, A → BC where A, B, C ∊ N, a ∊ T subject
to the following conditions:

if A → BC ∊ P, then B ≠ C.

2.
3.

if A → BC ∊ P, then for each rule X → DE ∊ P, we have E ≠ B and

D ≠ C.

4.
Theorem 2.9

For every CFG G = (N, T, P, S) there exists an equivalent CFG in

SCNF.

Proof. Let G = (N, T, P, S) be a CFG in CNF. One can construct an

equivalent CFG, G′ = (N′, T, P′, S′) in SCNF as below.

N′ = {S′} ∪{AL, AR|A ∊ N}

T = T

P′ = {AL → BLCR, AR → BLCR|A → BC ∊ P}
∪ {S′ → XLYR|S → XY ∊ P}
∪ {S′ → a|S → a ∊ P}
∪ {AL→ a, AR → a|A → a ∊ P, a ∊ T}.

Clearly, L(G) = L(G′) and G′ is in SCNF.

Example 2.19.

Let G = (N, T, P, S) be a CFG where N = {S, A, B}, T = {0, 1}

and P = {S → AB|0, B → BA|1, A → AB|0}. Then G′ = (N′, T, P′, S′)
in SCNF will be with

N′ = {S′ SL, SR, AL, AR, BL, BR}

T = {0, 1}

P′ = {S′ → ALBR|0, SL → ALBR|0,
SR → ALBR|0, AL →ALBR|0,
AR → ALBR|0, BL → BLAR|1,
BR → BLAR|1}.

Greibach Normal Form

Definition 2.4

Let ε ∉ L(G) and G = (N, T, P, S) be a CFG. G is said to be in

GNF, if each rule in P rewrites a variable into a word in TN* i.e.,
each rule will be of the form A → aα, a ∊T, α ∊ N*.

Before we proceed to construct a CFG in GNF, we prove the

following techniques.

TECHNIQUE 1
For any CFG G = (N, T, P, S) with a A-production A → α1Bα2 ∊ P,
and B-productions B → y1|y2|...yn, there exists an equivalent CFG
with new A-productions

A → α1y1α2|α1y2α2| ... αnynαn, i.e., G′ = (N, T, P′, S) with

P′ = {P— {A → α1Bα2}} ∪ {A → α1y1α2 1 ≤ i ≤ n}. How is L(G′)

= L(G)?
A → α1Bα2 is a production in P. Whenever this production is used,
the subsequent application of rules will be B-rules. Hence in
G, A ⇒ α1Bα2 ⇒ α1yiα2 whereas in G′, A ⇒ α1yiα2. Conversely,
if A ⇒ α1yiα2, then the rules used for
reaching α1 yiα2 are A − α1 Bα2 and B → yi in G.

TECHNIQUE 2
Let G = (N, T, P, S) be a CFG with A-productions,

A → Ax1|Ax2|... |Axn|y1| ... |ym where do not start with A.

Let G′ = (N ∪ {Z}, T, P′, S) where P′ is defined to include the

following set of rules.

1.
A →y1|y2| ... |ym.
2.
A → y1Z|y2Z| ... |ymZ.
3.
4.
Z → x1|x2| ... |xn
5.
Z → x1 Z|x2Z| ... |xnZ.
6.
7.
Remaining rules of P excluding original A-productions.
8.
Then G′ and G are equivalent. Left recursion is removed by
introducing right recursion. If w ∊ L(G), .

If A ⇒ Axin ⇒ Axin-1 xin ⇒ ... ⇒ Axi1 xi2 ... xin ⇒ yj xi1 ... xin is a

derivation in G, the corresponding derivation in G′ will be:

A ⇒ yjZ ⇒ yj xi1 Z ⇒ yjxi1xi2Z ⇒ ... ⇒ yjxi1xi2 ... xin-1 Z ⇒ yjxi1 ... xin-

1xin

Theorem 2.4
For every CFL L with ε ∉ L, there exists a CFG in GNF such that
L(G) = L.

Proof. Let G = (N, T, P, S) be a CFG without ε rules and WCNF.

A CFG in GNF is constructed in five-steps.

Rename all the variables as {A1, A2,..., An} with S = A1.

2.
3.

To obtain productions either in the

form A → aw or Ai → Ajw where j > i where the construction is by
induction on i.

Basis

For A1-productions, using Technique 1 and Technique 2 one can

get equivalent A1-rules which are clearly either in the
form A1 → aw or A1 → Ajw, j > 1. The A1 rules which are already in
the required form will be retained.

Induction

Consider Ai+1 productions assuming that all Aj productions 1

≤ j ≤ i are put in the required form. Productions already in the
required form are not modified. Hence,
consider Ai+1 → Alw where l is the least index among such symbols
occurring on the right-hand side of Ai+1 rules. If l > i+1, one need
not modify the rule. Otherwise, apply induction hypothesis
to Al rules where l ≤ i. By Technique 1, one can convert these rules
to the required form. Then modify Ai+1 → Ajw, j ≤ i + 1, to reach the
form Ai+1 → Alw, l ≥ i + 1, Ai+1 → aw, a ∊ T. Rules of the
form Ai+1 → Ai+1w can be modified by Technique 2.

Convert all An rules to the form An → aw using Technique 2. Now

all An rules are in GNF form.

2.
3.

Modification in Step 3 for An rules is to be propagated to An−1 rules

to convert them to GNF form and so on upto A1 rules. At the end of
this step all Ai rules will be in GNF.

4.
5.

For each Zi rules introduced by application of Technique 2 the

required form is reached via substitution of Ai rules. (Technique 1).

Hence G1 is in GNF.

L(G) = L(G′) because during the conversion only Techniques 1 and

2 are used which do not affect the language generated.

Remark (i) If G is in CNF, then any A-rule will be of the

form A → a, A → BC, A → AD. By Technique 1 or 2 as in step 2, new
equivalent A-rules can be obtained which are A → aw or A → w′
where w ∊ N*, w′ ∊ N+, a ∊ T. Steps 3–5 modify the A-rule
to A → aw or Z → bw′ where, a, b ∊ T, w, w′ N*.

(ii) When ε ∊ L, the previous construction works for L − {ε} and

subsequently ε rules can be added to the modified grammar
without affecting the other rules as mentioned earlier.

Example 2.5.

Consider a CFG G = (N,T,P,S) where N = {S}, T = {a, b} and

P = {S → SS, S → aSb, S → ab}.

The equivalent CFG in WCNF will be

G′ = (N′, T, P′, S) where N = {S, A, B}, T = {a, b} and

P′ = {S → ASB, S → AB, A → a, B → b, S → SS}.

The conversation of this to GNF is as shown below:

1.
S = A1, A = A2, B = A3.
2.
Rules now become:
3.
.
A1 → A1A1
.
.
A1 → A2A1A3
.
.
A1 → A2A3
.
.
A2 → a
.
.
A3 → b.
.
4.
Convert the Ai rules such that they are of the
form Ai → aw or Ai → Ajw j > i.
5.
Taking rules 1, 2, and 3 they can be replaced by the following 6
rules:
6.
7. A1 → A2A1A3

8. A1 → A2A3

9. A1 → A2A1A3Z

10. A1 → A2A3Z

11. Z → A1Z

12. Z → A1.
7.
A2, A3 rules are already in GNF.
8.
9.
is not necessary here.
10.
11.
Convert Ai rules into GNF. Starting from An and going up to A1,
12.
A3 → b
13.
A2 → a are in GNF
14.
A1 rules become
15.
7′. A1 → aA1A3
16.
8′. A1 → aA3
17.
9′. A1 → aA1A3Z
18.
10′. A1 → aA3Z
19.
20.
Convert the Z rules into GNF
21.
11 is replaced by the 4 rules

13. Z → aA1A3Z

14. Z → aA3Z

15. Z → aA1A3ZZ

16. Z → aA3ZZ

12 is replaced by the 4 rules

13′. Z → aA1A3

14′. Z → aA3

15′. Z → aA1A3Z

16′. Z → aA3Z
22.
Note that 15′ and 16′ are repetition of 13 and 14.
23.
So we end up with:
24.
A1 → aA1A3

A1 → aA3

A1 → aA1A3Z

A1 → aA3Z

A2 → a

A3 → b

13. Z → aA1 A3Z

14. Z → aA3Z

15. Z → aA1A3ZZ

16. Z → aA3ZZ

13′. Z → aA1A3

14′. Z → aA3.
25.
Useless symbols and rules can be removed afterwards
(e.g. A2 → a).
26.

Problems and Solutions

Problem Give CFG for the following:
1. a. {anbn|n ≥ 1}

Solution. G = (N, T, P, S)

N = {S}, T = {a, b}, P consists of the following rules:
{1.S → aSb, and 2.S → ab}
Whenever rule 1 is applied, one ‘a’ is generated on the left and one ‘b’; on the
terminates by using rule 2. A sample derivation and derivation tree are given b
Figure 2.12. A sample derivation tree

S ⇒ aSb

⇒ aaSbb

⇒ aaaSbbb

⇒ aaaaSbbbb

⇒ aaaaa bbbbb

b. {anbmcn|n, m ≥ 1}

Solution. The grammar G = (N, T, P, S), N = {S, A}, T = {a, b, c} P is given as follows:
1.
S → aSc
2.
3.
S → aAc
4.
5.
A → bA
6.
7.
A → b
8.
Rule 1 and 2 generate equal number of a’s and c’s; rule 2 makes sure at least o
generated. Rule 3 and 4 generate b’s in the middle. Rule 4 makes sure at least
is to be noted that a’s and c’s are generated first and b’s afterwards.
c. {anbncm|n, m ≥ 1}

Solution. G = (N, T, P, S), N = {S, A, B}, T = {a, b, c}

P is given as follows:
1.
S → AB
2.
3.
A → aAb
4.
5.
A → ab
6.
7.
B → cB
8.
9.
B → c
10.
Rule 2 and 3 generate equal number of a’s and b’s. Rule 4 and 5 generate c’s.
so that equal number of a’s and b’s are followed by a string of c’s.
In the following solutions, only the rules are given. Capital letters stand for no
letters stand for terminals.

d. {anbncmdm|n, m ≥ 1}

Solution.1.
S → AB
2.
3.
A → aAb
4.
5.
A → ab
6.
7.
B → cBd
8.
9.
B → cd
10.

e. {anbmcmdn|n, m ≥ 1}

Solution.1.
S → aSd
2.
3.
S → aAd
4.
5.
A → bAc
6.
7.
A → bc
8.

f. {anbm|n, m ≥ 1, n > m}

Solution.1.
S → aSb
2.
3.
S → aAb
4.
5.
A → aA
6.
7.
A → a
8.

g. {anbm|n, m ≥ 1,n ≠ m}

Solution. {anbm|n, m ≥ 1,n ≠ m} = {anbm|n, m ≥ 1,n > m} ∪ {anbm|n, m ≥ 1, m > n} Rules a
1.
S → aSb
2.
3.
S → aAb
4.
5.
A → aA
6.
7.
A → a
8.
9.
S → aBb
10.
11.
B → bB
12.
13.
B → b
14.
Rules 1, 2, 3, and 4 generate more a’s than b’s. Rules 1, 5, 6, and 7 generate m

h. {wcwR|w ∊ {a,b}*}

Solution. Rules are:

1.
S → aSa
2.
3.
S → bSb
4.
5.
S → c
6.

i. {w|w ∊ {(,)}+, w is a well-formed string of parenthesis}

Solution. {w|w ∊ {(,)}+, w is a well-formed string of parenthesis} Rules are:

1.
S → SS
2.
3.
S → (S)
4.
5.
S → ( )
6.
The following grammar generates the same language plus the empty string:
1.
S → SaSbS
2.
3.
S → ε
4.

Problem Give regular grammars for

2. a. {an|n ≥ 1}

Solution. G = (N, T, P, S), N = {S}, T = {a}

P has the following rules:
1.
S → aS
2.
3.
S → a
4.

b. {anbm|n, m ≥ 1}

Solution. We give below the rules only. Capital letters stand for non-terminals.
1.
S → aS
2.
3.
S → aA
4.
5.
A → bA
6.
7.
A → b
8.

c. {a2n|n ≥ 1}

Solution.1.
S → aA
2.
3.
A → aS
4.
5.
A → a
6.

d. {(ab)n|n ≥ 1}

Solution.1.
S → aA
2.
3.
A → bS
4.
5.
A → b
6.

e. {anbmcp|n, m, p ≥ 1}

Solution.1.
S → aS
2.
3.
S → aA
4.
5.
A → bA
6.
7.
A → bB
8.
9.
B → cB
10.
11.
B → c
12.

f. {(abc)n|n ≥ 1}

Solution.1.
S → aA
2.
3.
A → bB
4.
5.
B → cS
6.
7.
B → c
8.

Exercises
1. Find a CFG for the languages over {0, 1} consisting of those strings in which the rati
1’s to the number of 0’s is three to two.

2. Define CFGs that generate the following languages.

1.
The set of odd length string S in {0, 1}* with middle symbol 1
2.
3.
{aibjck | j > i + k}.
4.

3. Prove that the following CFGs do not generate


L = {x ∊{0, 1}* | #0(x) = #1(x)}

1.
S → S01S | S10S | ε
2.
3.
S → 0S1 | 1S0 | 01S | S01 | S10 | ε.
4.

4. Show that the CFG S → SS |a| b is ambiguous.

5. Convert the following to CNF.


S → ABA


A → aA | ε


B → bB | ε


6. Which of the following grammars are ambiguous? Are the languages generated inher
1.
.
S → ε
.
.
S → aSb
.
.
S → SS
.
2.
.
S → ε
.
.
S → aSb
.
.
S → bSa
.
.
S → SS
.
3.
.
S → bS
.
.
S → Sb
.
.
S → ε
.
4.
.
S → SaSa
.
.
S → b
.
5.
.
S → Sb
.
.
S → aSb
.
.
S → Sa
.
.
S → a
.
6.
.
S → a
.
.
S → aaS
.
.
S → aaaS
.
7.
.
S → A
.
.
S → aSb
.
.
S → bS
.
.
A → Aa
.
.
A → a
.
8.
.
S → AA
.
.
A → AAA
.
.
A → bA
.
.
A → Ab
.
.
A → a
.

7. Consider L2 where L = {wwR | w ∊ {a, b}*}. Give an argument that L2 is inherently am

8. Give an unambiguous grammar for


L = {w | w ∊ {a, b}+, w has equal number of a’s and b’s}


9. Consider the grammars over the terminal alphabet Σ = {a, b} with rules
1.
S → wSS
2.
S → a
3.
S → b
4.
5.
S → aSS
6.
S → bSS
7.
S → w
8.
where w is some string over Σ. Show that each of these grammars is always ambiguo
be.

10. Consider the grammar S → aS|aSbS|ε. Prove that the grammar generates all and only
and b’s such that every prefix has at least as many a’s as b’s. Show that the grammar
an equivalent unambiguous grammar.

11. Consider the following grammar. E → +EE| * EE| − EE|x|y. Show that the grammar
What is the language generated?
12. Which of the following CFLs you think are inherently ambiguous? Give arguments.
1.
{aibjckdl|i, j, k, l ≥ 1,i ≠ j and k ≠ l}
2.
3.
{aibjckdl|i, j, k, l ≥ 1,i = k or j = l}
4.
5.
{aibjckdl|i, j, k, l ≥ 1, i = j and k = l or i = l and j = k}
6.
7.
{aibjckdl|i, j, k, l ≥ 1,i ≠ j or k ≠ l}
8.

13. Remove the useless symbols from the following CFGs.

1.
.
S → ABB
.
.
S → CAC
.
.
A → a
.
.
B → Bc
.
.
B → ABB
.
.
C → bB
.
.
C → a
.
2.
.
S → aSASb
.
.
S → Saa
.
.
S → AA
.
.
A → caA
.
.
A → Ac
.
.
B → bca
.
.
A → ε
.

14. Consider the following CFGs. Construct equivalent CFGs without ε-productions.

1.
S → bEf
2.
E → bEc
3.
E → GGc
4.
G → b
5.
G → KL
6.
K → cKd
7.
K → ε
8.
L → dLe
9.
L → ε
10.
11.
S → eSe
12.
S → GH
13.
G → cGb
14.
G → ε
15.
H → JHd
16.
H → ε
17.
J → bJ
18.
J → f
19.

15. Remove unit production from the following grammar:

1.
S → cBA
2.
3.
S → B
4.
5.
A → cB
6.
7.
A → AbbS
8.
9.
B → aaa
10.

16. Find a CFG with six productions (including ε productions) equivalent to the followin

S → b | bHF | bH | bF


H → bHc | bc


F → dFe | de | G


G → dG |d


17. Given two grammars G and G′


G


a. S → EFG


b. E → bEc


c. E → ε


d. F → cFd


e. F → ε


f. G → dGb


g. G → ε


G′


1′ S → EQ


2′ Q → FG


(b)–(g) as in G′.

Give an algorithm for converting a derivation in G′ of a terminal string into a derivat
terminal string.

18. Let G and G′ be CFGs where


G


S → bSc


S → bc


S → bSSc


G′


S → bJ


J → bJc


J → c


J → bJbJc

1.
Give an algorithm for converting a derivation in G of a terminal string into a derivati
terminal string
2.
3.
Give an algorithm for converting a derivation in G′ of a terminal string into a derivat
4.

19. Suppose G is a CFG and w ∊ L(G) and | w | = n. How long is a derivation of w in G if
1.
G is in CNF
2.
3.
G is in GNF
4.

20. Show that every CFL without ε is generated by a CFG whose productions are of the
A → aB and A → aBC.

21. An operator CFG represents an ε-free context-free grammar such that no production
non-terminals on its right-hand side. That is, an ε-free CFG, G = (N, T, P, S), is an op
all p ∊ P, rhs(p) ∉ (N ∪ T)*NN(N ∪ T)*. Prove that for every CFG G there exists an
such that L(G) = L(G′).

22. Find CNF and GNF equivalent to the following grammars:

1.
S → S ∧ S
2.
S → S ∨ S
3.
S → ⌍ S
4.
S → (S)
5.
S → p
6.
S → q
7.
8.
E → E + E
9.
E → E * E
10.
E → (E)
11.
E → a
12.
23. A grammar G = (N, T, P, S) is said to be self-embedding if there exists a non-termina
that , α, β ∊ (N ∪ T)+. If a grammar is non-self-embedding, it generates a re

24. A CFG G = (N, T, P, S) is said to be invertible if A → α and B → α implies A = B. Fo

is an invertible CFG G′ such that L(G) = L(G′). Prove.

25. For each CFG G = (N, T, P, S) there is an invertible CFG, G′= (N′, T, P′, S′) such tha
Moreover
1.
A → ε is in P′ if and only if ε ∊ L(G) and A = S′
2.
3.
S′ does not appear on the right-hand side of any rule in P′
4.

26. Let L be a CFL. Then show that L can be generated by a CFG G = (N, T, P, S) with t
following forms:
S → ε

A → a, a ∊ T

A → B, B ∊ N− {S}

A → αBCβ, α, β ∊ ((N ∪ T) — {S})*, B, C ∊ (N ∪ T) — {S} and either B ∊ T o

Chapter 3. Finite State Automata

A language is a subset of the set of strings over an alphabet.
In Chapter 2, we have seen how a language can be generated by a
grammar. A language can also be recognized by a machine. Such a
device is called a recognition device. Hence, a language can have a
representation in terms of a generative device as a grammar as well
as in terms of a recognition device which is called an acceptor. The
simplest machine or recognition device is the finite state
automaton which we discuss in this chapter and in the next two
chapters. Apart from these two types of representation, there are
other representations like regular expressions, which is also
discussed in this chapter.

Consider a man watching a TV in his room. The TV is in ‘on’ state.

When it is switched off, the TV goes to ‘off’ state. When it is
switched on, it again goes to ‘on’ state. This can be represented by
the following Figure.

The above Figure 3.1 is called a state diagram.

Figure 3.1. An example of FSA

Consider another example, a set of processes currently on a single

processor. Some scheduling algorithm allots them time on the
processors. The states in which a system can be are ‘wait,’ ‘run,’
‘start,’ ‘end.’ The connections between them can be brought out by
the following figure (Figure 3.2).

Figure 3.2. Another example of FSA

When the schedule allots time, the process goes from ‘start’ state to
‘run’ state. When the allotted time is finished, it goes from ‘run’
state to ‘wait’ state. When the job is finished, it goes to the ‘end’
state.

As another example, consider a binary serial adder. At any time it

gets two binary inputs x1 and x2. The adder can be in any one of the
states ‘carry’ or ‘no carry.’ The four possibilities for the
inputs xi x2 are 00, 01, 10, and 11. Initially, the adder is in the ‘no
carry’ state. The working of the serial adder can be represented by
the following figure (Figure 3.3).

Figure 3.3. A binary serial adder

denotes that when the adder is in state p and

gets input x1x2, it goes to state q and outputs x3. The input and
output on a transition from p to q is denoted by i/o. It can be seen
that suppose the two binary numbers to be added are 100101 and
100111.

Time 654321

100101

100111

The input at time t = 1 is 11 and the output is 0, and the machine

goes to ‘carry’ state. The output is 0. Here at time t = 2, the input is
01; the output is 0 and the machine remains in ‘carry’ state. At
time t = 3, it gets 11 and outputs 1 and remains in ‘carry’ state. At
time t = 4, the input is 00; the machine outputs 1 and goes to ‘no
carry’ state. At time t = 5, the input is 00; the output is 0 and the
machine remains in ‘no carry’ state. At time t = 6, the input is 11;
the machine outputs 0 and goes to ‘carry’ state. The input stops
here. At time t = 7, no input is there (and this is taken as 00) and
the output is 1.

It should be noted that at time t = 1, 3, 6, the input is 11, but the

output is 0 at t = 1, 6 and is 1 at time t = 3. At time t = 4, 5, the
input is 00, but the output is 1 at time t = 4 and 0 at t = 5. So, it is
seen that the output depends both on the input and the state.
The figures we have seen are called the state diagrams. The nodes
are the states. They are represented as circles with labels written
inside them. The initial state is marked with an arrow pointing to
it. When considered as recognition devices, the final states are
represented as double circles. The transition from one state to
another is represented by directed edge.

The above figure (Figure 3.4) indicates that in state q when the

machine gets input i, it goes to state p and outputs o.

Figure 3.4. Representation of a transition

Let us consider one more example of a state diagram given

in Figure 3.5. The input and output alphabets are {0, 1}. For the
input 011010011, the output is 001101001 and machine is
in q1 state. It can be seen that the first is 0 and afterwards, the
output is the symbol read at the previous instant. It can also be
noted that the machine goes to q1 after reading a 1 and goes
to q0 after reading a 0. It should also be noted that when it goes
from state q0 it outputs a 0 and when it goes from state q1, it
outputs a 1. This machine is called a one-moment delay machine.

Figure 3.5. One-moment delay machine

One can try to construct two-moment delay machine and three-

moment delay machines. The idea for such construction is that for
two-moment delay machine last two inputs must be remembered
and for three-moment delay machine, last three inputs should be
remembered.

For example, construct a finite state machine which reads a binary

string and outputs a 0, if the number of 1’s it has read is even and 1
if the number of 1’s is odd. Such a machine is called a parity
checker.
The above state diagram represents such a machine (Figure 3.6).

Figure 3.6. Parity checker

Deterministic Finite State

Automaton (DFSA)
So far we have considered the finite state machine as an
input/output device. We can also look at the finite state automaton
(FSA) as accepting languages, i.e., sets of strings. We can look at
the FSA with an input tape, a tape head, and a finite control (which
denotes the state) (Figure 3.7).

Figure 3.7. FSA as a recognizer

The input string is placed on the input tape. Each cell contains one
symbol. The tape head initially points to the leftmost symbol. At
any stage, depending on the state and the symbol read, the
automaton changes its state and moves its tape head one cell to the
right. The string on the input tape is accepted, if the automaton
goes to one of the designated final states, after reading the whole of
the input. The formal definition is as follows.

Definition 3.1

A DFSA is a 5-tuple

M = (K, Σ, δ, q0, F), where



K is a finite set of states




Σ is a finite set of input symbols




q0 in K is the start state or initial state




F ⊆ K is set of final states




δ, the transition function is a mapping from K × Σ to K.

δ(q, a) = p means, if the automaton is in state q and reading a

symbol a, it goes to state p in the next instant, moving the pointer
one cell to the right. δ is extended as to K × Σ* as follows:


(q, ε) = q for all q in K




(q, xa) = δ ( (q, x), a), x ∊ Σ*, q ∊ K, a ∊ Σ.

Since (q, a) = δ (q, a), without any confusion we can use δ for

also. The language accepted by the automaton is defined as:

T(M) = {w|w ∊ T*, δ (q0, w) ∊ F}.


Example 3.1.

Let a DFSA have state set {q0, q1, q2, D}; q0 is the initial state; q2 is

the only final state. The state diagram of the DFSA is in Figure 3.8.

Figure 3.8. DFSA accepting {an bm | n, m ≥ 1}

The behavior of the machine on string aaabb can be represented as

follows:
After reading aaabb, the automaton reaches a final state. It is easy
to see that

T(M) = {anbm|n, m ≥ 1}

There is a reason for naming the fourth state as D. Once the control
goes to D, it cannot accept the string, as from D the automaton
cannot go to a final state. On further reading, for any symbol the
state remains as D. Such a state is called a dead state or a sink
state.

Non-deterministic Finite State

Automaton (NFSA)
In Section 3.1, we have seen what is meant by a DFSA. If the
machine is in a particular state and reads an input symbol, the next
state is uniquely determined. In contrast, we can also have a non-
deterministic FSA (NSFA), where the machine has the choice of
moving into one of a few states.

Consider the following state diagram of a NFSA (Figure 3.9).

Figure 3.9. An example of an NFSA

If the machine is in q0 state and reads ‘a,’ then it has the choice of
going to q0 or q1.

If the machine is in q1 and reads ‘b,’ it has the choice of going

to q1 or q2. Hence, the machine is non-deterministic. It is not
difficult to see that the language accepted is {anbm|n, m ≥ 1}. On a
string aaabb, the transition can be looked at as follows:

Starting on the input tape with aaabb in state q0, after reading a,

the automaton can be in q0 or q1. From q0, after reading the
second a, it can be in state q0 or q1. From q1, it cannot read a.
From Figure 3.10, it can be seen that after reading aaabb, the
automaton can be in state q1 or q2. If there is a sequence of moves
which takes the automaton to a final state, then the input is
accepted. In this example, the sequence is:

Equation 3.1.

Figure 3.10. Transition sequences for aaabb

Now, we give the formal definition of NFSA.

Definition 3.2

A NFSA is a 5-tuple M = (K, Σ, δ, q0, F), where K, Σ, δ, q0, F are as

given for DFSA and δ, the transition function is a mapping from K ×
Σ into finite subsets of K.
The mappings are of the form δ(q, a) = {p1, ...,pr}, which means if
the automaton is in state q and reads ‘a’ then it can go to any one
of the states p1, ..., pr. δ is extended as to K × Σ* as follows:

(q, ε) = {q} for all q in K.

If P is a subset of K

Since δ (q. a) and (q. a) are equal for a ∊ Σ, we can use the
same symbol δ for also.

The set of strings accepted by the automaton is denoted by T(M).

T(M) = {w|w ∊ T*, δ(q0, w) contains a state from F}.

The automaton can be represented by a state table also. For

example, the state diagram given in Figure 3.9 can be represented
as the state table given in Figure 3.11.

Figure 3.11. State table
There is a row corresponding to each state and a column
corresponding to each input symbol. The initial state is marked with
an arrow and the final state with a circle. In the case of NFSA,
each cell contains a subset of states and in the case of DFSA each
cell contains a single state.

Example 3.2.

The state diagram of an NFSA which accepts binary strings which

have at least one pair ‘00’ or one pair ‘11’ is in Figure 3.12.

Figure 3.12. State diagram for Example 3.2

Example 3.3.

The state diagram of an NFSA which accepts binary strings which

end with ‘00’ or ‘11’ is in Figure 3.13.
Figure 3.13. State diagram for Example 3.3

By making the automaton non-deterministic do we get any

additional power? The answer is negative. It is seen that NFSA are
equivalent to deterministic automata in language accepting power.

Theorem 3.1

If L is accepted by a NFSA then L is accepted by a DFSA

Proof. Let L be accepted by a NFSA M = (K, Σ, δ, q0, F). Then, we

construct a DFSA M′ = (K′, Σ, δ′, q′0, F′) as follows: K′ = P(K),
power set of K. Corresponding to each subset of K, we have a
state in K′.q′0 corresponds to the subset containing q0 alone. F
′ consists of states corresponding to subsets having at least one
state from F. We define δ′ as follows:

δ′([q1, ... ,qk], a) = [r1, r2, ... , rs] if and only if

δ({q1, ... ,qk}, a) = {r1, r2, ... , rs}.

We show that T(M) = T(M′).

We prove this by induction on the length of the string.

We show that:

if and only if δ(q0, x) = {p1, ..., pr}.

Basis

Induction
Assume that the result is true for strings x of length upto m. We
have to prove for string of length m + 1. By induction hypothesis

if and only if δ(q0, x) = {p1, ... ,pr}.

where P = {p1, ..., pr}.

Suppose

δ({p1, ...,pr}, a) = {s1, ... ,sm}.

By our construction

δ′([p1, ... ,pr], a) = [s1, ... ,sm] and hence,

In M′, any state representing a subset having a state from F is in F′.
So, if a string w is accepted in M, there is a sequence of states
which takes M to a final state f and M′ simulating M will be in a
state representing a subset containing f. Thus, L(M) = L(M′).

Example 3.4.

Let us construct the DFSA for the NFSA given by the table
in Figure 3.11. We construct the table for DFSA.

Equation 3.2.

The state diagram is given in Figure 3.14.

Figure 3.14. DFSA for the NFSA in Figure 3.9

Other states [q1], [q2], [q0, q2], [q0, q1, q2] are not accessible from
[q0] and hence the transitions involving them are not shown.

Note the similarity between the Figure 3.8 and Figure 3.14.

NFSA with ε-transitions
Having defined NFSA, we now try to include ε-transition to NFSA.
We allow the automaton to change state without reading any
symbol. This is represented by an ε-transition (Figure 3.15).

Figure 3.15. ε-transition

Example 3.5.
In Figure 3.16, q0 is the initial state and q3 is final state. There is
an ε-transition from q0 to q1 and another from q2 to q3. It is not
difficult to see that the set of strings which take the automaton
from q0 to q3 can be represented by {anbmcpdq/m ≥ 1, n, p, q ≥ 0}.

Figure 3.16. An example of NFSA with ε-transitions

By adding ε-transition do we get an additional power? The answer

is negative. For any NFSA with ε-transition, we can construct an
NFSA without ε-transitions.

Definition 3.3

An NFSA with ε-transition is a 5-tuple M = (K, Σ, δ, q0, F), where K,

Σ, q0, F are as defined for NFSA and δ is a mapping from K × (Σ ∪
{ε}) into finite subsets of K.

δ can be extended as to K × Σ* as follows. First, we define the ε-
closure of a state q. It is the set of states which can be reached
from q by reading ε only. Of course, ε-closure of a state includes
itself.

For w in Σ* and a in Σ,

, where P = {p| for some r in (q, w), p is

in δ(r, a)}. Extending δ and to a set of states, we get:
The language accepted is defined as:

Theorem 3.2

Let L be accepted by a NFSA with ε-moves. Then L can be

accepted by a NFSA without ε-moves.

Proof. Let L be accepted by a NFSA with ε-moves M = (K,

Σ, δ, q0, F). Then, we construct a NFSA M′ = (K, Σ, δ′, q0, F′)
without ε-moves for accepting L as follows.

F′ = F∪{q0} if ε-closure of q0 contains a state from F.

= F otherwise.

We should show T(M) = T(M′).

We wish to show by induction on the length of the string x accepted

that .

We start the basis with |x| = 1 because for |x| = 0, i.e., x = ε this

may not hold. We may have δ′(q0, ε) = {q0}
and of q0 which may include other states.
Basis
|x| = 1. Then x is a symbol of Σ say a, and by our
definition of δ′.

Induction
|x| > 1. Then x = ya for some y ∊ Σ* and a ∊ Σ.

Then δ′(q0, ya) = δ′(δ′(q0, y), a).

By the inductive hypothesis .

Let .

Therefore

It should be noted that δ′(q0, x) contains a state in F′ if and only

if contains a state in F.

For ε, this is clear from the definition. For x = ya, if (q0, x)

contains a state from F, then surely δ′(q0, x) contains the same
state in F′. Conversely, if δ′(q0, x) contains a state from F′ other
than q0, then δ′(q0, x) contains this state of F. The only case the
problem can arise is when δ′(q0, x) contains q0 and q0 is not in F.
This can happen if ε-closure of q0 contains some other states. In
this case of . Some state of q other
than q0 must have been reached from q0 and this must be in
.

Example 3.6.

Consider the ε-NFSA of Example 3.5. By our construction, we get

the NFSA without ε-moves given in Figure 3.17.
Figure 3.17. NFSA without ε-moves for Example 3.5


ε-closure of (q0) = {q0, q1}


ε-closure of (q1) = {q1}


ε-closure of (q2) = {q2, q3}


ε-closure of (q3) = {q3}

It is not difficult to see that the language accepted by the above
NFSA = {anbmcpdq|m ≥ 1, n, p, q ≥ 0}.

Regular Expressions
Regular expressions are another way of specifying regular sets.

Definition 3.4

Let Σ be an alphabet. For each a in Σ, a is a regular expression

representing the regular set {a}. φ is a regular expression
representing the empty set. ε is a regular expression representing
the set {ε}.
If r1 and r2 are regular expressions representing the regular sets
R1 and R2 respectively, then r1 + r2 is a regular expression
representing R1 ∪ R2. r1r2 is a regular expression representing
R1R2. is a regular expression representing .

Any expression obtained from φ, ε, a(a ∊ Σ), using the above

operations and parentheses where required, is a regular
expression.

Example 3.7.

(ab)*abcd represent the regular set:

{(ab)ncd|n ≥ 1}

Example 3.8.

(a + b)(c + d)(a + b) is a regular expression representing the

regular set:

{w1cw2|w1, w2 are strings of a′s and b′s including ε} ∪

{w1dw2|w1, w2 are strings of a′s and b′s including ε}

Now, we shall see how to construct an NFSA with ε-moves from a

regular expression and also how to get a regular expression from a
DFSA.

Theorem 3.3

If r is a regular expression representing a regular set, we can

construct an NFSA with ε-moves to accept r.

Proof. r is obtained from a, (a ∊ Σ), ε, φ by finite number of

applications of +, · and * (· is usually left out).
For ε, φ and a we can construct NFSA with ε-moves as given
in Figure 3.18.

Figure 3.18. NFSA for ε, φ, and a

Let r1 represent the regular set R1 and R1 is accepted by the

NFSA M1 with ε-transitions (Figure 3.19).

Figure 3.19. M1 accepting R1

Without loss of generality, we can assume that each such NFSA

with ε-moves has only one final state.

R2 is similarly accepted by an NFSA M2 with ε-transition (Figure

3.20).

Figure 3.20. M2 accepting R2

Now, we can easily see that R1 ∪ R2 (represented by r1 + r2) is

accepted by the NFSA given in Figure 3.21.
Figure 3.21. NFSA for R1 ∪ R2

For this NFSA, q0 is the ‘start’ state and qf is the ‘end’ state.

R1 R2 represented by r1r2 is accepted by the NFSA with ε-moves

given in Figure 3.22.

Figure 3.22. NFSA for R1R2

For this NFSA with ε-moves, q01 is the initial state and qf2 is the

final state.

represented by is accepted by the NFSA with ε-moves given

in Figure 3.23.
Figure 3.23. NFSA for

For this NFSA with ε-moves, q0 is the initial state and qf is the final
state. It can be seen that contains strings of the
form x1x2 . . . xk each xi ∊ R1. To accept this string, the control
goes from q0 to q01 and then after reading x1 and reaching qf1, it
goes to q01, by an ε-transition. From q01, it again reads x2 and goes
to qf1. This can be repeated a number (k) of times and finally the
control goes to qf from qf1 by an ε-transition. is accepted
by going to qf from q0 by an ε-transition.

Thus, we have seen that for a given regular expression one can
construct an equivalent NFSA with ε-transitions.

We know that we can construct an equivalent NFSA without ε-

transitions from this, and can construct a DFSA. (Figure 3.24).

Figure 3.24. Transformation from regular expression to DFSA

Example 3.9.

Consider a regular expression aabb. a and b are accepted by

NFSA with ε-moves given in Figure 3.25.

Figure 3.25. NFSA for a and b
a* and b* are accepted by NFSA with ε-moves given in Figure 3.26.

Figure 3.26. NFSA for a* and b*

aabb will be accepted by NFSA with ε-moves given in Figure

3.27.

Figure 3.27. NFSA for aa*bb*

But we have already seen that a simple NFSA can be drawn easily
for this as in Figure 3.28.

Figure 3.28. Simple NFSA for aabb

Next, we see that for a given DFSA how the corresponding regular
expression can be found. (Figure 3.29).

Figure 3.29. Transformation from DFSA to regular expression

This along with the diagram in Figure 3.24 brings out the

equivalence between DFSA, NFSA, NFSA with ε-moves and regular
expressions.
Definition 3.5

Let L ⊆ Σ* be a language and x be a string in Σ*. Then the

derivative of L with respect to x is defined as:

Lx = {y ∊ Σ*/xy ∊ L}.

It is sometimes denoted as ∂xL.

Theorem 3.4

If L is a regular set, Lx is regular for any x.

Proof. Consider a DFSA accepting L. Let this FSA be M = (K,

Σ, δ, q0, F). Start from q0 and read x to go to state qx ∊ K.

Then M′ = (K, Σ, δ, qx, F) accepts Lx. This can be seen easily as

below.

δ(q0, x) = qx,

δ(q0, xy) ∊ F ⇔ xy ∊ L,

δ(q0, xy) = δ(qx, y)

δ(qx, y) ∊ F ⇔ y ∊ Lx,

∴ M′ accepts Lx.

Lemma 3.1
Let Σ be an alphabet. The equation X = AX ∪ B where A, B ⊆
Σ* has a unique solution X = A*B if ε ∉ A.

Proof. Let

Equation 3.8.

Since ε ∉ A, any string in Ak will have minimum length k. To

show X = A*B. Let w ∊ X and |w| = n. We have:

Equation 3.9.

Since any string in An+lX will have minimum length n + 1, w will

belong to one of AkB, k ≤ n. Hence, w ∊ A*B.

On the other hand, let w ∊ A*B. To prove w ∊ X.

Since |w| = n, w ∊ AkB for some k ≤ n. Therefore from Equation

(3.9) w ∊ X.

Hence, we find that the unique solution for X = AX + B is X = A*B.

NOTE

If ε ∊ A, the solution will not be unique. Any A*C,

where C ⊇ B, will be a solution.

Next we give an algorithm to find the regular expression

corresponding to a DFSA.

Algorithm
Let M = (K, Σ, δ, q0, F) be the DFSA.

Σ = {a1, a2, ..., ak}, K = {q0, q1, ..., qn−1}.

1.
Write an equation for each state in K.
2.
q = a1 qi1 + a2 qi2 + ... + akqik
3.
if q is not a final state and δ(q, aj) = qij 1 ≤ j ≤ k
4.
q = a1 qi1 + a2 qi2 + ... + akqik + λ
5.
if q is a final state and δ(q, aj) = qij 1 ≤ j ≤ k.
6.
7.
Take the n equations with n variables qi, 1 ≤ i ≤ n, and solve
for q0 using Lemma 3.1 and substitution.
8.
9.
Solution for q0 gives the desired regular expression.
10.
Let us execute this algorithm for the following DFSA given
in Figure 3.30.
Figure 3.30. DFSA for aa*bb*

Step 1

Equation 3.10.

Equation 3.11.

Equation 3.12.

Equation 3.13.

Step 2

From Equation (3.13), Solving for q0.

D = (a + b) D + φ

Using Lemma 3.4.1

Equation 3.14.

Using Equation (3.14) in Equations (3.10) and (3.12)

Equation 3.15.
Equation 3.16.

Equation 3.17.

Note that we have got rid of one equation and one variable. In
Equation (3.17) using Lemma 3.4.1, we get

Equation 3.18.

Now using Equation (3.18) in Equation (3.16)

Equation 3.19.

We now have Equations (3.15) and (3.19). Again, we eliminated

one equation and one variable.

Using Lemma 3.4.1 in Equation (3.19), we obtain:

Equation 3.20.

Using Equation (3.20) in Equation (3.15), we obtain:

Equation 3.21.

This is the regular expression corresponding to the given FSA.

Next, we see how we are justified in writing the equations.

Let q be the state of the DFSA for which we are writing the
equation,

Equation 3.22.

Y = λ or φ.
Let L be the regular set accepted by the given DFSA. Let x be a
string such that starting from q0, after reading x, state q is reached.
Therefore q represents Lx, the derivative of L with respect to x.
From q after reading aj, the state qij is reached.

Equation 3.23.

ajLxaj represents the set of strings in Lx beginning with aj. Hence,

Equation (3.23) represents the partition of Lx into strings
beginning with a1, a2, and so on. If Lx contains ε,
then Y = ε otherwise Y = φ. It should be noted that
when Lx contains ε, q is a final state and so x ∊ L.

It should also be noted that considering each state as a variable qj,

we have n equation in n variables. Using Lemma 3.1, and
substitution, each time one equation is removed while one variable
is eliminated. The solution for q0 is Lε = L. This gives the required
regular expression.

In this chapter, we have considered the definition

of DFSA, NFSA, NFSA with ε-moves and regular expressions, and
shown the equivalence among them.

Problems and Solutions

Problem 1. Construct DFSA for the following sets of strings.
a. The set of strings over {a, b} having an even number of a’s and odd num

Solution. It should be noted that the set of strings over Σ = {a, b} can be divided into
1.
having even number of a’s and even number of b’s.
2.
3.
having even number of a’s and odd number of b’s.
4.
5.
having odd number of a’s and even number of b’s.
6.
7.
having odd number of a’s and odd number of b’s.
8.
Strings in class (i) take the machine to q0, class (ii) to q3, class (iii) to q1, an
to q2 (Figure 3.31).

Figure 3.31. Solution to Problem 1.a

b. The set of strings over {a, b} whose length is divisible by 4 (Figure 3.32).

Figure 3.32. Solution to Problem 1.b

c. The set of strings over {a, b, c} having bca as substring (Figure 3.33).

Figure 3.33. Solution to Problem 1.c

d. The set of strings over {a, b, c} in which the substring abc occurs an even n

(possibly zero) (Figure 3.34).
Figure 3.34. Solution to Problem 1.d

Problem 2. Describe the language accepted by the following DFSA in Figure 3.35.
a.

Figure 3.35. State diagram for Problem 2.a

Solution. L = {(ab)n|n ≥ 0}.

b. Describe the language accepted by the following DFSA in Figure 3.36.

Figure 3.36. State diagram for Problem 2.b

Solution. L = set of strings in (a, b)* whose length is divisible by 3.

c. Describe the language accepted by the following DFSA in Figure 3.37.

Figure 3.37. State diagram for Problem 2.c

Solution. L = {anbmcp|n, m, p ≥ 1}.

d. Describe the language accepted by the following DFSA in Figure 3.38.

Figure 3.38. State diagram for Problem 2.d

Solution. Set of strings over {a, b} containing aaa as a substring.

Problem 3. For the following NFSA find equivalent DFSA (Figure 3.39).
a.

Figure 3.39. State table for Problem 3.a

The state table can be represented by the state diagram as given below (Figu

Figure 3.40. State diagram for Problem 3.a

Solution.

Figure 3.41. Solution to Problem 3.a

b. For the following NFSA find equivalent DFSA (Figure 3.42).

Figure 3.42. State table for Problem 3.b

The state table can be represented by the state diagram given in Figure 3.13.

Solution.

Figure 3.43. Solution to Problem 3.b

c. For the following NFSA find equivalent DFSA (Figure 3.44).

Figure 3.44. State table for Problem 3.c

The state table can be represented by the state diagram given in Figure 3.12.

Figure 3.45. Solution to Problem 3.c

d. For the following NFSA find equivalent DFSA (Figure 3.46).

Figure 3.46. State table for Problem 3.d

The state table can be represented by the state diagram given below (Figure

Figure 3.47. State diagram for Problem 3.d

Solution.

Figure 3.48. Solution to Problem 3.d

Problem 4. Write regular expression for each of the following languages over the alphab
1.
The set of strings containing ab as a substring.
2.
3.
The set of strings having at most one pair of consecutive a’s and at most one
consecutive b’s.
4.
5.
The set of strings whose length is divisible by 6.
6.
7.
The set of strings whose 5th last symbol (5th symbol from the end) is b.
8.

Solution. 1.
A = (a + b)*ab(a + b)*
2.
3.
One pair of a’s but no pair of b’s B1 = (b + ε)(ab)*aa(ba)*(b + ε) One pair o
of a’s B2 = (a + ε)(ba)*bb(ab)*(a + ε) B = B1 + B2
4.
aa occurring before bb
5.
C = (b + ε)(ab)* aa(ba)* bb(ab)* (a + ε)
6.
bb occurring before aa
7.
D = (a + ε)(ba)* bb(ab)* aa(ba)* (b + ε)
8.
No pair of a’s and b’s
9.
E = (b + ε)(ab)* (a + ε)
10.
Required regular expression is B + C + D + E
11.
12.
[(a + b)6]*
13.
14.
(a + b)*b(a + b)4
15.

Exercises
1. The following are the state diagrams of two DFSA’s M1 and M2. Answer the followi
these machines (Figure 3.49)

Figure 3.49. State diagrams for Exercise 1

1.
What is the start state of M1?
2.
3.
What are the accepting states of M1?
4.
5.
What is the start state of M2?
6.
7.
What is the sequence of states does M1 go through on input 0000111?
8.
9.
Does. M1 accept 00001111?
10.
11.
Does M2 accept ε?
12.

2. Construct DFSA to accept the following languages.

1.
{a, b}+
2.
3.
{a, b}*
4.
5.
{(ab)n|n ≥ 1}
6.
7.
{anbmcp|n, m, p ≥ 1}
8.
9.
{w|w ∊ {a, b}*, w has an even number of a’s and even number of b’s}
10.

3. Find a DFSA for each of the following languages.

1.
L = {w|w is a binary string containing both substrings 010 and 101}.
2.
3.
For any fixed integer k ≥ 0, L = {0iw1i|w is a binary string and 0 ≤ i ≤ k}.
4.
4. Suppose we restrict DFSA’s so that they have at most one accepting state. Can any r
recognized by this restricted form of DFSA? Justify your answer.

5. Design deterministic finite automata for each of the following sets:

1.
the set of all strings in {1, 2, 3}* containing 231 as substring.
2.
3.
the set of strings x ∊ {0, 1}* such that #0(x) is even and #1 (x) is a multiple of three.
4.
5.
the set of strings in (a)* whose length is divisible by either 2 or 7.
6.

6. Let Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. Consider the base 10 numbers formed by strings

represents fifteen, 408 represents four hundred, and eight and so on. Let L = {x ∊ Σ*|
represented by x is exactly divisible by 7} = {∊, 0, 00, 000,..., 7, 07, 007,..., 14, 21, 2
DFSA that accepts L.

7. Let Σ = {a, b}. Consider the language consisting of all words that have neither conse
consecutive b’s. Draw a DFSA that accepts this language.

8. Let Σ = {a, b, c}. Let L = {x ∊ {a, b, c}*||x| = 0 mod 5}. Draw a DFSA that accepts L

9. Let Σ = {a, b, c}. Consider the language consisting of words that begin and end with
Draw a DFSA that accepts this language.

10. Let Σ = {a, b, c}.

1.
Draw a DFSA that rejects all words for which the last two letters match.
2.
3.
Draw a DFSA that rejects all words for which the first two letters match.
4.

11. Suppose we alter the definition of a DFSA so that once the automaton leaves its start
to its start state. For such a DFSA, if an input w causes it to take a transition from a s
state q0, where p ≠ q0, then the DFSA immediately halts and rejects w. Call this new
return-DFSA. Prove that the class of languages recognized by no-return-DFSA is the
languages.
12. Construct a NFSA for each of the languages given in problem no. 1,3,5,6,7,8,9.

13. Given a non-deterministic finite automaton M without ε-transitions, show that it is po

NFSA with ε-transitions M′ such that:
1.
M′ has exactly one start state and one end state
2.
3.
L(M) = L(M′).
4.

14. What is the language accepted by the NFSA M (Figure 3.50)?

Figure 3.50. State diagram for Exercise 14

15. Let Σ = {a, b}. Find a NFSA for each of the following languages:
1.
{x|x contains an even number of a’s}.
2.
3.
{x|x contains an odd number of b’s}.
4.
5.
{x|x contains an even number of a’s and an odd number of b’s}
6.
7.
{x|x contains an even number of a’s or an odd number of b’s}
8.

16. Suppose we alter the definition of an NFSA so that we now identify two types of stat
the good states G ⊆ Q, and the bad states B ⊆ Q where G ∩ B = φ. (Note that a state
good nor bad, but no state is both good and bad). The automata accepts input w if, co
computations on w, some computation ends in G and no computation ends in B. Call
NFSA as a good-bad NFSA.
Prove that the class of languages recognized by good-bad NFSAs is the class of regu
17. In the questions below, given a language L we describe how to form a new language
in L. Prove that if L is regular then the new languages are regular by constructing an
languages.
1.
skip(L) = {xy|xcy ∊ L, where x and y are strings, c is a letter}
2.
3.
suffix(L) = {y|xy ∊ L, x, y are strings}.
4.

18. Convert the following two NFSA to equivalent DFSA using subset construction meth

Figure 3.51. State diagrams for Exercise 18

19. Prove the following identities for regular expressions r, s and t. Here r = s means L(r

1.
r + s = s + r
2.
3.
φ* = ε
4.
5.
(r*s*)* = (r + s)*
6.
7.
(rs)* r = r(sr)*
8.

20. Construct a NFSA accepting the language denoted by the following regular expressio
NFSA to an equivalent DFSA.
1.
a*ba*ab*
2.
3.
a*bb*(a + b)ab*
4.
5.
b((aab* + a4)b)*a
6.

21. Given the following DFSAs, construct equivalent regular expressions (Figure 3.52).

Figure 3.52. State diagrams for Exercise 21

22. Give an NFSA with four states equivalent to the regular expression (01 + 011 + 0111
automaton to an equivalent DFSA using subset construction.

23. Give regular expressions that describe each of the following languages, which are ov
1}. Explain how you constructed your regular expressions.
1.
{w| w contains substrings 010 and 101}
2.
3.
{w|w does not contain substring 0110}
4.
5.
{w| w has an even number of 0’s and an even number of 1’s}
6.
7.
{w|w has the same number of occurrences of 10 and 01}
8.
Note that in (a) and (d) occurrences can overlap.

24. For each of the following languages, give two strings that are members and two strin
members – a total of four strings for each part. Let Σ = {a, b} in all parts.
1.
a*b*
2.
3.
a(ba)*b
4.
5.
a* ∪ b*
6.
7.
(aaa)*
8.
9.
Σ*aΣ*bΣ*aΣ*
10.
11.
aba ∪ bab
12.
13.
(Σ ∪ a)b
14.
15.
(a ∪ ba ∪ bb) Σ*
16.

25. Let Σ = {a, b}. Give (if possible) a regular expression that describes the set of all eve

26. Give examples of sets that demonstrate the following inequalities. Here, r1, r2, r3 are
1.
r1 + ∊ ≠ r1
2.
3.
r1 · r2 ≠ r2 · r1
4.
5.
r1 · r1 ≠ r1
6.
7.
r1 + (r2 · r3) ≠ (r1 + r2) · (r1 + r3)
8.
9.
(r1 · r2)* ≠ (r*1 · r*2)*
10.

27. Find examples of sets that show the following expressions maybe equal under some
Here, r1, r2, r3 are regular expressions.
1.
r1 + ∊ = r1
2.
3.
r1 · r2 = r2 · r1 (even if r1 ≠ r2)
4.
5.
r1 · r1 = r1
6.
7.
r1 + (r2 · r3) = (r1 + r2) · (r1 + r3) (even if r1 ≠ r2 and r3 = r1)
8.

28. Solve the following language equations for X1, X2, and X3 by eliminating X3 and then
Solve for X1 and then back-substitute to find X2 and X3.
X1 = φ + φX1 + (0 + 1)X2 + φX3

X2 = ∊ + 0X1 + 1X2 + φX3

X3 = φ + φX1 + (0 + 1) X2 + φX3.

Chapter 4. Finite State Automata:

Characterization, Properties, and
Decidability
In this chapter, the equivalence between right linear grammars and
finite state automata (FSA) is proved. The pumping lemma for
regular sets is considered, and is used to give a method for showing
a language not to be regular. We also consider some basic closure
properties and decidability results.

FSA and Regular Grammars

Theorem 4.1

If a language L is accepted by a finite non-deterministic automaton,

then L can be accepted by a right linear grammar and conversely.

Proof. Let L be a language accepted by a finite non-deterministic

automaton M = (K, Σ, δ, q0, F) where K = {q0, . . ., qn}. If w ∊ L,
then w is obtained by the concatenation of symbols corresponding
to different transitions starting from q0 and ending at a finite state.
Hence, for each transition by M while reading a symbol of w, there
must be a correspondence to a production of a right linear
grammar G. The construction is as shown below:

G = ({S0, S1, . . ., Sn}, Σ, P, S0)

where productions in P are

Si → aSj if δ(qi, a) contains qj for qj ∉ F

2.
3.
Si → aSj and Si → a if δ(qi, a) contains qj, qj ∊ F.

To prove L(G) = L = L(M).

From the construction of P, one is able to see that Si ⇒ aSj, if and

only if δ(qi, a) contains qj and Si ⇒ a, if and only if δ(qi, a) ∊ F.
Hence, if S0 ⇒ a1S1 ⇒ a1a2S2 ⇒ . . . ⇒ a1 . . . an if and only
if δ(q0, a1) contains q1, δ(q1, a2) contains q2, . . ., δ(qn-1, an)
contains qn where qn ∊ F.

Hence, w ∊ L(G) if and only if w ∊ L(M).

Let G = (N, T, P, S) be a right linear grammar. An equivalent non-

deterministic finite state automaton (NFSA) with ε-moves is
constructed as below:

Let M = (K, T, δ, [S], [ε]), where

K = {[α]|α is S or suffix of some right-hand side of a production in P,

the suffix need not be proper}.

The transition function δ is defined as follows:

δ([A], ε) = {[α]|A → α ∊ P}

2.
3.
If a ∊ T or α ∊ T*N, then δ([aα], a) = {[α]}. Clearly, [α] ∊ δ([S], w) if
and only if where A → yα ∊ P and xy = w.
If w ∊ L(M), then α = ε. M accepts w if and only
if . Hence the converse follows.

4.
Example 4.1.

Let G = ({S, A}, {0, 1}, {S → 0A, A → 10A/ε}, S) be a regular

grammar. The corresponding NFSA will be (Figure 4.1):

Figure 4.1. NFSA for Example 4.1

Clearly, L(G) = L(M) = 0(10)*.

Example 4.2.

Consider the finite automaton M = (K, Σ, δ, A, F) where K =

{A, B, C, D}


Σ = {0, 1}


F = {B}


δ: {δ(A, 0) = B, δ(B, 0) = D, δ(C, 0) = B, δ(A, 1) = D, δ(B, 1)
= C, δ(C, 1) = D, δ(D, 0) = D, δ(D, 1) = D}.

The corresponding regular grammar will be as follows:

G = ({A, B, C, D}, {0, 1}, P, A) where


P = {A → 0B|1D|0, B → 0D|1C|, C → 0B|1D|0, D → 0D|1D}.


Clearly, L(M) = L(G) = 0(10)*.


Pumping Lemma for Regular Sets

A language is said to be regular, if it is accepted either by a finite
automaton or it has a regular grammar generating it. In order to
prove that a language is not regular the most commonly used
technique is “pumping lemma.” The lemma gives a pumping
property that a sufficiently long word has a subword (non-empty)
that can be pumped. But the fact that a language satisfies pumping
lemma does not mean that it is regular.

Theorem 4.2 (pumping lemma)

Let L be a regular language over T. Then, there exists a constant k

depending on L such that for each w ∊ L with |w| ≥ k, there exists
x, y, z ∊ T* such that w = xyz and

|xy| ≤ k

2.
3.
|y| > 1

4.
5.

x yiz ∊ L ∀i ≥ 0

k is no more than the number of states in the minimum state

automaton accepting L.

Proof. Let M = (K, Σ, δ, q1, F) be a deterministic finite state

automaton (DFSA) accepting L. Let K = {q1, . . ., qk}.

Let w = a1 . . . am ∊ L where ai ∊ Σ, 1 ≤ i ≤ m, m ≥ k.

Let the transitions on w be as shown below:

q1a1 . . . am ⊢ a1q2a2 . . . am ⊢ . . . ⊢a1 . . . amqm+1

where qj ∊ K, 1 ≤ j ≤ m + 1. Here a1 . . . ai-1qiai . . . am means the

FSA is in qi state after reading a1 . . . ai-1 and the input head is
pointing to ai. Clearly, in the above transitions, m + 1 states are
visited, but M has only k states. Hence, there exists qi, qj such
that qi = qj. Hence for

q1a1 . . . am ⊢ a1q2a2 . . . am ⊢ . . . ⊢ (a1 . . . ai-1qiai . . . am ... ⊢ a1 . . . aj-1qiaj . . . am) ⊢

⊢ a1 . . . amqm+1
where the transitions between the brackets start and end at qi,
processing a string αt for t ≥ 0. Hence, if x = a1 . . . ai-
1, y = ai . . . aj, z = aj+1 . . . am, xytz ∊ L ∀t ≥ 0 where |xy| ≤ k,
since qi is the first state identified to repeat in the transition and |y|
≥ 1. Hence the lemma.

The potential application of this lemma is that it can be used to

show that some languages are non-regular. We illustrate this with
the following example.

Example 4.3.

Let L = {anbn| n ≤ 1}. If L is regular, then by the above lemma there
exists a constant ‘k’ satisfying the pumping lemma conditions.
Choose w = akbk. Clearly |w| > k. Then w = xyz, |xy| ≤ k and |y| ≥
1. If |x| = p, |y| = q, |z| = r, p + q + r = 2k, and p + q ≤ k.
Hence, xy consists of only a’s and since |y| ≥ 1, xz ∉ L as number
of a’s in x is less than k and |z|b = k. Hence, pumping lemma is not
true for i = 0 as xyiz must be in L for i ≥ 0. Hence L is not regular.

Example 4.4.

Show that L = {aibj| i, j ≥ 1, i ≠ j} is not regular.

When we have to use pumping lemma, we have to be careful in

choosing the proper word. Suppose L is regular. Then pumping
lemma holds. Let n be the constant of the pumping lemma. Now
we have to choose a proper string w to show a contradiction.
Consider w = an!b(n+1)!. This string belongs to L. w can be written
as uvw where u and v contain a’s only. We have to
choose i properly so that uviw is of the form arbr.
It can be seen that i is an integer and can be suitably chosen as
above. So pump v, i times to get a(n+1)!b(n+1)!. We come to the
conclusion a(n+1)!b(n+1)! ∊ L, which is a contradiction. Therefore L is
not regular.

In the next chapter, we see that there is a simpler proof for this
using Myhill-Nerode theorem.

Remarks on Pumping Lemma

One can see that not only regular languages satisfy the pumping
lemma property, but also some non-regular languages do so.
Hence, the converse of pumping lemma is not true. We can see it in
the following example. We use some closure properties of regular
languages in Example 4.2.3, which are given in Section 4.3.

Example 4.5.

Let L = {anbn|n ≥ 1}. We know L is non-regular. Consider L# =

(#+L) ∪ {a, b}* where # ∉ {a, b}. L# satisfies all the properties of
pumping lemma with k = 1. For any w ∊ #+L, let x = λ, y = # and for
any w ∊ Σ*, x = λ and y is the first letter of w. However, L# is not
regular, which can be seen as below. Let h be a homomorphism
defined as below: h(a) = a for each a ∊ Σ and h(#) = λ.
Then L = h(L# ∩ #+Σ*). Clearly #+Σ* is regular. If L# is regular,
then L# ∩ #+Σ* is regular as regular languages are closed under
intersection. Also, regular languages are closed under
homomorphism and hence L is regular which is a contradiction.
Hence L# is non-regular. Hence, one requires a stronger version of
pumping lemma so that the converse holds (Yu, 1997).

It should also be noted that it is not true that for arbitrarily long
strings z in a regular set L, z can be written in the form uviw for
large i.

Example 4.6.

Consider an initial string ab over the alphabet Σ = {a, b}. Use

rules a → ab, b → ba to replace a’s and b’s at every step.
Step 0: w0 = ab

Step 1: w1 = abba

Step 2: w2 = abbabaab

⋮

In step n, we get a string of length 2n+1. It is known that wn has no

substring repeated three times consecutively. {wn} are called cube-
free words. So, simple regular languages like {a, b}* contain
arbitrarily long words which cannot be written in the form uviw for
large i.

Closure Properties
Theorem 4.3

The family of regular languages is closed under the following

operations: (1) union (2) intersection (3) complementation (4) caten
ation (5) star, and (6) reversal.

Proof. The six closure properties will be proved below either

through finite automaton or regular grammars and it has been
shown that they are equivalent in Theorem 4.1.1.

Union: Let L1 and L2 be two regular languages generated by two

right linear grammars G1 = (N1, T1, P1, S1) and G2 =
(N2, T2, P2, S2) (say). Without loss of generality
let N1 ∩ N2 = φ. L1 ∪ L2 is generated by the right linear
grammar. G′ =
(N1 ∪ N2∪{S}, T1 ∪ T2, P1∪P2∪{S → S1, S → S2}, S). L(G′)
= L(G1)∪L(G2) because, the new start symbol of G′ is S from which
we reach S1 or S2 using the rules S → S1, S → S2. After this step
one can use only rules from P1 or P2, hence deriving words
in L1 or L2 or in both.

2.
3.

Intersection: Let L1, L2 be any two regular languages accepted by

two DFSA’s M1 = (K1, Σ1, δ1, q1, F1) and M2 = (K2, Σ2, δ2, q2, F2).
Then, the DFSA M constructed as below accepts L1 ∩ L2. Let M =
(K, Σ, δ, q0, F) where K = K1 × K2, q0 = (q1, q2), F = F1 × F2, δ: K ×
Σ → K is defined by δ((p1, p2), a) = (δ1(p1, a), δ2(p2, a)).

One can see that for each input

word w, M runs M1 and M2 parallely, starting from q1, q2,
respectively. Having finished reading the input, M accepts only if
both M1, M2 accept. Hence, L(M) = L(M1) ∩ L(M2).

5.
6.

Complementation: Let L1 be a regular language accepted

by DFSA M = (K, Σ, δ, q0, F). Then, clearly the complement of L is
accepted by the DFSA Mc = (K, Σ, δ, q0, K − F).

7.
8.

Concatenation: We prove this property using the concept of

regular grammar. Let L1 and L2 and G1 and G2 be defined as in
proof of union of this theorem. Then, the type 3
grammar G constructed as below satisfies the requirement
that L(G) = L(G1). L(G2). G = (N1 ∪ N2, T1 ∪ T2, S1, P2 ∪ P)
where P = {A → aB/A → aB ∊ P1} ∪ {A → aS2|A → a ∊ P1}.
Clearly, L(G) = L(G1). L(G2) because any derivation starting
from S1 derives a word w ∊ L1 and for G, . Hence,
if by G2, then by G.

9.
10.

Catenation closure: Here also we prove the closure using regular

grammar. Let L1 be a regular grammar generated by G1 =
(N1, T1, P1, S1). Then, the type 3 grammar G = (N1∪{S0}, T1, S0,
{S0 → ε, S0 → S1} ∪ {A → aS1|A → a ∊ P1} ∪ P1).
Clearly, G generates .

11.
12.

Reversal: The proof is given using the NFSA model. Let L be a

language accepted by a NFSA with ε-transitions which has exactly
one final state.

13.

(Exercise: For any NFSA, there exists an equivalent NFSA with ε-

transitions with exactly one final state). Let it be M = (K, Σ, δ, q0,
{qf}). Then, the reversal automaton M′ = (K, Σ, δ′, qf, {q0}) where δ′
is defined as δ′(q, a) contains p, if δ(p, a) contains q for
any p, q ∊ K, a ∊ Σ ∪ {ε}. One can see that if w ∊ L(M)
then wR ∊ L(M′) as in the modified automaton M′, each transition
takes a backward movement on w.

14.
We prove that regular languages are also closed under
homomorphism and right quotient.

Theorem 4.4
Regular languages are closed under homomorphism.

Proof. Let r be the regular expression for the regular

languages L and h be the homomorphism. h(r) is an expression
obtained by substituting h(a) for each symbol a in r. Clearly, h(a) is
a regular expression. Hence, h(r) is a regular expression. For
every w ∊ L(r) the corresponding h(w) will be in L(h(r)) and
conversely.

Theorem 4.5

Let L1 and L2 be two regular languages. Then L1/L2 is also regular.

Proof. Let L1 be accepted by a DFSA M1 = (K1, Σ1, δ1, q1, F1).

Let Mi = (K1, Σ1, δ1, qi, F1) be a DFSA with qi as its initial state, for

each qi ∊ K1. Construct an automaton that accepts L2 ∩ L(Mi). If
there is a successful path from the initial state of this automaton
to its final states, then L2 ∩ L(Mi) is not empty. If so add qi to .

Let . One can see that for

if x ∊ L1/L2 whenever for any y ∊ L2, δ1(q0, xy) ∊ F1.
Hence, δ(q0, x) = q and δ(q, y) ∊ F1.

Conversely, if , then δ(q0, x) = q, . By construction

there exists a y ∊ L2 such that δ(q, y) ∊ F1,
implying xy ∊ L1 and x ∊ L1/L2. Hence the proof.

Theorem 4.6
Let h: be a homomorphism. If is regular,
then will be regular.

Proof. Let M = (K, Σ2, δ, q0, F) be a DFSA such that L(M) = L′. We

construct a new FSA M′ for h-1 (L′) = L from M as below:

Let M′ = (K, Σ1, δ′, q0, F) be such that K, q0, F are as in M.

The construction of the transition function δ′ is defined as:

δ′(q, a) = δ(q, h(a)) for a ∊ Σ1.

i.e., Here h(a) is a string over Σ2.

For, if x , and x = ε,

δ′(q, x) = δ(q, h(x))

i.e., δ′(q, ε) = q = δ(q, h(ε)) = δ(q, ε).

If x ≠ ε, let x = x′ a, then

δ′(q, x′ a) = δ′(δ′(q, x′), a)
= δ′(δ(q, h(x′)),a)

= δ(δ(q, h(x′)), h(a))

= δ(δ(q, h(x′)h(a))

= δ (q, h(x′ a)).

Hence, one can see that L(M′) = h-l(L(M)) for any input x ∊ Σ1. i.e.,

x ∊ L(M′) iff δ′(q0, x) ∊ F

iff δ(q0, h(x)) ∊ F

iff h(x) ∊ L(M)

iff x ∊ h−1(L(M)).

Any family of languages which is closed under the six basic

operations of union, concatenation, Kleene closure, ε-free
homomorphism, intersection with regular sets, and inverse
homomorphism is called an abstract family of languages (AFL).
The family of regular sets is an AFL. This is seen from the above
closure properties. If a family is closed under union, concatenation,
Kleene closure, arbitrary homomorphism, intersection with regular
sets, and inverse homomorphism, it is called a full AFL. If a family
of languages is closed under intersection with regular set, inverse
homomorphism and ε-free homomorphism, it is called a trio. If a
family of languages is closed under all homomorphisms, as well as
inverse homomorphism and intersection with a regular set, then it
is said to be a full trio. The family of regular sets is a full trio and a
full AFL.

Decidability Theorems
In this section, we address the basic decidability issues for regular
languages. They are membership problem, emptiness problem,
and equivalence problems.

Theorem 4.7

Given a regular language L over T and w ∊ T*, there exists an

algorithm for determining whether or not w is in L.

Proof. Let L be accepted by a DFSA M (say). Then, for input w one

can see whether w is accepted by M or not. The complexity of this
algorithm is O(n) where |w| = n. Hence, membership problem for
regular sets can be solved in linear time.

Theorem 4.8

There exists an algorithm for determining whether a regular

language L is empty, finite or infinite.

Proof. Let M be a DFSA accepting L. In the state diagram

representation of M with inaccessible states from the initial state
removed, one has to check whether there is a simple directed path
from the initial state of M to a final state. If so, L is not empty.
Consider a DFSA M′ accepting L, where inaccessible states from
the initial state are removed and also states from which a final state
cannot be reached are removed.

If in the graph of the state diagram of the DFSA, there are no

cycles, then L is finite. Otherwise L is infinite.

One can see that the automaton accepts sentences of length less
than n (where n is the number of states of the DFSA), if and only
if L(M) is non-empty. One can prove this statement using pumping
lemma. That is, |w| < n for if w were the shortest and |w|
≥ n then w = xyz and xz is shorter than w that belongs to L.

Also, L is infinite if and only if the automaton M accepts at least one

word of length l where n ≤ l < 2n. One can prove this by using
pumping lemma. If w ∊ L(M), |w| ≥ n and |w| < 2n, directly from
pumping lemma, L is infinite. Conversely, if L is infinite, we show
that there should be a word in L whose length is l where n ≤ l < 2n.
If there is no word whose length is l, where n ≤ l < 2n, let w be the
word whose length is at least 2n, but as short as any word in L(M)
whose length is greater than or equal to 2n. Then by pumping
lemma, w = w1w2w3 where 1 ≤ |w2| ≤ n and w1w2 ∊ L(M). Hence,
either w was not the shortest word of length 2n or more or |w1w3| is
between n and 2n-1, which is a contradiction.

Theorem 4.9

For any two regular languages L1 and L2, there exists an algorithm

to determine whether or not L1 = L2.

Proof. Consider . Clearly, L is regular by

closure properties of regular languages. Hence, there exists
a DFSA M which accepts L. Now, by the previous theorem one can
determine whether L is empty or not. L is empty if and only
if L1 = L2. Hence the theorem.

Problems and Solutions

1. Let Σ be an alphabet. Define IΣ to be the collection of all infinite languages ov
that IΣ does not include any finite language over Σ. Prove or give counter exam
following:
1.
IΣ is closed under union
2.
3.
IΣ is closed under intersection
4.

Solution 1.
. IΣ is closed under union. Let L1 and L2 be in IΣ. L1 and L2 are infinite sets L1 ∪
x ∊ L1 or x ∊ L2}. L1 ∪ L2 includes L1 and also L2. Hence L1 ∪ L2 is infinite.
2.
3.
IΣ is not closed under intersection. Consider Σ = {a}.
4.
L1 = {a2n|n ≥ 1} is an infinite set.
5.
L2 = {ap|p is a prime} is an infinite set.
6.
L1, L2 ∊ IΣ
7.
L1 ∩ L2 = {a2} which is a finite set and hence it is not in IΣ.
8.

2. Construct regular grammar equivalent to the following NFSA (Figure 4.2).

Figure 4.2. State diagram for Problem 2

Solution Let G = (N, T, P, S) where N = {S0, S1, S2}, T = {0, 1}. P consists of the follow

.
S0 → 0S1, S1 → 0S2|0S0|0, S2 → 1S1|1S2|1.

3. Construct an equivalent NFSA for the following grammar S → abS|a.

Solution
.

Figure 4.3. Solution to Problem 3

Exercises
1. Let Σ be an alphabet. Define IΣ to be the collection of all infinite languages over Σ. N
include any finite language over Σ. Prove or give counter examples to the following:
1.
IΣ is closed under complementation
2.
3.
IΣ is closed under concatenation
4.
5.
IΣ is closed under Kleene closure
6.

2. If a collection of languages is closed under intersection, does it mean that it is closed

or give counter example.

3. If L is accepted by a NFSA, is it necessarily true that all subsets of L are accepted by
give counter examples.

4. Let NΣ denote the collection of languages such that no L ∊ NΣ is accepted by a NFSA

counter examples to the following:
1.
NΣ is closed under union
2.
3.
NΣ is closed under catenation
4.
5.
NΣ is closed under Kleene closure
6.

5. We have shown that the union of two regular languages is regular. Is the union of a c
languages always regular? Justify your answer.

6. Let M be a DFSA accepting L1 and G be a regular grammar generating L2. Using onl

that L1 ∩ L2 is regular.

7. Let P = {x| |x| is prime} and let I(L) be defined by I(L) = L ∩ P. Let DΣ denote the co
languages recognized by a DFSA
1.
Show that DΣ is not closed under I
2.
3.
Prove or disprove NΣ is closed under I
4.

8. Given any alphabet Σ and a DFSA M, show that it is decidable whether M accepts ev

9. Given any alphabet Σ and regular expressions r1 and r2 over Σ, show that it is decida

whether r1 and r2 describe any common strings.

10. Given any alphabet Σ and a regular expression r1 over Σ, show that it is decidable wh
DFSA with less than 31 states that accepts the language described by r1.

11. Give a regular grammar for:

1.
(a + b)c*(d + (ab)*)
2.
3.
(a + b)*a(a + b)*
4.

12. Construct a regular grammar equivalent to each of the following NFSA (Figure 4.4).
Figure 4.4. State diagrams for Exercise 12

13. Construct an equivalent NFSA for each of the following grammars:

1.
S → abS1
2.
S1 → abS1|S2
3.
S2 → a
4.
5.
S → abA
6.
A → baB
7.
B → aA|bb
8.

Chapter 5. Finite State Automata

with Output and Minimization
In this chapter, we consider Myhill-Nerode theorem, minimization
of deterministic finite state automaton (DFSA) and finite state
automaton (FSA) with output.

Myhill-Nerode Theorem
In an earlier chapter, we considered the examples of serial adder,
parity machine, acceptor accepting {anbm|n,m ≥ 1}, and many more
such examples. Consider the serial adder. After getting some input,
the machine can be in ‘carry’ state or ‘no carry’ state. It does not
matter what exactly the earlier input was. It is only necessary to
know whether it has produced a carry or not. Hence, the FSA need
not distinguish between each and every input. It distinguishes
between classes of inputs. In the above case, the whole set of inputs
can be partitioned into two classes – one that produces a carry and
another that does not produce a carry. Similarly, in the case of
parity checker, the machine distinguishes between two classes of
input strings: those containing odd number of 1’s and those
containing even number of 1’s. Thus, the FSA distinguishes
between classes of input strings. These classes are also finite.
Hence, we say that the FSA has finite amount of memory.

Theorem 5.1 (Myhill-Nerode)

The following three statements are equivalent.

L ⊆ Σ* is accepted by a DFSA.

2.
3.

L is the union of some of the equivalence classes of a right

invariant equivalence relation of finite index on Σ*.

4.
5.

Let equivalence relation RL be defined over Σ* as follows: xRL y if

and only if, for all z ∊ Σ*, xz is in L exactly when yz is in L. Then
RL is of finite index.
6.

Proof. We shall prove (1) ⇒ (2), (2) ⇒ (3), and (3) ⇒ (1).

(1) ⇒ (2)

Let L be accepted by a FSA M = (K, Σ, δ, q0, F). Define a

relation RM on Σ* such that xRMy if δ(q0, x) = δ(q0,y). RM is an
equivalence relation, as seen below.

∀x xRMx, since δ(q0, x) = δ(q0, x),

∀x xRMy ⇒ yRMx ∵ δ(q0, x) = δ(q0, y) which means δ(q0, y)

= δ(q0, x),

∀x, y xRM y and yRMz ⇒ xRMz.

For if δ(q0, x) = δ(q0, y) and δ(q0, y) = δ(q0, z) then δ(q0, x)

= δ(q0, z).

So RM divides Σ* into equivalence classes. The set of strings which

take the machine from q0 to a particular state qi are in one
equivalence class. The number of equivalence classes is therefore
equivalent to the number of states of M, assuming every state is
reachable from q0. (If a state is not reachable from q0, it can be
removed without affecting the language accepted). It can be easily
seen that this equivalence relation RM is right invariant, i.e., if

xRM y, xzRM yz ∀z ∊ Σ*.




δ(q0, x) = δ (q0, y) if xRM y,




δ(q0, xz) = δ(δ (q0, x), z) = δ(δ (q0, y), z) = δ(q0, yz).

Therefore xzRM yz.

L is the union of those equivalence classes of RM which

correspond to final states of M.

(2) ⇒ (3)

Assume statement (2) of the theorem and let E be the equivalence

relation considered. Let RL be defined as in the statement of the
theorem. We see that xEy ⇒ xRL y.

If xEy, then xzEyz for each z ∊ Σ*. xz and yz are in the same

equivalence class of E. Hence, xz and yz are both in L or both not
in L as L is the union of some of the equivalence classes of E.
Hence xRL y.

Hence, any equivalence class of E is completely contained in an

equivalence class of RL. Therefore, E is a refinement of RL and so
the index of RL is less than or equal to the index of E and hence
finite.
(3) ⇒ (1)

First, we show RL is right invariant. xRL y if ∀z in Σ*, xz is

in L exactly when yz is in L or we can also write this in the following
way: xRL y if for all w, z in Σ*, xwz is in L exactly when ywz is in L.

If this holds xwRLyw.

Therefore, RL is right invariant.

Let [x] denote the equivalence class of RL to which x belongs.

Construct a DFSA ML = (K′, Σ, δ′, q0, F′) as follows: K′ contains one

state corresponding to each equivalence class of RL. [ε]
corresponds to q′0. F′ corresponds to those states [x], x ∊ L. δ′ is
defined as follows: δ′ ([x], a) = [xa]. This definition is consistent
as RL is right invariant. Suppose x and y belong to the same
equivalence class of RL. Then, xa and ya will belong to the same
equivalence class of RL. For,

δ′([x], a) = δ′([y], a)

⇓ ⇓

[xa] = [ya]

if x ∊ L, [x] is a final state in M′, i.e., [x] ∊ F′. This automaton M′
accepts L.

Example 5.1.

Consider the FSA M given in Figure 5.1.

Figure 5.1. FSA accepting b*a(a + b)*

The language accepted consists of strings of a’s and b’s having at

least one a. M divides {a, b}* into three equivalence classes.

1.
H1, set of strings which take M from q0 to q0 i.e., b*.
2.
3.
H2, set of strings which take M from q0 to q1, i.e., set of strings
which have odd numbers of a’s.
4.
5.
H3, set of strings which take M from q0 to q2, i.e., set of strings
which have even number of a’s.
6.
L = H2 ∪ H3 as can be seen.

1.
Let x ∊ H1 and y ∊ H2. Then, xb ∊ H1 and yb ∊ H2.
Then, xb ∉ L and yb ∊ L. Therefore x L y.
2.
3.
Let x ∊ H1 and y ∊ H3. Then, xb ∊ H1 and so xb ∉ L and yb ∊ H3 and
so xb ∊ L. Therefore x L y.
4.
5.
Let x ∊ H2 and y ∊ H3. Take any string z, xz belongs to
either H2 or H3 and so in L, yz belongs to either H2 or H3 and so
in L. Therefore xRL y.
6.
So, if we construct M′ as in the proof of the theorem, we have one
state corresponding to H1 and one state corresponding
to L = H2 ∪ H3.

Figure 5.2 is the automaton we get as M′. We see that, it

accepts L(M). Both M and M′ are DFSA accepting the same
language. But M′ has minimum number of states and is called the
minimum state automaton.

Figure 5.2. Minimum state automaton for the example in Figure 5.1

Theorem 5.2

The minimum state automaton accepting a regular set L is unique

up to an isomorphism and is given by M′ in the proof of Theorem
5.1.

Proof. In the proof of Theorem 5.1, we started with M, found

equivalence classes for RM, RL, and constructed M′. The number of
states of M is equal to the index of RM and the number of states
of M′ is equal to the index of RL. Since RM is a refinement of RL,
the number of states of M′ is less than or equal to the number of
states of M. If M and M′ have the same number of states, then we
can find a mapping h: K → K′ (which identifies each state of K with
a state of K′) such that if h(q) = q′ then for a ∊ Σ,

h(δ(q, a)) = δ′(q′, a).


This is achieved by defining h as follows: h(q0) = q′0 and if q ∊ K,
then there exists a string x such that δ(q0, x) = q. h(q) = q′
where δ(q′0, x) = q′. This definition of h is consistent. This can be
seen as follows: Let δ(q, a) = p and δ′(q′, a) = p′, δ(q0 xa) = p and δ′
(q′0, xa) = p′ and hence h(p) = p′.

Minimization of DFSA
In this section, we shall see how we can find the minimum state
automaton corresponding to a DFSA.

Let M = (K, Σ, δ, q0, F) be a DFSA. Let R be an equivalence relation

on K such that pRq, if and only if for each input string x, δ(p, x)
∊ F if and only if δ(q, x) ∊ F. This essentially means that
if p and q are equivalent, then either δ(p, x) and δ(q, x) both are
in F or both are not in F for any string x. p is distinguishable
from q, if there exists a string x such that one of δ(q, x), δ(p, x) is
in F and the other is not. x is called the distinguishing string for the
pair < p, q >.

If p and q are equivalent δ(p, a) and δ(q, a) will be equivalent for

any a. If δ(p, a) = r and δ(q, a) = s and r and s are distinguishable
by x, then p and q are distinguishable by ax.

Algorithm to Find Minimum DFSA

We get a partition of the set of states of K as follows:

Step 1. Consider the set of states in K. Divide them into two
blocks F and K — F. (Any state in F is distinguishable from a state
in K — F by ε.) Repeat the following step till no more split is
possible.

Step 2. Consider the set of states in a block. Consider the a-

successors of them for a ∊ Σ. If they belong to different blocks, split
this block into two or more blocks depending on the a-successors
of the states.

For example, if a block has {q1, ..., qk}. δ(q1, a) = p1, δ(q2, a)

= p2, ..., δ(qk, a) = pk, and p1,..., pi belong to one block, pi +
1, ..., pj belong to another block and pj + 1,..., pk belong to third
block, then split {q1,..., qk} into {q1,..., qi} {qi+1,...,qj} {qj+1,..., qk}.

Step 3. For each block Bi, consider a state bi.

Construct M′ = (K′, Σ, δ′, q′0, F′) where K′ = {bi|Bi is a block of the

partition obtained in Step 2}.

q0′ corresponds to the block containing q0. δ(bi, a) = bj, if there

exists qi ∊ Bi and qj ∊ Bj such that δ(qi, a) = qj. F′ consists of states
corresponding to the blocks containing states in F.

Example 5.2.

Consider the following FSA M over Σ = {b, c} accepting strings

which have bcc as substrings. A non-deterministic automaton for
this will be (Figure 5.3),

Figure 5.3. NFSA accepting strings containing bcc as a substring

Converting to DFSA we get M′ as in Figure 5.4.

Figure 5.4. The DFSA obtained for the NFSA in Figure 5.3 by subset

construction

where, p0 = [q0] p1 = [q0, q1] p2 = [q0, q2] p3 = [q0, q3] p4 =

[q0, q1, q3] p5 = [q0, q2, q3]. Finding the minimum state automaton
for M′

Step 1. Divide the set of states into two blocks:

In , the b successors are in one block, the c successors

of p0, p1 are in one block, and p2 is in another block.
Therefore, is split as and . The b and c successors
of p3, p4, p5 are in the same block. Now, the partition is
. Consider , the b successors of p0, p1 are in the same
block but the c successors of p0 and p1 are p0 and p2 and they are
in different blocks. Therefore, is split into and . Now the
partition is , , , and . No further split is possible. The
minimum state automaton is given in Figure 5.5.

Figure 5.5. The minimum state DFSA for the DFSA in Figure 5.4

The minimization procedure cannot be applied to NFSA. For

example, consider the NFSA (Figure 5.6),

Figure 5.6. NFSA accepting (0 + 1)*0

The language accepted is represented by the regular expression (0

+ 1)*0 for which NFSA in Figure 5.7 will suffice. But, if we try to
use the minimization procedure, will be initially split as
and . q0 and q2 are not equivalent as δ(q0, 0), contains a final
state while δ(q2, 0) does not. So, they have to be split and the FSA
in first figure cannot be minimized using the minimization
procedure.
Figure 5.7. Another NFSA accepting (0 + 1)*0

Myhill-Nerode theorem can also be used to show that certain sets

are not regular.

Example 5.3.

We know L = {anbn|n ≥ 1} is not regular. Suppose L is regular. Then

by Myhill-Nerode theorem, L is the union of the some of the
equivalence classes of a right invariant relation ≈ over {a, b}.
Consider a, a2, a3, ... since the number of equivalence classes is
finite, for some m and n, m ≠ n, am and an must be in the same
equivalence class. We write this as am ≈ an. Since ≈ is right
invariant ambm ≈ anbm. i.e., ambm and anbm are in the same
equivalence class. L either contains one equivalence class
completely or does not contain that class. Hence, since ambm ∊ L,
L should contain this class completely and hence anbm ∊ L which is
a contradiction. Therefore L is not regular.

Finite Automata with Output

Earlier we considered finite state automata with output like the
serial adder. We now consider them formally. The output function
can be defined in two ways. If it depends on the state alone, the
machine is called a Moore machine. If it depends on both the states
and the input, it is called a Mealy machine.

Definition 5.1

Let M = (K, Σ, Δ, δ, λ, q0) be a Moore FSA. Here:

K is a finite set of states



Σ is a finite set of input alphabet




Δ is a finite set of output alphabet




δ, the transition function, is a mapping : K × Σ → K




λ, the output function, is a mapping : K → Δ

q0 in K is the initial state. Since, we are interested in the output for
an input, we do not specify any final state.

Given an input a1a2 ... an, the machine starts in state q0, outputs

b0, reads a1 and goes to q1 and outputs b1, reads a2 and outputs
b2 by going to q2... and so on.

Input: a1a2 ... an



States: q0q1q2 ... qn




Output: b0b1b2 ... bn

So the output is b0b1 ... bn. It should be noted that for an input of

length n, we get an output of length n + 1.

Example 5.4.

Figure 5.8 represents a Moore machine with input alphabet {0, 1}

and output alphabet {y, n}. Taking a sample string:


Input: 100110110


States: q0q1q2q4q4q4q3q2q0q0


Output: ynnnnnnnyy

Figure 5.8. A Moore machine

Output is y when the binary string read so far represents a number

divisible by 5.

Definition 5.2

Let M = (K, Σ, Δ, δ, λ, q0) be a Mealy machine. Here K, Σ,

Δ, δ, q0 are as defined for Moore machine. But λ the output
function is a mapping K × Σ → K.

For an input:

Input: a1a2 ... an




States: q0q1q2 ... qn



Output: b1 ... bn

The output is of length n, if the input is of length n.

Example 5.5.

Consider the following Mealy machine.

The machine outputs y if the last two inputs symbols are 01, n,
otherwise (Figure 5.9). For a sample input 001101001, states
are q0q1q1q2q0q1q2q1q1q2 and output is nnynnynny.

Figure 5.9. A Mealy machine

The Serial adder is an example of a Mealy machine. Since for an

input of length n, the output is of length n +1 in the case of Moore
machine and of the length n in the case of Mealy machine, we
cannot really think of equivalence. But if we ignore the output at
time 0, (initial output) for the Moore machine we can talk of
equivalence.

Theorem 5.3

Let M be a Moore machine. Then we can construct a Mealy

machine M′ such that for an input a1a2 ... an if M outputs b0b1... bn,
M′ outputs b1 ... bn.

Proof. Let M = (K, Σ, Δ, δ, λ, q0). Then M′ = (K, Σ, Δ, δ, λ′, q0)

where λ′(q, a) = λ(δ(q, a)). It can be easily seen that M and M′ are
equivalent.
Example 5.6.

Converting the Moore machine in Figure 5.8 into Mealy machine

we get, the machine in Figure 5.10.

Figure 5.10. Mealy machine equivalent to the Moore machine in Figure

5.8

Theorem 5.4

Let M′ be a Mealy machine. Then, we can construct an equivalent

Moore machine M in the sense that if a1a2 ... an is the input for
which M′ outputs b1b2 ... bn, then M for the same input M′ outputs
b0b1 ... bn where b0 is chosen arbitrarily from Δ.

Proof. Let M′ = (K′, Σ, Δ, δ′, λ′, q′0) be a Mealy machine. The Moore

machine equivalent to M′ can be constructed as follows: M = (K, Σ,
Δ, δ, λ, q0) where K = K′ × Δ. Each state in K is an ordered pair [p,
x], p ∊ K′, x ∊ Δ. δ is defined as follows: δ([q, b], a) = [δ′(q, a), λ′(q,
a)] and λ([q, a]) = a. q′0 is taken as any one [q0, a], a ∊ Δ.

It can be seen that if on input a1a2 ... an the state sequence of M′

is q0q1 ... qn and the output sequence is b1b2 ... bn. Then in M, the
machine goes through the sequence of states [q0, a][q1, b1]
[q2, b2] ... [qn, bn] and emits ab1 b2 ... bn. (First output a should be
ignored).
Example 5.7.

Converting the example in Figure 5.9 to Moore machine, we get the

machine in Figure 5.11.

Figure 5.11. Moore machine equivalent to the Mealy machine in Figure

5.9

The output for each state is shown within a square.

Problems and Solutions

1. Show that {aibj|i, j ≤ 1, i ≠ j} is not regular.

Solution As in Example 5.3

.
am ≈ an m ≠ n

ambn ≈ anbn

ambn ∊ L but anbn ∉ L
Therefore, we have a contradiction and the language is not regular.

2. Which of the following languages are regular sets? Prove your answer.
1.
L1 = {xxR|x ∊ {0, 1}+}
2.
3.
L2 = {xwxR|x,w ∊ {0, 1}+}
4.
5.
L3 = {xxRw|x,w ∊ {0, 1}+}
6.

Solution 1.
. L1 is not regular.
2.
Suppose L1 is regular.
3.
Let n be the constant of the pumping lemma. Consider 0n110n. The pump will o
first n 0’s. So, we shall get strings 0m 110n ∊ L, m ≠ n which is a contradiction.
4.
5.
L2 is regular.
6.
L2 can be represented by the regular expression
7.
0(0 + 1)+ 0 + 1(0 + 1)+ 1.
8.
9.
L3 is not regular. Suppose L3 is regular, (01)n(10)n0 ∊ L. x = (01)n, w = 0.
10.
By Myhill-Nerode theorem, since the number of equivalence classes is finite.
(01), (01)2, (01)3,... all cannot belong to different equivalence classes. For some
n (01)m and (01)n will belong to the same equivalence class. We write this as (0
Let m < n. Because of the right invariance,

(01)m(10)m0 ≈ (01)n(10)m0.

Since (01)m(10)m0 ∊ L3, we conclude (01)n(10)m0 ∊ L3. But (01)n(10)m0 is not of
Hence, we arrive at a contradiction. Therefore, L3 is not regular.

3. Construct a Mealy machine with Σ = Δ = {0, 1}. The output is 1 whenever the
read are 1111. Overlapping sequences are accepted. Output is 0 otherwise.
Solution
.

Figure 5.12. Solution to Problem 3

4. Use Myhill-Nerode theorem to show the following language is not regular {0i1

Solution L = {0i1j | gcd(i, j) = 1} is not regular. Suppose L is regular. Consider the sets o
. {p1, p2, ... .}. This is an infinite set. Consider the set of strings 0P1, 0P2, 0P3,... . B
theorem, all of them cannot be in different equivalence classes. For some pi an
be in the same equivalence class.
0Pi ≈ 0Pj

0Pi1Pj ≈ 0Pj1Pj

0Pi1Pj ∊ L whereas 0Pj1Pj ∉ L.

Hence, we have a contradiction. L is not regular.

5. Minimize the following DFSA. Indicate clearly which equivalence class corres
of the new automaton.
a b

→ 1 6 3

2 5 6

4 5

3 2

5 2 1

6 1 4
Solution Splitting into non-final and final states , .
. Considering b successors of 1,2,5,6 is split into . Further split is not po
minimum state automaton is:
a b

→ A A B A corresponds to {1, 6}.

B C B corresponds to {3, 4}.

C C A C corresponds to {2, 5}.

Exercises
1. Construct a Moore machine with input alphabet Σ = {0, 1}. The output after reading
reminder when x, the number whose binary representation is x is divided by 3. Hence
Δ = {0, 1, 2} leading 0’s in x in the binary representation of x is allowed.

2. Construct a Mealy machine with Σ = Δ = {0, 1}. The output is 1 whenever the last fiv
contain exactly three 1’s and the 4th and 5th last symbols read are 11. After each sub
with two 1’s, analysis of the next string will not start until the end of this substring o
at the end 0 or 1 is output. For example, for input 11110100, output is 00000000. For
output is 00001000.

3. Find a Mealy machine with Σ = Δ = {0, 1} satisfying the following condition. The ou
the input at time t is the same as input at time t − 2.

4. Find a Mealy machine with Σ = Δ = {0, 1} satisfying the following condition: For ev
subsequence x3i+1x3i+2x3i+3 the output is x3i+3 if this substring consisted of 2 or 3 1’s.

5. Find a Mealy machine with Σ = {a, b, c, d, e} and Δ = {0, 1} satisfying the following
output is 1 if the last three symbols read are in alphabetical order i.e., abc, bcd or cde

6. Use Myhill-Nerode theorem to show the following languages are not regular.
1.
{anbcn|n ≥ 1}
2.
3.
{anbncn|n ≥ 1}
4.
5.
{anbn2|n ≥ 1}
6.
7.
(ww|w ∊ {0, 1}+}
8.
9.
{anbm|n ≤ m, n, m ≥ 1}
10.

7. Minimize the following DFSA. Indicate clearly which equivalence class corresponds

new automaton.
a b

→ 1 2 3

2 5 6

1 4

6 3

5 2 1

6 5 4

8. Given the following DFSAs. Construct minimum state DFSAs equivalent to them (Fi

Figure 5.13. State diagrams of DFSA in Exercise 8

9. Given two DFSAs, M1 and M2 (Figure 5.14). Prove that they are either equivalent or

Figure 5.14. State diagrams of DFSA in Exercise 9

10. Using pumping lemma or Myhill-Nerode theorem show that the following languages
1.
L1 = {www|w ε {a, b}*}
2.
3.
L2 = {a2n|n ≥ 0}
4.
5.
L3 = {w|w has equal number of 0’s and 1’s}
6.
7.
L4 = {x ε {a, b, c}*| length of x is a square}.
8.

11. Given a language L accepted by a DFSA, M1 the minimal DFSA accepting L and ano

which L(M2) = L, prove that the number of non-final states in the minimal machine M
or equal to the number of non-final states in M2.

Chapter 6. Variants of Finite
Automata
We have seen that DFSA, NFSA, and NFSA with ε-moves have the
same power. They accept the family of regular sets. In this chapter,
we consider two generalized versions of FSA. While one of them
accepts only regular sets, the other accepts sets, which are not
regular. We also have a discussion on probabilistic finite automata
and weighted finite automata. These two models have applications
in image analysis.

Two-Way Finite Automata

A two-way deterministic finite automaton (2DFSA) is a
quintuple M = (K, Σ, δ, q0, F) where K, Σ, q0, F are as in DFSA
and δ is a mapping from K × Σ into K × {L, R}.

The input tape head can move in both directions. The machine
starts on the leftmost symbol of the input in the initial state. At any
time, depending on the state and the symbol read, the automaton
changes its state, and moves its tape head left or right as specified
by the move. If the automaton moves off at the right end of the
input tape in a final state, the input is accepted. The input can be
rejected in three ways:

1.
moving off the right end of the input tape in a non-final state,
2.
3.
moving off the left end of the tape and
4.
5.
getting into a loop
6.
An instantaneous description (ID) of the automaton is a string in
Σ*K Σ*. If wqx is an ID, for the input wx and currently the
automaton is reading the first symbol of x in state q. If a1 ... ai −
1qaiai + 1 ... an is an ID and δ(q, ai) = (p, R) then the next ID is:

a1 ... ai−1 ai pai+1 ... an.

If δ(q, ai) = (p, L), then the next ID is:

a1 ... ai−2 pai−1aiai + 1 ... an.

If, from IDi the automaton goes to IDi + 1 in one move, this is
represented as IDi is the reflexive, transitive closure of
⊢.

Definition 6.1

The language accepted by a 2DFSA

M = (K, Σ, δ, q0, F) is defined as:

T(M) = {w|w ∊ Σ*, for some qf ∊ F}

Example 6.1.

Consider the following 2DFSA:

M = ({q0, q1, q2, q3}, {a, b}, δ, q0, {q2}) where δ is given by the

following table.

The string abb is accepted as follows:

or the sequence of IDs is:

q0abb ⊢ aq1bb ⊢ abq3b ⊢ aq2bb ⊢ abq2b ⊢ abbq2

It can be seen that any string abn, n ≥ 2, will be accepted.

ab will not be accepted.

Any string beginning with aa will not be accepted.

A string of the form abna ... cannot be accepted as:

T(M) = {abn|n ≥ 2}

We have represented in the previous example a useful picture of

the behavior of a 2DFSA. This consists of the input, the path
followed by the head, and the state; each time the boundary
between two tape squares is crossed, with the assumption that the
control enters its new state prior to moving the head. The sequence
of states below each boundary, between tape cells is termed as a
crossing sequence. It should be noted that if a 2DFSA accepts its
input, elements of crossing sequence may not have a repeated state
with the head moving in the same direction, otherwise, the
automaton being deterministic would be in a loop and thus, could
never move off the right end of the tape. Another point about
crossing sequences is that the first time a boundary is crossed, the
head must be moving right. Subsequent crossings must be in
alternate directions. Thus, odd-numbered elements of a crossing
sequence represent right moves and even-numbered elements
represent left moves. If the input is accepted, it follows that, all
crossing sequences are of odd length.

A crossing sequence is said to be valid if it is of odd length and

no two odd-numbered elements or two even-numbered elements
are identical. If the 2DFSA has n states, the maximum length of the
crossing sequence is therefore 2n − 1. We show that any set
accepted by a 2DFSA is regular by constructing a NFSA, whose
states correspond to valid crossing sequences of the 2DFSA.

To proceed further we define the following. Suppose we are given

an isolated tape cell holding a symbol a and are also given valid
crossing sequences

and at the left and right boundaries of the square.

We can test whether the two sequences are locally compatible as

follows: If the tape head moves left from the square holding a in
state qi, restart the automaton on the square in state qi+1. Similarly,
whenever the tape head moves right from the square in
state pj, restart the automaton on the cell in state pj+1. This way, we
can test the two crossing sequences for local consistency.

Definition 6.2

We define right matching and left matching pairs of crossing

sequences recursively using (i) – (v) below.
If and are consistent crossing sequences on the left and right

boundary of a cell containing a, right matches , if initially the

cell containing a is reached moving right. If it reaches the cell

initially moving left, then left matches . The five conditions are:

the null sequence left and right matches the null sequence.

2.
3.

If right matches and δ(q1,a) = (q2,L) then right matches .

4.
5.

If left matches and δ(q1,a) = (p1,R) then right matches .

6.
7.

If left matches and δ(p1,a) = (p2,R) then left matches .

8.
9.
If right matches and δ(p1,a) = (q1,L) then left matches .

10.

We find that 2DFSA accept only regular sets.

Theorem 6.1

If L is accepted by a 2DFSA, then L is a regular set.

Proof 1. Let M = (K, Σ, δ, q0, F) be a 2DFSA. We construct an

NFSA M′ which accepts T(M).

M′ = (K′, Σ, δ′, q0′, F′)

where:

K′ contains states which correspond to valid crossing sequences

of M.

2.
3.

q0′ = [q0], the crossing sequence of length one having q0.

4.
5.
F′ is the set of all crossing sequences of length one containing a
state in F. i.e., of the form [qf], qf ∊ F.

6.
7.

δ′(sc, a) = {sd|sc, sd are states corresponding to valid crossing

sequences c and d, respectively and c right matches d on a}.

We now show T(M) = T(M′).

T(M) ⊆ T(M′).

Let w be in T(M). Consider the crossing sequences generated by

an accepting computation of M on w. Each crossing sequence right
matches the one at the next boundary, so M′ can guess the proper
crossing sequences (among other guesses) and accept.

3.
4.

T(M)′ ⊆ T(M).

Let w be in T(M′). M′ has a sequence of

states s0, s1, ..., sn accepting w. Let c0, c1, ..., cn be the crossing
sequences of M corresponding to s0, ..., sn, respectively,
where w= a1...an.

δ′(si−1, ai) contains si, where each ci−1 right matches ci on ai.

We can construct an accepting computation of M on input w. We

prove by induction on i that M′ on reading a1...ai can enter

state si corresponding to crossing sequence only if:

M started in state q0 on a1 ... ai will first move right from position i in

state q1, and

2.
3.

for j = 2, 4,..., if M is started at position i in state qj, M will eventually

move right from position i in state qj+1. This means k is odd.

Basis
(i = 0). As s0 = [q0], (1) is satisfied since M begins its computation
by moving right from position 0 in state q0. Since M never moves
off at the left end boundary, j cannot be 2 or more. Condition (2) is
vacously satisfied.
Induction
Assume that the hypothesis is true for i − 1. M′ is in

state si−1 corresponding to ; after reading a1 ... ai−1 and in

state si corresponding to , after reading a1 ... ai; k and l are

odd. right matches on ai. It follows that there must be some

odd j such that in state qj on input ai, M moves right. Let j1 be the
smallest such j. By definition of “right matches,” it follows
that δ(qj1, ai) = (p1, R). This proves the condition (1). Also by the

definition of “right matches” (rule (iii)) left matches .

If δ(pj, ai) = (pj+1, R) for all even j, then condition (2) is satisfied. In
the case that for some smallest even j2, δ(pj2, ai) = (q, L), then, by

the definition of “left matches” (rule (v)) q must be qj1+1 and

right matches on ai. Now, the same argument can be

repeated for sequences and till condition (2) is

satisfied (which will eventually happen).

Now the induction hypothesis is established for all i. Noting the

fact that sn corresponding to crossing sequence cn is of the form
[qf] for some qf ∊ F, we realize that M accepts a1... an.

Example 6.2.
Let us consider the Example 6.1.1, where [q0], [q1], [q2], [q3] are
some of the valid crossing sequences among others. Let the
corresponding NFSA = (K′, {a, b}, δ′, [q0], [q2]), where K′
corresponds to all valid crossing sequences. Empty sequence right
and left matches the empty sequence.

[q0] right matches [q1] on a

[q1] right matches [q3] on b

[q2] right matches [q2] on b from the table. Since δ(q2, b) = (q2, R),

empty sequence left matches on b (rule (iv)). Since, δ(q1, b) =

(q3, R), [q1] right matches on b (rule (iii)) [q2] right matches

[q2] on b.

Since, δ(q3, b) = (q2, L), right matches [q2] on b (rule (ii)).

Though we can consider sequences of length upto seven, in this

example these sequences are enough. We get a NFSA as follows:

That 2DFSA accept only regular sets that can be proved in another
way using Myhill-Nerode theorem, without actually constructing
the crossing sequences or NFSA.

Proof 2. Let M = (K, Σ, δ, q0, F) be a 2DFSA. To show T(M) is

regular, consider the relation RT(M) defined in (iii) of Myhill-
Nerode theorem. For two strings x1 and x2, x1 RT(M)x2 if x1x is
in T(M) exactly when x2x is in T(M) for any x ∊ Σ*. It is clearly seen
that RT(M) is right invariant. If we prove RT(M) is of finite index,
then by the third part of the proof of Myhill-Nerode theorem, it will
follow that T(M) is regular.

Let K = {q0,..., qn−1} and p be a symbol not in K. For each

string w = a1...ak, define a function τw: K ∪ {p} → K ∪ {p} as
follows: For each state q in K, let τw(q) = q′, if M when started on
the rightmost symbol of w in state q, ultimately moves off the tape
at the right end, going into the state q′. Let τw(q) be p otherwise.
(i.e., M started on w does not move off the right end of the tape).
Let τ(p) = q′ if M started on the leftmost symbol of w, in state q0,
ultimately moves off the right end of the tape in state q′.
Otherwise τw(p) = p. (i.e., M started on w at the leftmost symbol in
state q0, does not move off the right end of the tape). Now, we can
verify that x1RT(M)x2 if τx1 = τx2. Consider the strings x1x and x2x.

Suppose M is started on x1x and moves into the first symbol of x in

state r. Then M started on x2x will move into the first symbol
of x in state r. This is because τx1 = τx2. The behavior of M on
the x portion is identical for both strings x1x and x2x.
When M crosses the boundary between x1 and x from right to left
in state s say, it will do so for x2x also. Now,
since τx1 = τx2M crosses the boundary between x1x and x2x from left
to right in the same state.

Thus, the crossing sequence at the boundary between x1 and x is

the same as the crossing sequence at the boundary
between x2 and x. Thus, if x1x is accepted, so will be x2x and vice
versa. The number of equivalence classes generated by RT(M) is at
most the number of different functions τw. Since there are atmost
(n + 1)(n+1) different functions possible for τw (As τw: K ∪ {p} → K ∪
{p}), the number of such functions and hence, the number of
equivalence classes of RT(M) are finite. Hence, T(M) is regular.

This proves T(M) is regular without actually showing the NFSA

accepting it.
Multihead Finite State Automata
One-way multihead finite automata have been studied in some
detail in the literature (Rosenberg, 1996). They have more
accepting power than FSA.

Definition 6.3

A one-way multihead non-deterministic finite

automaton (k − NFSA) is a device M = (k, K, Σ, δ, q0, $, F), where
k ≥ 1 is the number of input heads, K is the finite set of states, Σ is
the finite set of symbols called as alphabet, q 0 ∊ K is the initial
state, $ (not in Σ) is the right end-marker for the inputs, F ⊆ K is
the set of final states and δ is a mapping from K ∪ (Σ ∪ {$})k into
the set of subsets of K × {0, 1}k.

An input to M is a string a1a2 ... an of symbols in Σ delimited on the

right end by the special marker $. We can think of a1a2 ... an$ as
written on a tape (with each symbol occupying one tape square)
and M’s heads moving left to right on the tape. A move of M is
described as follows: Let M be in the state q with
heads H1, H2,..., Hk scanning symbols a1,..., ak (in Σ ∪ {$}),
respectively. Suppose δ(q, a1,..., ak) contains (p, d1,..., dk).
Then M may move each Hi, di squares to the right and enter
state p. An instantaneous description ID of M on input a1a2... an$
(ai ∊ Σ) is given by a (k + 2)-tuple (q, a1a2 ... an$, α1, α2, ..., αk),
where q is the state of M and αi is the distance (i.e., the number of
squares) of head Hi from a1: (Thus, 1 ≤ αi ≤ n+1, where αi = 1, n +1
correspond to Hi being on a1, $, respectively.) We define the
relation ⊢ between ID’s as follows: Write
(q, a1a2 ... an$, α1, α2, ..., αk)⊢ (q′, a1a2 ... an$, α′1, α′2, ..., α′k) if
from the first ID, M can enter the second ID by a single move. The
reflexive-transitive closure of ⊢ will be denoted by ⊢*.

A string a1a2 ... an is accepted (or recognized) by M if:

(q0, a1a2 ... an$, 1, ..., 1) ⊢* (q, a1a2 ... an$, n+1, ..., n +1)

for some accepting state q in F. Without loss of generality, we shall

assume that δ(q, $, ..., $) = φ, i.e., accepting states are halting
states.

We say a k – NFSA a simple one-way multihead finite automata,

denoted by k − SNFSA if at each transition only one head reads a
symbol and moves its head rightwards, i.e., δ is a mapping
from K ∪ (Σ ∪ {$})k into the set of all subsets of K × {0}m × 1 ×
{0}n where m + n + 1 = k, 0 ≤ m, n ≤ k − 1. We can easily show the
equivalence of both k − NFSA and k − SNFSA.

Example 6.3.

Consider the following 2-head FSA accepting L = {anbn|n ≥ 1}.

(Hence, we find non-regular sets are accepted.)

M = (2, K, Σ, δ, q0, $, F)

where K = {q0, q1, q2}, Σ = {a, b}, F = {q2}.

δ(q0, a, a) = {(q0, 1, 0)}

δ(q0, b, a) = {(q1, 1, 1)}

δ(q1, b, a) = {(q1, 1, 1)}

δ(q1, $, b) = {(q2, 0, 1)}

δ(q2, $, b) = {(q2, 0, 1)}

The sequence of IDs leading to the acceptance of aaabbb is given

below:

(q0, aaabbb$, 1, 1)
⊢(q0, aaabbb$, 2, 1)

⊢(q0, aaabbb$, 3, 1)

⊢(q0, aaabbb$, 4, 1)

⊢(q1, aaabbb$, 5, 2)

⊢(q1, aaabbb$, 6, 3)

⊢(q1, aaabbb$, 7, 4)

⊢(q2, aaabbb$, 7, 5)

⊢(q2, aaabbb$, 7, 6)

⊢(q2, aaabbb$, 7, 7)

It can be seen that {anbncn|n ≥ 1}, {wcw|w ∊ {a, b}*} can be

accepted by multihead FSA. But CFLs like the Dyck set cannot be
accepted by multihead FSA.

Probabilistic Finite Automata

Let = (x1, ..., xn) be an n-dimensional row stochastic vector, n ≥ 1.
Then, (i) = xi, 1 ≤ i ≤ n, 0 ≤ xi ≤ 1, Σxi = 1.

Definition 6.4

A finite probabilistic automaton over a finite alphabet V is an

ordered triple PA = (S, s0, M), where S = {s1, s2, ..., sn} is a finite
set with n ≥ 1 elements (the set of internal states), s0 is an n-
dimensional stochastic row vector (the initial distribution) and M is
a mapping of V into the set of n-dimensional stochastic matrices.
For x ∊ V, the (i, j)th entry in the matrix M(x) is denoted by
pj (si, x) and referred to as the transient probability of PA to enter
into the state sj, after being in the state si after reading the input x.
As an example, consider the following example PA1 = ({s1, s2}, (1,
0), M) over the alphabet {x, y} where:

The initial distribution indicates that s1 is the initial state. From the
matrices M, we see that the state s1 changes to s2 with a probability
of 1/2 on reading y. This can be indicated in a diagram, with the
states being the nodes and the arcs having labels in the form x(p),
where x is the symbol, while p is the probability of transition from
one node to the other on scanning the symbol x. (Figure 6.1)

Figure 6.1. State diagram of a probabilistic FSA

For a finite probabilistic automaton, we increase the domain

of M from V to V* as follows:


M (ε) = M


M (wx) = M (w)M (x)

Now, for a word w, the (i, j)th entry would denote the probability
that the automaton would move to state sj, if it were initially in
state si.

Definition 6.5

Let PA = ({s1, ..., sn}, s0, M) be a finite probabilistic automaton over

V and w ∊ V*. The stochastic row vector s0M(w) is termed the
distribution of states caused by the word w, and is denoted by
PA(w).

We notice that for a word w, the ith entry of PA(w) is the

probability that the automaton is in state si starting from the initial
distribution s0.

Definition 6.6

Let be a n-dimensional column vector, each component of which

equals either 0 or 1, and the PA as in the previous definition. Let η
be a real number such that 0 ≤ η ≤ 1. The language accepted in PA
by with the cut-point η is defined by:

A language L is η-stochastic for some η.

Clearly there is a one-to-one correspondence between the

subsets S1 of the set {s1, ... , sn} and the column vectors . To each
subset S1, there corresponds a column vector whose ith
component equals 1 if and only if si ∊ S1. The language represented
by the column vector in PA with cut point η consists of all
words w such that the sum of the components corresponding to
the states of S1 in the distribution PA(w) is greater than η.

For the example given earlier (Figure 6.1),

PA(xn) = (1, 0)(M(x))n = (1, 0) if n is e

= (0, 1) if n is odd

if w contains at least one y

Thus, if we take

i.e., s2 is a final state and s1 is not, for the language accepted

is Σ* — (x2)*; for the language accepted is x(x2)*.

DFSA can be considered as a special case of finite probabilistic

automata. If the state set is {q0, ... , qn−1} with q0 as the initial state,
initial distribution s0 is (1, 0, ... , 0). For a DFSA M = (K,
Σ, δ, q0, F), a PA can be constructed on the alphabet Σ as
follows: PA = (K, s0, M) Mx(i, j) = 1 if δ(qi, x) = qj for x ∊ Σ.

Hence, the languages accepted by the PA with cut point η for any 0

< η < 1 is the same as the language accepted by the DFSA M.

Hence, we have the following theorem.

Theorem 6.2

Every regular language is stochastic. Further, every regular

language is η-stochastic for every 0 ≤ η < 1.

Consider a PA over {a, b} and K = {q1, q2, q3}.

Initial distribution π is (1, 0, 0). ST = (1, 0, 0)

The strings with probability η = 0.25 are a2, b2na2.

The converse of the above theorem is not true (Salomaa, 1969).

Theorem 6.3

Every 0-stochastic language is regular.

Proof. Let L = L(PA, , 0) where PA = (S, s0, M) is a finite

probabilistic automaton over Σ. is an n-dimensional column
vector whose components are 0’s and 1’s. We construct a set of
NFSAs M1, ... , Mk such that L(PA, , 0) = T(M1) ∪ T(M2)∪ ...
∪T(Mk). Suppose S = {q1, ... , qn}, s0 = (a1, ... , an).
Let ai1, ... , aik be the non-zero components of s0.

For each aij, 1 ≤ j ≤ k, NFSA Mj is constructed as follows:

Mj = (K, Σ, δ, qij, F)

where F corresponds to those elements in which are 1’s, δ(qi,a)

contains qj if Mij(a) is greater than 0. Here Mj is a NFSA with initial
state qij and final states corresponding to elements of which are
1’s. δ mapping is the same for all Mj’s (only the initial state differs).
If there is a transition in the PA with non-zero probability, that is
kept in the NFSA.

It is straight forward to see L(PA, S1, 0) = T(M1) ∪...∪ T(Mk).

Similar to regular grammars, regular grammars with probability

can be defined, which are counterparts of stochastic finite
automata.
Stochastic finite automata have application in syntactic pattern
recognition. Consider the following probabilistic automata:

The probabilistic automata given in the above figure accept the

quadrilaterals given below. (Figure 6.2)

Figure 6.2. Squares and rhombuses generated by PA

Figure (a) is a square while (b), (c), (d) are rhombuses.

Square will be accepted with more probability than the rhombuses.

Weighted Finite Automata and

Digital Images
In this section we consider a variant of finite automata, which is
called weighted finite automata (WFA). We give some basic
definitions and notations for WFA and the representation of digital
images using WFA.

A digital image of finite resolution m × n consists of m × n pixels

each of which is assigned a value corresponding to its color or
grayness. In this section, we consider only square images of
resolution 2n × 2n.

The 2n × 2n pixels can be considered to form a bound square on a

two dimensional space with x and y orthogonal axes. Thus, the
location of each of the 2n × 2n pixels can be specified by a tuple
(x, y) representing its x and y coordinates. Hereafter we will call
this tuple as the address of the pixel. The address tuple (x, y) is
such that x ∊ [0, 2n − 1] and y ∊ [0, 2n − 1]. Hence, we can specify
the x (y) co-ordinate as an n-bit binary number.

In our representation, the address of any pixel at (x, y) is specified

as a string w ∊ Σn, where Σ = {0, 1, 2, 3}. If the n-bits representation
of x co-ordinate is xn−1 xn−2 ... x1x0 and the n-bits representation
of y co-ordinate is yn−1yn−2 ... y1y0, then the address
string w = an−1an−2... a1a0 such that ai ∊ Σ and ai = 2xi + yi, ∀i ∊ [0,
2n − 1]. The addresses of the pixels of a 4 × 4 image are as shown
in Figure 6.3.

Figure 6.3. Addressing the subsquares

Another way of getting the same address is to consider a unit

square whose subsquares are addressed using 0, 1, 2, 3 as shown
in Figure 6.3 and the address of the subsubsquare as a
concatenation of the address of the subsquare and the
subsubsquare. For example, the address of the darkened square
in Figure 6.3 would be 3021. Its co-ordinates are (10, 9) equal to
(1010, 1001) in binary. Putting ai = 2xi + yi we get 3021.
Finite Automata and Black-White Images
We give here the representation of black-white images using finite
automata (Culik and Kari, 1997).

In order to represent a black-white square image using a finite

state automaton, we specify a Boolean function f: Σ → {0, 1} such
that f(w) = 1 if the value of the pixel addressed by w is black
and f(w) = 0 if the value of the pixel addressed by w is white
for w ∊ Σ = {0, 1, 2, 3}.

Definition 6.7

The FSA representing the 2n × 2n resolution black-white image is a

non-deterministic FSA, A = (K, Σ, δ, I, F) where

Q is a finite set of alphabet {0, 1, 2, 3}.




Σ is a finite set of states.



δ is a transition function as defined for a non-deterministic FSA.




I ⊆ K is a set of initial states. This can be equivalently represented

as single initial state q0 with ε transitions to all states in I.




F ⊆ K is a set of final states.


The language recognized by A, L(A) = {w|w ∊ Σn, f(w) = 1} i.e., the
language recognized by the FSA consists of the addresses of the
black pixels.

For example, consider the image shown in Figure 6.4. The

addresses of the black squares form a language L = {0, 3}*1{0, 1, 2,
3}*. Thus, the FSA representing this image is an FSA, which
accepts the language L as shown in Figure 6.4.

Figure 6.4. Finite state automaton for a triangle

Definition 6.8

Another way to represent the same non-deterministic FSA of m

states is as:

1.
a row vector IA ∊ {0, 1}1 × m called the initial distribution ( if q is
an initial state, 0 otherwise.).

2.
3.

a column vector FA ∊ {0, 1}m × 1 called the final distribution (

if q is a final state, 0 otherwise.).

4.
5.

a matrix ∊{0,1}m×m, ∀a ∊ Σ called the transition matrix (

if q ∊ δ(p, a), 0 otherwise.).

This FSA, A, defines the function f: Σn → {0, 1} by:

where the operation ‘·’ indicates binary multiplication.

We will describe an FSA using diagrams, where states are circles

and the transitions are arcs labeled with the alphabet. The initial
and final distributions are written inside the circles for each state
(see Figure 6.4).

Next we define a finite state transducer, FST, which represents a

transformation such as rotation, translation, etc., on a black-white
image. A finite state transducer represents a transformation from
an alphabet Σ1 to another alphabet Σ2.
Definition 6.9

An m state finite state transducer FST from an alphabet Σ1 to an

alphabet Σ2 is specified by:

a row vector I ∊ {0, 1}1 × m called the initial distribution;

2.
3.

a column vector F ∊{0, 1}m × 1 called the final distribution and

4.
5.

binary matrices Wa,b ∊ {0, 1}m × m ∀a ∊ Σ1 and b ∊ Σ2 called

transition matrices.

6.
Here, we consider only FSTs from Σ to Σ, Σ = {0, 1, 2, 3}. In order
to obtain a transformation of an image, we apply the
corresponding FST to the FSA representing the image to get a new
FSA. The application of an n state FST to an m state FSA A =
(IA, FA, , a ∊ Σ) produces an n × m state FSA B = (IB, FB, , b ∊
Σ) as follows:

IB = I ⊗ IA

FB = F ⊗ FA

, ∀ b ∊ Σ
where the operation ⊗ is the ordinary tensor product of matrices.
If T and Q are two matrices of size s × t and p × q then

We represent the FST in a diagram similar to an FSA. The values in

the circles denote the initial and final distributions and the
transitions are labeled as ‘a/b’ as shown in Figure 6.5.

Figure 6.5. Example of a finite state transducer

Weighted Finite Automata and Gray-Scale

Images
In this subsection, we present the basic definitions related to the
WFA and give the representation of gray-scale images using the
WFA.

Definition 6.10

A weighted finite automaton M = (K, Σ, W, I, F) is specified by:

K is a finite set of states.

2.
3.

Σ is a finite set of alphabets.

4.
5.

W is the set of weight matrices Wα: K × K → ℜ for all α ∊ Σ ∪ {ε},
the weights of edges labeled α.

6.
7.

I: K → (-∞, ∞), the initial distribution.

8.
9.

F: K → (-∞, ∞), the final distribution.

10.
Here Wα is an n × n matrix, where n = | K |. I is considered to be
an 1 × n row vector and F is considered to be an n × 1 column
vector. When representing the WFAs as a figure, we follow a
format similar to FSAs. Each state is represented by a node in a
graph. The initial distribution and final distribution of each state is
written as a tuple inside the state. A transition labeled α is drawn
as a directed arc from state p to q if Wα (p, q) ≠ 0. The weight of
the edge is written in brackets on the directed arc. For an example
of WFA, see Figure 6.6. We use the notation Iq(Fq) to refer to the
initial(final) distribution of state q. Wα (p, q) refers to the weight of
the transition from p to q. Wα (p) refers to the pth row vector of the
weight matrix Wα. It gives the weights of all the transitions from
state p labeled α in a vector form. Also Wx refers to the
product Wα1 · Wα2 ... Wαk where x = α1α2 ... αk.
Figure 6.6. Example: WFA computing linear grayness function

Definition 6.11

A WFA is said to be deterministic if its underlying FSA is

deterministic.

Definition 6.12

A WFA is said to be ε-free if the weight matrix Wε = 0 where 0 is

the zero matrix of order n × n.

Hereafter, whenever we use the term WFA, we refer to an ε-free

WFA only unless otherwise specified.

A WFA M as in Definition 6.10 defines a function f: Σ* → ℜ, where

for all x ∊ Σ* and x = α1α2 ... αk,

f(x) = I· Wα1 · Wα2 · ... · Wαk · F

where the operation ‘·’ is matrix multiplication.

Definition 6.13

A path P of length k is defined as a

tuple (q0 q1 ... qk, α1 α2 ... αk) where qi ∊ K, 0 ≤ i ≤ k and αi ∊ Σ, 1
≤ i ≤ k such that αi denotes the label of the edge traversed while
moving from qi−1 to qi.

Definition 6.14

The weight of a path P is defined as:

W(P) = Iq0 · Wα1 (q0, q1)· Wα2 (q1, q2)· ... · Wαk (qk−1, qk)· Fqk·

The function f: Σ* → ℜ represented by a WFA M can be

equivalently defined as follows:

Definition 6.15

A function f: Σ* → ℜ is said to be average preserving if

for all w ∊ Σ* where m = | Σ |.

Definition 6.16

A WFA M is said to be average preserving if the function that it

represents is average preserving.

The general condition to check whether a WFA is average

preserving is as follows. A WFA M is average preserving if and
only if:

where m =| Σ | .
We also consider the following definitions:

Definition 6.17

A WFA is said to be i-normal if the initial distribution of every state

is 0 or 1 i.e., Iqi = 0 or Iqi = 1 for all qi ∊ K.

Definition 6.18

A WFA is said to be f-normal if the final distribution of every state

is 0 or 1 i.e. Fqi = 0 or Fqi = 1 for all qi ∊ K.

Definition 6.19

A WFA is said to be I-normal if there is only one state with non-

zero initial distribution.

Definition 6.20

A WFA is said to be F-normal if there is only one state with non-

zero final distribution.

REPRESENTATION OF GRAY-SCALE IMAGES

A gray-scale digital image of finite resolution consists of 2m ×
2m pixels, where each pixel takes a real grayness value (in reality
the value ranges as 0, 1, ... , 256). By a multi-resolution image, we
mean a collection of compatible 2n × 2n resolution images for n ≥ 0.
Similar to black and white images, we will assign a word x ∊
Σk where Σ = {0, 1, 2, 3} to address each pixel. A word x of length
less than k will address a subsquare as in black and white images.
Then, we can define our finite resolution image as a function fI:
Σk → ℜ, where fI(x) gives the value of the pixel at address x. A
multi-resolution image is a function fI: Σ* → ℜ. It is shown in that
for compatibility, the function fI should be average preserving i.e.,

A WFA M is said to represent a multi-resolution image if the

function fM represented by M is the same as the function fI of the
image.

Example 6.4.

Consider the 2-state WFA shown in Figure 6.6.

Let I = (1, 0) and , and the weight matrices are ,

The we can calculate the values of pixels as follows. f(03) = sum of

weights of all paths labeled 03.
Similarly for f (123) we have . The image obtained
by this WFA are shown for resolutions 2 × 2, 4 × 4 and 128 × 128
in Figure 6.6.

The RGB format of color images are such that the image in pixel
form contains the red, green, and blue values of the pixels.
Analogous to gray-scale images, we can use the WFA to represent a
color image by extracting the red, green, and blue pixel values and
storing the information in a WFA.

Inferencing and De-Inferencing

We described how every digital gray-scale multi-resolution image
can be represented by an average-preserving WFA. Algorithms are
given for both converting a WFA into a digital image and for
inferencing the WFA representing a digital image.
The WFA consists of an I1 × n row vector, Fn × 1 column vector
and W0n×n, W1n×n, W2n×n, and W3n×n weight matrices.

DE-INFERENCING
Assume, we are given a WFA M, (I, F, W0, W1, W2, W3), and we
want to construct a finite resolution approximation of the multi-
resolution image represented by M. Let the image to be
constructed be I of resolution 2k × 2k. Then, for all x ∊ Σk, we have to
compute f (x) = I · Wx · F. The algorithm is as follows. The
algorithm computes φp (x) for p ∊ Q for all x ∊ Σi, 0 ≤ i ≤ k.
Here, φp is the image of state p.

ALGORITHM: DE_INFER_WFA

Input : WFA M = (I, F, W0, W1, W2, W3).

Output : f(x), for all x ∊ Σk.

begin

Step 1 : Set φp (ε) → Fp for all p ∊ Q

Step 2 : for i = 1, 2, ... , k, do the following

begin

Step 3 : for all p ∊ Q, x ∊ Σi−1 and α ∊ Σ compute

end for

Step 4 : for each x ∊ Σk, compute

Step 5 : stop.

end

The time complexity of the above algorithm is O(n24k), where n is

the number of states in the WFA and 4k = 2k · 2k is the number of
pixels in the image. We know that f(x) can be computed either by
summing the weights of all the paths labeled x or by
computing I · Wx · F. Finding all paths labeled of length k takes k ·
(4k)n time. Since n ≫ k, we prefer the matrix multiplication over
this.

INFERENCING
Let I be the digital gray-scale image of finite resolution 2k × 2k. In
(Culik and Kari, 1994), an iterative algorithm is proposed to obtain
the WFA M representing the image I. It is also shown in (Culik and
Kari, 1994), that the WFA so obtained is a minimum state WFA.
The inference algorithm is given below. In the algorithm
Infer_WFA, N is the index of the last state created, i is the index of
the first unprocessed state, φp is the image represented by
state p, fx represents the sub-image at the sub-square labeled x,
while favg(x) represents the average pixel value of the sub-image at
the sub-square labeled x, and γ: Q → Σ* is a mapping of states to
sub-squares.
ALGORITHM INFER_WFA

Input : Image I of size 2k × 2k.

Output : WFA M representing image I.

begin

Step 1 : Set N → 0, i ← 0, Fq0 ← favg(ε), γ(q0) ← ε.

Step 2 : Process qi, i.e., for x = γ(qi) and each α ∊ {0, 1, 2, 3} do

begin

Step 3 : If there are c0, c1, ... , cN such that

fkα = c0φ0 + c1φ1 + ... + cNφN where φj = fγ(qj)

for 0 ≤ j ≤ N then set Wα(qi, qj) ← cj for 0 ≤ j ≤ N.

Step 4 : else set γ(qN+1) ← xα, FqN+1 ← favg(xα), Wα(qi, qN + 1) ← q

and N ← N + 1.

end for

Step 5 : Set i ← i + 1 and goto Step 2.

Step 6 : Set Iq0 ← 1 and Iqj ← 0 for all 1 ≤ j ≤ N.

end

Example 6.5.

Consider the linearly sloping ap-function f introduced in Example

6.4. Let us apply the inference algorithm to find a minimal ap-WFA
generating f.

First, the state q0 is assigned to the square ε and we define .

Consider then the four sub-squares 0, 1, 2, 3. The image in the sub-
square 1 can be expressed as (it is obtained from the original
image by decreasing the gray-scale by ) so that we
define W1 (q0, q0) = 0.5.

The image in sub-square 0 cannot be expressed as a linear

combination of f so that we have to use a second state q1.

Define W0 (q0, q1) = 1 and (the average grayness of sub-

square 0 is ). Let f1 denote the image in sub-square 0 of f.

The image in sub-square 3 is the same as in sub-square 0, so

that W3 (q0, q1) = 1. In the quadrant 2 we have an image which can

be expressed as . We define and W2 (q0, q1) =2.

The outgoing transitions from state q0 are now ready.

Consider then the images in the square 00, 01, 02 and 03. They
can be expressed as , , and . This gives
us the ap-WFA shown in Figure 6.7 The initial distribution is (1, 0).

Figure 6.7. Example: Inference of WFA

It is shown in Culik and Kari (1997) that in Algorithm Infer_WFA,

each state is independent of the other state, i.e., φi, where 1
≤ i ≤ n cannot be expressed as a linear combination of other states.

c1φ1 + c2φ2 + ... + cn φn = 0

implies that ci = 0 for all 1 ≤ i ≤ n. Hence, the WFA obtained by

Algorithm Infer_WFA is a minimum state WFA.

Now consider Step 3 of Algorithm Infer_WFA. This step asks for

finding c0, c1, ... , cn such that fα = c0φ0 + c1φ1 + c2φ2 + · · · + cnφn,
i.e., to express the sub-image of the sub-square xα as a linear
combinations of the images represented by the states so far
created. Let the size of the sub-image be 2k × 2k. Then, the above
equation can be re-stated as follows:

Ixα,2k×2k = c0 · I0,2k×2k + c1 · I1,2k×2k + ··· + cn · In,2k×2k

where Ii,2k×2k is the 2k × 2k image represented by the

state qi and Ixα,2k × 2k is the 2k × 2k sub-image at the sub-square
addressed by xα. The equations can be rewritten as follows:

c0I0(1, 1) + c1I1(1, 1) + ··· + cnIn(1, 1) =

c0I0(1, 2) + c1I1(1, 2) + ··· + cnIn(1, 2) =

⋮ =

c0I0(1, 2k) + c1I1(1, 2k) + ··· + cnIn(1, 2k) =

⋮ =

c0I0(2k,1) + c1I1(2k, 1) + ··· + cnIn(2k, 1) =

c0I0(2k, 2) + c1I1(2k, 2) + ··· + cnIn (2k, 2) =

⋮ =

c0I0(2k, 2k) + c1I1(2k, 2k) + ··· + cnIn(2k, 2k) =

This is nothing but a set of linear equations. It can be rewritten in

matrix form as A · C = B, where A is a 4k × (n +1) matrix where
the ith column represents the 2k × 2k = 4k pixels of the
image Ii represented by state qi. C is an (n +1) × 1 column vector of
the coefficients. B is a 4k × 1 vector containing the pixels of the
image Ixα. Thus, Step 3 reduces to solving a set of linear equations.

One well-known method of attacking this problem is using the

Gaussian elimination technique. But for using this technique, the
rank of the matrix A should be min (4k, n). But in the general
case, rank (A) is found to be ≤ min (4k, n). Hence, this method
cannot be used in our case.
Another standard method for solving a set of linear equations is
singular value decomposition. This method not only gives the
solution, if it exists, but also in the case of non-existing solution,
gives us the least mean square approximate solution. The
computed coefficients are such that ||B − AC|| is minimum,
where .

APPROXIMATE REPRESENTATION
In image processing applications, it is not always required that the
images be exactly represented. We can see that by introducing an
error parameter in the above algorithm, the number of states
required for representation can be reduced. While solving the
linear equations in Step 3 of Algorithm Infer_WFA, we get a
solution with least mean square error. We can accept the solution if
this error is greater than a positive quantity δ. This way we can
represent the image approximately with a smaller automaton.

COMPRESSION
In a gray-scale image, eight bits are required per pixel. If the
resolution is 512 × 512, 512 × 512 bytes are required to store the
image. If the number of states is less in a WFA, it can be stored
with lesser file space than the image, and we get a compressed
representation of the image. It is not difficult to see that any WFA
can be made f-normal; further it can be made I-normal since we
need not bother about representing images of size 0. Further it was
observed that in most cases, the weight matrices obtained are
sparse. Hence, it is enough to store only the weights on the edges
in a file. This may help to keep an image in compressed form.

OBSERVATION
The inference algorithm applied to four images of size 256 × 256 is
given in Figure 6.8. It shows the compression obtained for the
images in cases where the error factor was equal to 0%, 5%, 10%,
and 15%. The re-constructed images are also shown in Figure 6.8.
It can be observed that while error up to 10% does not disturb the
figure much, using error of 15% distorts the figure pretty badly.
Also, it was observed that the number of states obtained depends
on the regularity of the image itself.

Figure 6.8. Some pictures and their reconstructed images

The image can perhaps be further compressed by a smarter way of

storing the WFA in a file. Currently for each edge, four bytes are
needed to store the weight (type float). On an average, an n state
WFA has edges. The number of bytes used to store
an n state WFA is 4(2n2) = 8n2. In order to obtain compression of
say 50% for an image of size 2k × 2k,
For a 256 × 256 image, n should be less than 64 in order to obtain
any good compression.

Transformations on Digital Images

Next we give the basic definitions related to the weighted finite
transducer (WFT) and show how transformations on the n-WFA,
representing the gray-scale and color images, can be done using
the WFTs. We also show how 3-dimensional objects can be
represented using the FSA and how transformations such as
scaling, translation, and rotation can be performed on the 3-
dimensional object using the finite state automaton representing
the 3-dimensional object and the finite state transducer.

WEIGHTED FINITE TRANSDUCERS

Almost every transformation of an image involves moving (scaling)
pixels or changing grayness (color) values between the pixels.
These image transformations can be specified by WFT.

Definition 6.21

An n-state weighted finite transducer M from an alphabet Σ = {0, 1,

2, 3} into the alphabet Σ is specified by:

weight matrices Wa, b ∊ ℜn×n for all a ∊ Σ ∪ {ε} and b ∊ Σ ∪ {ε};

2.
3.

a row vector I ∊ ℜ1×n, called the initial distribution and

4.
5.
a column vector, F ∊ ℜn×1, called the final distribution.

6.
The WFT M is called ε-free if the weight
matrices Wε,ε, Wa,ε and Wε,b are zero matrices for all a ∊ Σ and b ∊
Σ.

Definition 6.22

The WFT M defines a function fM: Σ* × Σ* → ℜ, called weighted

relation between Σ* and Σ* by:

fM(u,υ) = I · Wu,v · F, for all u ∊ Σ, υ ∊ Σ

where

Equation 6.1.

if the sum converges. (If the sum does not converge,

fM(u,υ) remains undefined).

In Equation (6.1), the sum is taken over all decompositions of u

and υ into symbols ai ∊ Σ ∪ {ε} and bi ∊ Σ ∪ {ε}, respectively.

In the special case of ε-free transducers,

fM(a1a2 ... ak, b1b2 ... bk) = I · Wa1,b1 · Wa2,b2 · ... · Wak,bk · F,

for a1a2 ... ak ∊ Σk, b1b2 ... bk ∊ Σk, and fM (u, υ) = 0, if | u |≠ | υ |.

We recall that WFA defines a multi-resolution function f: Σ* →

ℜ where for all x ∊ Σ* and x = α1, α2, ... αk,

f(x) = Σ(I · Wα1 · Wα2 · ... · Wαk · F)

where the summation is over all possible paths for x and the
operation ‘·’ is the matrix multiplication.

Definition 6.23

Let ρ : Σ* × Σ* → ℜ be a weighted relation and f : Σ* → ℜ a multi-

resolution function represented by a WFA. The application of ρ to f
is the multi-resolution function g = ρ(f) : Σ* → ℜ over Σ defined by:

Equation 6.2.

provided the sum converges. The application M(f) of WFT M to f is

defined as the application of the weighted relation fM to f, i.e. M(f)
= fM(f).

Equation (6.2) defines an application of a WFT to an image in the

pixel form. When the image is available in the WFA-compressed
form, we can apply a WFT directly to it and compute the re-
generated image again from the transformed WFA.
Here, we define the application of the ε-free n-state WFT to a
WFA. The application of an ε-free n-state WFT M to an m-state
WFA Γ over the alphabet Σ specified by initial distribution IΓ, final
distribution FΓ and weight matrices , α ∊ Σ, is the mn-state WFA
Γ′ = M(Γ) over the alphabet Σ with initial distribution IΓ′ = I ⊗ IΓ,
final distribution FΓ′ = F ⊗ FΓ and weight matrices.

Here, ⊗ denotes the ordinary tensor product of matrices (called

also Kronecker product or direct product), defined as follows:
Let T and Q be matrices of sizes s × t and p × q, respectively. Then,
their tensor product is the matrix

of size sp × tq.

Clearly fΓ′ = M(fΓ), i.e., the multi-resolution function defined by Γ′

is the same as the application of the WFT M to the multi-resolution
function computed by the WFA Γ.

We note that every WFT M is a linear operator from ℜΣ* → ℜΣ* . In

other words,

M (r1f1 + r2f2) = r1 M (f1) + r2 M (f2),

for all r1, r2 ∊ ℜ ft and f1, f2 : Σ* → ℜ. More generally, any weighted

relation acts as a linear operator.

We give procedures to do transformations such as scaling,

translation, and rotation on gray-scale and color images. We give
constructions for scaling up the gray-scale image by a factor of 4,
scaling down the gray-scale image by a factor of 4, translation of
the gray-scale image by units of 2, units of 4, units of 1/2 and units
of 1/4 of the gray-scale square image. We also illustrate with
examples the above-mentioned transformations on gray-scale and
color images.
SCALING
We define how the operation of scaling can be performed on the
WFA representing gray-scale images. Consider a gray-scale image
of resolution 2n × 2n. We consider the scaling with respect to the
center of the image. So, the co-ordinate axes are shifted to the
center of the image. Thus, the new co-ordinate axes are x′ = x −
2n−1 and y′ = y − 2n−1.

The operation of scaling by a factor k is defined as follows:

The operation of scaling up by a factor of 4 and scaling down by a

factor of 1/4 is illustrated in Figure 6.9. It is seen that the sub-
square addressed as 033 in the original gray-scale image becomes
the bigger sub-square 0 in the scaled version of the gray-scale
image. Similarly, the sub-squares 122, 211, and 300 in the original
image are scaled up to form the sub-squares 1, 2 and 3 in the scaled
version of the gray-scale image. The WFA for the scaled version of
the gray-scale image can be obtained from the WFA representing
the original gray-scale image by introducing a new initial state q
′0 which on reading a 0 (1, 2, 3) makes a transition to the states
reachable from the initial states of the original WFA by a path
labeled 033 (122, 211, 300). The formal construction can be given
in a straight-forward manner.
Figure 6.9. Scaling of a gray-scale image.

The color images in RGB format are stored in 24 bits per pixel with
one byte each for the value of the three primary colors, namely:
red, green, and blue. In representing the color image using
the WFA, we use three functions one each corresponding to the
three primary colors. So, the transformation of scaling for color
images is same as that mentioned for the gray-scale images except
that the WFA corresponding to the color image, defines three
functions one each for the three primary colors. We illustrate the
scaling up of the color image by a factor of 4 and the scaling of the
color image by a factor of using an example in Figure 6.10.

(See color plate on page 417)

Figure 6.10. Scaling of the color image

TRANSLATION
We define how translations can be performed on WFA
representing the gray-scale images using the WFT.

Let Γ be a WFA representing a gray-scale image of resolution 2n ×

2n. Suppose that we want to translate the image from left to right,
along the x-axis by one pixel, then the image wraps around on
translation i.e., the pixels in the (2n − 1)th column are moved to the
0th column. This is equivalent to adding 1 to the x co-ordinate of
the pixels. For example, if the n-bit x co-ordinate of the pixel
is w01τ, then the x co-ordinate of this pixel after the translation
would be w10τ, 0 ≤ r ≤ n − 1. The y co-ordinate of the pixel remains
unchanged. The WFT for this translation is given in Figure 6.11.
The WFA, Γ′ representing the translated image can be obtained
from the WFA, Γ by applying the WFT to Γ.

Figure 6.11. WFT for translation

As mentioned earlier, the color images are represented by a WFA

using three functions, one each for the three primary colors,
namely red, green, and blue. So, in order to perform a translation
on the color image, we have to apply the corresponding WFT on all
the three functions represented by the WFA. We illustrate the
operation of translation on color images with the following
example in Figure 6.12.
(See color plate on page 417)

Figure 6.12. Translation of color image by 1/2 and 1/4 of square

Rotation also can be done by WFT though the rotated image maybe
scaled down by a factor.

REPRESENTATION OF 3-DIMENSIONAL OBJECTS

We show how a 3-dimensional object can be represented as an FSA
using an alphabet, Σ = {0, 1, 2, 3, 4, 5, 6, 7}. The construction for
obtaining the images of the projections of the 3-dimensional object
onto the three co-ordinate planes and the construction to obtain
the 3-dimensional object from its projections are given in
Ramasubramanian and Krithivasan (2000). We see how
transformations such as scaling, translation, and rotation can be
performed on 3-dimensional objects using the finite state
transducers and the FSA representing the object.

A solid object is considered to be a 3-dimensional array. Hence,

any point in the solid can be addressed as a 3-tuple (x, y, z). In
order to represent the 3-dimensional solid object in the form of
a FSA, we have to extend the alphabet set Σ, of the FSA to Σ = {0, 1,
2, 3, 4, 5, 6, 7}. Now, any string w ∊ Σn gives the address of a point
in the 3-dimensional space of size 2n × 2n × 2n enclosing the solid
object. If the bit representation of the x co-ordinate
is xn−1 xn−2... x1x0, y co-ordinate is yn−1yn−2 ... y1y0, z co-ordinate
is zn− 1zn−2 ... z1z0, then the address of the point is the
string w = an−1an−2 ... a1a0 such that

ai = 4xi + 2yi + zi, ∀i ∊ [0, n − 1]

Figure 6.13 shows how the 8 sub-cubes of a cube are addressed.

The figure also gives the FSA which generates the right-angled
prism.
Figure 6.13. Addressing scheme and an example automaton

Note that whenever we say size of the object, we refer to the 3-

dimensional space enclosed by the 3-dimensional object.

Operations like scaling, translation, and rotation can be performed

on these 3-dimensional objects using finite state transducers. If the
object is solid and convex, in a systematic manner we can find the
projections on the three planes. In the other direction, if we are
given the projections of an object on the three planes, we can
construct the solid object from them. The details are beyond the
scope of this book.

Problems and Solutions

1. Consider the following 2DFSA. Give two strings in T(M) and two strings not i
moves of M.

What is T(M)?

Solution Two strings accepted (i) a b a a b (ii) a a b

Two strings rejected

T(M) consists of strings where two b’s do not occur consecutively.

2. Consider the PA over the alphabet {a, b} with K = {q1, q2}. Initial distribution

and .

Find π M(x) where x ∊ {a2, b2, ab, a2b, b3a2}.

Solution
π M (a2) = 0.9375,
.
π M (b2) = 0.25,

π M (ab) = 0.1250,

π M (a2b) = 0.0313,

π M (b3a2) = 0.9609.

3. Consider the PA over the alphabet {a, b} with K = {q1, ..., q6}. Initial distribut
and S = (0, 0, 0, 1, 0, 0).

Find π M(x) where x ∊ {ab, a2b2, bab, b2ab3, b4ab}. Also, find the language ac

point η = 0.25.

Solution
π M(ab) = 0,
.
πM(a2b2) = 0,

πM(bab) = 0.0625,

πM(b2ab3) = 0.0098,

πM(b4ab) = 0.1855.
For L(G) = {bna|n ≥ 1} the cut point η is 0.25.

4. Construct FSA over the alphabet {0, 1, 2, 3} to represent the following black a

Solution
.

Exercises
1 Consider the following 2DFSA. Give two strings in T(M) and two strings not in T(M)
. of M.
2 Show that adding the capability of 2DFSA to keep its head stationary (and change stat
. not give additional power.

3 Let M be a 2DFSA. T(M) is the set of strings which make M move off at the right end
. know T(M) is a regular set. Let Tnf(M) be the set of strings which make M move off at
non-final state; let Tleft(M) be the set of strings which make M move off at the left end
be the set of strings which make M loop. Show that Tnf (M), Tleft(M), Tloop (M) are reg

4 Construct multi-head automata to accept:

. 1.
{wcw|w ∊ {a, b}*}
2.
3.
{anbncn|n ≥ 1}
4.

5 Σ* is a semigroup. An equivalence relation E on Σ* is right invariant if xEy ⇒ xzEyz f

. Similarly we can define left invariance. An equivalence relation is a congruence relati
invariant and left invariant.
Prove that the following three statements about a language L ⊆ Σ* are equivalent:
1.
L is a regular set.
2.
3.
L is the union of some equivalence classes generated, by a congruence relation of finit
4.
5.
The congruence relation ≡ is defined as follows: x ≡ y if and only if for all strings z an
in L exactly when zyw is in L. Then ≡ is of finite index.
6.

6 Construct FSA over the alphabet {0, 1, 2, 3} to represent the following black and whit
.

Chapter 7. Pushdown Automata
In the earlier chapters, we have considered the simplest type of automaton,
namely, the FSA. We have seen that a FSA has finite amount memory and
hence cannot accept type 2 languages like {anbn|n ≥ 1}. In this chapter, we
consider a class of automata, the pushdown automata, which accept exactly
the class of context-free (type 2) languages. The pushdown automaton is a
finite automaton with an additional tape, which behaves like a stack. We
consider two ways of acceptance and show the equivalence between them. The
equivalence between context-free grammars (CFG) and pushdown automata is
also proved.
The Pushdown Automaton
Let us consider the following language over the alphabet Σ = {a, b, c}: L =
{anbmcn|n, m ≥ 1}. To accept this, we have an automaton which has a finite
control and the input tape which contains the input. Apart from these, there is
an additional pushdown tape which is like a stack of plates placed on a spring.
Only the top most plate is visible. Plates can be removed from top and added
at the top only. In the following example, we have a red plate and a number of
blue plates. The machine is initially is q0 state and initially a red plate is on the
stack. When it reads a it adds a blue plate to the stack and remains in state q0.
When it sees the b, it changes to q1. In q1, it reads b’s without manipulating the
stack. When it reads a c it goes to state q2 and removes a blue plate.
In q2 state, it proceeds to read c’s and whenever it reads a c it removes a blue
plate. Finally, in q2 state without reading any input it removes the red plate.
The working of the automaton can be summarized by the following table.

Stat Top Input

e plate
a b c

q0 red add blue plate remain in — —

state q0

blue add blue plate remain in go to q1 —

state q0

q1 red — — —

blue — remain in go to q2 remove the plate

state q1

q2 red without waiting for input remove red plate

blue — — remain in q2 remove the

plate

Let us see how the automaton treats the input aabbbaa.

Initially, it is q0 and reads a a. Top plate is red:

3.
4.

When it reads the first a in q0, it adds a blue plate to the stack. Now the
situation appears as follows:
5.

6.
7.

When its reads the second a, the automaton’s instantaneous description (ID)
can be represented by:
8.

9.
10.

In q0 when it reads a b it goes to q1:

11.

12.
13.

In q1 it reads a b without manipulating the stack:

14.

15.
16.
In state q1, when it reads a c it removes a blue plate and goes to q2:
17.

18.
19.

In state q2, when it reads a c it removes a blue plate:

20.

21.
22.

Now, the whole input has been read. The automaton is in q2 and top plate is
red. Now, without looking for the next input it removes the red plate.
23.

24.
25.

The current situation is represented as:

26.

27.

The whole input has been read and the stack has been emptied.

The string is accepted by the automaton if the whole input has been read and
the stack has been emptied. This automaton can accept in a similar manner
any string of the form anbmcn, n,m ≥ 1.

Now, let us consider the formal definition of a pushdown automaton.

Definition 7.1

A pushdown automaton (PDA) M = (K, Σ, Γ, δ, q0, Z0, F) is a 7-tuple where,



K is a finite set of states




Σ is a finite set of input symbols




Γ is a finite set of pushdown symbols




q0 in K is the initial state




Z0 in Γ is the initial pushdown symbol




F ⊆ K is the set of final states

δ is the mapping from K × (Σ ∪ {ε}) × Γ into finite subsets of K × Γ*
δ(q, a, z) contains (p, γ) where p, q ∊ K, a ∊ Σ ∪ {ε}, z ∊ Γ, γ ∊ Γ* which means
that when the automaton is ins state q and reading a (reading nothing if
a = ε) and the top pushdown symbol is z, it can go to state p and replace z in
the pushdown store by the string γ. If γ = z1 ... zn; z1 becomes the new top
symbol of the pushdown store. It should be noted that basically the pushdown
automaton is non-deterministic in nature.

Definition 7.2

An instantaneous description (ID) of a PDA is a 3-tuple (q,w,α) where q denotes

the current state, w is the portion of the input yet to be read and α denotes the
contents of the pushdown store. w ∊ Σ*, α ∊ Γ* and q ∊ K. By convention, the
leftmost symbol of α is the top symbol of the stack.

If (q, aa1a2 ... an, zz1 ... zn) is an ID and δ(q, a, z) contains (p, B1 ... Bm), then

the next ID is (p, a1 ... an, B1 ... Bm z1 ... zn), a ∊ Σ ∪ {ε}.

This is denoted by (q, aa1 ... an, zz1 ... zn) ⊢ (p, a1 ... an, B1 ... Bmz1 ... zn). ⊢* is

the reflexive transitive closure of ⊢. The set of strings accepted by the
PDA M by emptying the pushdown store is denoted as Null(M) or N(M).

N(M) = {w|w ∊ Σ, (q0, w, Z0) ⊢ (q, ε, ε) for some q ∊ K}

This means that any string w on the input tape will be accepted by the
PDA M by the empty store, if M started in q0 with its input head pointing to
the leftmost symbol of w and Z0 on its pushdown store, will read the whole
of w and go to some state q and the pushdown store will be emptied. This is
called acceptance by empty store. When acceptance by empty store is
considered, F is taken as the empty set.

There is another way of acceptance called acceptance by final state. Here,

when M is started in q0 with w on the input tape and input tape head pointing
to the leftmost symbol of w and with Z0 on the pushdown store, finally after
some moves, reads the whole input and reaches one of the final states. The
pushdown store need not be emptied in this case. The language accepted by
the pushdown automaton by final state is denoted as T(M).
T(M) = {w|w ∊ Σ*, (q0, w, Z0) ⊢* (qf, ε, γ) for some qf ∊ F and γ ∊ Γ*}.

Example 7.1.

Let us formally define the pushdown automaton for accepting {anbmcn | n, m ≥
1} described informally earlier.

M = (K, Σ, Γ, δ, q0, R, φ) where K = {q0, q1, q2}, Σ = {a, b, c}, Γ = {B, R}

and δ is given by:

δ(q0, a, R) = {(q0, BR)}



δ(q0, a, B) = {(q0, BB)}



δ(q0, b, B) = {(q1, B)}



δ(q1, b, B) = {(q1, B)}



δ(q1, c, B) = {(q2, ε)}



δ(q2, c, B) = {(q2, ε)}



δ(q2, ε, R) = {(q2, ε)}


The sequence of ID’s on input aabbbcc is given by:

(q0, aabbbcc, R) ⊢ (q0, abbbcc, BR) ⊢ (q0, bbbcc, BBR)

⊢ (q1, bbcc, BBR) ⊢ (q1, bcc, BBR) ⊢ (q1, cc, BBR)

⊢ (q2, c, BR) ⊢ (q2, ε, R) ⊢ (q2, ε, ε)

It can be seen that the above PDA is deterministic. The general definition of
PDA is non-deterministic. In order that a PDA is deterministic, two conditions
have to be satisfied. At any instance, the automaton should not have a choice
between reading a true input symbol or ε; the next move should be uniquely
determined. These conditions may be stated formally as follows:

In a deterministic PDA (DPDA),

For all q in K, Z in Γ if δ(q, ε, Z) is non-empty δ(q, a, Z) is empty for all a ∊ Σ.

2.
3.

For all q in K, a in Σ ∪ {ε}, Z in Γ, δ(q, a, Z) contains at most one element.

In the following sections, we shall show the equivalence of the two modes of
acceptance and also the equivalence of non-deterministic PDA with CFG.
Non-deterministic PDA and DPDA are not equivalent. There are languages
which can be accepted by non-deterministic PDA but not by DPDA. For
example, consider the language {wwR|w ∊ {a, b}*}. Let us informally describe
a PDA accepting a string say abbabbabba. The pushdown store initially
contains a red plate (say). When a is read, a blue plate is added and when
a b is read a green plate is added. This happens, when the PDA reads the first
half of the input. During the second half, if a (a, blue plate) combination
occurs, the blue plate is removed and if a (b, green plate) combination occurs,
the green plate is removed. Finally, after the whole input is read, the red plate
can be removed. Now the question is, how does the automaton know when the
second half begins. Whenever bb or aa occurs in the input, when the
automaton looks at the second b or a, it should consider both possibilities –
whether it will be the continuation of the first half or the starting of second
half. This language cannot be accepted by any DPDA. It can be seen that any
inherently ambiguous context-free language cannot be accepted by DPDA.
Equivalence between Acceptance
by Empty Store and Acceptance by
Final State
In this section, we show that acceptance by empty store and acceptance by
final state are equivalent.

Theorem 7.1

L is accepted by a PDA M1 by empty store, if and only if L is accepted by a PDA

M2 by final state.

Proof. (i) Let L be accepted by a PDA M2 = (K, Σ, Γ, δ2, q0, Z0, F) by final state.

Then construct M1 as
follows: . We add two more
states q0′ and qe and one more pushdown symbol X0. q0′ is the new initial state
and X0 is the new initial pushdown symbol. qe is the erasing state.

δ mappings are defined as follows:

δ1(q0′, ε, X0) contains (q0, Z0X0)

2.
3.

δ1(q, a, Z) includes δ2(q, a, Z) for all q ∊ K, a ∊ Σ ∪ {ε}, Z ∊ Γ

4.
5.

δ1(qf, ε, Z) contains (qe, ε) for qf ∊ F and Z ∊ Γ ∪ {X0}

6.
7.

δ1(qe, ε, Z) contains (qe, ε) for Z ∊ Γ ∪ {X0}

The first move makes M1 go to the initial ID of M2 (except for the X0 in the
pushdown store). Using the second set of mappings M1 simulates M2.
When M2 reaches a final state using mapping 3, M1 goes to the erasing
state qe and using the set of mappings 4, entire pushdown store is erased.

If w is the input accepted by M2, we have .




This can happen in M1 also. .




M1 accepts w as follows:

Equation 7.1.


Hence, if w is accepted by M2, it will be accepted by M1. On the other hand,
if M1 is presented with an input, the first move it can make is using mapping 1
and once it goes to state qe, it can only erase the pushdown store and has to
remain in qe only. Hence, mapping 1 should be used in the beginning and
mapping 3 and 4 in the end. Therefore, mapping 2 will be used in between and
the sequence of moves will be as in Equation (7.1).

Hence, which means

and w will be accepted by M2.

(ii) Next, we prove that if L is accepted by M1 by empty store, it will be accepted
by M2 by final state. Let M1 = (K, Σ, Γ, δ1, q0, Z0, φ). Then, M2 is constructed as
follows:

M2 = (K ∪ {q0′, qf}, Σ, Γ ∪ {X0}, δ2, q0′, X0, {qf})

Two more states q0′ and qf are added to the set of states K. q0′ becomes the
new initial state and qf becomes the only final state. One more pushdown
symbol X0 is added which becomes the new initial pushdown symbol.
The δ mappings are defined as follows:

δ2(q0′, ε, X0) contains (q0, Z0X0)

2.
3.

δ2(q, a, Z) includes all elements of δ1(q, a, Z) for q ∊ K, a ∊ Σ ∪ {ε},Z ∊ Γ

4.
5.
δ2(q, ε, X0) contains (qf, X0) for each q ∊ K

Mapping 1 makes M2 go to the initial ID of M1 (except for the X0 in the

pushdown store). Then using mapping 2, M2 simulates M1. When M1 accepts by
emptying the pushdown store, M2 has X0 left on the pushdown store. Using
mapping 3, M2 goes to the final state qf.

The moves of M2 in accepting an input w can be described as follows:

(q0′, w, X0) ⊢ (q0, w, Z0 X0) ⊢* (q, ε, X0) ⊢ (qf, ε, X0)

It is not difficult to see that w is accepted by M2, if and only if w is accepted
by M1.

It should be noted that X0 is added in the first part for the following
reason. M2 may reject an input w by emptying the store and reaching a nonfinal
state. If X0 were not there, M1 while simulating M2 will empty the store and
accept the input w. In the second part, X0 is added because for M2 to make the
last move and reach a final state, a symbol in the pushdown store is required.
Thus, we have proved the equivalence of acceptance by empty store and
acceptance by final state in the case of non-deterministic PDA.

Remark The above theorem is not true in the case of DPDA. Any regular set
can be accepted by DPDA by final state. The DPDA for a regular set R will
behave like a DFSA for R except that throughout the sequence of moves, the
pushdown store will contain Z0 without any change. But even simple regular
languages like {0}* cannot be accepted by DPDA by empty store. Suppose we
want to accept {0}* by a PDA by empty store. If the initial state is q0 and the
initial pushdown symbol is Z0, there should be an ε-move for (q0, Z0)
combination. To read a ‘a,’ δ(q0, a, Z0) cannot be empty. i.e., both δ(q0, ε, Z0)
and δ(q0, a, Z0) are non-empty and the machine cannot be deterministic.
Equivalence of CFG and PDA
In this section, we show the equivalence of CFG and non -deterministic PDA.

Theorem 7.2

If L is generated by a CFG, then L is accepted by a non-deterministic PDA by

empty store.

Proof. Let L be generated by G = (N, T, P, S). Then M =

({q}, T, N ∪ T, δ, q, S, φ) can be constructed to accept L(G). δ is defined as
follows.

If A → α is a rule in P, δ(q, ε, A) contains (q, α). For each α in T, δ(q, a, a)

contains (q, ε).

To see that L(G) = N(M), we note that M simulates a leftmost derivation in G.

Suppose

Equation 7.2.

is a leftmost derivation in G. M uses an ε-move if the top pushdown symbol is a

non-terminal and reads a true input symbol if the top pushdown symbol is the
same as that input symbol and pops it off.

(i) We show if w ∊ L(G), then w ∊ N(M).

Consider the derivation in Equation (7.2). Suppose αi can be written

as xiγi where xi is the terminal prefix string and γi begins with a non-terminal.
We show by induction on i, the number of steps of derivations in G, that

Basis
i = 0, = w, γi = S

Induction
Suppose the result holds up to i. i.e., (q, w, S) ⊢* (q, , γi).

To prove for i + 1



αi = xiγi, αi + 1 = xi + 1 γi + 1




and αi ⇒ αi +1. The first symbol of γi is A say; then we use a rule A → η to get

αi ⇒ αi + 1.


Now different possibilities arise.

η has the form yi , where yi ∊ T*.

2.
3.

η ∊ T* and γi = ABηi, where B ∊ N, ηi ∊ (N ∪ T)*.

4.
5.

η ∊ T* and γi = Ay′Cη′i, where y′ ∊ T* and .

Case (i): In the first case,

will be of the form




γi will be


So, we get

Here and .

i.e., the PDA uses one ε-move to replace A by on the stack and matches
the symbols of yi with the symbols on the top of the stack popping them off
till B becomes the top of the stack. On the input tape yi has been read.

Case (ii): . Here, will be of the form ηx′i + 1.

(q, ηx′i + 1, A ) ⊢ (q, ηx′i + 1, η ) ⊢* (q, x′i + 1, Bηi)

The symbols of η are matched with the top of the stack and popped off.

Case (iii): where A,C ∊ N, y′ ∊ T, ηi, ∊ (N ∪ T).

Here will be of the form ηy′x′i + 1.

Finally, γn = ε; xn′ = ε

∴ (q, w, S) ⊢* (q, ε, ε)

∴ w will be accepted by M by empty store.

(ii) If w ∊ N(M), then w ∊ L(G).

We prove a slightly general result.

If (q, x, A) ⊢* (q, ε, ε) then .

This can be proved by induction on i, the number of moves of the PDA.

Basis
If (q, x, A) ⊢ (q, ε, ε),

δ(q, x, A) contains (q, ε).

x has to be ε by our construction and so A → ε is a rule in P.

A ⇒ ε. i.e., .

Induction
Suppose the result is true upto i − 1 steps.

Let (q, x, A) ⊢* (q, ε, ε) in i steps. The first step would make use of a mapping
of the form δ(q, ε, A) contains (q, B1 ... Bm). So (q, x, A) ⊢ (q, x, B1 ... Bm)
⊢* (q, ε, ε). Now x can be written in the form x1 x2 ... xm. After the whole of x is
read, stack is emptied (Figure 7.1).

Figure 7.1. Stack while reading x1x2 ... xm

Let x1 be the portion (prefix) of x read at the end of which B2 becomes the top
of the stack (x1 can be ε).

∴ (q, x1, B1) ⊢* (q, ε, ε).

This should have happened in less than i steps.

In a similar manner, we can see that , 2 ≤ i ≤ m.

Since (q, x, A) ⊢ (q, x, B1 ... Bm), there should be a mapping δ(q, ε, A) contains

(q, B1 ... Bm) which should have come from the rule A → B1 ... Bm.

Hence, we have

Now we have to note only that if w is in N(M) (q, x, S) ⊢* (q, ε, ε) and by what
we have proved just now . Hence, x ∊ L(G).

Having proved this result, let us illustrate with an example.

Example 7.3.

Let G = ({S}, {a, b}, {S → aSb, S → ab}, S) be a CFG.

The corresponding PDA M = ({q}, {a, b}, {S, a, b}, δ, q, S, φ) is constructed.

δ is defined as follows:

δ(q, ε, S) contains (q, aSb)

2.
3.

δ(q, ε, S) contains (q, ab)

4.
5.

δ(q, a, a) contains (q, ε)

6.
7.

δ(q, b, b) contains (q, ε)

Consider the leftmost derivation of aaabbb (Since the grammar considered is

linear – a CFG G is said to be linear if for any rule of G, the right-hand side
contains at most one non-terminal, any sentential form will have only one
non-terminal and so every derivation is leftmost).

S ⇒ aSb ⇒ aaSbb ⇒ aaabbb


aaabbb is accepted by M in the following manner.

Initially we have,
2.

3.
4.

Using an ε-move the machine goes to

6.
7.

a and a are matched using mapping 3 and we get

9.
10.

Using ε-move the next ID becomes

11.

12.
13.
a and a are matched. The ID becomes
14.

15.
16.

Using mapping 2, the next ID becomes

17.

18.

In the next four moves, abbb on the input tape are matched with on the stack
and symbols on the stack are removed. After reading aaabbb on the input
tape, the stack is emptied.

We can also give a diferent proof for Theorem 7.3.

Proof. Let us assume that L does not contain ε and L = L(G), where G is in

Greibach normal form (GNF). G = (N, T, P, S) where rules in P are of the
form A → aα, A ∊ N, a ∊ T, α ∊ N*. Then M can be constructed such that N(M)
= L(G). M = ({q}, T, N, δ, q, S φ) where δ is defined as follows: If A → aα is a
rule, δ(q, a, A) contains (q, ε). M simulates a leftmost derivation in G and the
equivalence L(G) = N(M) can be proved using induction. If ε ∊ L, then we can
have a grammar G in GNF with an additional rule S → ε and S will not appear
on the right-hand side of any production. In this case, M can have one ε-move
defined by δ(q, ε, S) contains (q, ε) which will enable it to accept ε.

Now, we shall construct a CFG G, given a PDA M such that

L(G) = N(M).

Theorem 7.3

If L is accepted by a PDA, then L can be generated by a CFG.

Proof. Let L be accepted by a PDA by empty store.

Construct a CFG G = (N, T, P, S) as follows:

N = {[q, Z, p]|q, p ∊ K, Z ∊ Γ} ∪ {S}.

P is defined as follows: S→ [q0, Z0, q] ∊ P for each q in K.

If δ(q, a, A) contains (p, B1 ... Bm) (a ∊ Σ ∪ {ε}) is a mapping, then P includes

rules of the form:

[q, A, qm] → a[p, B1, q1][q1, B2, q2] ... [qm − 1, Bm, qm] qi ∊ K, 1 ≤ i ≤ m

If δ (q, a, A) contains (p, ε) then P includes [q, A, p] → a

Now, we show that L(G) = N(M)(= L).

It should be noted that the variables and productions in the grammar are defined
in such a way that the moves of the PDA are simulated by a leftmost derivation
in G.

We prove that:

if and only if (q, x, A) ⊢* (p, ε, ε).

That is, if the PDA goes from state q to state p after reading x and the stack
initially with A on the top ends with A removed from stack (in between the stack
can grow and come down). See Figure 7.1.

This is proved by induction on the number of moves of M.

(i) If (q, x, A) ⊢* (p, ε, ε) then

Basis
If (q, x, A) ⊢ (p, ε, ε) x = a or ε, where a ∊ Σ and there should be a
mapping δ(q, x, A) contains (p, ε).

In this case, by our construction [q, A, p] → x is in P.

Hence, [q, A, p] ⇒ x.

Induction
Suppose the result holds up to n − 1 steps.

Let (q, x, A) ⊢* (p, ε, ε) in n steps.

Now, we can write x = ax′, a ∊ Σ ∪ {ε} and the first move is (q, ax′, A) ⊢ (q1, x
′, B1 ... Bm)

This should have come from a mapping δ(q, a, A) contains (q1, B1 ... Bm) and

there is a rule

Equation 7.3.

The stack contains A initially, and is replaced by B1 ... Bm. Now, the string x′

can be written as x1x2 ... xm such that, the PDA completes
reading x1 when B2 becomes top of the stack; completes
reading x2 when B3 becomes the top of the stack and so on.

The situation is described in Figure 7.1.

Therefore, (qi, xi, Bi) ⊢* (qi + 1, ε, ε) and this happens in less than n steps.

So,

Equation 7.4.

Putting qm + 1 = p in Equation (7.3), we get:

by Equation (7.4)

Therefore in G.

(ii) If in G then (q, x, A) ⊢* (p, ε, ε)

Proof is by induction on the number of steps in the derivation of G.

Basis
If [q, A, p] ⇒ x then x = a or ε where a ∊ Σ and (q, A, p) → x is a rule in P. This
must have come from the mapping δ(q, x, A) contains (p, ε) and hence (q, x,
A) ⊢ (p, ε, ε).

Induction
Suppose the hypothesis holds up to (n − 1) steps and suppose
in n steps. The first rule applied in the derivation must be of the form:

Equation 7.5.

and x can be written in the form x = ax1 ... xm

such that .

This derivation must have taken less than n steps and so by the induction
hypothesis:

Equation 7.6.
Equation (7.5) must have come from a mapping δ(q, a, A) contains
(q, B1 ... Bm). Therefore,

(q, ax1 ... xm, A) ⊢ (q1, x1 ... xm, B1 ... Bm)

⊢* (q2, x2 ... xm, B2 ... Bm)

⊢* (q3, x3 ... xm, B3 ... Bm)

⋮

⊢* (qm − 1, xm − 1xm, Bm − 1Bm)

⊢* (qm, xm, Bm)

⊢* (p, ε, ε)

Hence, . Having proved that if and only

if , we can easily see that if and only
if .

This means w is generated by G if and only if w is accepted by M by empty
store.

Hence, L(G) = N(M).

Let us illustrate the construction with an example.

Example 7.3.

Construct a CFG to generate N(M) where

M = ({p, q}, {0, 1}, {X, Z0} δ, q, Z0, φ)



where δ is defined as follows:

δ(q, 1, Z0) = {(q, X Z0)}

2.
3.

δ(q, 1, X) = {(q, X X)}

4.
5.

(q, 0, X) = {(p, X)}

6.
7.

δ(q, ε, Z0) = {(q, ε)}

8.
9.

δ(q,1, X) = {(p, ε)}
10.
11.

δ(q, 0, Z0) = {(q, Z0)}

12.

It can be seen that:

N(M) = {1n 0 1n 0 | n ≥ 1}*



The machine while reading 1n adds X’s to the stack and when it reads a 0,
changes to state p. In state p, it reads 1n again removing the X’s from the stack.
When it reads a 0, it goes to q keeping Z0 on the stack. It can remove Z0 by
using mapping 4 or repeat the above process several times. Initially,
also Z0 can be removed using mapping 4, without reading any input. Hence, ε
will also be accepted.

G = (N, T, P, S) is constructed as follows:

T = Σ

N = {[q, Z0, q], [q, X, q], [q, Z0, p], [q, X, p], [p, Z0, q], [p, X, q], [p, Z0, p], [p, X, p]} ∪{S}

Initial rules are:

r1. S → [q, Z0, q]
r2. S → [q, Z0, p]

Next, we write the rules for the mappings.

Corresponding to mapping 1, we have the rules:

r3. [q, Z0, q] → 1[q, X, q] [q, Z0, q]

r4. [q, Z0, q] → 1[q, X, p] [p, Z0, q]

r5. [q, Z0, p] → 1[q, X, q] [q, Z0, p]

r6. [q, Z0, p] → 1[q, X, p] [p, Z0, p]

Corresponding to mapping 2, we have the rules:

r7. [q, X, q] → 1[q, X, q] [q, X, p]

r8. [q, X, q] → 1[q, X, p] [p, X, q]

r9. [q, X, p] → 1[q, X, q] [q, X, p]

r10. [q, X, p] → 1[q, X, p] [p, X, p]

Corresponding to mapping 3, we have the rules:

r11. [q, X, q] → 0[p, X, q]

r12. [q, X, p] → 0[p, X, p]

Corresponding to mapping 4, we have the rule:

r13. [q, Z0, q] → ε

Corresponding to mapping 5, we have the rule:

r14. [p, X, p] → 1

Corresponding to mapping 6, we have the rule:

r15. [p, Z0, q] → 0[q, Z0, q]

r16. [p, Z0, p] → 0[q, Z0, p]

So, we have ended up with 16 rules. Let us see whether we can remove some
useless non-terminals and rules here.

There is no rule with [p, X, q] on the left-hand side. So, rules involving it can
be removed i.e., r8, r11. Once r8 and r11 are removed, the only rule with [q, X,
q] on the left-hand side is r7 which will create more [q, X, q] whenever applied
and the derivation will not terminate. So rules involving [q, X, q] can be
removed. i.e., r3, r5, r7, r9. Now, we are left with
rules r1, r2, r4, r6, r10, r12, r13, r14, r15, r16. If you start with r2, r6 can be
applied. [q, Z0, p] will introduce [p, Z0, p] in the sentential form. Then r16 can
be applied which will introduce [q, Z0, p] and the derivation will not
terminate. Hence, [q, Z0, p] and rules involving it can be removed. i.e.,
rules r2, r6, r16 can be removed. So, we end up with
rules r1, r4, r10, r12, r13, r14, r15. Using non-terminals:

A for [q, z0, q]

B for [q, X, p]

C for [p, Z0, q]

D for [p, X, p]

the rules can be written as:

S → A

A → 1BC

B → 1BD

B → 0D

A → ε

D → 1
C → 0A

It can be easily checked that this grammar generates {1n 0 1n 0 | n ≥ 1}*.

Problems and Solutions

1. Let L = {aibjck|i, j, k ≥ 1 and i + j = k}
1.
Find a PDA (which accepts via final state) that recognizes L.
2.
3.
Find a PDA (which accepts via empty stack) that recognizes L.
4.

Solution1.
. Hint: Push a’s and b’s into the stack and match them with each c and clear.
2.
The PDA M = ({q0, q1, q2, q3}, {a, b, c}, {a, b, $}, δ, q0, $, {q3})
3.
Transitions are:
4.


δ(q0, a, $) contains (q0, a $)




δ(q0, a, a) contains (q0, aa)




δ(q0, b, a) contains (q1, ba)




δ(q1, b, b) contains (q1, bb)




δ(q1, c, b) contains (q2, ε)




δ(q2, c, a) contains (q2, ε)




δ(q2, c, b) contains (q2, ε)




δ(q2, ε, $) contains (q3, ε)


5.
The machine M above is the required PDA, q3 is the final state and all the transitions
remain the same as for (i). This machine accepts L via both by empty stack and final
state. Note that this is a DPDA.
6.

2. Find a PDA for L = {x ∊ {a, b, c}*||x|a + |x|b = |x|c}.

Solution This example is slightly different from the previous one as one can have interleaving
. occurrences of a’s, b’s and c’s.
The transitions are:


δ(q0, a, $) contains (q0, a$)




δ(q0, b, $) contains (q0, b$)




δ(q0, c, $) contains (q0, c$)




δ(q0, a, a) contains (q0, aa)




δ(q0, a, b) contains (q0, ab)




δ(q0, b, a) contains (q0, ba)




δ(q0, b, b) contains (q0, bb)




δ(q0, c, c) contains (q0, cc)




δ(q0, a, c) contains (q0, ε)




δ(q0, b, c) contains (q0, ε)




δ(q0, c, a) contains (q0, ε)




δ(q0, c, b) contains (q0, ε)




δ(q0, ε, $) contains (qf, ε)




qf - final state.



3. Give an example of a finite language that cannot be recognized by any one-state PDA
that accepts via final state.

Solution L = {abc}

. Reason
If abc is accepted in state q0, then ab, a are also accepted in q0.

4. Let L = {anbncmdm|n, m ≥ 1}. Find a PDA that accepts L.

Solution M = ({q0, q1, q2, q3, q4}, {a, b, c, d}, {a, c, $}, δ, q0, $, {q4})

. Transitions are:


δ(q0, a, $) contains (q0, a$)




δ(q0, a, a) contains (q0, aa)




δ(q0, b, a) contains (q1, ε)




δ(q1, b, a) contains (q1, ε)




δ(q1, c, $) contains (q2, c$)



δ(q2, c, c) contains (q2, cc)



δ(q2, d, c) contains (q3, ε)




δ(q3, d, c) contains (q3, ε)




δ(q3, ε, $) contains (q4, ε)



This machine accepts L by empty stack. Taking q4 as the final state L is accepted by
final state also.

5. Design a PDA to accept the following language.

1.
The set of all strings of balanced parentheses. i.e., each left parenthesis has a matching
right parentheses and pairs of matching parentheses are properly nested.
2.
3.
The set of all non-palindromes over {a, b}.
4.

Solution1.
. M = ({q0}, {(,)}, {x, $}, δ, q0, $, φ)
2.
δ is given by:
3.


δ(q0, (, $) contains (q0, X$)




δ(q0, (, X) contains (q0, X X)




δ(q0,), X) contains (q0, ε)




δ(q0, ε, $) contains (q1, ε)



The acceptance is by empty stack.

4.
M = {(q0, q1, q2), {a, b}, {0, 1, $}, δ, q0, $, φ}
5.


For one symbol strings, accept by emptying the stack and going to q2.


δ(q0, a, $) contains (q2, ε)

δ(q0, b, $) contains (q2, ε)


First half: Push 0 when a is seen and 1 when b is seen.



δ(q0, a,$) contains (q0, 0$)

δ(q0, a, 0) contains (q0, 00)
δ(q0, a, 1) contains (q0, 01)
δ(q0, b, $) contains (q0, 1$)
δ(q0, b, 0) contains (q0, 10)
δ(q0, b, 1) contains (q0, 11)


Guessing the middle of the string:



Change state: For strings with odd length, a symbol is read without changing the stack.
δ(q0, a, 0) contains (q1, 0)
δ(q0, a, 1) contains (q1, 1)
δ(q0, b, 0) contains (q1, 0)
δ(q0, b, 1) contains (q1, 1)
For strings with even length
Pop 0 when a is seen, pop 1 when b is seen.
δ(q0, a, 0) contains (q1, ε)
δ(q0, b, 1) contains (q1, ε)


Keep popping for (a, 0) or (b, 1) combination



δ(q1, a, 0) contains (q1, ε)

δ(q1, b, 1) contains (q1, ε)


If 0 is not found when a is seen or 1 is not found when b is seen, then pop the symbol
and change state to q2. Then continue to clear out all the symbols including $.


δ(q1, a, 1) contains (q2, ε)

δ(q1, b, 0) contains (q2, ε)
δ(q2, a, 0) contains (q2, ε)
δ(q2, b, 0) contains (q2, ε)
δ(q2, a, 1) contains (q2, ε)
δ(q2, b, 1) contains (q2, ε)


(q2, ε, $) contains (q2, ε)



6. Find a PDA for

L = {x0y1| x, y ∊ {0, 1}*, |x| = |y|}

Solution
M =({q0, q1, q2}, {0, 1}, {X, $}, δ, q0, $, φ}
.
where δ is given by:
δ(q0, 0, $) contains (q0, X$)
δ(q0, 0, X) contains (q0, X X)
δ(q0, 1, $) contains (q0, X$)
δ(q0, 1, X) contains (q0, X X)
δ(q0, 0, $) contains (q1, $)
δ(q0, 0, X) contains (q1, X)
δ(q1, 0, X) contains (q1, ε)
δ(q1, 1, X) contains (q1, ε)
δ(q1, 1, $) contains (q2, ε)
Acceptance is by empty stack. We can also look at it as acceptance by final state by
taking q2 as the final state.

Exercises
1. Design a PDA which accepts strings of the form 1*0n1n and one which accepts strings which
contain twice as many zeros as ones.

2. Find a PDAs which accepts set of strings composed of zeros and ones which are:
1.
not of the form ww
2.
3.
of the form 0n1n or 0n12n
4.

3. Show that the language {0m 1n|m ≤ n ≤ 2m} is context-free by giving a PDA that accepts it.

4. Let G = (N, Σ, P, S) be a CFG with P = {S → aSA|aAA|b, A → bBBB, B → b}. Construct a

PDA which accepts L(G) by final state.

5. Let M be a PDA such that M = (K, Σ, Γ, δ, q0, Z, F) where K = {q0, q1, q2, q3}, Σ = {a, b}, Γ
= {A, Z}, F = φ
δ: δ (q0, a, Z) = (q0, AZ)
δ(q0, b, A) = (q1, ε)
δ(q0, a, A) = (q3, ε)
δ(q1, ε, Z) = (q2, ε)
δ(q3, ε, Z) = (q0, AZ)
Construct a CFG accepting L(M).

6. Construct a DPDA for the following languages.

1.
{anbn|n ≥ 1}
2.
3.
{anbmcn + m|n, m > 0}
4.

7. Show that if L is a CFL and ε ∉ L, then there is a PDA M accepting L by final state such
that M has at most two states and makes no ε-moves.

8. A pushdown transducer (PDT) M is an 8-tuple (K, Σ, Γ, Δ, δ, q0, Z0, F), where all the

symbols have the same meaning as for a PDA except that Δ is a finite output alphabet
and δ is a mapping from K × (Σ ∪{ε}) × Γ} to finite subsets of K × Γ* × Δ*.
Any configuration of M will be a 4-tuple (q, x, a, y) where q, x, a are as in any PDA and y is
the output string at this point of time. The output happens on a separate tape. It never goes
back on this tape for any change. The machine may write a string as part of each instruction.
Design an automaton that changes infix arithmetic expression to postfix expression.

9. The PDA’s defined in this chapter make moves on the input tape oneway only. By allowing
it to move two-ways on the input tape by having end markers on the input tape, one can
define a two-way PDA or 2PDA. Show that the following languages are recognized by a
2PDA.
1.
{anbncn|n ≥ 1}
2.
3.
{ww|w ∊ {0, 1}*}
4.

10 For a PDA M, let there exist a constant k such that M can never have more than k symbols on
. its pushdown stack at any time. Show that L(M) is a regular language.

11 Find a grammar generating L(M), M = ({q0, q1, q2}, {0, 1}, {Z0, A}, δ, q0, Z0, {q2})

. where δ is given by:


δ(q0, a, Z0) = {(q1, AZ0)}



δ(q0, a, A) = {(q1, AA)}



δ(q1, a, A) = {(q0, AA)}



δ(q1, ε, A) = {(q2, AA)}



δ(q2, b, A) = {(q2, ε)}


12 A PDA M = (Q1, Σ, Γ, δ, q0, z0, F) is said to be a single-turn PDA if

. whenever and |γ2| < |γ1|, then |γ3| ≤ |
γ2|. That is, once the stack starts to de crease in height, it never increases in height.
Show that a language is generated by a linear CFG if and only it is accepted by a single-turn
PDA.

 Copy
 Add Highlight
 Add Note

Chapter 8. Context-Free
Grammars–Properties and Parsing
Pumping lemma for regular sets presented in an earlier chapter
was used to prove that some languages are non-regular. Now, we
give another pumping lemma for context-free languages (CFL)
whose application will be to show that some languages are non-
context-free. The idea behind this lemma is that longer strings in a
CFL, have substrings which can be pumped to get infinite number
of strings in the language.

Pumping Lemma for CFL

Theorem 8.1

Let L be a CFL. Then there exists a number k (pumping length)

such that if w is a string in L of length at least ‘k,’ then w can be
written as w = uυxyz satisfying the following conditions:

|υy| > 0

2.
3.

|υxy| ≤ k

4.
5.

For each i ≥ 0, uυixyiz ∊ L

Proof. Let G be a context-free grammar (CFG) in Chomsky normal

form (CNF) generating L. Let ‘n’ be the number of non-terminals
of G. Take k = 2n. Let ‘s’ be a string in L such that |s| ≥ k. Any parse
tree in G for s must be of depth at least n. This can be seen as
follows:

If the parse tree has depth n, it has no path of length greater

than n; then the maximum length of the word derived is 2n−1. This
statement can be proved by induction. If n = 1, the tree has

structure . If n = 2, the tree has the structure . Assuming that

the result holds upto i − 1, consider a tree with depth i. No path in
this tree is of length greater than i. The tree has the structure as in
the figure below.

T1 and T2 have depth i − 1 and the maximum length of the word

derivable in each is 2i−2, and so the maximum length of the string
derivable in T is 2i−2 + 2i−2 = 2i−1.

Choose a parse tree for s that has the least number of nodes.

Consider the longest path in this tree. This path is of length at least
‘n + 1.’ Then, there must be at least n + 1 occurrences of non-
terminals along this path. Consider the nodes in this path starting
from the leaf node and going up towards the root. By pigeon-hole
principle some non-terminal occurring on this path should repeat.
Consider the first pair of occurrences of the non-terminal A (say)
which repeats while reading along the path from bottom to top.
In Figure 8.1, the repetition of A thus identified allows us to replace
the subtree under the second occurrence of the non-terminal A with
the subtree under the first occurrence of A. The legal parse trees
are given in Figure 8.1.
Figure 8.1. Derivation trees showing pumping property

We divide s as uυxyz as in Figure 8.1(i). Each occurrence of A has

a subtree, under it generating a substring of s. The occurrence
of A near the root of the tree generates the string ‘υxy’ where the
second occurrence of A produces x. Both the occurrences
of A produce substrings of s. Hence, one can replace the
occurrence of A that produces x by a parse tree that
produces υxy as shown in Figure 8.1(ii). Hence, strings of the
form uυixyiz, for i > 0 are generated. One can replace the subtree
rooted at A which produces ‘υxy’ by a subtree which produced x as
in Figure 8.1(iii). Hence, the string ‘uxz’ is generated. In essence,

We have .

Hence, .

Therefore, we have .
Both υ and y simultaneously cannot be empty as we consider the
grammar in CNF. The lower A will occur in the left or right subtree.
If it occurs in the left subtree, y cannot be ε and if it occurs in the
right subtree, υ cannot be ε.

The length of υxy is at most k, because the first occurrence

of A generates υxy and the next occurrence generates x. The
number of non-terminal occurrences between these two
occurrences of A is less than n +1. Hence, length of υxy is at most
2n(= k). Hence the proof.

One can use pumping lemma for showing that some languages are
not context-free. The method of proof will be similar to that of
application of pumping lemma for regular sets.

Example 8.1.

Show that L = {anbncn|n ≥ 0} is not context-free.

Proof. Suppose L is context-free. Let p be the pumping length.

Choose s = apbpcp. Clearly, |s| > p. Then, s can be pumped and all
the pumped strings must be in L. But we show that they are not.
That is, we show that s can never be divided as uυxyz such
that uυixyiz is in L for all i ≥ 0.

v and y are not empty simultaneously. If v and y can contain more

than one type of symbol, then uυ2xy2z may not be of the
form anbncn. If υ or y contains only one type of alphabet,
then uυ2xy2z cannot contain equal number of a’s, b’s, and c’s
or uxz has unequal number of a’s, b’s, and c’s. Thus, a
contradiction arises.

Hence, L is not a CFL.

Closure Properties of CFL

In this section, we investigate the closure of CFLs under some
operations like union, intersection, difference, substitution,
homomorphism, and inverse homomorphism etc. The first result
that we will prove is closure under substitution, using which we
establish closure under union, catenation, catenation closure,
catenation +, and homomorphism.

Theorem 8.2

Let L be a CFL over TΣ and σ be a substitution on T such that

σ(a) is a CFL for each a in T. Then σ(L) is a CFL.

Proof. Let G = (N, T, P, S) be a CFG generating L. Since σ(a) is a

CFL, let Ga = (Na, Ta, Pa, Sa) be a CFG generating σ(a) for
each a ∊ T. Without loss of
generality, Na ∩ Nb = φ and Na ∩ N = φ for a ≠ b, a, b ∊T. We now
construct a CFG G′= (N′, T′, P′, S′) which generates σ (L) as
follows:

1. N′ is the union of , a ∊ T and N

3. P′ consists of:

all productions in Pa for a ∊ T


all productions in P, but for each terminal a occurring in any rule of P, is to be replac
by Sa. i.e., in A → α, every occurrence of a (∊ T) in α is replaced by Sa.


Any derivation tree of G′ will typically look as in the following figure
(Figure 8.2).
Figure 8.2. A derivation tree showing a string obtained by substitution

Here ab... k is a string of L and xaxb... xk is a string of σ(L). To

understand the working of G′ producing σ(L), we have the following
discussion:

A string w is in L(G′) if and only if w is in σ(L). Suppose w is in σ(L).

Then, there is some string x = a1... ak in L and strings xi in σ(ai), 1
≤ i ≤ k, such that w = x1 ... xk. Clearly from the construction of G
′, Sai ... Sak is generated (for a1 ... ak ∊ L). From each Sai, xis are
generated where xi ∊ σ(ai). This becomes clear from the above
picture of derivation tree. Since G′ includes productions
of Gai, x1 ... xk belongs to σ(L).

Conversely for w ∊ σ(L), we have to understand the proof with the

help of the parse tree constructed above. That is, the start symbol
of G and G′ are S. All the non-terminals of G, Ga’s are disjoint.
Starting from S, one can use the productions of G′ and G and
reach w = Sa1 ... Sak and w′ = a1 ... ak, respectively. Hence,
whenever w has a parse tree T, one can identity a
string a1a2 ... ak in L(G) and string in σ(ai) such that x1... xk ∊ σ(L).
Since x1... xk is a string formed by substitution of strings xi’s for ai’s,
we conclude w ∊ σ(L).

Remark One can use the substitution theorem of CFLs to prove

the closure of CFLs under other operations.

Theorem 8.3
CFLs are closed under union, catenation, catenation
closure (*), catenation +, and homomorphism.

Proof

Union: Let L1 and L2 be two CFLs. If L = {1, 2} and σ(1)

= L1 and σ(2) = L2. Clearly, σ(L) = σ(1) ∪ σ(2) = L1 ∪ L2 is CFL by
the above theorem.

2.
3.

Catenation: Let L1 and L2 be two CFLs. Let L = {12}. σ(1)

= L1 and σ(2) = L2. Clearly, σ(L) = σ(1). σ(2) = L1L2 is CFL as in the
above case.

4.
5.

Catenation closure (): Let L1 be a CFL. Let L = {1} and σ(1) = L1.

Clearly, = σ(L) is a CFL.

6.
7.

Catenation +: Let L1 be a CFL. Let L = {1}+ and σ(1) = L1.

Clearly = σ(L) is a CFL.

8.
9.
Homomorphism: This follows as homomorphism is a particular
case of substitution.

10.
Theorem 8.4

If L is a CFL then LR is CFL.

Proof. Let L = L(G) be a CFL where G = (N, T, P, S) be a CFG

generating L. Let GR = (N, T, PR, S) be a new grammar constructed
with PR = {A → αR|A → α ∊ P}. Clearly, one can show by induction
on the length of the derivation that L(GR) = LR.

Theorem 8.5

CFLs are not closed under intersection and complementation.

Proof. Let L1 = {anbncm|n, m ≥ 1} and L2 = {ambncn|n, m ≥ 1}.

Clearly, L1 and L2 are CFLs. (Exercise: Give CFG’s for L1 and L2).

L1 ∩ L2 = {anbncn|n ≥ 1} which has been shown to be non-context-

free. Hence, CFLs are not closed under ∩.

For non-closure under complementation, if CFL’s are closed under

complementation, then for any two CFLs L1 and L2, L1 ∩ L2 =
(L1c ∪ L2c)c which is a CFL. Hence, we get CFLs are closed under
intersection, which is a contradiction.

Remark. CFLs are not closed under difference i.e., L1 − L2 need

not be a CFL for all CFLs L1 and L2. If for all
CFLs L1 and L2 if L1 − L2 where a CFL, then taking L1 as Σ* we get
Σ* − L2 is a CFL for all CFL’s L2, which we know is not true.

Even though intersection of two CFLs need not be context-free, an

attempt is made to look for the closure under intersection of a CFL
with a regular language. This is a weaker claim.

Theorem 8.6

If L is a CFL and R is a regular language, then L ∩ R is a CFL.

Proof. Let M = (K, Σ, Γ, δ, q0, Z0, F) be a pushdown automata

(PDA) such that T(M) = L and let be a DFSA
such that T(A) = R. A new PDA M′ is constructed by
combining M and A such that the new automaton simulates the
action of M and A on an input parallely. Hence, the new PDA M′ will
be as follows: where δ′
([p, q],a, X) is defined as follows: δ′([p, q],a, X) contains ([r, s], γ)
where and δ(p, a, X) contains (r, γ).

Clearly for each move of the PDA M′, there exists a move by the
PDA M and a move by A. The input a may be in Σ or a = ε.
When a is in Σ, and when a = ε, , i.e., A does
not change its state while M′ makes a transition on ε.

To prove L(M′) = L ∩ R. We can show that if

and only if where . The

proof is by induction on the number of derivation steps and is
similar to that of closure of regular languages with respect to
intersection. If qf ∊ F and , then w belongs to both L and R.
Therefore M′ accepts L ∩ R.

Remark Since L ∩ R is context-free where L is a CFL and R being

regular, is also a CFL.
Theorem 8.7

Family of CFLs is closed under inverse homomorphism.

Proof. Let L be a CFL over Σ′. Let h be a homomorphism from Σ to

Σ′. To prove h−1 (L) is CFL. We give the proof using PDA which
accepts h−1(L). Let M = (K, Σ′, Γ, δ, q0, Z0, F) be a PDA such
that T(M) = L. We now construct a PDA M′ such that T(M′) = h−1(L).
Let M′ = (K′, Σ, Γ, δ′, (q0, ε), Z0, F × {ε}) where

K′ = {(q, x) q ∊ K, x is a suffix (need not be proper) of some

string h(a) for a ∊ Σ}.

2.
3.

δ′ is defined to simulate δ and action of h.

4.
.

δ′((q, ε), a, X) = {(q, h(a)), X)} for all a ∊ Σ, all states q ∊ K, X ∊ Γ.

Here a cannot be ε.

.
.

If δ(q, b, X) contains (p, γ) where b ∊ Σ or b = ε, then δ′(q, bx), ε, X)

contains ((p, x), γ).

.
.
The set of accepting states are of the form (f, ε) where f ∊ F.

From the construction of M′, one can see that, for the new PDA M,


.

a buffer is added to the finite state set.

.
.

Whenever the buffer is empty, it can be filled with h(a), where a is

the next input symbol, keeping the stack and the state of M in tact.

.
.

Whenever the buffer is not empty the PDA M′s non-ε-moves can be

performed on the frontmost element of the buffer, thereby removing
that element from the buffer.

.
.

The PDA M′s ε-moves can always be performed without affecting

the buffer.

.
Hence, we can see that if and only
if , ((p, ε), ε, γ). The proof that T(M′) = h−1(L) follows
from the above argument. The whole action of M′ having M as its
part can be depicted pictorially as shown:

Hence the theorem.

Since the family of CFL is closed under the six basic operations of
union, concatenation, Kleene closure, arbitrary homomorphism,
intersection with regular sets, and inverse homomorphism, the
family of CFL is a full abstract family of languages (AFL). It is also
a full trio.

Decidability Results for CFL

The three decision problems that we studied for regular languages
are emptiness, finiteness, and membership problems. The same
can be studied for CFLs also. The discussion of the results under
this section is based on either the representation of a CFL as in a
PDA form or in simplified CFG form. Hence, we will be using CFG
in CNF or a PDA which accepts by empty stack or final state.

Theorem 8.8
Given a CFL L, there exists an algorithm to test whether L is
empty, finite or infinite.

Proof. To test whether L is empty, one can see whether the start
symbol S of the CFG G = (N, T, S, P) which generates L is useful
or not. If S is a useful symbol, then L ≠ φ.

To see whether the given CFL L is infinite, we have the following

discussion. By pumping lemma for CFL, if L contains a word of
length t, with |t| > k for a constant k (pumping length), then
clearly L is infinite.

Conversely, if L is infinite it satisfies the conditions of the pumping

lemma, otherwise L is finite. Hence, we have to test
whether L contains a word of length greater than k.

Membership Problem for CFL

Given a CFL L and a string w, one wishes to have a procedure to
test whether w is in L or not. One can approach for the result
directly using derivation trees. That if |w| = n and the
grammar G generating L is in CNF, then the derivation tree
for w uses exactly 2n − 1 nodes labeled by variables of G. The
number of possible trees for w and node labelings leads to an
exponential time algorithm. There is another efficient algorithm
which uses the idea of “dynamic programming.” The algorithm is
known as CYK (Cocke, Younger, and Kasami) algorithm which is
an O(n3) algorithm.

CYK Algorithm
We fill a triangular table where the horizontal axis corresponds to
the positions of an input string w = a1 a2 ... an. An entry Xij which
is an ith column entry will be filled by a set of variables A such
that . The triangular table will be filled row wise in
upward fashion. For Example, if w = a1a2a3a4a5, the table will look
like:

X15

X14 X25

X13 X24 X35

X12 X23 X34 X45

X11 X22 X33 X44

a1 a2 a3 a4

Note by the definition of Xij, bottom row corresponds to a string of

length one and top row corresponds to a string of length n, if |w|
= n. The computation of the table is as below.

First row (from bottom): Since the strings beginning and ending
position is i, they are simply those variables for which we
have A → ai, and listed in Xii. We assume that the given CFG in
CNF generates L.

To compute Xij which will be in (j − i + 1)th row, we fill all the

entries in the rows below. Hence, we know all the variables which
give strings aiai+1 ... aj. Clearly, we take j − i > 0. Any derivation of
the form will have a derivation step
like . B derives a prefix
of aiai+1 ... aj and C derives a suffix of aiai+1... aj. i.e.,
, k < j and . Hence we place A in Xij if, for
a k, i ≤ k < j, there is a
production A → BC with B ∊ Xik and C ∊ Xk+1j.
Since Xik and Xk+1j entries are already known for all k, 1
≤ k ≤ j, Xij can be computed.

The algorithm terminates once an entry X1n is filled where n is the

length of the input. Hence we have the following theorem.

Theorem 8.8
The algorithm described above correctly computes Xij for all i and j.
Hence w ∊ L(G), for a CFL L = L(G) if and only if S ∊ X1n.

Example 8.2.

Consider the CFG G with the following productions:


S0 → AB|SA


S → AB|SA|a


A → AB|SA|a|b


B → SA

We shall test the membership of aba in L(G) using CYK algorithm.
The table thus produced on application of CYK algorithm is as
below:

Since X13 has S0, aba is in L(G).

SubFamilies of CFL
In this section, we consider the special cases of CFLs.

Definition 8.1
A CFG G = (N, T, P, S) is said to be linear if all rules are of the
form A → x By or A → x, x, y ∊ T*, A, B ∊ N. i.e., the right-hand
side consists of at most one non-terminal.

Example 8.3.

G = ({S}, {a, b}, P, S) where P = {S → aSb, S → ab} is a linear CFG

generating L = {anbn | n ≥ 1}.

Definition 8.2

For an integer k ≥ 2, a CFG, G = (N, T, P, S) is termed k-linear if

and only if each production in P is one of the three forms,
A → xBy, A → x, or S → α, where α contains at most k non-
terminals and S does not appear on the right-hand side of any
production, x, y ∊ T*.

A CFL is k-linear if and only if it is generated by a k-linear

grammar.

Example 8.4.

G = ({S, X, Y}, {a, b, c, d, e}, P, S) where P =

{S → XcY, X → aXb, X → ab, Y → dYe, Y → de} generates
{anbncdmem|n, m ≥ 1}. This is a 2-linear grammar.

Definition 8.3

A grammar G is metalinear if and only if there is an integer k such

that G is k-linear. A language is metalinear if and only if it is
generated by a metalinear grammar.

Definition 8.4
A minimal linear grammar is a linear grammar with the initial letter
S as the only non-terminal and with S → a, for some terminal
symbol a, as the only production with no non-terminal on the right
side. Furthermore, it is assumed that a does not occur in any other
production.

Example 8.5.

G = ({S}, {a, b}, {S → aSa, S → b}) is a minimal linear grammar

generating {anban | n ≥ 0}.

Definition 8.5

An even linear grammar is a linear grammar where all productions

with a non-terminal Y on the right-hand side are of the form
X → uYυ where |u| = |υ|.

Definition 8.6

A linear grammar G = (N, T, P, S) is deterministic linear if and only

if all production in P are of the two forms.

X → aYυ X → a, a ∊ T, υ ∊ T*

and furthermore, for any X ∊ N and a ∊ T, there is at most one

production having ‘a’ as the first symbol on the right-hand side.

Definition 8.7

A CFG G = (N, T, P, S) is sequential if and only if an ordering on

symbols of N can be imposed {X1,..., Xn} where S = X1 and for all
rules Xi → α in P, we have α ∊ (VT ∪ {Xj | i ≤ j ≤ n})*.
Example 8.6.

G = ({X1, X2}, {a, b}, P, X1) where P =

{X1 → X2X1,X1 → ε, X2 → aX2b, X2 → ab} is sequential
generating L* where L = {anbn | n ≥ 1}.

Definition 8.8

The family of languages accepted by deterministic PDA are called

deterministic CFL (DCFL).

Definition 8.9

A PDA M = (K, Σ, Γ, δ, qr, Z0, F) is called a k-turn PDA, if and only

if the stack increases and decreases (makes a turn) at most k
times. If it makes just one-turn, it is called a one-turn PDA. When k
is finite, it is called finite-line PDA. It should be noted that for some
CFL number of turns of the PDA cannot be bounded.

We state some results without giving proofs.

Theorem 8.9

The family of languages accepted by one-turn PDA is the same as

the family of linear languages.

Theorem 8.10

The class of regular sets forms a subclass of even linear

languages.

Definition 8.10
A CFG G = (N, T, P, S) is said to be ultralinear (sometimes called
non-terminal bounded) if and only if there exists an integer k such
that any sentential form α with, , contains at most k-non-
terminals (whether leftmost, rightmost or any derivation is
considered). A language is ultralinear (non-terminal bounded) if
and only if it is generated by an ultralinear grammar.

Theorem 8.11

The family of ultralinear languages is the same as the family of

languages accepted by finite-turn PDA.

For Example, consider the CFL:

L = {w | w ∊ {a, b}+, w has equal number of a’s and b’s}.

For accepting arbitrarily long strings, the number of turns of the

PDA cannot be bounded by some k.

Definition 8.11

Let G = (N, T, P, S) be a CFG. For a sentential form α,

let #N(α) denote the number of non-terminals in α. Let D be a
derivation of a word w in G.

D: S = α0 ⇒ α1 ⇒ α1 ... ⇒ αr = w

The index of D is defined as:

For a word w in L(G), there may be several derivations, leftmost,
rightmost etc. Also if G is ambiguous, w may have more than one
leftmost derivation.

For a word w ∊ L(G), we define:

where D ranges over all derivations of w in G. The index of

G, ind(G), is the smallest natural number u such that for all
w ∊ L(G), ind(w, G) ≤ u. If no such u exists, G is said to be of
infinite index. Finally, the index of a CFL L is defined as ind(L) =
minG ind(G) where G ranges over all the CFGs generating L.

We say that a CFL is of finite index, then the index of L is finite.

The family of CFL with finite index is denoted as FI. Sometimes,
this family is also called the family of derivation-bounded
languages.

Example 8.7.

Let G = ({X1, X2}, {a, b}, P, X1) where

P = {X1 → X2X1, X1 → ε, X2 → aX2b, X2 → ab}

is of index 2. The language consists of strings of the

form an1 bn1 an2 bn2 ... anr bnr. In a leftmost derivation, the maximum
number of non-terminals that can occur is 2 whereas in a
rightmost derivation it is r and keeps increasing with r. This
grammar is not a non-terminal bounded grammar but it is of finite
index.

Example 8.8.

L = Dyck set = well-formed strings of parentheses is generated by

{S → SS, S → aSb, S → ab} (a = (, b =)). Here, we find that as the
length of the string increases, and the level of nesting increases the
number of non-terminals in a sentential form keeps increasing and
cannot be bounded. This CFG is not of finite index. L is not of finite
index.

Definition 8.12

A CFG G = (N, T, P, S) is termed non-expansive if there is no non-

terminal A ∊ N such that and α contains two occurrences of
A. Otherwise G is expansive. The family of languages generated
by non-expansive grammars is denoted by NE.

Theorem 8.12

NE = FI.

Parikh Mapping and Parikh’s

Theorem
We present in this section a result which connects CFLs to semi-
linear sets.

Semi-linear Sets
Let N denote the set of non-negative integers. For each integer n ≥
1, let

Nn = N × N × ... × N (n times)

If x = (x1, ..., xn), and y = (y1, ..., yn) ∊ Nn, they are called vectors of
order n or n-vectors. We can talk about linear dependence, linear
independence of such vectors.

x + y = (x1 + y1, x2 + y2, ..., xn + yn)

cx = (cx1, cx2, ..., cxn) where c is a constant c ∊ N.

Definition 8.13

Given subsets C and P of Nn, let L(C; P) denote the set of all

elements in Nn which can be represented in the form c0 + x1 + ...
+ xm for some c0 in C and some (possibly empty) sequence
x1, ..., xm of elements of P. C is called the set of constants and P
the set of periods of L(C; P).

Thus, L(C; P) is the set of all elements x in Nn of the

form with c0 in C, x1, ..., xm in P, and k1, ..., km in

N. If C consists of exactly one element c0, C = {c0}, then we write
L(C; P) as L(c0; P).

Definition 8.14

L ⊆ Nn is said to be a linear set if L = L(c0; P) where c0 ∊ Nn and P

is a finite set of elements from Nn. P = {P1, ..., Ps}. In this case,
L(c0; P) can be written as L(c0; P1, ..., Ps).

Example 8.9.

Let c0 = (0, 0, 1), P1 = (2, 0, 0), P2 = (0, 2, 0).

Then L((0, 0,1); (2, 0, 0), (0, 2, 0)) is a linear set. It has elements of
the form (0, 0,1) + k1 (2, 0, 0) + k2(0, 2, 0) i.e., (2k1, 2k2, 1)
where k1, k2 ∊ N.

A linear set can be represented in more than one way.

If L(c0; P1, ..., Ps) is a linear set, it can also be represented
by L(c0; P1, ..., Ps, P1 + P2) etc. The only condition on P is that it
should be finite.

Definition 8.15
A subset of Nn is said to be semi-linear if it is a finite union of linear
sets.

Let L1 = ((0, 0, 1); (2, 0, 0), (0, 2, 0)) be a linear set having
elements of the form (2k1, 2k2, 1).

L2 = ((0, 0, 2); (1, 1, 0)) be another linear set having elements of
the form (k3, k3, 2) where k1, k2, k3 ∊ N.

Then L1 ∪ L2 is a semi-linear set having elements of the form (2k1,

2k2, 1) or (k3, k3, 2).

Parikh Mapping
Let Σ = {ai |1 ≤ i ≤ n} be an alphabet. That is, we are considering
an ordering among the elements of Σ.

Definition 8.16

Let Σ = {ai |1 ≤ i ≤ n}. The mapping ψ: Σ* → Nn is defined by ψ(z) =

(na1(z), na2(z), ..., nan(z)) where nai(z) denotes the number of
occurrences of ai in Z.

Thus ψ(ε) = (0, ..., 0) and . ψ is called

Parikh mapping. If L is a language,

Example 8.10.

Let Σ = {a, b, c}

z = aabbaaccb
Then ψ(z) = (4, 3, 2).

The definition of Parikh mapping was given by Parikh in 1961, and

also he proved a result connecting CFLs to semi-linear sets.

Theorem 8.13

If L is a CFL, then ψ(L) is semi-linear.

Proof. Without loss of generality, let ε ∉ L. Let G = (N, T, P, S) be

an ε-free grammar generating L. (If we add ε to L, we need to add
only a linear set containing (0, ..., 0) alone). Again without loss of
generality, we assume G has no unit-productions.

Let N′ be a subset of N containing S and let t = #(N′). Consider the

subset L′ of L which consists of strings w in L having the property
that in some derivation tree of , the non-terminals which are
node names are exactly the elements in N′. Since L is the union of
all such L′ and there are only a finite number of subsets of N and
hence only a finite number of such L′, it is enough if we show ψ(L′)
is semi-linear. From that it will follow ψ(L) is semi-linear.

For every element X in N′, we define two sets AX and BX. A

word α in (N ∪ T)* is in AX if the following conditions are satisfied.

α contains exactly one occurrence of X.

2.
3.

α contains no other non-terminal.

4.
5.

There is a derivation tree of , containing nodes, whose node

names are in N′, such that no non-terminal occurs more than t + 2
times in any path of the tree.

A word w is in BX if the following two conditions are satisfied:

w is in T*.

2.
3.

There is a derivation tree of such that each non-terminal

in N′, and no other non-terminal, is a node name in the tree. Also
no non-terminal occurs more than t + 2 times in any path of the
tree.

Since there are only a finite number of trees satisfying conditions

(3) and (5), AX and BX are finite sets.

Extend ψ from T* → Nn to (N ∪ T)k → Nn such that

ψ(X) = (0, ..., 0)

for each X in N′ and

For Example, if N′ = {X, Y}, T = {a, b, c}

ψ(aabXcYbb) = (2, 3,1)

For each variable X in N′ let

PB = {ψ(w) | w in BS} = {u1 ..., uh}

For each j, 1 ≤ j ≤ h, let

Each Cj is linear.

To show that ψ(L′) is semi-linear it is enough to show that:

First, we show .

Let w be an element of L′ and τ (see Figure 8.3) a derivation tree

of such that the node names in τ which are non-terminals are
exactly the elements in N′. Suppose no non-terminal occurs more

than t + 2 times in any path; then w is in Bs, so that ψ(w) is in .

Figure 8.3. Cutting and rejoining of trees

Suppose that some non-terminal occurs more than t + 2 times in

any path. For each node r in τ, let τr be the subtree of τ generated
by r. There exists a subtree τβ0 in τ containing t + 3
nodes β0, β1, ..., βt+2 with node name X, each βi+1 is a descendant
of β, such that no proper subtree of τβ0 contains more than t + 2

nodes with the same label. For each i ≥ 1, let be the
set of those non-terminals which are node names in Tβ1, but not
in Tβi+1. Let p ≥ 1 be the smallest integer such that:
(6) .

If (6) were not true for any p, 1 ≤ p ≤ t + 1, then τ would contain at

least t + 1 different node labels which are non-terminals, a
contradiction. Thus, the existence of p is proved. Let τ′ be the tree
obtained from τ by deleting τβp and replacing it with Tβp−1. By (6) the
node names which are non-terminals in T′ are exactly the non-
terminals in N′. Also τ′ has fewer nodes than τ. Let U be the part
of Tβb from which Tβp−1 has been deleted, but node βp+1 remains.
Then, U is the derivation tree of a derivation
where w contains one occurrence of X. i.e., w′ is in AX. This
process of cutting of subtrees can be done for different nodes
in τ till a tree with root S occurs such that and ψ(w) = uj for
some uj is in BS. Whenever this cutting and replacing process takes
place, we have removed a portion of τ corresponding to the
derivation , where α′ contains exactly one X and no other
non-terminal.

Thus, α′ ∊ AX. Thus, if w is the result of the derivation for

which τ is the derivation tree, then ψ(w) ∊ Cj. This is because, each
cutting and replacing contributes one and there will be a finite
number of cutting and replacing and final reduced tree τ attributes

to uj. Thus, τ(w) ∊ Cj. Hence, we get . Now, we have to

show .

Let υ be in some Cj. Suppose υ = uj. Then, υ = ψ(w) for

some w in BS and so w is in L′. It is enough, if we show that if υ is
in Cj ∩ ψ(L′), then is in ψ(L′), for each in N′. It will

follow that in N, in AX, is in ψ(L′).

Thus, Cj ⊆ ψ(L′) and the result follows.
Now to show if v is in is in ψ(L′) for each , let
there be a word w ∊ T* such that and ψ(w) = υ and let τ be
the corresponding derivation tree. The non-terminals which are
node names in τ are exactly the elements in N′.

By definition of , there is a derivation tree τ′ of of some

word w′ such that conditions (1), (2), and (3) are satisfied. Let β1 be
a node in τ whose node name is X. β1 exists, since the node
names in τ are exactly the non-terminals in N′. Let τ be cut at β1.

This gives rise to two portions T1 and T2. Attach T1 to τ′ at β1 and

attach T2 to τ′ at the node β with label X.

Let τ″ be the tree thus obtained. Then, the result of τ″ is a word w″
in L′ such that . Thus is in ψ(L′).

Hence,

Since L is the union of a finite number of such L′ and each ψ(L′) is

semi-linear, ψ(L) is semi-linear.
Example 8.11.

L = {anbn|n ≥ 1}

ψ(L) = L (C; P)

C = (1, 1) P = (1, 1)

ψ(anbn) = (n, n)

which can be represented as:

(1, 1) + (n − 1) (1, 1).

We can also see that the grammar for L is S → aSb, S → ab. The

constant comes from S → ab, (1, 1) being ψ(ab).

The period comes from S → aSb as and ψ(aSb) = (1, 1).

Example 8.12.

The Parikh mapping for L = {wcwR|w ∊ {a, b}*} is L(C; P1, P2)

where

C = (0, 0, 1)

P1 = (2, 0, 0)

P2 = (0, 2, 0)

The grammar is S → aSa, S → bSb, S → c.

Constant comes from S → c.

Period P1 comes from S → aSa.

Period P2 comes from S → bSb.

Application of Parikh’s Theorem

Parikh Theorem can be used to show certain languages are not
CFL.

Example 8.13.

L = {an2 | n ≥ 1} is not a CFL.

The Parikh mapping of L is {(n2)} which cannot be expressed as a

semi-linear set.

If the Parikh mapping of a language L is not semi-linear, we can

conclude that L is not context-free. But if the Parikh mapping
of L is semi-linear, we cannot conclude L is a CFL. For example,
consider {anbncn|n ≥ 1}. The Parikh mapping can be expressed
as L(c; p) where c = (1, 1, 1) and p = (1, 1, 1). But, we know {anbncn|
n ≥ 1} is not a CFL. Thus, the converse to Parikh’s theorem is not
true.

In an earlier chapter, we have seen that if a CFL is a bounded CFL,

whether it is inherently ambiguous or not, is decidable. The
algorithm which does this, is based on Parikh’s theorem. We also
get the results similar to the one given below as a consequence of
Parikh’s theorem.

Example 8.14.

Let L = {w|w ∊ {a, b}*, |w| = n2 for an integer n}. Then, L is not

context-free.

If L is context-free, by closure under homomorphism, h(L) is

context-free for any homomorphism h. Define h from {a, b}* → c*
as follows:

h(a) = h(b) = c.

Then, h (L) = {cn2 | n ≥ 1} which has been shown to be non-context-

free by Parikh’s theorem. Hence, L cannot be context-free.

Self-embedding Property
In this section, we consider the self-embedding property which
makes CFL more powerful than regular sets. Pumping lemma for
CFL makes use of this property. By this property, it is possible to
pump equally on both sides of a substring which is lacking in
regular sets.

Definition 8.17

Let G = (N, T, P, S) be a CFG. A non-terminal A ∊ N is said to be

self-embedding, if where x, y ∊ (N ∪ T)+. A grammar G is
self-embedding if it has a self-embedding non-terminal.

A CFG is non-self-embedding if none of its non-terminals are self-

embedding. Any right linear grammar is non-self-embedding, as
the non-terminal occurs as the rightmost symbol in any sentential
form. Hence, a regular set is generated by a non-self-embedding
grammar. We have the following result:

Theorem 8.14

If a CFG G is non-self-embedding, then L(G) is regular.

Proof. Let G = (N, T, P, S) be a non-self-embedding CFG. Without

loss of generality we can assume that ε ∉ L(G) and G is in GNF.
(While converting a CFG to Greibach normal form (GNF), the self-
embedding or non-self-embedding property does not get affected).
Let k be the number of non-terminals in G and l be the maximum
length of the right-hand side of any production in G. Let w ∊ L(G)
and consider a leftmost derivation of w in G. Every sentential form
is of the form xα where x ∊ T* and α ∊ N′*. The length of α can be
at most kl. This can be seen as follows. Suppose there is a
sentential form xα where |α| > kl. Consider the corresponding
derivation tree which is of the form given in figure.
Consider the path from S to X, the leftmost non-terminal in α.
Consider the nodes in this path where non-terminals are introduced
to the right of the nodes. Since the maximum number of nodes
introduced on the right is l − 1, there must be more than k such
nodes as |α| > kl. So two of such nodes will have the same label
say A and we get , x′ ∊ T+, β ∊ N+. Hence, A is self-
embedding and G is not non-self-embedding as supposed. Hence,
the maximum number of non-terminals which can occur in any
sentential form in a leftmost derivation in G is kl.

Construct a right linear grammar G′ = (N′, T′, P′, S′) such that L(G′)

= L(G′).

N′ = {[α]|α ∊ N+, |α| ≤ kl}.

S′ = [S]

P′ consists of rules of the following form.

If A → aB1 ... Bm is in P, then

[Aβ] → a[B1... Bmβ] is in P′ for all possible β ∊ N* such that | Aβ|
≤ kl,

|B1... Bmβ| ≤ kl. So, if there is a derivation in G.

S ⇒ a1α1 ⇒ a1a2α2 ⇒ ... ⇒ a1 ... an−1αn−1 ⇒ a1 ... an

there is a derivation in G′ of the form:

[S] ⇒ a[α1] ⇒ a1a2[α2] ⇒ ... ⇒ a1 ... an−1 [αn−1] ⇒ a1 ... an

and vice versa. Hence, L(G) = L(G′) and L(G) is regular.

Theorem 8.15

Every CFL over a one-letter alphabet is regular. Thus, a set {ai|

i ∊ A} is a CFL if and only if A is ultimately periodic.

Proof. Let L ⊆ a* be a CFL. By pumping lemma for CFL, there

exists an integer k such that for each word w in L such that |w|
> p, w can be written as υvxyz such that |υxy| ≤ k, |υy| > 0
and uυixyiz ∊ L for all i ≥ 0, w is in a*. Hence, u, υ, x, y, z all are
in a*. So, uxz(υy)* is in L for all i > 0. Let υy = aj. So uxz(aj)i is
in L for all i ≥ 0. Let n = k(k − 1) ... 1 = k!. Then w(an)m is in L,
because w(an)m can be written as w(aj)i for ,1
≤ j ≤ k. w(an)* ⊆ L ⊆ a* for each word w in L such that |w|> k.

For each i, 1 ≤ i ≤ n, let Ai = ak + i (an)* ∩ L. If Ai ≠ φ, let wi be the

word in Ai of minimum length. If Ai = φ, let wi be undefined.
Then w is in for each w in L with |w| > k. Let B be the set

of strings in L of length ≤ k. Then . B is a finite set

represented by u1+ ... +ur (say). Then L is represented by (u1+ ...
+ ur) + (w1+ ... +wn) (an)*. Therefore, L is regular.

Example 8.15.

As seen earlier, it immediately follows that {an2 |n ≥ 1}, {a2n |n ≥ 0},

{ap|p is a prime} are not regular and hence they are not context-
free.

Homomorphic Characterization
Earlier, we saw that the family of CFL is a full AFL. For an AFL F, if
there exists a language L0 ∊ F, such that any language L in F can be
obtained from L0 by means of some of these six operations,
then L0 is called a generator for the AFL F. Any regular set is a
generator for the family of regular sets. Let R be any regular set.
Any other regular set R′ can be got by (Σ*∪ R) ∩ R′. Next, we show
that the Dyck set is a generator for the family of CFL.

Definition 8.18

Consider the CFG

, n ≥ 1. The language L(G) is

called the Dyck set over T and usually denoted by D n.

Example 8.16.

Let G = ({S}, T, S, {S → aSa′, S → bSb′}), T = {a, b, a′, b′}.

The set D2 consists of matching nested parentheses if:

a stands for (,

b stands for [,

a′ stands for ),

b′ stands for ].

The set Dn has the following properties:

1.
For w1, w2 ∊ Dn, w1w2 ∊ Dn.
2.
3.
For w1 ∊ Dn, for i = 1, 2, ..., n.
4.
5.
Each word w ≠ ε in Dn is of the form , for
some w1, w2 ∊ Dn for some i ∊ {1, 2, ..., n}.
6.
7.
If , then w ∊ Dn.
8.
One can see that all the above properties are satisfied for Dn and
this can be proved from the definition of Dyck set. We have the
following result for CFLs.

Theorem 8.15

Every CFL can be represented as a homomorphic image of the

intersection of a regular language and a Dyck language.

Proof. Let L be a CFL. Without loss of generality let ε ∉ L. If ε ∊ L,

then L − {ε} can be expressed in the form h(D ∩ R) where R is a
regular language, D is a Dyck language. Then, L = h(D ∩ (R ∪ {ε}))
where h(ε) = ε. Hence, we can assume that L is ε-free and is
generated by a CFG G = (N, T, P, S) which is in CNF. Let T =
{a1, ..., at} and let P contain productions of the form Ai → BiCi, i = 1,
2, ..., m and terminal rules of the form A → ai, A ∊ N, ai ∊ T.
Let Dt + m be a Dyck language
over . Let R be the
regular language generated by the right linear grammar.

G′ = (N, T′, P′, S),

where

Having defined the Dyck set Dt + m and R, we define the

homomorphism h as follows so that h(Dt + m ∩ R) is L.

h is a homomorphism from T′ to T given by:

h(ai) = ai, for i = 1, ..., t

h(ai) = h = ε, for

i = t + 1, t + 2, ..., t + m,

j = 1, ..., t + m.

Now to prove L = h(Dt+m ∩ R).

1.
L ⊆ h(Dt+m ∩ R).

To prove this let us prove a general result as below.

If , where A ∊ N, w ∊ T*, is a

derivation in G, there is a word wq in Dm + r such that

and w = h(wq).

To prove that , and w = h(α).

The proof is by induction on the number of derivation steps.

For k = 1, and w ∊ T means that A → w is a rule in P.

By G′, for this rule there has to be a rule A → ww′ in P′ and

hence ww′ ∊ Dt+m and w = h(ww′).

Assume the result to be true for all derivations n, n ≥ 1 and let,

Ai = w0 ⇒ w1 ⇒ w2 ⇒ ... ⇒ wn+1 = w, for Ai ∊ N, wn+1 = w ∊ T+ be a

derivation according to G. Since G is in Chomsky normal form,

clearly w1 will be BiCi, 1 ≤ i ≤ m. Clearly, w = α1α2 where

and in less than n number of steps. Then by our inductive

assumption, there are words w1, w2 in Dt + m such that:
α1 = h(w1), α2 = h(w2), and

Hence,

Now, choose α = at + iw1a′t+iw2 such that h(α) = h(w1) h(w2)

= α1α2 = wn + 1 = w.

Hence for any derivation , there exists word α ∊

Dt + m and such that h(α) = w.

To show h(Dt + m ∩ R) ⊆ L, we proceed as before by simulating the

derivations of G′ by derivations in G. Let

2.
A = α0 ⇒ α1 ⇒ ... αn = α,

where A ∊ N, α ∊ Dt + m be a derivation according to G′.

Then .

For n = 1, α = aa′ for some a ∊ T and A → a is the rule which is

used.
Hence, and h(α) = a, a will be in L by a terminal rule
of G, A → a.

Assume the result to be true for all k ≤ n in the above type of

derivations. Consider

Equation 8.1.

where A ∊ N, α ∊ Dt+ m be a derivation of G′. Since n ≥ 1, n +1 ≥ 2,

the rules of G′ of the form A → aa′, a ∊ T and A → aa′a′t + iCi are
not used in the beginning of the derivation. These applications will
produce words that cannot appear in Dt+ m. Hence, for some αj·, 1
≤ j ≤ t, α1 = am + j Bj is used in the derivation in the beginning of the
derivation. But the derivation from Aj deriving α′n + 1, is such
that α = at+ iα′n + 1 where α′n + 1 ∊ Dt + m.

That is,

Equation 8.2.

Hence, α will be of the form at+ jα′α′t + j α″ where α′, α″ ∊ Dt + m.

Hence α′k, in the above derivation of Equation (8.2) from Bj, must
be of the form , 2 ≤ k ≤ n. Hence, the rule that is
used in (k − 1)th step must be B → aa′at + jCj, a ∊ T, B ∊ N and
for B → a in P the corresponding rule in P′ is B → aa′. Hence, using
this rule in the (k − 1)th step of derivation of Equation (8.2), we
obtain in less than n steps. Likewise, we argue for
through the applications of production of G′. Hence, for derivation
of Equation (8.1), the actual details are:
Hence the result.

Problems and Solutions

1. Prove that the following languages are not context-free.
1.
L1 = {ap|p is a prime}.
2.
3.
L2 = {a,b}* − {anbn2 | n ≥ 0}
4.

Solution 1.
. L1 = {ap|p is a prime}.
2.
Suppose L1 is context-free.
3.
Then, by pumping lemma there exists k such that for all z ∊ L1 and |z| ≥ k. z can
form uvxyz such that uυixyiz ∊ L for all i ≥ 0. Consider some p > k. ap ∊ L1
4.
ap = uυxyz
5.
Now u, υ, x, y, z ∊ a*. Therefore, by pumping lemma:
6.

uxz(υy)i ∊ L1 for all i ≥ 0

Let |υy| = r

uxz(ar)i ∊ L1 for all i ≥ 0

or

z(ar)i − 1 ∊ L1 for all i ≥ 0


ap+ r(i − 1) ∊ L1 for all i ≥ 0

Choose i such that p + r(i − 1) is not a prime. Select i − 1 = p. Therefore, i = p
ap+ rp ∊ L1.
But, ap + rp = ap (r + 1).
p(r + 1) is not a prime. So, we come to the conclusion that as where s is not a p
This is a contradiction.
Therefore, L1 is not context-free.
7.
L2 = {a, b}* − {anbn2| n ≥ 0}.
8.
Suppose L2 is context-free.
9.
Since the family of context-free languages is closed under intersection with reg
is context-free.
10.
This contains strings of the form L3 = {anbm| m ≠ n2}.
11.
We shall show this is not context-free. If L3 is context-free, then by pumping l
constant k which satisfies the conditions of pumping lemma. Choose z = anbm w
12.
z = uυxyz where |υxy| ≤ k, |υy| ≥ 1 such that uυixyiz ∊ L3 for all i ≥ 0. If υ or y co
then by pumping we shall get a string which is not of the form aibj. If υ ∊ a*, y
show by pumping that we can get a string not in L3. Let υ = ap and y = bq. Then
13.
Choose m = (n − p)2 + q. i.e., we started with anb(n−p)2+q. Then an−pb(n−p)2+ q−q = a(n−p)
14.
This is a contradiction.
15.
L3 is not context-free and hence L2 is not context-free.
16.

2. Which of the following sets are context-free and which are not? Justify your an
a. L1 = {anbmck|n, m, k ≥ 1 and 2n = 3k, or 5n = 7m}.

Solution
S → A S → BD
.
A → a3 Ac2 D → cD

A → a3Cc2 D → c

C → bC B → a7Bb5

C → b B → a7b5
This CFG generates L1.

b. L2 = {aibjckdl, i, j, k, l ≥ 1, i = l, j = k}.

Solution L2 is CFL generated by:

.
S → aSd

S → aAd

A → bAc

A → bc

c. L3 = {x ∊ {a, b, c}* |# ax = #bx = #cx}.

Solution L3 is not context-free.

. Suppose L3 is context-free.
Then, since the family of CFL is closed under intersection with regular sets L3
regular.
This is {an bn cn |n ≥ 0}.
We have shown that this is not context-free
L3 is not context-free.

d. L4 = {ambn|n, m ≥ 0, 5m − 3n = 24}.

Solution It can be seen that:
.
a6b2 ∊ L4

a9b7 ∊ L4

a12b12 ∊ L4

a15b17 ∊ L4
S → a3Sb5, S → a6b2 which generates L4.

e. L5 = {ambn |n ≠ m}.

Solution Worked out earlier in Chapter 2.

3. Let NL2 be the set of non-context-free languages. Determine whether or not.

a. NL2 is closed under union.

Solution No.
.
L1 = {ap|p is a prime}.

L2 = {ap|p is not a prime}.

L1 ∪ L2 = {an|n ≥ 1} is CFL whereas L1 and L2 are not.

b. NL2 is closed under complementation.

Solution No.
.
L = {x|x ∊ {a, b}*} and not of the form ww is CFL.

is not a CFL.

c. NL2 is closed under intersection.

Solution No.
.
L1 = {ap|p is a prime}.
L2 = {a2n | n ≥ 0}.

L1 and L2 are not CFL. L1 ∩ L2 = {a2} is a singleton and hence a CFL.

d. NL2 is closed under catenation.

Solution No.
.
L1 = {ap|p is a prime}.

L2 = {ap|p is not a prime}.

L1 = {a2, a3, a5, a7, a11, ...}.

L2 = {a, a4, a6, a8, a9, a10, a12, ...}.

L1 L2 = {a3, a4, a6, a7, a8, a9, a10, a11, ...}

= a* − {a, a2, a5} is CFL.

e. NL2 is closed under Kleene closure.

Solution No.
. is CFL.
is CFL.

4. Is the language {xmyn|m, n ∊ N, m ≤ n ≤ 2m} context-free? Justify your answer.

Solution Yes.
.
S → xSy

S → xSyy

S → ε

5. Is the union of a collection of context-free languages always context-free? Just

Solution Finite union of CFL is CFL.

. L1, L2, L3, ..., Lk are CFL.
Let Li be generated by Gi = (Ni, T, Pi, Si) and let Ni ∩ Nj = φ for i ≠ j

generates L1 ∪ L2 ∪ ... Lk.
Infinite union of CFL is not CFL
Li = {ai|i is a particular prime}.
Li are CFL.
But is not a CFL.

6. Consider the following context-free grammar:

S → AA|AS |b

A → SA|AS|a
For strings abaab and bab, bbb.
Construct the CYK parsing table.
Are these strings in L(G)?

Solution
.

Exercises
1. Consider L = {y ∊ {0, 1}*| |y|0 = |y|1}. Prove or disprove that L is context-free.

2. Let G be the following grammar:

S → CF|DE|AB

D → BA|0

E → SD

A → AA|1

B → BB|0

C → FS

F → AB|1
Use CYK algorithm to determine which of the following strings are in L(G).
10110, 0111010, 0110110.
For each of the above strings present the final table, state if the string is in L(G) and i
derivation.

3. Let G be defined S → AS|b, A → SA|a. Construct CYK parsing tables for:

1.
bbaab
2.
3.
ababab
4.
5.
aabba
6.
Are these strings in L(G)?

4. S → SS|AA|b, A → AS|AA|a
Give CYK parsing tables for:
aabb, and bbaba.
Are these strings in L(G)?

5. For the CFG

S → AB |BC
A → BA|a

B → CC|b

C → AB|a
Construct CYK parsing tables for:
1.
ababb
2.
3.
bbbaaa
4.

6. Let DCFL be the collection of languages accepted by a deterministic PDA. Given Ex

even if L1 and L2 are in DCFL:
L1 · L2 need not be in DCFL

L1 − L2 need not be in DCFL

7. Given a DPDA M show that it is decidable whether L(M) is a regular set.

8. Let L be in DCFL and R a regular set. Show that it is decidable whether R is containe

9. Using a cardinality argument show that there must be languages that are not context-

10. Let LIN be a family of linear languages. Pumping lemma for linear languages can be
Let L ⊆ Σ* be in LIN. Then, there exists a constant p > 0 such that for all words z in
expressed as z = uxwyυ for some u, υ, x, y, w ∊ Σ* such that:
1.
|uxyυ| < p
2.
3.
|xy| ≥ 1
4.
5.
for all i ≥ 0, uxiwyiυ ∊ L
6.
1.
Prove the LIN language pumping lemma.
2.
3.
Using the LIN language pumping lemma, prove that the following languages are not
4.
1.
{aibicjdj|i, j ≥ 1}
2.
3.
{x|x is in {a, b}* and #a(x) = #b(x)}
4.
5.
{aibici|i ≥ 1}
6.

11. Prove or disprove the following claims:

1.
Family of deterministic context-free languages (DCF) is closed under complementati
2.
3.
Family of DCF is closed under union.
4.
5.
Family of DCF is closed under regular intersection.
6.
7.
Family of DCF is closed under reversal.
8.
9.
Family of LIN is closed under union.
10.
11.
Family of LIN is closed under intersection.
12.
13.
Family of LIN is closed under reversal.
14.

12. A language L consists of all words w over the alphabet {a, b, c, d} which satisfy each
conditions:
1.
#a(w) + #b(w) = 2(#c(w) + #d(w)).
2.
3.
aaa is a subword of w but abc is not a subword of w.
4.
5.
The third letter of w is not c.
6.
Prove that L is context-free.

13. Compare the family of minimal linear languages with the family regular languages. C
languages belonging to the intersection of these two families.

14. Prove that there is a linear language which is not generated by any deterministic line

15. A Parikh mapping ψ depends on the enumeration of the basic alphabet; another enum
different mapping ψ′. Prove that if ψ (L) is semi-linear for some ψ, then ψ′(L) is semi

16. Consider languages over a fixed alphabet T with at least two letters. Prove that, for a
there is a CFL Ln which is not generated by any type-2 grammar containing fewer th

17. Consider the grammar G determined by the productions:

X0 → adX1da|aX0a|aca,

X1 → bX1b|bdX0db.
Prove that L(G) is not sequential. This shows that not all linear languages are sequen
an Example of a sequential language which is not metalinear.

18. Let G be a CFG with the production S → AB, A → a, B → AB|b. Run the CYK algor
string aab.

19.1.
Modify the CYK algorithm to count the number of parse trees of a given string and t
number is non-zero.
2.
3.
Test your algorithm of part (i) above on the following grammar:
4.
S → ST |a

T → BS

B → +
5.
and string a + a + a.
6.

20. Use closure under union to show that the following languages are CFL.
1.
{ambm|m ≠ n}
2.
3.
{a, b}* − {anbn|n ≥ 0}
4.
5.
{w ∊ {a, b}*|w = wR}
6.
7.
{ambncpdq|n = q or m ≤ p or m + n = p + q}
8.

21. Prove the following stronger version of the pumping lemma.

Let G be a CFG. Then, there are numbers K and k such that any string w ∊ L(G) with
written as w = uυxyz with |υxy| ≤ k in such a way that either υ or y is non-empty and u
every n ≥ 0.

22. Show that the class of DCFL is not closed under homomorphism.

Chapter 9. Turing Machines
We have seen earlier that FSA have finite amount of memory and
hence cannot do certain things. For example, no FSA can accept
{anbn|n ≥ 1}. Though, it is possible to have FSA for adding two
arbitrarily long binary numbers, we cannot have an FSA which can
multiply two arbitrarily long binary numbers. Hence the question
arises. What happens when we leave this restriction of finite
memory? What problems can be solved by mechanical process
with unlimited memory? By mechanical process, it is meant a
procedure so completely specified that a machine can carry it out.
Are there processes that can be precisely described yet still cannot
be realized by a machine (computer)?

Programming – the job of specifying the procedure that a

computer is to carry out – amounts to determining in advance,
everything that a computer will do. In this sense, a computer’s
program can serve as a precise description of the process the
machine can carry out, and in this same sense it is meaningful to
say that anything that can be done by a computer can be precisely
described.

Often, one also hears a kind of converse statement to the effect that
“any procedure which can be precisely described, can be
programmed to be performed by a computer.” This statement is
the consequence of the work of Alan M. Turing and is called
Turing’s thesis or Church – Turing thesis or Turing hypothesis. A.
M. Turing put forth his concept of computing machine in his
famous 1936 paper, which was an integral part of the formalism of
the theory of computability. These machines are called Turing
machines (TM) and are still the subject of research in computer
science.

As an explication of the concept of algorithm, the concept of TM is

particularly helpful in offering a clear and realistic demarcation of
what constitutes a single step of execution. The problem of how to
separate the steps from one another in a step-by-step procedure is
clearly specified when it is put into the form of a TM table.

To start with, we may specify an effective procedure as follows: An

effective procedure is a set of rules which tell us, from moment to
moment, precisely how to behave. With this in mind, Turing’s
thesis may be stated as follows: Any process which could naturally
be called an effective procedure can be realized by a TM.

One cannot expect to prove Turing’s thesis, since the term

‘naturally’ relates rather to human dispositions than to any
precisely defined quality of a process. Support must come from
intuitive arguments. Hence, it is called Turing’s hypothesis or
Church—Turing thesis and not as Turing’s theorem.

Perhaps, the strongest argument in favor of Turing’s thesis is the

fact that, over the years, all other noteworthy attempts to give
precise yet intuitively satisfactory definitions of “effective
procedure” have turned out to be equivalent to Turing’s concept.
Some of these are Church’s “effective calculability”, Post’s
canonical systems, and Kleene’s general recursive function.

Turing also stated the halting problem for TMs and showed that it
is undecidable. The concept of ‘undecidability’ was one of the
major breakthroughs in mathematics (theoretical computer
science) in the first half of twentieth century. After this, many
problems like the Hilbert’s 10th problem have been seen to be
undecidable and the long search for an algorithm for these
problems was given up. But, it also became clear that problems like
Fermat’s last theorem were decidable, even though at that time
nobody could prove or disprove it. Recently, in the last decade, this
conjecture has been proved to be true.

Till today, TM is taken as the model of computation. Whenever a

new model of computation (like DNA computing, membrane
computing) is defined, it is the practice to show that this new
model is as capable as the TM. This proves the power of the new
model of computation.

In this chapter, we see the basic definition of TMs with examples

and some techniques for TM construction. In the next three
chapters, we will learn more about this model and related
concepts.

Turing Machine as an Acceptor

The TM can be considered as an accepting device accepting sets of
strings. Later, we shall see that TM accept the family of languages
generated by type 0 grammars. The set accepted by a TM is called a
recursively enumerable set.
When we consider the TM as an accepting device, we usually
consider a one-way infinite tape. In the next chapter, we shall see
that by having a two-way infinite tape, the power does not change.
The TM consists of a one-way infinite read/write tape and a finite
control (Figure 9.1).

Figure 9.1. Initial configuration

The input a1 ... an is placed at the left end of the tape. The rest of
the cells contain the blank symbol . Initially, the tape head points
to the leftmost cell in the initial state q0. At any time, the tape head
will point to a cell and the machine will be in a particular state.
Suppose the machine is in state q and pointing to a cell containing
the symbol a, then depending upon the δ mapping (transition
function) of the TM it will change state to p and write a
symbol X replacing a and move its tape head one cell to the left or
to the right. The TM is not allowed to move-off the left end of the
tape. When it reaches a final state it accepts the input. Now, we will
consider the formal definition.

Definition 9.1

A TM M = (K, Σ, Γ, δ, q0, F) is a 6-tuple, where

K is a finite set of states;




Σ is a finite set of input symbols;




Γ is a finite set of tape symbols, Σ ⊆ Γ, ∊ Γ is the blank symbol;




q0 in K is the initial state;




F ⊆ K is the set of final states; and




δ is a mapping from K × Γ into K × Γ × {L, R}.

NOTE

Turing machine mapping is defined in such a way that it is

deterministic. In the next chapter, we see that non-
deterministic TM can also be defined. Though they have the
same power as far as accepting power is concerned, the
number of steps may exponentially increase if a deterministic
TM tries to simulate a non-deterministic TM.

2.
3.
In some formulations the head remaining stationary is
allowed. i.e., δ: K × Γ → K × Γ × {L, S, R}. But we shall stick to
{L, R} as remaining stationary can be achieved by two moves,
first moving right and then moving back left.

Next, we consider an instantaneous description (ID) of a TM.

Definition 9.2

An ID of a TM is a string of the form α1qα2, α1, α2 ∊ Γ*, q ∊ K.

This means that at that particular instance α1α2 is the content of

the tape of the TM. q is the current state and the tape head points
to the first symbol of α2. See Figure 9.2.

Figure 9.2. Contents of tape and head position for ID α1qα2

The relationship between IDs can be described as follows:

If X1 ... Xi − 1qXiXi + 1 ... Xm is an ID and δ(q, Xi) = (p, Y, R) then the

next ID will be X1...Xi − 1YpXi + 1 ... Xm.

If δ(q, Xj) = (p, Y, L) then the next ID will be X1... Xi − 2pXi − 1 YXi +

1 ... Xm. We denote this as:

X1 ... Xi − 1qXiXi + 1 ... Xm ⊢ X1 ... Xi − 2pXi − 1 Y Xi + 1 ... Xm.

q0X1 ... Xm is the initial ID. Initially, the tape head points to the
leftmost cell containing the input. If qX1 ... Xm is an ID and δ(q, X1)
= (p, Y, L), machine halts. i.e., moving off the left end of the tape is
not allowed. If X1 ... Xmq is an ID, q is reading the leftmost blank
symbol. If δ(q, ) = (p, Y, R) next ID will be X1 ... Xm Yp. If δ(q, )
= (p, Y, L) next IΔ will be X1... Xm − 1pXmY. ⊢* is the reflexive,
transitive closure of ⊢. i.e., ID0 ⊢ ID1 ⊢ ... ⊢ IDn is denoted as
ID0 ⊢* IDn, n ≥ 0. An input will be accepted if the TM reaches a
final state.

Definition 9.3

A string w is accepted by the TM, M = (K, Σ, Γ, δ, q0, F) if

q0w ⊢* α1qfα2 for some α1, α2 ∊ Γ*, qf ∊ F. The language accepted
by the TM M is denoted as:

T(M) = {w|w ∊ Σ, q0w ⊢ α1qfα2 for some α1, α2 ∊ Γ*, qf ∊ F}

NOTE

It should be noted that, by definition, it is not necessary for the

TM to read the whole input. If w1w2 is the input and the TM
reaches a final state after reading w1, w1w2 will be accepted;
for that matter any string w1wj will be accepted. Usually, while
constructing a TM we make sure that the whole of the input is
read.

2.
3.

Usually, we assume that after going to a final state, the TM

halts, i.e., it makes no more moves.
4.
5.

A string w will not be accepted by the TM, if it reaches an

ID η1rη2 from which it cannot make a next move; η1η2 ∊
Γ*, r ∊ K and r is not a final state or while reading w, the TM
gets into a loop and is never able to halt.

Having given the formal definition of TM as an acceptor, let us

consider some examples.

Example 9.1.

Let us consider a TM for accepting {aibjck|i, j, k ≥ 1, i = j + k}.

The informal description of the TM is as follows. Consider Figure

9.3 which shows the initial ID.

Figure 9.3. Initial configuration of the TM for Example 9.1

The machine starts reading a ‘a’ and changing it to a X; it moves

right; when it sees a ‘b,’ it converts it into a Y and then starts
moving left. It matches a’s and b’s. After that, it
matches a’s with c’s. The machine accepts when the number of a’s
is equal to the sum of the number of b’s and the number of c’s.

Formally, M = (K, Σ, Γ, δ, q0, F)

K = {q0, q1, q2, q3, q4, q5, q6, q7, q8} F = {q8}

Σ = {a, b, c}

Γ = {a, b, c, X, Y, Z, }
δ is defined as follows:

δ(q0, a) = (q1, X, R)

In state q0, it reads a ‘a’ and changes it to X and moves right in q1.

δ(q1, a) = (q1, a, R)

In state q1, it moves right through the ‘a’s.

δ(q1, b) = (q2, Y, L)

When it sees a ‘b,’ it changes it into a Y.

δ(q2, a) = (q2, a, L)

δ(q2, Y) = (q2, Y, L)

In state q2, it moves left through the ‘a’s and Y’s.

δ(q2, X) = (q0, X, R)

When it sees a X, it moves right in q0 and the process repeats.

δ(q1, Y) = (q3, Y, R)

δ(q3, Y) = (q3, Y, R)

δ(q3, b) = (q2, Y, L)

After scanning the ‘a’s, it moves through the Y’s till it sees

a ‘b,’ then it converts it into a Y and moves left.

δ(q3, c) = (q4, Z, L)

When no more ‘b’s remain it sees a ‘c’ in state q3, changes that into

Z and starts moving left in state q4. The process repeats. After
matching ‘a’s and ‘b’s, the TM tries to match ‘a’s and ‘c’s.
δ(q4, Y) = (q4, Y, L)

δ(q4, a) = (q4, a, L)

δ(q4, X) = (q0, X, R)

δ(q3, Z) = (q5, Z, R)

δ(q5, c) = (q4, Z, L)

δ(q5, Z) = (q5, Z, R)

δ(q4, Z) = (q4, Z, L)

When no more ‘a’s remain it sees a Y in state q0 checks that all ‘b’s

and ‘c’s have been matched and reaches the final state q8.

δ(q0, Y) = (q6, Y, R)

δ(q6, Y) = (q6, Y, R)

δ(q6, Z) = (q7, Z, R)

δ(q7, Z) = (q7, Z, R)

δ(q7 ) = (q8, , halt)

Let us consider the move of the TM on an input aaabcc.

The sequence of moves is given below:

The string aaabcc is accepted as q8 is the final state.

Let us see how a string aaabbcc will be rejected. Let us trace the
sequence of IDs on aaabbcc

q0aaabbcc ⊢ Xq1aabbcc ⊢ Xaq1abbcc ⊢ Xaaq1bbcc ⊢ Xaq2aYbcc

⊢ Xq2aaYbcc ⊢ q2XaaYbcc ⊢ Xq0aaYbcc ⊢ XXq1aYbcc

⊢ XXaq1Ybcc ⊢ XXaYq3bcc ⊢ XXaYYcc ⊢ XXq2aYYcc

⊢ Xq2XaYYcc ⊢ XXq0aYYcc ⊢ XXXq1YYcc ⊢ XXXYq3Ycc

⊢ XXXYYq3cc ⊢ XXXYq4YZc⊢ XXXq4YYZc ⊢ XXq4XYYZc

⊢ XXXq0YYZc ⊢ XXXYq6YZc ⊢ XXXYYq6Zc ⊢ XXXYYZq7c

The machine halts without accepting as there is no move for (q7, c).

Let us see the sequence of moves for aaaabc:

q0aaabc ⊢ Xq1aaabc ⊢ Xaq1aabc ⊢ Xaaq1abc ⊢ Xaaaq1bc

⊢ Xaaq2aYc ⊢ Xaq2aaYc ⊢ Xq2aaaYc ⊢ q2XaaaYc

⊢ Xq0aaaYc ⊢ XXq1aaYc ⊢ XXaq1aYc ⊢ XXaaq1Yc

⊢ XXaaYq3c ⊢ XXaaq4YZ⊢ XXaq4aYZ ⊢ XXq4aaYZ
⊢ Xq4XaaYZ ⊢ XXq0aaYZ ⊢ XXXq1aYZ ⊢ XXXaq1YZ

⊢ XXXaYq3Z ⊢ XXXaYZq5

The machine halts without accepting as there is no further move

possible and q5 is not an accepting state.

It can be seen that only strings of the form ai + jbicj will be accepted.

Example 9.2.

Construct a TM which will accept the set of strings over Σ = {a, b}

beginning with a ‘a’ and ending with a ‘b.’

Though this set can be accepted by a FSA, we shall give

a TM accepting it.

M = (K, Σ, Γ, δ, q0, F) where

K = {q0, q1, q2, q3} F = {q3}

Σ = {a, b} Γ {a, b, X, }

δ is defined as follows:

δ(q0, a) = (q1, X, R)

δ(q1, a) = (q1, X, R)

δ(q1, b) = (q2, X, R)

δ(q2, a) = (q1, X, R)

δ(q2, b) = (q2, X, R)

δ(q2, ) = (q3, , halt)

Let us see how the machine accepts abab.

It can be seen that after initially reading ‘a,’ the machine goes to
state q1. Afterwards, if it sees a ‘a’ it goes to state q1; if it sees a ‘b’ it
goes to q2. Hence, when it sees the leftmost blank symbol, if it is in
state q2 it accepts as this means that the last symbol read is a ‘b.’

Example 9.3.

Construct a TM which will accept strings over Σ = {(,), #}, which

represent well-formed strings of parentheses placed between
#’s. Input is of the form #w# where w is a well-formed string of
parentheses. The TM M = (K, Σ, Γ, δ, , F) accepts this set, where

K = { , q0, q1, q2, q3, q4, q5} F =

Σ = {(,), #}

Γ = {(,), X, #, }

δ is defined as follows:

δ( , #) = (q0, #, R)

δ(q0, ( ) = (q0, (, R)

δ(q0, )) = (q1, X, L)

δ(q1, X) = (q1, X, L)

δ(q1, ( ) = (q0, X, R)

δ(q0, X) = (q0, X, R)

δ(q0, #) = (q2, #, R)

δ(q2, ) = (q3, , L)

δ(q3, #) = (q4, #, L)

δ(q4, X) = (q4, X, L)

δ(q4, #) = (q5, #, halt)

The sequence of moves on #((())())# is represented as follows:

The working of the TM can be described as follows. The head
moves right in state q0 till it finds a right parentheses’).’ It changes
it into a X and moves right in q0. This process repeats. When it sees
the right # in q0, it means that all right parentheses have been
read. It moves right to see there are no more # symbols. Then it
moves left in state q4. If it reaches the left # in state q4, it means
there are no more left parentheses left. Hence, all left and right
parentheses have been matched. The machine accepts by going to
state q5. It will be on the left # in state q1, if there is a right
parenthesis for which there is no left parenthesis and will halt
without accepting. In state q3 if it sees left parenthesis, it means
there is a left parenthesis for which there is no matching right
parenthesis. The machine will halt without accepting.

Turing Machine as a Computing

Device
In the last section, we have viewed TM as an acceptor. In this
section, we consider the TM as a computing device. It computes
functions which are known as partial-recursive functions. In this
section, we consider the tape of the TM as infinite in both
directions. We can make this assumption without loss of generality
as in the next chapter, we shall prove the equivalence of one-way
and two-way infinite tapes. The machine is started with some non-
blank portion on the tape, with the rest of the tape containing
blanks only. This is taken as the input. After making some moves,
when the machine halts, the non-blank portion of the tape is taken
as the output. For some inputs, the machine may get into a loop in
which case the output is not defined. Hence, the function
computed will be a partial function. While considering the TM as a
computing device we do not bother about the final states. Initial
tape head position has to be specified.

Example 9.4. (Unary to binary converter)

The input is a string of a’s which is taken as the unary

representation of an integer; ai represents integer i. The output is
of the form biXi where bi is a binary string which is the binary
representation of integer i. The mapping for the TM which does
this is given below. The tape symbols are { , a, X, 0, 1}. The
machine has two states q0 and q1. q0 is the initial state and a right
moving state. q1 is a left moving state.

δ(q0, a) = (q1, X, L)

δ(q1, X) = (q1, X, L)

δ(q1, ) = (q0, 1, R)

δ(q1, 0) = (q1, 1, R)

δ(q1, 1) = (q1, 0, L)

δ(q0, X) = (q0, X, R)

δ(q0, ) = (q2, , halt)

The machine works like a binary counter. When it has

converted j ‘a’s into X’s, it prints binary number j to the left of the
position where it started. Let us consider the working of the
machine on aaaaa. It should output 101X X X X X.

The machine starts in state q0 on the leftmost a.

Example 9.5. (Copy machine)

Given an input #w#, where w is a string of a’s and b’s, the machine

makes a copy of w and halts with #w#w#. The machine starts in
state q0, the initial state on the leftmost symbol of w.
It reads a ‘a’, changes that into a X and moves right in
state qa. When it sees the first blank symbol, it prints a ‘a’ and
moves left in state q1. If it sees a ‘b’ in q0, it changes that into
a Y and moves right in state qb. When it sees the first blank symbol,
it prints a ‘b’ and moves left in state q1. In state q1 it moves left till
it sees a ‘X’ or a ‘Y’ and the process repeats. When no more ‘a’ or ‘b’
remains to be copied, the machine goes to q2, prints a # after the
copy it has made and moves left in q3. In q3, it moves left till the #
symbol. Then, moving left it converts the ‘X’s and ‘Y’s into ‘a’s and
‘b’s respectively, and halts when it sees the leftmost #
symbol. qa and qb are used to remember the symbol the machine
has read.

The state set is {q0, qa, qb, q1, q2, q3, q4, q5}.

Tape symbols are {#, a, b, X, Y, }

δ mappings are given by:

δ(q0, a) = (qa, X, R)

δ(qa, a) = (qa, a, R)

δ(q0, b) = (qb, Y, R)

δ(qb, a) = (qb, a, R)

δ(qa, b) = (qa, b, R)

δ(qa, #) = (qa, #, R)

δ(qa, ) = (q1, a, L)

δ(qb, b) = (qb, b, R)

δ(qb, #) = (qb, #, R)

δ(qb, ) = (q1, b, L)

δ(q1, a) = (q1, a, L)
δ(q1, b) = (q1, b, L)

δ(q1, #) = (q1, #, L)

δ(q1, X) = (q0, X, R)

δ(q1, Y) = (q0, Y, R)

δ(q0, #) = (q2, #, R)

δ(q2, a) = (q2, a, R)

δ(q2, b) = (q2, b, R)

δ(q2, ) = (q3, #, L)

δ(q3, a) = (q3, a, L)

δ(q3, b) = (q3, b, L)

δ(q3, #) = (q4, #, L)

δ(q4, X) = (q4, a, L)

δ(q4, Y) = (q4, b, L)

δ(q4, #) = (q5, #, halt)

The sequence of moves in input #abb# can be described as follows:

#q0abb# ⊢ #Xqabb# ⊢* #Xbb#qa⊢ #Xbbq1 #a ⊢* #q1Xbb#a

⊢ #Xq0bb#a ⊢ #XYqbb#a ⊢* #XYb#aqb ⊢ #XYb#q1ab

⊢* #XYq0b#ab ⊢ #XYYqb#ab ⊢* #XYY #abqb ⊢* #XYY #aq1bb

⊢* #XYY q0#abb ⊢ #XYY#q2abb ⊢* #XYY #abbq2 ⊢#XYY #abq3b#

⊢* #XYYq3#abb# ⊢ #XYq4Y #abb# ⊢* q4#abb#abb#

⊢ q5#abb#abb#.
Example 9.6.

To find the reversal of a string over {a, b}.

Input is #w#, w ∊ {a, b}+.

Output is #wR#.

The set of states is {q0, q1, q2, q3, q4, qa, qb}.

The tape symbols are {#, a, b, }.

Initial tape head position is on the last symbol of w

δ is given by:

δ(q0, a) = (qa, X, R)

δ(q0, b) = (qb, X, R)

δ(qa, #) = (qa, #, R)

δ(qb, #) = (qb, #, R)

δ(qa, a) = (qa, a, R)

δ(qa, b) = (qa, b, R)

δ(qb, a) = (qb, a, R)

δ(qb, b) = (qb, b, R)

δ(qa, X) = (qa, X, R)

δ(qb, X) = (qb, X, R)

δ(qa, ) = (q1, a, L)

δ(qb, ) = (q1, b, L)

δ(q1, a) = (q1, a, L)

δ(q1, b) = (q1, b, L)
δ(q1, #) = (q0, #, L)

δ(q0, X) = (q0, X, L)

δ(q0, #) = (q2, , R)

δ(q2, X) = (q2, , R)

δ(q2, #) = (q2, #, R)

δ(q2, a) = (q2, a, R)

δ(q2, b) = (q2, b, R)

δ(q2, ) = (q3, #, R)

δ(q3, ) = (q4, , halt)

If the input is the machine

proceeds to copy the symbols one by one after the second # in the
reverse order, changing the original symbols into X’s.

When it realizes that it has finished copying the symbols in the

reverse order, it erases the first # and the X’s, moves right and after
the sequence of a’s and b’s, prints a # and halts.

It should be noted the reversed string appears after the second #. It

would be an interesting exercise to reverse the string in place. i.e.,
if #w# is the input, the machine halts with #wR# in the same
location.

Example 9.7.
Given two integers i and j, i > j, to compute the quotient and
reminder when i is divided by j.

The input is

with the tape head positioned on the leftmost ‘b’ in the initial
state q0.

The output is

where k is the quotient when i is divided by j and l is the

remainder. The TM which does this is described as follows:

The TM converts the b’s into Y’s and a’s into X’s one by one. When

it sees no more b’s it prints a ‘c’ after the # meaning j has been
subtracted from i once.

This repeats as many times as possible. Each time a ‘c’ is printed.

Finally, when the number of a’s which have to be converted to X’s
is less than j, the TM while trying to convert a ‘a’ into a ‘X,’ will not
find a ‘a.’ At this stage, it would have converted (i mod j + 1) b’s
into Y’s. The TM prints a # after c’s and prints (i mod j + 1) d’s. It
does this by changing a Y into a ‘b’ and printing a ‘d’ after rightmost
# and d’s. When all the Y’s have been converted into b’s, we have (i
mod j + 1) d’s after the rightmost #. The TM erases the last d and
prints a # and halts. The set of states are {q0, ..., q21}. The tape
symbols are { , #, a, b, c, d, X, Y}.

The mappings are given by:

δ(q0, b) = (q1, Y, L)

changes ‘b’ to Y and moves left.

δ(q1, Y) = (q1, Y, L)
δ(q1, #) = (q2, #, L)

δ(q2, a) = (q2, a, L)

moves left.

δ(q2, #) = (q3, #, R)

δ(q2, X) = (q3, X, R)

when the leftmost # or an X is seen, the head starts moving right.

δ(q3, a) = (q4, X, R)

one ‘a’ is changed into X

δ(q4, a) = (q4, a, R)

δ(q4, #) = (q5, #, R)

δ(q5, Y) = (q5, Y, R)

moves right

δ(q5, b) = (q1, Y, L)

process starts repeating

δ(q5, #) = (q6, #, R)

all ‘b’s have been converted to Y’s

δ(q6, c) = (q6, c, R)

δ(q6, ) = (q7, c, L)

one ‘c’ is printed

δ(q7, c) = (q7, c, L)
δ(q7, #) = (q8, #, L)

moves left

δ(q8, Y) = (q8, b, L)

Y’s are changed back to ‘b’s

δ(q8, #) = (q0, #, R)

process starts repeating

δ(q3, #) = (q9, #, R)

all ‘a’s have been changed. Now the number of ‘c’s represents the
quotient. Y’s represent the remainder.

δ(q9, Y) = (q9, Y, R)

δ(q9, b) = (q9, b, R)

δ(q9, #) = (q10, #, R)

δ(q10, c) = (q10, c, R)

moves right

δ(q10, ) = (q11, #, L)

# is printed after the ‘c’s

δ(q11, c) = (q11, c, L)

δ(q11, #) = (q12, #, L)

δ(q12, b) = (q12, b, L)

δ(q12, Y) = (q13, b, R)
δ(q13, b) = (q13, b, R)

δ(q13, #) = (q14, #, R)

δ(q14, c) = (q14, c, R)

δ(q14, #) = (q15, #, R)

δ(q15, d) = (q15, d, R)

δ(q15, ) = (q16, d, L)

δ(q16, d) = (q16, d, L)

δ(q16, #) = (q11, #, L)

Y’s are copied as ‘d’s

δ(q12, #) = (q17, #, R)

after all Y’s have been copied as ‘d’s the process starts finishing

δ(q17, b) = (q17, b, R)

δ(q17, #) = (q18, #, R)

δ(q18, c) = (q18, c, R)

δ(q18, #) = (q19, #, R)

δ(q19, d) = (q19, d, R)

δ(q19, ) = (q20, , L)

δ(q20, d) = (q21, #, halt)

The move of a TM can be represented as a state diagram:

means the TM when in state p and reading X, prints a Y over X,
goes to state q and moves right.

The state diagram for Example 9.2.2 (copy) can be represented as:

Techniques for Turing Machine

Construction
Designing a TM to solve a problem is an interesting task. It is
somewhat similar to programming. Given a problem, different
TMs can be constructed to solve it. But, we would like to have a TM
which does it in a simple and efficient manner. Like we learn some
techniques of programming to deal with alternatives, loops etc, it is
helpful to understand some techniques in TM construction, which
will help in designing simple and efficient TM. It should be noted
that we are using the word ‘efficient’ in an intuitive manner here,
though later in Chapter 12, we shall deal with it formally. Next, we
consider some techniques.

Considering the state as a tuple

In Example 9.2.2, we considered a TM which makes a copy of a
given string over Σ = {a, b}. After reading a ‘a,’ the machine
remembers it by going to qa and after reading a ‘b,’ it goes to qb. In
general, we can represent the state as [q, x] where x ∊ Σ denoting
that it has read a ‘x.’

Considering the tape symbol as a tuple

Sometimes, we may want to mark some symbols without
destroying them or do some computation without destroying the
input. In such cases, it is advisable to have multiple tracks on the
tape. This is equivalent to considering the tape symbol as a tuple.

There is only one tape head. In the above figure, there are three
tracks. The head is pointing to a cell which contains A on the first
track, B on the second track, and C on the third track. The tape
symbol is taken a 3-tuple [A, B, C]. Some computation can be done
in one track by manipulating the respective component of the tape
symbol. This is very useful in checking off symbols.

Checking off symbols

We use one track of the tape to mark that some symbols have been
read without changing them.

Example 9.8.

Consider a TM for accepting

w#w#w, w ∊ {a, b}*

A tape having two tracks is considered.

The first track contains the input. When the TM reads the first a in
state q0, it stores it in its memory (by taking the state as a pair),
checks off ‘a’ by printing a √ in the second track below a, moves
right and after the # symbol checks whether the symbol is a ‘a.’ If
so, marks it by putting a √ in the second track, moves right and
again checks the first symbol after # is a ‘a’ and if so it marks also.
It then moves left and repeats the process with each unmarked
leftmost symbols in each block. When all the symbols in the first
block match with the second and third blocks, the machine halts
accepting the string.

The mappings can be defined as follows:

δ(q0, [a, ]) = ([q, a], [a, √], R)

δ(q0, [b, ]) = ([q, b], [b, √], R)

The machine reads the leftmost symbol, marks it and remembers

whether it is a ‘a’ or ‘b’ by storing it in the state:

δ([q, a], [a, ]) = ([q, a], [a, ], R)

δ([q, a], [b, ]) = ([q, a], [b, ], R)

δ([q, b], [a, ]) = ([q, b], [a, ], R)

δ([q, b], [b, ]) = ([q, b], [b, ], R)

The head passes through symbols in the first block to the right:

δ([q, a], [#, ]) = ([p, a], [#, ], R)

δ([q, b], [#, ]) = ([p, a], [#, b], R)

When the head encounters a # in the first track, the first

component of the state is changed to p.

δ([p, a], [a, √]) = ([p, a], [a, √], R)

δ([p, a], [b, √]) = ([p, a], [b, √], R)

δ([p, b], [a, √]) = ([p, b], [a, √], R)

δ([p, b], [b, √]) = ([p, b], [b, √], R)

δ([p, a], [a, ]) = ([r, a], [a, √], R)

δ([p, b], [b, ]) = ([r, b], [b, √], R)

When it encounters a first unchecked symbol it marks it by putting

a √ in the second track and changes the first component of the
state to r.

δ([r, a], [a, ]) = ([r, a], [a, ], R)

δ([r, b], [a, ]) = ([r, b], [a, ], R)

δ([r, a], [b, ]) = ([r, a], [b, ], R)

δ([r, b], [b, ]) = ([r, b], [b, ], R)

The head moves through the second block without changing

symbols, when the first component of the state is r:

δ([r, a], [#, ]) = ([s, a], [#, ], R)

δ([r, b], [#, ]) = ([s, b], [#, ], R)

When it encounters a # in the first track it moves right into the

third block changing the first component of the state to s.

δ([s, a], [a, √]) = ([s, a], [a, √], R)

δ([s, a], [b, √]) = ([s, a], [b, √], R)

δ([s, b], [a, √]) = ([s, b], [a, √], R)

δ([s, b], [b, √]) = ([s, b], [b, √], R)

It moves right looking for the unchecked symbol:

δ([s, b], [b, ]) = (t, [b, √], L)

δ([s, a], [a, ]) = (t, [a, √], L)

When it encounters an unchecked symbol in the third block it

marks it by putting a √ in the second track and starts moving left.

δ(t, [a, √]) = (t, [a, √], L)

δ(t, [b, √]) = (t, [b, √], L)

δ(t, [#, ]) = (t′, [#, ], L)

It moves into the second block in state t′:

δ(t′, [a, ] = (t′, [a, ], L)

δ(t′, [b, ] = (t′, [b, ], L)

δ(t′, [a, √]) = (t′, [a, √], L)

δ(t′, [b, √]) = (t′, [b, √], L)

It moves left in the second block.

δ(t′, [#, ]) = (t″, [#, b], L)

It moves left into the first block in state t″.

δ(t″, [a, ]) = (t″, [a, ], L)

δ(t″, [b, ]) = (t″, [b, ], L)

It moves left in the first block through unchecked symbols.

When it encounters a checked symbol, it moves right in

state q0 and the whole process repeats.

δ(t″, [a, √]) = (q0, [a, √], R)

δ(t″, [b, √]) = (q0, [b, √], R)

This way the machine checks for same symbols in the first, second,
and third blocks. When the machine encounters a # in the first
track in state q0, it means it has checked all symbols in the first
block. Now, it has to check that there are no more symbols in the
second and third block.

δ(q0, [#, ]) = (q1, [#, ], R)

δ(q1, [a, √]) = (q1, [a, √], R)

δ(q1, [b, √]) = (q1, [b, √], R)

If it encounters an unchecked symbol, it halts without accepting:

δ(q1, [a, ]) = (qn, [a, ], R)

δ(q1, [b, ]) = (qn, [b, ], R)

If it finds all symbols are checked in the second block, it moves to

the third block in state q2:

δ(q1, [#, ] = (q2, [#, ], R)

In the third block, it checks whether all symbols have already been
checked. If so, it halts in accepting state qy. Otherwise, halts in
non-accepting state qn.

δ(q2, [a, √]) = (q2, [a, √], R)

δ(q2, [b, √]) = (q2, [b, √], R)

δ(q2, [a, ]) = (qn, [a, ], R)

δ(q2, [b, ]) = (qn, [b, ], R)

δ(q2, [ , ]) = (qy, [ , ], R)

If the input has more symbols in the first block (than second
block), it moves in the second block in state [p, a] or [p, b] and
encounters [#, ]. Then it halts rejecting the input:
δ([p, a], [#, ]) = (qn, [#, ], R)

δ([p, b], [#, ]) = (qn, [#, ], R)

If the input has equal symbols in the first and second block but less
symbols in the third block, the machine encounters [ , ] in state
[s, b] or [s, a] and halts without accepting:

δ([s, a], [ , ]) = (qn, [ , ], R

δ([s, b], [ , ]) = (qn, [ , ], R)

Thus, we find that having two tracks and using the second track to
check off symbols is a useful technique.

When we consider a single tape multi-track TM, we really take the

tape symbol as a tuple. This need not be considered as a variation
of TM.

Shifting over
Sometimes, we may have to shift symbols on the tape to the right
or left to allow for some symbols to be written. Suppose the
contents of the tape are a1... ai − 1 Aai + 1 ... an at some instant. A has
to be replaced by abcd say. Then, ai + 1 ... an have to be shifted three
cells to the right and then in the space created abcd can be printed.
We can use the state as a tuple to store some information and shift
symbols. Suppose the head is reading ai + 1 in state q and the
shifting process has to start. Then, the TM reads ai + 1 and goes to a
state [q, –, –, ai + 1] and prints X over ai.

The ID

changes to
Next, the TM reads ai + 2, storing it in the fourth component, and
shifting ai + 1 from fourth component to the third component.

δ([q, –, –, ai + 1], ai + 2) = ([q, –, ai + 1, ai + 2], X, R)

Similarly, δ([q, –, ai + 1, ai + 2], ai + 3) = ([q, ai + 1, ai + 2, ai + 3]X, R)

When it reads ai + 4, it deposits ai + 1 in that cell

δ([q, ai + 1, ai + 2, ai + 3], ai + 4) = ([q, ai + 2, ai + 3, ai + 4], ai + 4, R)

In general:

δ([q, aj, aj + 1, aj + 2], aj + 3) = ([q, aj + 1, aj + 2, aj + 3], aj, R) i + 1≤ j ≤ n

where an + 1, an + 2, an + 3 is blank symbol . Finally, it starts

moving left δ([q, an, , ], ) = (q′, an, L).

In q′, it moves left till it finds A X X X and replaces it by abcd. A

similar method can be used for shifting symbols to the left. Thus,
storing some information in some components of the state and
cyclically moving the components helps in the technique of shifting
off symbols.

Subroutines
Just as a computer program has a main procedure and
subroutines, the TM can also be programmed to have a main TM
and TMs which serve as subroutines. Suppose we have to
make n copies of a word w. Input is #w# and the output

is .

In this case, we can write the mappings for a TM Msub which when

started on #w#x ends up with #w#xw. The main TM will call
this Msub n times. Similarly, for multiplying two unary
numbers m and n, n has to be copied on m times. We can write a
subTM for copying and main TM will call this m times.

In order that a TM M1 uses another TM M2 as a subroutine, the

states of M1 and M2 have to be disjoint. Also, when M1 wants to
call M2, from a state of M1, the control goes to the initial state
of M2. When the subroutine finishes and returns to M1, from the
halting state of M2, the machine goes to some state of M1. Note that
a subroutine TM call another TM as its subroutine. This technique
helps to construct a TM in a topdown manner dividing the work
into tasks and writing a TM for each task and combining them.

In this chapter, we have considered the definition of a TM and

some techniques for TM construction. In the next three chapters,
we shall study more about TMs and computability.

Problems and Solutions

1. Consider the following TM M′ with transitions as follows:
δ(q0, 1) = (q1, 0, R)

δ(q1, 1) = (q1, 1, R)

δ(q1, 0) = (q2, 1, R)

δ(q2, 0) = (q3, 0, L)

δ(q3, 0) = (q0, 0, R)

δ(q3, 1) = (q3, 1, L)

q0 is the initial state and 0 is taken as blank symbol.
1.
Trace the sequence of moves when the machine is started on
2.

3.
4.
What happens when it is started on:
5.
1.

2.
3.

Solution
a.
.
q0 1 1 1 1 0 0 0 1 1

0 q1 1 1 1 0 0 0 1 1

0 1 q1 1 1 0 0 0 1 1

0 1 1 q1 1 0 0 0 1 1

0 1 1 1 q1 0 0 0 1 1

0 1 1 1 1 q2 0 0 1 1

0 1 1 1 q3 1 0 0 1 1

0 1 1 q3 1 1 0 0 1 1

0 1 q3 1 1 1 0 0 1 1

0 q3 1 1 1 1 0 0 1 1

q3 0 1 1 1 1 0 0 1 1

0 q0 1 1 1 1 0 0 1 1

0 0 q1 1 1 1 0 0 1 1

0 0 1 q1 1 1 0 0 1 1

0 0 1 1 q1 1 0 0 1 1

0 0 1 1 1 q1 0 0 1 1
0 0 1 1 1 1 q2 0 1 1

0 0 1 1 1 q3 1 0 1 1

0 0 1 1 q3 1 1 0 1 1

0 0 1 q3 1 1 1 0 1 1

0 0 q3 1 1 1 1 0 1 1

0 q3 0 1 1 1 1 0 1 1

0 0 q0 1 1 1 1 0 1 1

0 0 0 q1 1 1 1 0 1 1

0 0 0 1 q1 1 1 0 1 1

0 0 0 1 1 q1 1 0 1 1

0 0 0 1 1 1 q1 0 1 1

0 0 0 1 1 1 1 q2 1 1

No move for (q2, 1); machine halts with output

The first block of 1’s is shifted step by step to the right till it becomes adjacen
second block of 1’s.

It will halt when the ID is as below:

First block of 1’s will be shifted to the right till it is adjacent to the second bl
third block of 1’s is not affected.

When there is only one block of 1’s it gets shifted one cell to the right and th
repeats. It never stops as there is no second block of 1’s.

Since the machine starts with the third 1 in the first block, the portion of the f
from this point is shifted to the right till it becomes adjacent to the second blo

There is only one block of 1’s. The portion of the block from the initial positi
one cell to the right and this process starts repeating and never stops as there
block 1’s. The portion of the block of 1’s to the left of the initial tape head po
unaffected.

2. Construct a TM with three characters 0, 1, and # which locates a ‘1’ under the
There is only one # on the tape and somewhere to the right of it is a ‘1.’ The re
blank. The head starts at or to the left of the #. When the TM halts, the tape is
stops at the ‘1.’ Zero is taken as the blank symbol.

Solution The transition table is as follows. Here q3 is the (halt or) final state:
.
0 1 #

q0 (q0, 0, R) (q0, 1, R) (q1, #, R)

q1 (q1, 0, R) (q2, 1, R) –

q2 (q3, 0, L) – –

q3 – – –

3. Construct a TM over an alphabet {0, 1, #}, where 0 indicates blank, which tak
1’s and #’s and transfers the rightmost symbol to the left-hand end. Thus, ...00
becomes ...0001#1#1#000.... The head is initially at the leftmost non-blank sym

Solution The machine mainly has to move to the right-hand end, read the character, to i
. 1 or #. Then, move it to the leftmost end and halt. The transitions are:
0 1 #

q0 (q1, 0, L) (q0, 1, R) (q0, #, R)

q1 (q4, 0, R) (q2, 0, L) (q3, 0, L)

q2 (q4, 1, L) (q2, 1, L) (q2, #, L)

q3 (q4, #, L) (q3, 1, L) (q3, #, L)

q4 (q5, 0, R) – –

4. Design a TM with one track, one head, and three characters 0, 1, # to compute
functions. Input and output are to be in binary form as follows. Zero is represe
represented as # 1 1 1 #. That is the binary string represented by ‘n’ is enclosed
left and right of it. is the blank symbol.
1.
f(n)= n + 1
2.
3.
g(n) = 2n.
4.
Input is #n#
Output is #n + 1# in (a) and #2n# in (b)

Solution 1.
. The function to be computed is f(n) = n + 1.
2.

Input is
3.

Output is
4.
The transition table is given below:
5.
0 1 #

q0 – – (q1, #, L) –

q1 (q4, 1, L) (q2, 0, L) (q3, 1, L) –

q2 (q4, 1, L) (q2, 0, L) (q3, 1, L) –

q3 – – – (q4, #, L

q4 – – – –
6.
7.
The function to be computed is g(n) = 2n.
8.

Input is
9.
Output is
10.
0 1 #

q0 (q0, 0, R) (q0, 1, R) (q1, 0, R) –

q1 – – – (q2, #, L

q2 – – – –
11.

Exercises
1. Draw a state diagram for a TM accepting each of the following languages:
1.
{x ∊ {0, 1}*|#1(x) = 2#0(x) + 1}.
2.
3.
The language of all non-palindromes over {a, b}.
4.

2. Consider the TM whose state transition is given below:

δ(q0, 1) = (q1, 0, R)

δ(q1, 0) = (q2, 1, R)

δ(q2, 1) = (q3, 0, L)

δ(q3, 0) = (q0, 1, R)

δ(q0, 0) = (q0, 0, R)

δ(q1, 1) = (q1, 1, R)

δ(q2, 0) = (q2, 0, R)

δ(q3, 1) = (q3, 1, L)
Here, q0 is the start state and q2 is a final state.
1.
For each of the initial tape configurations, determine the final tape pattern that the m
and indicate the final head position.
2.
.
...01110111110...
.
.
...01110110...
.
Here ‘B’ means the head is presently reading that symbol ‘B.’
3.
What effect will the machine have on an arbitrary initial pattern of the form ... 01m01
where m and n are positive integers. Explain briefly how the machine works. What is
the reading head?
4.
5.
Show how to modify the given transitions so that the machine will always halt at its
6.

3. Consider the TM described by the following state diagram.

1.
Determine the behavior of the machine for each of the following initial configuration
2.
.
...000000000000...
.
.
...00000100101100...
.
.
...01100010000000...
.
3.
Describe as clearly and concisely as you can the initial tape configurations for which
eventually halt.
4.

4. Design a TM for the following job. When started anywhere on a tape that is blank ex
the machine eventually halts with that 1 under its reading head. The remainder of the
when the machine halts.

5. Design a TM that behaves as follows:

When presented with a tape containing an arbitrary string of 1’s and 2’s (preceded an
blanks) and made to scan the first symbol in the string, the machine is to reverse the
presented with the tape pattern,
...00121121200...
the machine should eventually produce the tape pattern,
...00212112100...
and halt as indicated. The final pattern is to occupy the same region of the tape as the
solution using between six and nine states is reasonable.

6. Construct a TM to carry out the following operations:

1.
A left shift of its input by one cell.
2.
3.
A cyclic left shift of its input by one cell.
4.
5.
Let c be in Σ (the input alphabet). If the input word x = x1cx2 where x1 is in (Σ – {c})
output.
6.
7.
A duplication of its input. i.e., if the input is w, the output should be ww.
8.
9.
Let 1 be in Σ. If the input is x 1i for some i ≥ 0, then the output should be x shifted l
input x is unchanged on output while x 1 is given by the solution to (a) above.
10.
11.
Let 1 be in Σ and the input be x 1i for some i ≥ 0. Then output xi.
12.

7. Consider the following TM transitions:

Here, q0 is the initial state, and q2 is the final state.
δ(q0, a) = (q1, b, R)

δ(q0, b) = (q3, , R)

δ(q0, ) = (q2, , L)

δ(q1, a) = (q1, b, R)

δ(q1, b) = (q1, a, R)

δ(q1, ) = (q2, , L)

δ(q3, a) = (q4, , R)

δ(q3, b) = (q3, b, R)

δ(q4, a) = (q1, b, R)
1.
Give four words accepted by the TM together with their configuration sequences.
2.
3.
Give four words that are not accepted by the TM and in each case explain why not.
4.

8. Design a TM that when started on any tape pattern of the form:

...01n01x00...(n > 0, x ≥ 0)
eventually halts with the pattern
...01n01x + n00...
on its tape. The new pattern is to start in the same square as the given pattern.

9. Using the TM of problem (8) as a submachine, design a new TM that behaves as foll
any pattern of the form:
...01m01n00... (m, n > 0)
the machine is eventually to halt with the pattern
...01mn0...
on its tape. The location of this final pattern may be chosen to make the design of the
as possible.

10. For each of the following languages, construct a TM that recognizes the language:
1.
{xyx|x and y are in {a, b}*and|x| > 1}
2.
3.
{aibicjdj|i ≠ j}
4.
5.
{aba2b2a3b3 ... anbn|n ≥ 0}
6.

11. Consider the TM with input alphabets {a, b}, start state q0 with the following transiti
δ(q0, ) = (q1, , R)

δ(q1, a) = (q1, a, R)

δ(q1, b) = (q1, b, R)

δ(q1, ) = (q2, , L)

δ(q2, a) = (q3, , R)

δ(q2, b) = (q5, , R)

δ(q2, ) = (q2, , N)

δ(q3, ) = (q4, a, R)

δ(q4, a) = (q4, a, R)
δ(q4, b) = (q4, b, R)

δ(q4, ) = (q7, a, L)

δ(q5, ) = (q6, b, R)

δ(q6, a) = (q6, a, R)

δ(q6, b) = (q6, b, R)

δ(q6, ) = (q7, b, L)

δ(q7, a) = (q7, a, L)

δ(q7, b) = (q7, b, L)

δ(q7, ) = (q2, , L)

1.
What is the final configuration if the input is ab ?
2.
3.
What is the final configuration if the input is baa ?
4.
5.
Describe what the TM does for an arbitrary input string in {a, b}*.
6.

12. Construct a TM to accept the language {aibj|i < j}.

13. Construct a TM to accept the language

{w ∊ {a, b}*|w contains the same number of a′s and b′s}

14. Construct a TM to accept the language {w ∊ {a, b}*|w = wR}.

15. Construct a TM to compute the following functions. Let the input x be represented in
1.
f (x) = x + 2
2.
3.
f (x) = 2x
4.
5.
f (x) = x mod 2.
6.

16. Give informal arguments which explain why TMs are more powerful than PDAs.

Programming – the job of specifying the procedure that a

As an explication of the concept of algorithm, the concept of TM is

To start with, we may specify an effective procedure as follows: An

One cannot expect to prove Turing’s thesis, since the term

Perhaps, the strongest argument in favor of Turing’s thesis is the

Till today, TM is taken as the model of computation. Whenever a

In this chapter, we see the basic definition of TMs with examples

and some techniques for TM construction. In the next three
chapters, we will learn more about this model and related
concepts.

Turing Machine as an Acceptor

When we consider the TM as an accepting device, we usually

consider a one-way infinite tape. In the next chapter, we shall see
that by having a two-way infinite tape, the power does not change.
The TM consists of a one-way infinite read/write tape and a finite
control (Figure 9.1).

Figure 9.1. Initial configuration

Definition 9.1
A TM M = (K, Σ, Γ, δ, q0, F) is a 6-tuple, where

K is a finite set of states;




Σ is a finite set of input symbols;




Γ is a finite set of tape symbols, Σ ⊆ Γ, ∊ Γ is the blank symbol;




q0 in K is the initial state;




F ⊆ K is the set of final states; and




δ is a mapping from K × Γ into K × Γ × {L, R}.


NOTE

Turing machine mapping is defined in such a way that it is

2.
3.

In some formulations the head remaining stationary is

allowed. i.e., δ: K × Γ → K × Γ × {L, S, R}. But we shall stick to
{L, R} as remaining stationary can be achieved by two moves,
first moving right and then moving back left.

Next, we consider an instantaneous description (ID) of a TM.

Definition 9.2

An ID of a TM is a string of the form α1qα2, α1, α2 ∊ Γ*, q ∊ K.

This means that at that particular instance α1α2 is the content of

the tape of the TM. q is the current state and the tape head points
to the first symbol of α2. See Figure 9.2.
Figure 9.2. Contents of tape and head position for ID α1qα2

The relationship between IDs can be described as follows:

If X1 ... Xi − 1qXiXi + 1 ... Xm is an ID and δ(q, Xi) = (p, Y, R) then the

next ID will be X1...Xi − 1YpXi + 1 ... Xm.

If δ(q, Xj) = (p, Y, L) then the next ID will be X1... Xi − 2pXi − 1 YXi +

1 ... Xm. We denote this as:

X1 ... Xi − 1qXiXi + 1 ... Xm ⊢ X1 ... Xi − 2pXi − 1 Y Xi + 1 ... Xm.

Definition 9.3

A string w is accepted by the TM, M = (K, Σ, Γ, δ, q0, F) if

q0w ⊢* α1qfα2 for some α1, α2 ∊ Γ*, qf ∊ F. The language accepted
by the TM M is denoted as:

T(M) = {w|w ∊ Σ, q0w ⊢ α1qfα2 for some α1, α2 ∊ Γ*, qf ∊ F}

NOTE
1.

It should be noted that, by definition, it is not necessary for the

2.
3.

Usually, we assume that after going to a final state, the TM

halts, i.e., it makes no more moves.

4.
5.

A string w will not be accepted by the TM, if it reaches an

ID η1rη2 from which it cannot make a next move; η1η2 ∊
Γ*, r ∊ K and r is not a final state or while reading w, the TM
gets into a loop and is never able to halt.

Having given the formal definition of TM as an acceptor, let us

consider some examples.

Example 9.1.

Let us consider a TM for accepting {aibjck|i, j, k ≥ 1, i = j + k}.

The informal description of the TM is as follows. Consider Figure

9.3 which shows the initial ID.
Figure 9.3. Initial configuration of the TM for Example 9.1

The machine starts reading a ‘a’ and changing it to a X; it moves

Formally, M = (K, Σ, Γ, δ, q0, F)

K = {q0, q1, q2, q3, q4, q5, q6, q7, q8} F = {q8}

Σ = {a, b, c}

Γ = {a, b, c, X, Y, Z, }

δ is defined as follows:

δ(q0, a) = (q1, X, R)

In state q0, it reads a ‘a’ and changes it to X and moves right in q1.

δ(q1, a) = (q1, a, R)

In state q1, it moves right through the ‘a’s.

δ(q1, b) = (q2, Y, L)

When it sees a ‘b,’ it changes it into a Y.

δ(q2, a) = (q2, a, L)

δ(q2, Y) = (q2, Y, L)

In state q2, it moves left through the ‘a’s and Y’s.

δ(q2, X) = (q0, X, R)
When it sees a X, it moves right in q0 and the process repeats.

δ(q1, Y) = (q3, Y, R)

δ(q3, Y) = (q3, Y, R)

δ(q3, b) = (q2, Y, L)

After scanning the ‘a’s, it moves through the Y’s till it sees

a ‘b,’ then it converts it into a Y and moves left.

δ(q3, c) = (q4, Z, L)

When no more ‘b’s remain it sees a ‘c’ in state q3, changes that into

Z and starts moving left in state q4. The process repeats. After
matching ‘a’s and ‘b’s, the TM tries to match ‘a’s and ‘c’s.

δ(q4, Y) = (q4, Y, L)

δ(q4, a) = (q4, a, L)

δ(q4, X) = (q0, X, R)

δ(q3, Z) = (q5, Z, R)

δ(q5, c) = (q4, Z, L)

δ(q5, Z) = (q5, Z, R)

δ(q4, Z) = (q4, Z, L)

When no more ‘a’s remain it sees a Y in state q0 checks that all ‘b’s

and ‘c’s have been matched and reaches the final state q8.

δ(q0, Y) = (q6, Y, R)

δ(q6, Y) = (q6, Y, R)

δ(q6, Z) = (q7, Z, R)
δ(q7, Z) = (q7, Z, R)

δ(q7 ) = (q8, , halt)

Let us consider the move of the TM on an input aaabcc.

The sequence of moves is given below:

The string aaabcc is accepted as q8 is the final state.

Let us see how a string aaabbcc will be rejected. Let us trace the
sequence of IDs on aaabbcc

q0aaabbcc ⊢ Xq1aabbcc ⊢ Xaq1abbcc ⊢ Xaaq1bbcc ⊢ Xaq2aYbcc

⊢ Xq2aaYbcc ⊢ q2XaaYbcc ⊢ Xq0aaYbcc ⊢ XXq1aYbcc

⊢ XXaq1Ybcc ⊢ XXaYq3bcc ⊢ XXaYYcc ⊢ XXq2aYYcc

⊢ Xq2XaYYcc ⊢ XXq0aYYcc ⊢ XXXq1YYcc ⊢ XXXYq3Ycc

⊢ XXXYYq3cc ⊢ XXXYq4YZc⊢ XXXq4YYZc ⊢ XXq4XYYZc

⊢ XXXq0YYZc ⊢ XXXYq6YZc ⊢ XXXYYq6Zc ⊢ XXXYYZq7c

The machine halts without accepting as there is no move for (q7, c).

Let us see the sequence of moves for aaaabc:

q0aaabc ⊢ Xq1aaabc ⊢ Xaq1aabc ⊢ Xaaq1abc ⊢ Xaaaq1bc

⊢ Xaaq2aYc ⊢ Xaq2aaYc ⊢ Xq2aaaYc ⊢ q2XaaaYc

⊢ Xq0aaaYc ⊢ XXq1aaYc ⊢ XXaq1aYc ⊢ XXaaq1Yc

⊢ XXaaYq3c ⊢ XXaaq4YZ⊢ XXaq4aYZ ⊢ XXq4aaYZ
⊢ Xq4XaaYZ ⊢ XXq0aaYZ ⊢ XXXq1aYZ ⊢ XXXaq1YZ

⊢ XXXaYq3Z ⊢ XXXaYZq5

The machine halts without accepting as there is no further move

possible and q5 is not an accepting state.

It can be seen that only strings of the form ai + jbicj will be accepted.

Example 9.2.

Construct a TM which will accept the set of strings over Σ = {a, b}

beginning with a ‘a’ and ending with a ‘b.’

Though this set can be accepted by a FSA, we shall give

a TM accepting it.

M = (K, Σ, Γ, δ, q0, F) where

K = {q0, q1, q2, q3} F = {q3}

Σ = {a, b} Γ {a, b, X, }

δ is defined as follows:

δ(q0, a) = (q1, X, R)

δ(q1, a) = (q1, X, R)

δ(q1, b) = (q2, X, R)

δ(q2, a) = (q1, X, R)

δ(q2, b) = (q2, X, R)

δ(q2, ) = (q3, , halt)

Let us see how the machine accepts abab.

Example 9.3.

Construct a TM which will accept strings over Σ = {(,), #}, which

represent well-formed strings of parentheses placed between
#’s. Input is of the form #w# where w is a well-formed string of
parentheses. The TM M = (K, Σ, Γ, δ, , F) accepts this set, where

K = { , q0, q1, q2, q3, q4, q5} F =

Σ = {(,), #}

Γ = {(,), X, #, }

δ is defined as follows:

δ( , #) = (q0, #, R)

δ(q0, ( ) = (q0, (, R)

δ(q0, )) = (q1, X, L)

δ(q1, X) = (q1, X, L)

δ(q1, ( ) = (q0, X, R)

δ(q0, X) = (q0, X, R)

δ(q0, #) = (q2, #, R)

δ(q2, ) = (q3, , L)

δ(q3, #) = (q4, #, L)

δ(q4, X) = (q4, X, L)

δ(q4, #) = (q5, #, halt)

The sequence of moves on #((())())# is represented as follows:

Turing Machine as a Computing

Example 9.4. (Unary to binary converter)

The input is a string of a’s which is taken as the unary

δ(q0, a) = (q1, X, L)

δ(q1, X) = (q1, X, L)

δ(q1, ) = (q0, 1, R)

δ(q1, 0) = (q1, 1, R)

δ(q1, 1) = (q1, 0, L)

δ(q0, X) = (q0, X, R)

δ(q0, ) = (q2, , halt)

The machine works like a binary counter. When it has

converted j ‘a’s into X’s, it prints binary number j to the left of the
position where it started. Let us consider the working of the
machine on aaaaa. It should output 101X X X X X.

The machine starts in state q0 on the leftmost a.

Example 9.5. (Copy machine)

Given an input #w#, where w is a string of a’s and b’s, the machine

The state set is {q0, qa, qb, q1, q2, q3, q4, q5}.

Tape symbols are {#, a, b, X, Y, }

δ mappings are given by:

δ(q0, a) = (qa, X, R)

δ(qa, a) = (qa, a, R)

δ(q0, b) = (qb, Y, R)

δ(qb, a) = (qb, a, R)

δ(qa, b) = (qa, b, R)

δ(qa, #) = (qa, #, R)

δ(qa, ) = (q1, a, L)

δ(qb, b) = (qb, b, R)

δ(qb, #) = (qb, #, R)

δ(qb, ) = (q1, b, L)

δ(q1, a) = (q1, a, L)
δ(q1, b) = (q1, b, L)

δ(q1, #) = (q1, #, L)

δ(q1, X) = (q0, X, R)

δ(q1, Y) = (q0, Y, R)

δ(q0, #) = (q2, #, R)

δ(q2, a) = (q2, a, R)

δ(q2, b) = (q2, b, R)

δ(q2, ) = (q3, #, L)

δ(q3, a) = (q3, a, L)

δ(q3, b) = (q3, b, L)

δ(q3, #) = (q4, #, L)

δ(q4, X) = (q4, a, L)

δ(q4, Y) = (q4, b, L)

δ(q4, #) = (q5, #, halt)

The sequence of moves in input #abb# can be described as follows:

#q0abb# ⊢ #Xqabb# ⊢* #Xbb#qa⊢ #Xbbq1 #a ⊢* #q1Xbb#a

⊢ #Xq0bb#a ⊢ #XYqbb#a ⊢* #XYb#aqb ⊢ #XYb#q1ab

⊢* #XYq0b#ab ⊢ #XYYqb#ab ⊢* #XYY #abqb ⊢* #XYY #aq1bb

⊢* #XYY q0#abb ⊢ #XYY#q2abb ⊢* #XYY #abbq2 ⊢#XYY #abq3b#

⊢* #XYYq3#abb# ⊢ #XYq4Y #abb# ⊢* q4#abb#abb#

⊢ q5#abb#abb#.
Example 9.6.

To find the reversal of a string over {a, b}.

Input is #w#, w ∊ {a, b}+.

Output is #wR#.

The set of states is {q0, q1, q2, q3, q4, qa, qb}.

The tape symbols are {#, a, b, }.

Initial tape head position is on the last symbol of w

δ is given by:

δ(q0, a) = (qa, X, R)

δ(q0, b) = (qb, X, R)

δ(qa, #) = (qa, #, R)

δ(qb, #) = (qb, #, R)

δ(qa, a) = (qa, a, R)

δ(qa, b) = (qa, b, R)

δ(qb, a) = (qb, a, R)

δ(qb, b) = (qb, b, R)

δ(qa, X) = (qa, X, R)

δ(qb, X) = (qb, X, R)

δ(qa, ) = (q1, a, L)

δ(qb, ) = (q1, b, L)

δ(q1, a) = (q1, a, L)

δ(q1, b) = (q1, b, L)
δ(q1, #) = (q0, #, L)

δ(q0, X) = (q0, X, L)

δ(q0, #) = (q2, , R)

δ(q2, X) = (q2, , R)

δ(q2, #) = (q2, #, R)

δ(q2, a) = (q2, a, R)

δ(q2, b) = (q2, b, R)

δ(q2, ) = (q3, #, R)

δ(q3, ) = (q4, , halt)

If the input is the machine

proceeds to copy the symbols one by one after the second # in the
reverse order, changing the original symbols into X’s.

When it realizes that it has finished copying the symbols in the

reverse order, it erases the first # and the X’s, moves right and after
the sequence of a’s and b’s, prints a # and halts.

It should be noted the reversed string appears after the second #. It

would be an interesting exercise to reverse the string in place. i.e.,
if #w# is the input, the machine halts with #wR# in the same
location.

Example 9.7.
Given two integers i and j, i > j, to compute the quotient and
reminder when i is divided by j.

The input is

with the tape head positioned on the leftmost ‘b’ in the initial
state q0.

The output is

where k is the quotient when i is divided by j and l is the

remainder. The TM which does this is described as follows:

The TM converts the b’s into Y’s and a’s into X’s one by one. When

it sees no more b’s it prints a ‘c’ after the # meaning j has been
subtracted from i once.

This repeats as many times as possible. Each time a ‘c’ is printed.

The mappings are given by:

δ(q0, b) = (q1, Y, L)

changes ‘b’ to Y and moves left.

δ(q1, Y) = (q1, Y, L)
δ(q1, #) = (q2, #, L)

δ(q2, a) = (q2, a, L)

moves left.

δ(q2, #) = (q3, #, R)

δ(q2, X) = (q3, X, R)

when the leftmost # or an X is seen, the head starts moving right.

δ(q3, a) = (q4, X, R)

one ‘a’ is changed into X

δ(q4, a) = (q4, a, R)

δ(q4, #) = (q5, #, R)

δ(q5, Y) = (q5, Y, R)

moves right

δ(q5, b) = (q1, Y, L)

process starts repeating

δ(q5, #) = (q6, #, R)

all ‘b’s have been converted to Y’s

δ(q6, c) = (q6, c, R)

δ(q6, ) = (q7, c, L)

one ‘c’ is printed

δ(q7, c) = (q7, c, L)
δ(q7, #) = (q8, #, L)

moves left

δ(q8, Y) = (q8, b, L)

Y’s are changed back to ‘b’s

δ(q8, #) = (q0, #, R)

process starts repeating

δ(q3, #) = (q9, #, R)

all ‘a’s have been changed. Now the number of ‘c’s represents the
quotient. Y’s represent the remainder.

δ(q9, Y) = (q9, Y, R)

δ(q9, b) = (q9, b, R)

δ(q9, #) = (q10, #, R)

δ(q10, c) = (q10, c, R)

moves right

δ(q10, ) = (q11, #, L)

# is printed after the ‘c’s

δ(q11, c) = (q11, c, L)

δ(q11, #) = (q12, #, L)

δ(q12, b) = (q12, b, L)

δ(q12, Y) = (q13, b, R)
δ(q13, b) = (q13, b, R)

δ(q13, #) = (q14, #, R)

δ(q14, c) = (q14, c, R)

δ(q14, #) = (q15, #, R)

δ(q15, d) = (q15, d, R)

δ(q15, ) = (q16, d, L)

δ(q16, d) = (q16, d, L)

δ(q16, #) = (q11, #, L)

Y’s are copied as ‘d’s

δ(q12, #) = (q17, #, R)

after all Y’s have been copied as ‘d’s the process starts finishing

δ(q17, b) = (q17, b, R)

δ(q17, #) = (q18, #, R)

δ(q18, c) = (q18, c, R)

δ(q18, #) = (q19, #, R)

δ(q19, d) = (q19, d, R)

δ(q19, ) = (q20, , L)

δ(q20, d) = (q21, #, halt)

The move of a TM can be represented as a state diagram:

means the TM when in state p and reading X, prints a Y over X,
goes to state q and moves right.

The state diagram for Example 9.2.2 (copy) can be represented as:

Techniques for Turing Machine

Considering the state as a tuple

Considering the tape symbol as a tuple

Checking off symbols

We use one track of the tape to mark that some symbols have been
read without changing them.

Example 9.8.

Consider a TM for accepting

w#w#w, w ∊ {a, b}*

A tape having two tracks is considered.

The mappings can be defined as follows:

δ(q0, [a, ]) = ([q, a], [a, √], R)

δ(q0, [b, ]) = ([q, b], [b, √], R)

The machine reads the leftmost symbol, marks it and remembers

whether it is a ‘a’ or ‘b’ by storing it in the state:

δ([q, a], [a, ]) = ([q, a], [a, ], R)

δ([q, a], [b, ]) = ([q, a], [b, ], R)

δ([q, b], [a, ]) = ([q, b], [a, ], R)

δ([q, b], [b, ]) = ([q, b], [b, ], R)

The head passes through symbols in the first block to the right:

δ([q, a], [#, ]) = ([p, a], [#, ], R)

δ([q, b], [#, ]) = ([p, a], [#, b], R)

When the head encounters a # in the first track, the first

component of the state is changed to p.

δ([p, a], [a, √]) = ([p, a], [a, √], R)

δ([p, a], [b, √]) = ([p, a], [b, √], R)

δ([p, b], [a, √]) = ([p, b], [a, √], R)

δ([p, b], [b, √]) = ([p, b], [b, √], R)

δ([p, a], [a, ]) = ([r, a], [a, √], R)

δ([p, b], [b, ]) = ([r, b], [b, √], R)

When it encounters a first unchecked symbol it marks it by putting

a √ in the second track and changes the first component of the
state to r.

δ([r, a], [a, ]) = ([r, a], [a, ], R)

δ([r, b], [a, ]) = ([r, b], [a, ], R)

δ([r, a], [b, ]) = ([r, a], [b, ], R)

δ([r, b], [b, ]) = ([r, b], [b, ], R)

The head moves through the second block without changing

symbols, when the first component of the state is r:

δ([r, a], [#, ]) = ([s, a], [#, ], R)

δ([r, b], [#, ]) = ([s, b], [#, ], R)

When it encounters a # in the first track it moves right into the

third block changing the first component of the state to s.

δ([s, a], [a, √]) = ([s, a], [a, √], R)

δ([s, a], [b, √]) = ([s, a], [b, √], R)

δ([s, b], [a, √]) = ([s, b], [a, √], R)

δ([s, b], [b, √]) = ([s, b], [b, √], R)

It moves right looking for the unchecked symbol:

δ([s, b], [b, ]) = (t, [b, √], L)

δ([s, a], [a, ]) = (t, [a, √], L)

When it encounters an unchecked symbol in the third block it

marks it by putting a √ in the second track and starts moving left.

δ(t, [a, √]) = (t, [a, √], L)

δ(t, [b, √]) = (t, [b, √], L)

δ(t, [#, ]) = (t′, [#, ], L)

It moves into the second block in state t′:

δ(t′, [a, ] = (t′, [a, ], L)

δ(t′, [b, ] = (t′, [b, ], L)

δ(t′, [a, √]) = (t′, [a, √], L)

δ(t′, [b, √]) = (t′, [b, √], L)

It moves left in the second block.

δ(t′, [#, ]) = (t″, [#, b], L)

It moves left into the first block in state t″.

δ(t″, [a, ]) = (t″, [a, ], L)

δ(t″, [b, ]) = (t″, [b, ], L)

It moves left in the first block through unchecked symbols.

When it encounters a checked symbol, it moves right in

state q0 and the whole process repeats.

δ(t″, [a, √]) = (q0, [a, √], R)

δ(t″, [b, √]) = (q0, [b, √], R)

δ(q0, [#, ]) = (q1, [#, ], R)

δ(q1, [a, √]) = (q1, [a, √], R)

δ(q1, [b, √]) = (q1, [b, √], R)

If it encounters an unchecked symbol, it halts without accepting:

δ(q1, [a, ]) = (qn, [a, ], R)

δ(q1, [b, ]) = (qn, [b, ], R)

If it finds all symbols are checked in the second block, it moves to

the third block in state q2:

δ(q1, [#, ] = (q2, [#, ], R)

In the third block, it checks whether all symbols have already been
checked. If so, it halts in accepting state qy. Otherwise, halts in
non-accepting state qn.

δ(q2, [a, √]) = (q2, [a, √], R)

δ(q2, [b, √]) = (q2, [b, √], R)

δ(q2, [a, ]) = (qn, [a, ], R)

δ(q2, [b, ]) = (qn, [b, ], R)

δ(q2, [ , ]) = (qy, [ , ], R)

δ([p, b], [#, ]) = (qn, [#, ], R)

If the input has equal symbols in the first and second block but less
symbols in the third block, the machine encounters [ , ] in state
[s, b] or [s, a] and halts without accepting:

δ([s, a], [ , ]) = (qn, [ , ], R

δ([s, b], [ , ]) = (qn, [ , ], R)

Thus, we find that having two tracks and using the second track to
check off symbols is a useful technique.

When we consider a single tape multi-track TM, we really take the

tape symbol as a tuple. This need not be considered as a variation
of TM.

The ID

changes to
Next, the TM reads ai + 2, storing it in the fourth component, and
shifting ai + 1 from fourth component to the third component.

δ([q, –, –, ai + 1], ai + 2) = ([q, –, ai + 1, ai + 2], X, R)

Similarly, δ([q, –, ai + 1, ai + 2], ai + 3) = ([q, ai + 1, ai + 2, ai + 3]X, R)

When it reads ai + 4, it deposits ai + 1 in that cell

δ([q, ai + 1, ai + 2, ai + 3], ai + 4) = ([q, ai + 2, ai + 3, ai + 4], ai + 4, R)

In general:

δ([q, aj, aj + 1, aj + 2], aj + 3) = ([q, aj + 1, aj + 2, aj + 3], aj, R) i + 1≤ j ≤ n

where an + 1, an + 2, an + 3 is blank symbol . Finally, it starts

moving left δ([q, an, , ], ) = (q′, an, L).

In q′, it moves left till it finds A X X X and replaces it by abcd. A

is .

In this case, we can write the mappings for a TM Msub which when

In order that a TM M1 uses another TM M2 as a subroutine, the

In this chapter, we have considered the definition of a TM and

some techniques for TM construction. In the next three chapters,
we shall study more about TMs and computability.

Problems and Solutions

1. Consider the following TM M′ with transitions as follows:
δ(q0, 1) = (q1, 0, R)

δ(q1, 1) = (q1, 1, R)

δ(q1, 0) = (q2, 1, R)

δ(q2, 0) = (q3, 0, L)

δ(q3, 0) = (q0, 0, R)

δ(q3, 1) = (q3, 1, L)

q0 is the initial state and 0 is taken as blank symbol.
1.
Trace the sequence of moves when the machine is started on
2.

3.
4.
What happens when it is started on:
5.
1.

2.
3.

Solution
a.
.
q0 1 1 1 1 0 0 0 1 1

0 q1 1 1 1 0 0 0 1 1

0 1 q1 1 1 0 0 0 1 1

0 1 1 q1 1 0 0 0 1 1

0 1 1 1 q1 0 0 0 1 1

0 1 1 1 1 q2 0 0 1 1

0 1 1 1 q3 1 0 0 1 1

0 1 1 q3 1 1 0 0 1 1

0 1 q3 1 1 1 0 0 1 1

0 q3 1 1 1 1 0 0 1 1

q3 0 1 1 1 1 0 0 1 1

0 q0 1 1 1 1 0 0 1 1

0 0 q1 1 1 1 0 0 1 1

0 0 1 q1 1 1 0 0 1 1

0 0 1 1 q1 1 0 0 1 1

0 0 1 1 1 q1 0 0 1 1
0 0 1 1 1 1 q2 0 1 1

0 0 1 1 1 q3 1 0 1 1

0 0 1 1 q3 1 1 0 1 1

0 0 1 q3 1 1 1 0 1 1

0 0 q3 1 1 1 1 0 1 1

0 q3 0 1 1 1 1 0 1 1

0 0 q0 1 1 1 1 0 1 1

0 0 0 q1 1 1 1 0 1 1

0 0 0 1 q1 1 1 0 1 1

0 0 0 1 1 q1 1 0 1 1

0 0 0 1 1 1 q1 0 1 1

0 0 0 1 1 1 1 q2 1 1

No move for (q2, 1); machine halts with output

The first block of 1’s is shifted step by step to the right till it becomes adjacen
second block of 1’s.

It will halt when the ID is as below:

First block of 1’s will be shifted to the right till it is adjacent to the second bl
third block of 1’s is not affected.

When there is only one block of 1’s it gets shifted one cell to the right and th
repeats. It never stops as there is no second block of 1’s.

Since the machine starts with the third 1 in the first block, the portion of the f
from this point is shifted to the right till it becomes adjacent to the second blo

Solution The transition table is as follows. Here q3 is the (halt or) final state:
.
0 1 #

q0 (q0, 0, R) (q0, 1, R) (q1, #, R)

q1 (q1, 0, R) (q2, 1, R) –

q2 (q3, 0, L) – –

q3 – – –

Solution The machine mainly has to move to the right-hand end, read the character, to i
. 1 or #. Then, move it to the leftmost end and halt. The transitions are:
0 1 #

q0 (q1, 0, L) (q0, 1, R) (q0, #, R)

q1 (q4, 0, R) (q2, 0, L) (q3, 0, L)

q2 (q4, 1, L) (q2, 1, L) (q2, #, L)

q3 (q4, #, L) (q3, 1, L) (q3, #, L)

q4 (q5, 0, R) – –

Solution 1.
. The function to be computed is f(n) = n + 1.
2.

Input is
3.

Output is
4.
The transition table is given below:
5.
0 1 #

q0 – – (q1, #, L) –

q1 (q4, 1, L) (q2, 0, L) (q3, 1, L) –

q2 (q4, 1, L) (q2, 0, L) (q3, 1, L) –

q3 – – – (q4, #, L

q4 – – – –
6.
7.
The function to be computed is g(n) = 2n.
8.

Input is
9.
Output is
10.
0 1 #

q0 (q0, 0, R) (q0, 1, R) (q1, 0, R) –

q1 – – – (q2, #, L

q2 – – – –
11.

Exercises
1. Draw a state diagram for a TM accepting each of the following languages:
1.
{x ∊ {0, 1}*|#1(x) = 2#0(x) + 1}.
2.
3.
The language of all non-palindromes over {a, b}.
4.

2. Consider the TM whose state transition is given below:

δ(q0, 1) = (q1, 0, R)

δ(q1, 0) = (q2, 1, R)

δ(q2, 1) = (q3, 0, L)

δ(q3, 0) = (q0, 1, R)

δ(q0, 0) = (q0, 0, R)

δ(q1, 1) = (q1, 1, R)

δ(q2, 0) = (q2, 0, R)

3. Consider the TM described by the following state diagram.

4. Design a TM for the following job. When started anywhere on a tape that is blank ex
the machine eventually halts with that 1 under its reading head. The remainder of the
when the machine halts.

5. Design a TM that behaves as follows:

6. Construct a TM to carry out the following operations:

7. Consider the following TM transitions:

Here, q0 is the initial state, and q2 is the final state.
δ(q0, a) = (q1, b, R)

δ(q0, b) = (q3, , R)

δ(q0, ) = (q2, , L)

δ(q1, a) = (q1, b, R)

δ(q1, b) = (q1, a, R)

δ(q1, ) = (q2, , L)

δ(q3, a) = (q4, , R)

δ(q3, b) = (q3, b, R)

δ(q4, a) = (q1, b, R)
1.
Give four words accepted by the TM together with their configuration sequences.
2.
3.
Give four words that are not accepted by the TM and in each case explain why not.
4.

8. Design a TM that when started on any tape pattern of the form:

...01n01x00...(n > 0, x ≥ 0)
eventually halts with the pattern
...01n01x + n00...
on its tape. The new pattern is to start in the same square as the given pattern.

10. For each of the following languages, construct a TM that recognizes the language:
1.
{xyx|x and y are in {a, b}*and|x| > 1}
2.
3.
{aibicjdj|i ≠ j}
4.
5.
{aba2b2a3b3 ... anbn|n ≥ 0}
6.

11. Consider the TM with input alphabets {a, b}, start state q0 with the following transiti
δ(q0, ) = (q1, , R)

δ(q1, a) = (q1, a, R)

δ(q1, b) = (q1, b, R)

δ(q1, ) = (q2, , L)

δ(q2, a) = (q3, , R)

δ(q2, b) = (q5, , R)

δ(q2, ) = (q2, , N)

δ(q3, ) = (q4, a, R)

δ(q4, a) = (q4, a, R)
δ(q4, b) = (q4, b, R)

δ(q4, ) = (q7, a, L)

δ(q5, ) = (q6, b, R)

δ(q6, a) = (q6, a, R)

δ(q6, b) = (q6, b, R)

δ(q6, ) = (q7, b, L)

δ(q7, a) = (q7, a, L)

δ(q7, b) = (q7, b, L)

δ(q7, ) = (q2, , L)

1.
What is the final configuration if the input is ab ?
2.
3.
What is the final configuration if the input is baa ?
4.
5.
Describe what the TM does for an arbitrary input string in {a, b}*.
6.

12. Construct a TM to accept the language {aibj|i < j}.

13. Construct a TM to accept the language

{w ∊ {a, b}*|w contains the same number of a′s and b′s}

14. Construct a TM to accept the language {w ∊ {a, b}*|w = wR}.

15. Construct a TM to compute the following functions. Let the input x be represented in
1.
f (x) = x + 2
2.
3.
f (x) = 2x
4.
5.
f (x) = x mod 2.
6.

16. Give informal arguments which explain why TMs are more powerful than PDAs.

Chapter 10. Variations of Turing
Machines
In Chapter 9, we defined the computability model called “Turing
Machine” (TM). This model is one of the most beautiful, simple,
and an useful abstract model. We had elaborate discussion of this
model through various examples. One can think of variations of the
basic model in many ways. For example, one can work with two
tapes, instead of a single tape. These models are got by adding
extra components and power to the control and hence, appear to
be more powerful than the basic model, but they are not. We could
also consider some restricted version of the basic model. In this
chapter we are considering such variants and discuss their
computing capabilities. We note that the power is not increased by
adding extra components and not decreased by considering the
restricted versions.

Generalized Versions
In this section, we consider the following generalized versions of
the basic model and show that they are equivalent to the basic
model as far as accepting power is concerned. The variants are:

1.
Turing machines with two-way infinite tapes
2.
3.
Multitape TM
4.
5.
Multihead TM
6.
7.
Non-deterministic TMs
8.
9.
Turing machines with 2-dimensional tapes
10.

Two-Way Infinite Tape TM

A two-way infinite tape Turing machine (TTM) is a TM with its
input tape infinite in both directions, the other components being
the same as that of the basic model. We observe from the following
theorem that the power of TTM is no way superior of that of the
basic TM.

That a one-way TM M0 can be simulated by a two-way TM MD can

be seen easily. MD puts a # to the left of the leftmost nonblank and
moves its head right and simulates M0. If MD reads # again, it halts
rejecting the input as this means M0 tries to move off the left end
of the tape.

Theorem 10.1

For any TTM MD = (K, Σ, Γ, δ, q0, F) there exists an equivalent TM

MD.

Proof. The tape of the MD at any instance is of the form

... a−2 a−1 a0 a1 a2 ... an

where a0 is the symbol in the cell scanned by MD initially. M0 can
represent this situation by two tracks:

a0 a1 a2 ...

# a−1 a−2 ...

When MD is to the right of a0, the simulation is done on the upper

track.

When MD is to the left of a0, the simulation is done in M0 on the

lower track.

The initial configuration would be:

a0 a1 ... an

M0 = (K′, Σ′, Γ′, δ′, , F′) where

K′ = { }∪ (K × {1, 2})

Σ′ = Σ×{ }

Γ′ = Γ × (Γ ∪{#})

F′ = {[q, 1], [q, 2]|q ∊ F}

δ′ is defined as follows:
If δ (q, a)= (q′, c, L/R) and if the head of MD is to the right of a0 we
have: δ([q, 1], [a, b]) = ([q′, 1], [c, b], L/R); simulation is done on the
upper track 1.

If MD is to the left of the initial position, simulation is done on the

lower track.

If δ(q, b) = (q′, c, L/R)

δ′ ([q, 2], [a, b]) = ([q′, 2], [a, c], R/L)

The initial move will be:

if δ(q0, a0) = (q, A, R).

if δ(q0, a0) = (q, A, L).

When reading the leftmost symbol MD behaves as follows:

If δ(q, a) = (p, A, R)

δ′([q, 1/2], [a, #]) = ([p, 1], [A, #], R).

If δ(q, a) = (p, A, L)
δ′([q, 1/2], [a, #]) = ([p, 2], [A, #], R).

while simulating a move when MD is to the left of the initial

position, M0 does it in the lower track always moving in a direction
opposite to that of MD. If MD reaches an accepting
state qf, M0 reaches [qf, 1] or [qf, 2] and accepts the input.

Multi-tape TM
Definition 10.1

A multi-tape TM is a TM, with n tapes each having a separate tape

head. The move of this TM will depend on the state and symbol
scanned by each head. In each tape, a symbol is printed on the
cell scanned and each head moves left or right independently
depending on the move.

Suppose we have a 3-tape TM (Figure 10.1).

Figure 10.1. TM with three tapes

The symbols scanned by the heads are A, B and C respectively.

Then, the mapping will be of the form:

δ(q, A, B, C) = (q′, (A′, L), (B′, R), (C′, R))

The state changes to q.

In the first tape, A′ is printed over A and the tape head moves left.
In the second tape, B′ is printed over B and the tape head moves
right while in the third tape C′ is printed over C, the tape head
moving right.

Theorem 10.2

A multi-tape TM can be simulated by a single tape TM.

Proof. Let M = (K, Σ, Γ, δ, q0, F) be a k-tape TM. It can be

simulated by a single tape TM M′ having 2k tracks. Odd numbered
tracks contain the contents of M’s tapes.

Table 10.2. TM with one tape having six tracks

... A ...

... X ...

... ... B

... ... X

C ... ...

X ... ...

Even-numbered tracks contain the blank symbols excepts in one

position where it has a marker X. This specifies the position of the
head in the corresponding M’s tape. The situation in Figure 10.1 is
represented by a 6-track TM as given in Figure 10.2.

To simulate a move of the multi-tape TM M, the single tape TM M′

makes two sweeps, one from left to right and another from right to
left. It starts on the leftmost cell, which contains a X in one of the
even tracks. While moving from left to right, when it encounters
a X, it stores the symbol above it in its finite control. It keeps a
counter as one of the component of the state to check whether it
has read the symbols from all the tapes. After the left to right move
is over, depending on the move of M determined by the symbols
read, the single tape TM M′ makes a right to left sweep changing
the corresponding symbols on the odd tracks and positioning
the X’s properly in the even-numbered tracks. To simulate one
move of M, M′ roughly takes (it may be slightly more depending on
the left or right shift of X) a number of steps equal to twice the
distance between the leftmost and the rightmost cells containing
a X in an even-numbered track. When M′ starts all X’s will be in
one cell. After i moves the distance between the leftmost and
rightmost X can be at most 2i. Hence, to simulate n moves of M, M′

roughly takes steps. If M reaches a final

state, M′ accepts and halts.

An off-line TM is a multi-tape TM with a read only input tape. In

this tape, input w is placed with end markers as ¢w$ and symbols
cannot be rewritten on this tape.

Multi-head TM
Definition 10.2

A multi-head TM is a single tape TM having k heads reading

symbols on the same tape. In one step, all the heads sense the
scanned symbols and move or write independently.

Theorem 10.3

A multi-head TM M can be simulated by a single head TM M′.

Proof. Let M have k heads. Then, M′ will have k + 1 tracks on a

single tape. One track will contain the contents of the tape of M and
the other tracks are used to mark the head positions. One move
of M is simulated by M′ by making a left to right sweep followed by
a right to left sweep. The simulation is similar to the one given
in Theorem 10.2. One fact about which one has to be careful here
is the time when two heads scan the same symbol and try to
change it differently. In this case, some priority among heads has
to be used.

Non-deterministic Turing Machine (NTM)

One can define a NTM by the following transition function:

δ: K × Γ → (K × Γ × {L, R})

where δ is a mapping from K × Γ to the power set of K × Γ × {L, R}.

The computation of any input will be in several directions and the
input is accepted if there is at least one sequence of moves, which
accepts it.

A NTM is N = (K, Σ, Γ, δ, q0, F) where K, Σ, Γ, q0, F as in the

definition of TM. δ is defined as above.

Theorem 10.4

Every NTM can be simulated by a deterministic TM (basic model).

Proof. The computation of a NTM is a tree whose branches

correspond to different possibilities for the machine movement on
the given input. If some branch of the computation leads to the
accept state, then, the machine accepts the input. The root of the
tree would be the start configuration and each node is a possible
continuation from the root node.

Computation of a NTM ‘N’ on any input w is represented as a tree.

Each branch is a branch of nondeterminism. Each node is a
configuration of N. Root will be the start configuration. One has to
traverse the whole tree in a ‘breath-first’ manner to search for a
successful path. One cannot proceed by ‘depth-first’ search as the
tracing may lead to an infinite branch while missing the accepting
configurations of some other branches.

Using a multi-tape TM one can simulate N on a given input. For

each path of the tree, simulation is done on a separate tape. The
paths are considered one-by-one in the increasing order of depth
and among paths of equal length, the paths are considered from
left to right.

Suppose every node in the tree has at most b children. Let every

node in the tree has address, which is a string over the alphabet
Σb = {1, 2,..., b} (say). To obtain a node with address 145, start at
the root going to its child numbered 1, move to its 4th child and
then move to the nodes that corresponds to its 5th child. Ignore
addresses that are meaningless. Then, in a breath-first manner
check the nodes (configurations) in canonical order as ε, 1, 2,
3,..., b, 11, 12, 13,...,1b, 21, 22,...,2b,...,111, 112, ... (if they exist).
Then, DTM on input w = a1 ... an works as follows. Place w on the
input tape and the others are empty. Copy the contents of the input
tape to simulation tape. Then, simulate NTM’s one non-
deterministic branch on the simulation tape. On each choice,
consult the address tape for the next move. Accept if the accepting
configuration is reached. Otherwise abort this branch of simulation.
The abortion will take place for the following reasons.


symbols on address tape are all used;




rejecting configurations encountered; and




non-deterministic choice is not a valid choice.

Once the present branch is aborted, replace the string on the

address tape with the next string in canonical order. Simulate this
branch of the NTM as before.

Two-Dimensional TM
The TM can have 2-dimensional tapes. When the head is scanning
a symbol, it can move left, right, up or down. The smallest
rectangle containing the non-blank portion is m × n, then it
has m rows and n columns. A 1-dimensional TM, which tries to
simulate this 2-dimensional TM will have two tapes. On one tape,
this m rows of n symbols each will be represented as m blocks of
size n each separated by markers. The second tape is used as
scratch tape. When the 2-dimensional TM’s head moves left or
right, it is simulated in a block of the 1-dimensional TM. When the
2-dimensional TM’s head moves up or down, the 1-dimensional
TM’s head moves to the previous block or the next block. To move
to the correct position in that block, the second tape is used.
If m or n increases, number of blocks or the size of the blocks is
increased.
Restricted Turing Machines
In this section we can discuss some more variations of TMs, which
are in the form of restrictions. For example, we have an offline TM
that has a restriction on the input tape. One can view the
development of TM from finite state automata in a hierarchical
way. When viewed as language recognizers FSA, PDA models could
not recognize some languages, whereas a TM can do so. One can
view the tape of the basic TM as input tape, output tape, and
processing space. For example, a PDA is equivalent to a NTM with
input tape, resembling the input tape of the PDA, the storage tape
resembling the stack of the PDA. Now the processing of NTM can
simulate the processing of the PDA. Hence, any computing can be
simulated by a TM. But there are some standard ways a TM can be
restricted leading to multi-stack TMs, counter machines, etc,
without losing out on accepting power.

Definition 10.3

A deterministic TM with read only input and two storage tape is

called a deterministic two stack Turing machine (DTSTM). When
the head of the DTSTM tries to move left on the tapes, a blank
symbol will be printed.

One can easily simulate a TM with a DTSTM. At any point of time,

one can see the symbol being scanned by the head of the TM,
placed on the top of one stack, the symbols to its left on this stack
below the symbols scanned, placing the symbols closer to the
present head position, closer to the top of the stack. Similar
exercise is done for the symbols to the right of the present head
position by placing them in the second stack. Hence, clearly a move
of the TM has a corresponding action on the input and stacks of
the DTSTM and the simulation can be done.

Theorem 10.5

There exists a DTSTM that simulates the basic TM on any given

input.
The next variant is a ‘counter’ machine. A ‘counter’ machine can
store finite number of integers each counter storing a number. The
counters can be either increased or decreased and cannot cross a
‘stack’ or ‘counter’ symbol ‘Z.’ In other words, it is a machine with
stacks having only two stack symbols Z and (blank). Every stack
will have Z as its initial symbol. A stack may hold a string of the
form , i ≥ 0, indicating that the stack holds an integer i in it. This
stack can be increased or decreased by moving the stack head up
or down. A counter machine with 2-stacks is illustrated in the
following figure.

Theorem 10.6

For a basic TM, there exists an equivalent 4-counter TM.

Proof. The equivalence is shown between a 2-stack TM (DTSTM)

and a 4-counter machine. We have already seen that a DTSTM
and basic TM are equivalent.

We now see how to simulate each stack with two counters.

Let X1, X2, ..., Xt−1 be the (t−1) stack symbols. Each stack content
can be uniquely represented by an integer in base ‘t.’
Suppose Xi1 Xi2 ... Xin is the present stack content with Xin on the
top in DTSTM. Then, the integer count will be
k = in + tin−1 + t2in−2 + ... + tn−1i1.

For example, if the number of stack symbols used is 3, and an

integer for the stack content X2X3X1X2 will be:

k = 2 + 4 + 42.3 + 43.2 = 172.

Suppose Xr is to be put on the top of the stack, then the new

integer counter has to be kt + r. The first counter contains k and the
second counter contains zero at this point. To get kt + r in the
second counter, the counter machine has to move the first counter
head to the left by one cell and move the head of the second
counter t cells to the right. Thus, when the first counter head
reaches ‘Z’ the second counter contains kt. Now add r to the
second counter to get kt + r.

If it is a clear move of the stack, Xik is to be cleared. Then k has to

be reduced to , the integer part of k/t. Now, the adjustment is
decrementing the count of the first counter in steps and increment
the second counter by one. This repeats till the first counter
becomes zero.

Now by the above exercise, one is able to identify the stack symbol
on the stack from the two counters thus designed. That
is k mod t is the index in and hence, Xin is the top symbol of the
stack.

The above theorem can be improved and hence, we have the

following theorems.

Theorem 10.7
For a basic TM, there exists an equivalent 3-counter TM.

Proof. The idea of the simulation is similar to the previous theorem.

Instead of having two counters for adjusting the two counters that
correspond to two stacks, one common counter is used to adjust
the operation of pop, push or change of the stack symbols.

Theorem 10.8

For a basic TM, there exists an equivalent 2-counter TM.

Proof. The simulation is now done using the previous theorem.

One has to simulate the 3-counters by using 2-counters to get an
equivalent result. Let i, j, and k be the numbers in the three
counters. We have to represent these by a unique integer. Let m =
2i3j5k be the integer. Put this in one counter.

To increment say i, one has to multiply m by 2. This can be done

using the second counter as we have done earlier. Similarly
for j and k increment can be done using the second counter. Any
of i, j, k will be zero whenever m is not divisible by 2, 3 or 5
respectively. For example, to say whether j = 0, copy m to the
second counter and while copying store in finite control
whether m is divisible by 2, 3 or 5. If it is not divisible by 3, then j =
0.

Finally to decrease i, j, k, divide m by 2, 3, 5, respectively. This

exercise is also similar to the previous one except that the machine
will halt whenever m is not divisible by a constant by which we are
dividing.

A TM can be restricted by reducing the tape alphabet or by

reducing the number of states. One can also aim at reducing the
number of tape symbols to achieve universality. There exists a TM
with three states, one tape but any amount of tape symbols to
recognize recursively enumerable languages. Following is one such
characterization.

Theorem 10.9

There exists a TM with one tape and tape alphabet {0, 1, } to
recognize any recursively enumerable language L over {0, 1}.

Proof. Let M = (K, {0, 1}, Γ, δ, q0, F) be a TM recognizing L. Now

the tape alphabet Γ can be anything. Our aim is to construct an
equivalent TM with Γ = {0, 1, }. For that we encode each symbol of
Γ. Suppose Γ has ‘t’ symbols. We use binary codes to code each
symbol of Γ by ‘k’ bits where 2k − 1 < t < 2k.

We now have to design another TM M′ with Γ = {0, 1, }. The tape
of M′ will consists of coded symbols of Γ and the input over {0, 1}.
The simulation of one move of M by k moves of M′ is as follows.
The tape head of M′ is initially at the leftmost symbol of the coded
input. M′ has to scan the next k − 1 symbols to its right to make a
decision of change of state or overwrite or move left or right as
per M. The TM M′ stores in its finite control the state of M and the
head position of M′ which is a number between 0 to k−1. Hence, M′
clearly indicates at the end of a block of moves whether one move
of M is made or not. Once the finite control indicates ‘0’ as head
position, it means this is time for the change on the tape, state as
per M’s instruction. If the thus changed state is an accepting state
of M, M′ accepts.

One observation is that on the tape of M′, there has to be code for
the blank symbol of M. This is essential for simulating the blank
of M by M′. Second observation is that any tape symbol of M is
directly coded in terms of 0s and 1s and as any string w is placed
on the tape of M, the codes for each symbol of w is concatenated
and placed on the tape of M′.
One can see from the previous result that even if the input for L is
not over {0, 1}, the above TM construction will work, because the
input of L over some other alphabet will be now coded and placed
as input for a TM with tape alphabet {0, 1, }.

Also one can see that one can construct a multi-tape TM that uses
only two symbols 0 and 1 as tape alphabet to simulate any TM. One
has to keep the input tape fixed with the input. There will be a
second tape with tape symbols coded as binary symbols. This
simulates moves of the original TM. The newly constructed TM
must have positions on the tape to indicate the present head
position, cells to indicate that the binary representation of the
symbol under scan is already copied. Each ID is copied on the third
tape after simulation of one move. If we take the input alphabet {0,
1}, we start with the second tape (first tape is not necessary). In the
third tape, IDs are copied one by one without erasing. Thus, we
have the following result.

Theorem 10.10

Any TM can be simulated by an offline TM having one storage tape

with two symbols 0 and 1 where 0 indicates blank. A blank (0) can
be retained as 0 or replaced by 1 but a ‘1’ cannot be rewritten at a
cell with ‘0’.

Turing Machines as Enumerators

The languages accepted by the TM are recursively enumerable sets.
One can think of a TM as generating a language. An enumerator is
a TM variant which generates a recursively enumerable language.
One can think of such a TM to have its output on a printer (an
output tape). That is, the output strings are printed by the output
device printer. Thus, every string that is processed freshly, is added
to the list, thereby printing it also.

The enumerator machine has an input tape, which is blank

initially, an output device which may be a printer. If such a
machine does not halt, it may perform printing of a list of strings
infinitely. Let M be an enumerator machine and G(M) be the list of
strings appearing on output tape. We have the following theorem
for M.

Theorem 10.11

A language L is recursively enumerable if and only if there exists

an enumerator M such that G(M) = L.

Proof. Let M be an enumerator such that L = G(M). To show that

there exists a TM recognizing L. Let w be an input for .
Perform the following two steps on w.

Run M and compare each output string of M with w.

2.
3.

If w appears on the output tape of M, accept w.

That is, accepts only those strings that appear on the output
tape of M. Hence .

Conversely let be a TM such that . We construct an

enumerator M that prints every string of L as follows.

Let Σ be the alphabet of L and w1, w2, w3, w4, ... be all possible

strings over Σ.
The enumerator machine M will do the following for any input from
Σ*.

Repeat the following for i = 1, 2, 3, ...

2.
3.

Run for i steps on each input w1, w2, ..., wi.

4.
5.

If accepts any string wi print the corresponding string wi.

Clearly, if any string w is accepted by , it will be output by the

enumerator M. In the above procedure, one can see that there will
be repeated printing of a string w. It is straightforward to
see .

If L is a recursive set (a set accepted by a TM which halts on all

inputs), then there exists an enumerator M for L which will print the
strings in L in canonical order.

Equivalence Between Turing

Machines and Type 0 Languages
In this section, we prove the equivalence between Type-0
grammars and TM. That is, any recursively enumerable set can be
generated by a type 0 grammar and it can also be recognized by a
TM.

Theorem 10.12

If L is the language generated by an unrestricted grammar G = (N,

T, P, S), then L is recognized by a TM.

Proof. For G, we construct a TM M with two tapes such that on one

tape we put the input w and the other tape is used to
derive w using P. Each time a rule from P is applied, compare the
two tapes for acceptance or rejection of w. Initially put w on one
tape. Then, M initially places S on the second tape.
Nondeterministically select a rule S → α from P, replace S by α on
the second tape. Now compare the tapes, if they agree
accept w. Otherwise, from the present string α, choose a
location ‘i’ nondeterministically such that β is a subword occurring
in α from position i. Choose a rule β → γ again
nondeterministically. Apply to α, by inserting γ at the position of β.
Now, let the present tape content be α1. If α1 = w, then
accept w, otherwise continue the procedure.

Theorem 10.13

If L is accepted by a TM M, then there exists an unrestricted

grammar generating L.

Proof. Let L be accepted by a TM M = (K, Σ, Γ, q0, δ, F).

Then, G is constructed as follows. Let G = (N, Σ, P, S1) where N =
((Σ ∪ {ε} × Γ) ∪ {S1, S2, S3}). P consists of the following rules:

S1 → q0S2
2.
3.

S2 → (a, a)S2 for each a ∊ T.

That is, G produces every time two copies of symbols from Σ.

5.
6.

S2 → S3

7.
8.

S3 → (ε, ε)S3

9.
10.

S3 → ε

11.
12.

q(a, X) → (a, Y)p

13.

if δ(q, X) = (p, Y, R) for every a in Σ ∪ {ε}, each q ∊ Q, X, Y ∊ Γ.

This rule simulates the action of M on the second component of the
symbols (α, β).
14.
15.

(b, Z)q(a, X) → p(b, Z)(a, Y)

16.

if δ(q, X) = (p, Y, L) for each a, b ∊ Σ ∪ {ε}, each q ∊ Q, X, Y, Z ∊

Γ. This rule does the same job as rule 6.

17.
18.

[a, X]q → qaq, q[a, X] → qaq and q → ε for each a ∊ Σ ∪ {ε}, X ∊ Γ

and q ∊ F.

19.

These rules bring out w from the pair if the second component of

the input pair is properly accepted by M.

Hence, we see from the rules that the constructed grammar

“nondeterministically” generates two copies of w in Σ* using rules
(1) and (2) and simulates M through the rules 6 and 7. Rule 8
brings out w if it is accepted by M. The equivalence that
if w ∊ L(G), then w ∊ L(M) and conversely can be proved by
induction on the number of derivation steps and on the number of
moves of the TM. Hence the theorem.

Linear-Bounded Automata
A linear-bounded automata (LBA) is a NTM with a bounded,
finite-input tape. That is, input is placed between two special
symbols ¢ and $.
¢ a1 a2 ... an

But, all the other actions of a TM are allowed except that the
read/write head cannot fall off on left of ¢ and right of $. Also, ¢
and $ are not altered. One can say that this is a restricted version
of TM.

Definition 10.4

A LBA is a 8-tuple M = (K, Σ, Γ, δ, q0, ¢, $, F) where K,Σ, Γ, q0, F

and δ are as in any TM. The language recognized by M is L(M) =
{w|w ∊ Σ* and for some p ∊ F}.

One can show that the family of languages accepted by a LBA is

exactly CSL.

Theorem 10.14

If L is a CSL, then L is accepted by a LBA.

Proof. For L, one can construct a LBA with 2-track tape. The

simulation is done as in Theorem 10.13 where we place w on the
first track and produce sentential forms on the second track, every
time comparing with contents on the first track. If w = ε, the LBA
halts without accepting.

Theorem 10.15

If L is recognized by a LBA, then L is generated by a context-

sensitive grammar.

Proof. Let M = (K, Σ, Γ, q0, δ, ¢, $, F) be a LBA such that L(M) = L.

Then, one can construct a CSG, G = (N, Σ, P, S1) as below.
N consists of nonterminals of the form (a, β) where a ∊ Σ and β is
of the form x or qx or q¢x or x$ or qx$ where q ∊ K, x ∊ Γ.

P consists of the following productions:

S1 → (a, q0¢a)S2;

2.
3.

S1 → (a, q0¢a$);

4.
5.

S2 → (a, a)S2; and

6.
7.

S2 → (a, a$) for all a ∊ Σ.

The above four rules generate a sequence of pairs whose first

components form a terminal string a1a2 ... at and the second
components form the LBA initial ID.

9.
The moves of the LBA are simulated by the following rules in the
second component.

10.
11.

If δ(q, X) = (p, Y, R) we have rules of the form (a, qX)(b, Z) →

(a, Y)(b, pZ)(a, q¢X)(b, Z) → (a, ¢Y)(b, pZ) where a, b ∊
Σ, p, q ∊ K, X, Y, Z ∊ Γ.

12.
13.

If δ(q, X) = (p, Y, L) we have rules of the form (b, Z)(a, qX) →

(b, pZ)(a, Y) (b, Z)(a, qX$) → (b, pZ)(a, Y$) where a, b ∊
Σ, p, q ∊ K, X, Y, Z ∊ Γ.

14.
15.

(a, qβ) → a if q is final, for all a ∊ Σ.

16.
17.

(a, α)b → ab

18.

b(a, α) → ba for any a ∊ Σ and all possible α.

19.
Clearly, all the productions are context-sensitive. The simulation
leads to a stage where the first components emerge as the string
generated if the second components representing LBA ID has a
final state.

ε will not be generated by the grammar whether or not it is in T(M).

We have already seen that ε ∉ L, if L is context-sensitive by

definition. To include ε, we must have a new start symbol S′ and
include S′ → ε, making sure S′ does not appear on the right-hand
side of any production by adding S′ → α where S → α is a rule in the
original CSG with S as the start symbol.

Gödel Numbering
In the construction of counter automata, we have used the concept
of Gödel numbering. Let us consider this topic more formally.

Perhaps the most famous and most important of the several deep
theorems about mathematical logic proved by Kurt Gödel (1931)
was his incompleteness theorem: for any sound logical axiomatic
system that is sufficiently rich to contain the theory of numbers,
there must be number-theoretic statements that can neither be
proved nor disproved in the system. Gödel’s theorem is one of the
most significant discoveries made in the twentieth century, since it
places a limit on the efficacy of mathematical reasoning itself.
During proving his result, Gödel used a numbering scheme which
is called Gödel numbering.

Gödel Numbering of Sequences of Positive

Integers
The first class of objects to which we shall assign Gödel numbers is
the class of finite sequences of positive integers. Let us consider the
primes in the increasing order of magnitude. Prime(0) is 2,
Prime(1) is 3, Prime(2) is 5 and so on.
Definition 10.5

The Gödel number of the finite sequence of positive integers is i 1,

i2, ..., in is

2i1 * 3 i2 * 5 i3 * ... * (Prime (n − 1))in

Example 10.1.

The Gödel number of the sequence 2, 1, 3, 1, 2, 1 is

22 * 31 * 53 * 71 * 112 * 131 = 16, 516, 500.

From the Gödel number, the sequence can be got back. For
example, if the Gödel number is 4200, the sequence is 3, 1, 2, 1.
This is obtained as follows. Divide 4200 by 2 as many times as
possible . So the first number in the sequence is
3. Divide 525 by 3 as many times as possible. is not
divisible by 3. Hence the second number in the sequence is 1.
Divide 175 by 5 as many times as possible. . So third
number in the sequence is 2 and the last number is 1 as 7 is
divisible by 7 once. So the sequence is 3, 1, 2, 1.

Gödel Number of Strings

Once we have a method of finding the Gödel number of a sequence
of positive integers, it is not difficult to find a method of assigning
Gödel numbers to any written piece of English text. First, assign a
positive integer to every distinguishable character including
blanks, small letters, capital letters, and each punctuation sign.
One possible way is as follows: blank 1, small a through z, 2
through 27; capital A through Z, 28 through 53; period 54; comma
55; and so on. The Gödel number of any piece of text is simply the
Gödel number of the corresponding sequence of integers. For
example, the Gödel number of the word ‘book’ is 23 * 316 * 516 * 712.
Gödel Number of Undirected Graphs
An undirected graph consists of nodes and edges. The procedure of
assigning to a graph a suitable Gödel numbering begins with an
assignment of a distinct prime number to each node of the graph.
This assignment is made arbitrarily. We then note that once we
have the information about how many nodes there are and which
nodes are connected to which by edges, we have complete
information about the graph. Supposing the nodes have been
numbered P0, ..., Pn, we take a Gödel number of the graph to be the
number , where, for each i, Ki is the product of all
those such that the node numbered Pi is connected by xj edges
to the node numbered Pj; Thus Ki = 1 if the node numbered Pi is
not connected to any other node.

Consider the following graph, where the nodes are assigned prime
numbers

The Gödel numbering for this graph is

2K0 * 3K1 * 5K2 * 7K3 * 11K5

where

K0 = 32 * 5 * 7

K1 = 22

K2 = 2*7
K4 = 2 * 5 and

K5 = 1

i.e., the Gödel number is given by

23257 * 322 * 527 725 111

From the Gödel numbering, the graph can be obtained. This idea
can be extended to labeled graphs, directed graphs, suitably.

Problems and Solutions

1. For the following two-way infinite TM, construct an equivalent one-way TM.

Solution.
2. Construct a TM M with a 2-dimensional tape. M starts with initial ID:
...

X X X ... X X X

...
i.e., a row of n X’s surrounded by blanks. It has to halt with the final ID.
...

X ... X

...
i.e., above and below the row of n X’s, a row of (n−2) X’s is printed, centrally

Solution.
K = {q0, ..., q11}

Γ = {X, Y, }
δ is given by:
δ(q0, X) = (q1, Y, R)

δ(ql, X) = (q2, Y, U)

δ(q2, ) = (q3, X, D)

δ(q3, Y) = (q4, Y, D)

δ(q4, ) = (q5, X, U)

δ(q5, Y) = (q1, Y, R)

δ(q1, ) = (q6, , L)

δ(q6, Y) = (q7, Y, U)
δ(q7, X) = (q8, , D)

δ(q8, Y) = (q9, Y, D)

δ(q9, X) = (q10, , U)

δ(q10, Y) = (q10, X, L)

δ(q10, ) = (q11, , halt)

Exercises
1 Consider the following TM with two-way infinite tape.
. M = ({q0, q1, q2, q3}, {0, 1}, {0, 1}, δ, q0, {q3})
where δ is given by:
δ(q0, 0) = (q1, 1, R)

δ(q1, 1) = (q2, 0, L)

δ(q2, 0) = (q0, 1, R)

δ(q1, 0) = (q3, 0, R)

Construct an equivalent one-way infinite tape TM.

2 It is desired to design a TM that copies patterns of 1’s and 2’s in accordance with the f
.

1.
First, design the machine as a 2-track machine. Assume that the given pattern is initial
track and that the bottom track is initially blank. Let the machine use the bottom track
in the given pattern with an X as it is copied. Give the state diagram of the machine.
2.
3.
Now, convert the 2-track machine of (a) into an equivalent one-track machine by assig
to ordered pairs of upper and lower-track symbols. Choose this assignment so that the
meets the format specified at the beginning of the problem. How many tape symbols d
machine use? What are their roles?
4.

3 It is desired to design a 2-tape TM that behaves as follows. The machine’s first tape is
. with a pattern of the form

and the second tape is left blank. The machine is to determine which of the given bloc
longest and is to halt with a copy of that block on its second tape. The original pattern
unchanged on the first tape. The machine is to halt scanning the 0 to the right of the gi
first tape and the 0 to the left of the block formed on the second tape.
Design an appropriate machine, using the symbol alphabet {0, 1} for each tape. Descr
graphically, using the same conventions used for ordinary TM, except that: (1) the sym
written on the machine’s first and second tapes are to be represented by symbol pairs o
the case of two-track machines; and (2) each state is to be labeled with a pair of direct
form to indicate the directions that the machine is to move on its first and second ta
that state. Each of D1 and D2 may be either L, R, or –, where the symbol − indicates th
not shift the tape head in question.

4 Let M be any TM that operates on a doubly infinite tape. Show that there exists anothe
. machine that duplicates each of M’s computations in at most half as many steps as
initial and final tape patterns are properly encoded.
Describe a typical step in ’s computation. Hint: Let the squares of M’s tape be repre
of ’s tape according to the following scheme.

M’s tape: −5 −4 −3 −2 −1 0 1 2 3 4

−11 −9 −7 −5 −3 −1 1 3 5 7

’s tape: −10 −8 −6 −4 −2 0 2 4 6 8

−9 −7 −5 −3 −1 1 3 5 7 9

5 Construct a TM with two-dimensional tape which gives the following output for the g
. 1.
2.
Output is an array of X’s surrounded by blanks.
3.
4.

6 Give type 0 grammar for the TM,

. 1.
given in Exercise 1.
2.
3.
given in Exercise 7, Chapter 9.
4.

1.
Turing machines with two-way infinite tapes
2.
3.
Multitape TM
4.
5.
Multihead TM
6.
7.
Non-deterministic TMs
8.
9.
Turing machines with 2-dimensional tapes
10.

Two-Way Infinite Tape TM

That a one-way TM M0 can be simulated by a two-way TM MD can

Theorem 10.1

For any TTM MD = (K, Σ, Γ, δ, q0, F) there exists an equivalent TM

MD.

Proof. The tape of the MD at any instance is of the form

... a−2 a−1 a0 a1 a2 ... an

where a0 is the symbol in the cell scanned by MD initially. M0 can

represent this situation by two tracks:

a0 a1 a2 ...

# a−1 a−2 ...

When MD is to the right of a0, the simulation is done on the upper
track.

When MD is to the left of a0, the simulation is done in M0 on the

lower track.

The initial configuration would be:

a0 a1 ... an

M0 = (K′, Σ′, Γ′, δ′, , F′) where

K′ = { }∪ (K × {1, 2})

Σ′ = Σ×{ }

Γ′ = Γ × (Γ ∪{#})

F′ = {[q, 1], [q, 2]|q ∊ F}

δ′ is defined as follows:

If δ (q, a)= (q′, c, L/R) and if the head of MD is to the right of a0 we
have: δ([q, 1], [a, b]) = ([q′, 1], [c, b], L/R); simulation is done on the
upper track 1.

If MD is to the left of the initial position, simulation is done on the

lower track.
If δ(q, b) = (q′, c, L/R)

δ′ ([q, 2], [a, b]) = ([q′, 2], [a, c], R/L)

The initial move will be:

if δ(q0, a0) = (q, A, R).

if δ(q0, a0) = (q, A, L).

When reading the leftmost symbol MD behaves as follows:

If δ(q, a) = (p, A, R)

δ′([q, 1/2], [a, #]) = ([p, 1], [A, #], R).

If δ(q, a) = (p, A, L)

δ′([q, 1/2], [a, #]) = ([p, 2], [A, #], R).

while simulating a move when MD is to the left of the initial

position, M0 does it in the lower track always moving in a direction
opposite to that of MD. If MD reaches an accepting
state qf, M0 reaches [qf, 1] or [qf, 2] and accepts the input.

Multi-tape TM
Definition 10.1

A multi-tape TM is a TM, with n tapes each having a separate tape

Suppose we have a 3-tape TM (Figure 10.1).

Figure 10.1. TM with three tapes

The symbols scanned by the heads are A, B and C respectively.

Then, the mapping will be of the form:

δ(q, A, B, C) = (q′, (A′, L), (B′, R), (C′, R))

The state changes to q.

Theorem 10.2
A multi-tape TM can be simulated by a single tape TM.

Proof. Let M = (K, Σ, Γ, δ, q0, F) be a k-tape TM. It can be

simulated by a single tape TM M′ having 2k tracks. Odd numbered
tracks contain the contents of M’s tapes.

Table 10.2. TM with one tape having six tracks

... A ...

... X ...

... ... B

... ... X

C ... ...

X ... ...

Even-numbered tracks contain the blank symbols excepts in one

position where it has a marker X. This specifies the position of the
head in the corresponding M’s tape. The situation in Figure 10.1 is
represented by a 6-track TM as given in Figure 10.2.

To simulate a move of the multi-tape TM M, the single tape TM M′

roughly takes steps. If M reaches a final

state, M′ accepts and halts.

An off-line TM is a multi-tape TM with a read only input tape. In

this tape, input w is placed with end markers as ¢w$ and symbols
cannot be rewritten on this tape.

Multi-head TM
Definition 10.2

A multi-head TM is a single tape TM having k heads reading

symbols on the same tape. In one step, all the heads sense the
scanned symbols and move or write independently.

Theorem 10.3

A multi-head TM M can be simulated by a single head TM M′.

Proof. Let M have k heads. Then, M′ will have k + 1 tracks on a

δ: K × Γ → (K × Γ × {L, R})

where δ is a mapping from K × Γ to the power set of K × Γ × {L, R}.

The computation of any input will be in several directions and the
input is accepted if there is at least one sequence of moves, which
accepts it.

A NTM is N = (K, Σ, Γ, δ, q0, F) where K, Σ, Γ, q0, F as in the

definition of TM. δ is defined as above.

Theorem 10.4

Every NTM can be simulated by a deterministic TM (basic model).

Proof. The computation of a NTM is a tree whose branches

Computation of a NTM ‘N’ on any input w is represented as a tree.

Using a multi-tape TM one can simulate N on a given input. For

Suppose every node in the tree has at most b children. Let every

symbols on address tape are all used;




rejecting configurations encountered; and




non-deterministic choice is not a valid choice.

Once the present branch is aborted, replace the string on the

address tape with the next string in canonical order. Simulate this
branch of the NTM as before.

Restricted Turing Machines

In this section we can discuss some more variations of TMs, which
are in the form of restrictions. For example, we have an offline TM
that has a restriction on the input tape. One can view the
development of TM from finite state automata in a hierarchical
way. When viewed as language recognizers FSA, PDA models could
not recognize some languages, whereas a TM can do so. One can
view the tape of the basic TM as input tape, output tape, and
processing space. For example, a PDA is equivalent to a NTM with
input tape, resembling the input tape of the PDA, the storage tape
resembling the stack of the PDA. Now the processing of NTM can
simulate the processing of the PDA. Hence, any computing can be
simulated by a TM. But there are some standard ways a TM can be
restricted leading to multi-stack TMs, counter machines, etc,
without losing out on accepting power.

Definition 10.3

A deterministic TM with read only input and two storage tape is

called a deterministic two stack Turing machine (DTSTM). When
the head of the DTSTM tries to move left on the tapes, a blank
symbol will be printed.

One can easily simulate a TM with a DTSTM. At any point of time,

Theorem 10.5

There exists a DTSTM that simulates the basic TM on any given

input.

The next variant is a ‘counter’ machine. A ‘counter’ machine can

store finite number of integers each counter storing a number. The
counters can be either increased or decreased and cannot cross a
‘stack’ or ‘counter’ symbol ‘Z.’ In other words, it is a machine with
stacks having only two stack symbols Z and (blank). Every stack
will have Z as its initial symbol. A stack may hold a string of the
form , i ≥ 0, indicating that the stack holds an integer i in it. This
stack can be increased or decreased by moving the stack head up
or down. A counter machine with 2-stacks is illustrated in the
following figure.

Theorem 10.6

For a basic TM, there exists an equivalent 4-counter TM.

Proof. The equivalence is shown between a 2-stack TM (DTSTM)

and a 4-counter machine. We have already seen that a DTSTM
and basic TM are equivalent.

We now see how to simulate each stack with two counters.

k = in + tin−1 + t2in−2 + ... + tn−1i1.

For example, if the number of stack symbols used is 3, and an

integer for the stack content X2X3X1X2 will be:
k = 2 + 4 + 42.3 + 43.2 = 172.

Suppose Xr is to be put on the top of the stack, then the new

If it is a clear move of the stack, Xik is to be cleared. Then k has to

Now by the above exercise, one is able to identify the stack symbol
on the stack from the two counters thus designed. That
is k mod t is the index in and hence, Xin is the top symbol of the
stack.

The above theorem can be improved and hence, we have the

following theorems.

Theorem 10.7

For a basic TM, there exists an equivalent 3-counter TM.

Proof. The idea of the simulation is similar to the previous theorem.

Instead of having two counters for adjusting the two counters that
correspond to two stacks, one common counter is used to adjust
the operation of pop, push or change of the stack symbols.
Theorem 10.8

For a basic TM, there exists an equivalent 2-counter TM.

Proof. The simulation is now done using the previous theorem.

To increment say i, one has to multiply m by 2. This can be done

Finally to decrease i, j, k, divide m by 2, 3, 5, respectively. This

exercise is also similar to the previous one except that the machine
will halt whenever m is not divisible by a constant by which we are
dividing.

A TM can be restricted by reducing the tape alphabet or by

Theorem 10.9

There exists a TM with one tape and tape alphabet {0, 1, } to
recognize any recursively enumerable language L over {0, 1}.
Proof. Let M = (K, {0, 1}, Γ, δ, q0, F) be a TM recognizing L. Now
the tape alphabet Γ can be anything. Our aim is to construct an
equivalent TM with Γ = {0, 1, }. For that we encode each symbol of
Γ. Suppose Γ has ‘t’ symbols. We use binary codes to code each
symbol of Γ by ‘k’ bits where 2k − 1 < t < 2k.

One can see from the previous result that even if the input for L is
not over {0, 1}, the above TM construction will work, because the
input of L over some other alphabet will be now coded and placed
as input for a TM with tape alphabet {0, 1, }.

Theorem 10.10

Any TM can be simulated by an offline TM having one storage tape

with two symbols 0 and 1 where 0 indicates blank. A blank (0) can
be retained as 0 or replaced by 1 but a ‘1’ cannot be rewritten at a
cell with ‘0’.

Turing Machines as Enumerators

The enumerator machine has an input tape, which is blank

Theorem 10.11

A language L is recursively enumerable if and only if there exists

an enumerator M such that G(M) = L.
Proof. Let M be an enumerator such that L = G(M). To show that
there exists a TM recognizing L. Let w be an input for .
Perform the following two steps on w.

Run M and compare each output string of M with w.

2.
3.

If w appears on the output tape of M, accept w.

That is, accepts only those strings that appear on the output
tape of M. Hence .

Conversely let be a TM such that . We construct an

enumerator M that prints every string of L as follows.

Let Σ be the alphabet of L and w1, w2, w3, w4, ... be all possible

strings over Σ.

The enumerator machine M will do the following for any input from

Σ*.

Repeat the following for i = 1, 2, 3, ...

2.
3.

Run for i steps on each input w1, w2, ..., wi.

4.
5.

If accepts any string wi print the corresponding string wi.

Clearly, if any string w is accepted by , it will be output by the

enumerator M. In the above procedure, one can see that there will
be repeated printing of a string w. It is straightforward to
see .

If L is a recursive set (a set accepted by a TM which halts on all

inputs), then there exists an enumerator M for L which will print the
strings in L in canonical order.

Equivalence Between Turing

Theorem 10.12

If L is the language generated by an unrestricted grammar G = (N,

T, P, S), then L is recognized by a TM.
Proof. For G, we construct a TM M with two tapes such that on one
tape we put the input w and the other tape is used to
derive w using P. Each time a rule from P is applied, compare the
two tapes for acceptance or rejection of w. Initially put w on one
tape. Then, M initially places S on the second tape.
Nondeterministically select a rule S → α from P, replace S by α on
the second tape. Now compare the tapes, if they agree
accept w. Otherwise, from the present string α, choose a
location ‘i’ nondeterministically such that β is a subword occurring
in α from position i. Choose a rule β → γ again
nondeterministically. Apply to α, by inserting γ at the position of β.
Now, let the present tape content be α1. If α1 = w, then
accept w, otherwise continue the procedure.

Theorem 10.13

If L is accepted by a TM M, then there exists an unrestricted

grammar generating L.

Proof. Let L be accepted by a TM M = (K, Σ, Γ, q0, δ, F).

Then, G is constructed as follows. Let G = (N, Σ, P, S1) where N =
((Σ ∪ {ε} × Γ) ∪ {S1, S2, S3}). P consists of the following rules:

S1 → q0S2

2.
3.

S2 → (a, a)S2 for each a ∊ T.

4.
That is, G produces every time two copies of symbols from Σ.

5.
6.

S2 → S3

7.
8.

S3 → (ε, ε)S3

9.
10.

S3 → ε

11.
12.

q(a, X) → (a, Y)p

13.

if δ(q, X) = (p, Y, R) for every a in Σ ∪ {ε}, each q ∊ Q, X, Y ∊ Γ.

This rule simulates the action of M on the second component of the
symbols (α, β).

14.
15.

(b, Z)q(a, X) → p(b, Z)(a, Y)

16.
if δ(q, X) = (p, Y, L) for each a, b ∊ Σ ∪ {ε}, each q ∊ Q, X, Y, Z ∊
Γ. This rule does the same job as rule 6.

17.
18.

[a, X]q → qaq, q[a, X] → qaq and q → ε for each a ∊ Σ ∪ {ε}, X ∊ Γ

and q ∊ F.

19.

These rules bring out w from the pair if the second component of

the input pair is properly accepted by M.

Hence, we see from the rules that the constructed grammar

Linear-Bounded Automata
A linear-bounded automata (LBA) is a NTM with a bounded,
finite-input tape. That is, input is placed between two special
symbols ¢ and $.

¢ a1 a2 ... an

Definition 10.4
A LBA is a 8-tuple M = (K, Σ, Γ, δ, q0, ¢, $, F) where K,Σ, Γ, q0, F
and δ are as in any TM. The language recognized by M is L(M) =
{w|w ∊ Σ* and for some p ∊ F}.

One can show that the family of languages accepted by a LBA is

exactly CSL.

Theorem 10.14

If L is a CSL, then L is accepted by a LBA.

Proof. For L, one can construct a LBA with 2-track tape. The

Theorem 10.15

If L is recognized by a LBA, then L is generated by a context-

sensitive grammar.

Proof. Let M = (K, Σ, Γ, q0, δ, ¢, $, F) be a LBA such that L(M) = L.

Then, one can construct a CSG, G = (N, Σ, P, S1) as below.

N consists of nonterminals of the form (a, β) where a ∊ Σ and β is

of the form x or qx or q¢x or x$ or qx$ where q ∊ K, x ∊ Γ.

P consists of the following productions:

1.
S1 → (a, q0¢a)S2;

2.
3.

S1 → (a, q0¢a$);

4.
5.

S2 → (a, a)S2; and

6.
7.

S2 → (a, a$) for all a ∊ Σ.

The above four rules generate a sequence of pairs whose first

components form a terminal string a1a2 ... at and the second
components form the LBA initial ID.

The moves of the LBA are simulated by the following rules in the
second component.

10.
11.

If δ(q, X) = (p, Y, R) we have rules of the form (a, qX)(b, Z) →

(a, Y)(b, pZ)(a, q¢X)(b, Z) → (a, ¢Y)(b, pZ) where a, b ∊
Σ, p, q ∊ K, X, Y, Z ∊ Γ.
12.
13.

If δ(q, X) = (p, Y, L) we have rules of the form (b, Z)(a, qX) →

(b, pZ)(a, Y) (b, Z)(a, qX$) → (b, pZ)(a, Y$) where a, b ∊
Σ, p, q ∊ K, X, Y, Z ∊ Γ.

14.
15.

(a, qβ) → a if q is final, for all a ∊ Σ.

16.
17.

(a, α)b → ab

18.

b(a, α) → ba for any a ∊ Σ and all possible α.

19.

Clearly, all the productions are context-sensitive. The simulation

leads to a stage where the first components emerge as the string
generated if the second components representing LBA ID has a
final state.

ε will not be generated by the grammar whether or not it is in T(M).

We have already seen that ε ∉ L, if L is context-sensitive by

Gödel Numbering
In the construction of counter automata, we have used the concept
of Gödel numbering. Let us consider this topic more formally.

Gödel Numbering of Sequences of Positive

Definition 10.5

The Gödel number of the finite sequence of positive integers is i 1,

i2, ..., in is

2i1 * 3 i2 * 5 i3 * ... * (Prime (n − 1))in

Example 10.1.

The Gödel number of the sequence 2, 1, 3, 1, 2, 1 is

22 * 31 * 53 * 71 * 112 * 131 = 16, 516, 500.

Gödel Number of Strings

Gödel Number of Undirected Graphs

An undirected graph consists of nodes and edges. The procedure of
assigning to a graph a suitable Gödel numbering begins with an
assignment of a distinct prime number to each node of the graph.
This assignment is made arbitrarily. We then note that once we
have the information about how many nodes there are and which
nodes are connected to which by edges, we have complete
information about the graph. Supposing the nodes have been
numbered P0, ..., Pn, we take a Gödel number of the graph to be the
number , where, for each i, Ki is the product of all
those such that the node numbered Pi is connected by xj edges
to the node numbered Pj; Thus Ki = 1 if the node numbered Pi is
not connected to any other node.

Consider the following graph, where the nodes are assigned prime
numbers

The Gödel numbering for this graph is

2K0 * 3K1 * 5K2 * 7K3 * 11K5

where

K0 = 32 * 5 * 7

K1 = 22

K2 = 2*7

K4 = 2 * 5 and

K5 = 1

i.e., the Gödel number is given by

23257 * 322 * 527 725 111

From the Gödel numbering, the graph can be obtained. This idea
can be extended to labeled graphs, directed graphs, suitably.

Problems and Solutions

1. For the following two-way infinite TM, construct an equivalent one-way TM.

Solution.

2. Construct a TM M with a 2-dimensional tape. M starts with initial ID:

...

X X X ... X X X

...
i.e., a row of n X’s surrounded by blanks. It has to halt with the final ID.
...

X ... X
X ... X

X ... X

...
i.e., above and below the row of n X’s, a row of (n−2) X’s is printed, centrally

Solution.
K = {q0, ..., q11}

Γ = {X, Y, }
δ is given by:
δ(q0, X) = (q1, Y, R)

δ(ql, X) = (q2, Y, U)

δ(q2, ) = (q3, X, D)

δ(q3, Y) = (q4, Y, D)

δ(q4, ) = (q5, X, U)

δ(q5, Y) = (q1, Y, R)

δ(q1, ) = (q6, , L)

δ(q6, Y) = (q7, Y, U)

δ(q7, X) = (q8, , D)

δ(q8, Y) = (q9, Y, D)

δ(q9, X) = (q10, , U)

δ(q10, Y) = (q10, X, L)

δ(q10, ) = (q11, , halt)

Exercises
1 Consider the following TM with two-way infinite tape.
. M = ({q0, q1, q2, q3}, {0, 1}, {0, 1}, δ, q0, {q3})
where δ is given by:
δ(q0, 0) = (q1, 1, R)

δ(q1, 1) = (q2, 0, L)

δ(q2, 0) = (q0, 1, R)

δ(q1, 0) = (q3, 0, R)

Construct an equivalent one-way infinite tape TM.

2 It is desired to design a TM that copies patterns of 1’s and 2’s in accordance with the f
.

3 It is desired to design a 2-tape TM that behaves as follows. The machine’s first tape is
. with a pattern of the form

M’s tape: −5 −4 −3 −2 −1 0 1 2 3 4

−11 −9 −7 −5 −3 −1 1 3 5 7

’s tape: −10 −8 −6 −4 −2 0 2 4 6 8

−9 −7 −5 −3 −1 1 3 5 7 9

5 Construct a TM with two-dimensional tape which gives the following output for the g
. 1.

2.
Output is an array of X’s surrounded by blanks.
3.
4.
5.

6 Give type 0 grammar for the TM,

. 1.
given in Exercise 1.
2.
3.
given in Exercise 7, Chapter 9.
4.

1.
Turing machines with two-way infinite tapes
2.
3.
Multitape TM
4.
5.
Multihead TM
6.
7.
Non-deterministic TMs
8.
9.
Turing machines with 2-dimensional tapes
10.

Two-Way Infinite Tape TM

That a one-way TM M0 can be simulated by a two-way TM MD can

Theorem 10.1

For any TTM MD = (K, Σ, Γ, δ, q0, F) there exists an equivalent TM

MD.

Proof. The tape of the MD at any instance is of the form

... a−2 a−1 a0 a1 a2 ... an

where a0 is the symbol in the cell scanned by MD initially. M0 can

represent this situation by two tracks:

a0 a1 a2 ...

# a−1 a−2 ...

When MD is to the right of a0, the simulation is done on the upper

track.

When MD is to the left of a0, the simulation is done in M0 on the

lower track.

The initial configuration would be:

a0 a1 ... an

M0 = (K′, Σ′, Γ′, δ′, , F′) where

K′ = { }∪ (K × {1, 2})

Σ′ = Σ×{ }

Γ′ = Γ × (Γ ∪{#})

F′ = {[q, 1], [q, 2]|q ∊ F}

δ′ is defined as follows:

If δ (q, a)= (q′, c, L/R) and if the head of MD is to the right of a0 we
have: δ([q, 1], [a, b]) = ([q′, 1], [c, b], L/R); simulation is done on the
upper track 1.

If MD is to the left of the initial position, simulation is done on the

lower track.

If δ(q, b) = (q′, c, L/R)

δ′ ([q, 2], [a, b]) = ([q′, 2], [a, c], R/L)

The initial move will be:

if δ(q0, a0) = (q, A, R).
if δ(q0, a0) = (q, A, L).

When reading the leftmost symbol MD behaves as follows:

If δ(q, a) = (p, A, R)

δ′([q, 1/2], [a, #]) = ([p, 1], [A, #], R).

If δ(q, a) = (p, A, L)

δ′([q, 1/2], [a, #]) = ([p, 2], [A, #], R).

while simulating a move when MD is to the left of the initial

position, M0 does it in the lower track always moving in a direction
opposite to that of MD. If MD reaches an accepting
state qf, M0 reaches [qf, 1] or [qf, 2] and accepts the input.

Multi-tape TM
Definition 10.1

A multi-tape TM is a TM, with n tapes each having a separate tape

Suppose we have a 3-tape TM (Figure 10.1).

Figure 10.1. TM with three tapes

The symbols scanned by the heads are A, B and C respectively.

Then, the mapping will be of the form:

δ(q, A, B, C) = (q′, (A′, L), (B′, R), (C′, R))

The state changes to q.

Theorem 10.2

A multi-tape TM can be simulated by a single tape TM.

Proof. Let M = (K, Σ, Γ, δ, q0, F) be a k-tape TM. It can be

simulated by a single tape TM M′ having 2k tracks. Odd numbered
tracks contain the contents of M’s tapes.

Table 10.2. TM with one tape having six tracks

... A ...

... X ...

... ... B
... ... X

C ... ...

X ... ...

Even-numbered tracks contain the blank symbols excepts in one

position where it has a marker X. This specifies the position of the
head in the corresponding M’s tape. The situation in Figure 10.1 is
represented by a 6-track TM as given in Figure 10.2.

To simulate a move of the multi-tape TM M, the single tape TM M′

roughly takes steps. If M reaches a final

state, M′ accepts and halts.

An off-line TM is a multi-tape TM with a read only input tape. In

this tape, input w is placed with end markers as ¢w$ and symbols
cannot be rewritten on this tape.
Multi-head TM
Definition 10.2

A multi-head TM is a single tape TM having k heads reading

symbols on the same tape. In one step, all the heads sense the
scanned symbols and move or write independently.

Theorem 10.3

A multi-head TM M can be simulated by a single head TM M′.

Proof. Let M have k heads. Then, M′ will have k + 1 tracks on a

Non-deterministic Turing Machine (NTM)

One can define a NTM by the following transition function:

δ: K × Γ → (K × Γ × {L, R})

where δ is a mapping from K × Γ to the power set of K × Γ × {L, R}.

The computation of any input will be in several directions and the
input is accepted if there is at least one sequence of moves, which
accepts it.

A NTM is N = (K, Σ, Γ, δ, q0, F) where K, Σ, Γ, q0, F as in the

definition of TM. δ is defined as above.
Theorem 10.4

Every NTM can be simulated by a deterministic TM (basic model).

Proof. The computation of a NTM is a tree whose branches

Computation of a NTM ‘N’ on any input w is represented as a tree.

Using a multi-tape TM one can simulate N on a given input. For

Let us see how the implementation works on a DTM with the tapes.
There is an input tape containing input which is never altered.
Second tape will be a simulation tape which contains a copy of N’s
tape content on some branch of its non-deterministic computation.
The third tape keeps track of the location of the DTM in NTM’s
computation tree. The three tapes may be called as input tape,
simulation tape, and address tape.
Suppose every node in the tree has at most b children. Let every
node in the tree has address, which is a string over the alphabet
Σb = {1, 2,..., b} (say). To obtain a node with address 145, start at
the root going to its child numbered 1, move to its 4th child and
then move to the nodes that corresponds to its 5th child. Ignore
addresses that are meaningless. Then, in a breath-first manner
check the nodes (configurations) in canonical order as ε, 1, 2,
3,..., b, 11, 12, 13,...,1b, 21, 22,...,2b,...,111, 112, ... (if they exist).
Then, DTM on input w = a1 ... an works as follows. Place w on the
input tape and the others are empty. Copy the contents of the input
tape to simulation tape. Then, simulate NTM’s one non-
deterministic branch on the simulation tape. On each choice,
consult the address tape for the next move. Accept if the accepting
configuration is reached. Otherwise abort this branch of simulation.
The abortion will take place for the following reasons.

symbols on address tape are all used;




rejecting configurations encountered; and




non-deterministic choice is not a valid choice.

Once the present branch is aborted, replace the string on the

address tape with the next string in canonical order. Simulate this
branch of the NTM as before.
Two-Dimensional TM
The TM can have 2-dimensional tapes. When the head is scanning
a symbol, it can move left, right, up or down. The smallest
rectangle containing the non-blank portion is m × n, then it
has m rows and n columns. A 1-dimensional TM, which tries to
simulate this 2-dimensional TM will have two tapes. On one tape,
this m rows of n symbols each will be represented as m blocks of
size n each separated by markers. The second tape is used as
scratch tape. When the 2-dimensional TM’s head moves left or
right, it is simulated in a block of the 1-dimensional TM. When the
2-dimensional TM’s head moves up or down, the 1-dimensional
TM’s head moves to the previous block or the next block. To move
to the correct position in that block, the second tape is used.
If m or n increases, number of blocks or the size of the blocks is
increased.

Restricted Turing Machines

Definition 10.3

A deterministic TM with read only input and two storage tape is

called a deterministic two stack Turing machine (DTSTM). When
the head of the DTSTM tries to move left on the tapes, a blank
symbol will be printed.

One can easily simulate a TM with a DTSTM. At any point of time,

Theorem 10.5

There exists a DTSTM that simulates the basic TM on any given

input.

The next variant is a ‘counter’ machine. A ‘counter’ machine can

Theorem 10.6

For a basic TM, there exists an equivalent 4-counter TM.

Proof. The equivalence is shown between a 2-stack TM (DTSTM)
and a 4-counter machine. We have already seen that a DTSTM
and basic TM are equivalent.

We now see how to simulate each stack with two counters.

k = in + tin−1 + t2in−2 + ... + tn−1i1.

For example, if the number of stack symbols used is 3, and an

integer for the stack content X2X3X1X2 will be:

k = 2 + 4 + 42.3 + 43.2 = 172.

Suppose Xr is to be put on the top of the stack, then the new

If it is a clear move of the stack, Xik is to be cleared. Then k has to

Now by the above exercise, one is able to identify the stack symbol
on the stack from the two counters thus designed. That
is k mod t is the index in and hence, Xin is the top symbol of the
stack.

The above theorem can be improved and hence, we have the

following theorems.

Theorem 10.7

For a basic TM, there exists an equivalent 3-counter TM.

Proof. The idea of the simulation is similar to the previous theorem.

Instead of having two counters for adjusting the two counters that
correspond to two stacks, one common counter is used to adjust
the operation of pop, push or change of the stack symbols.

Theorem 10.8

For a basic TM, there exists an equivalent 2-counter TM.

Proof. The simulation is now done using the previous theorem.
One has to simulate the 3-counters by using 2-counters to get an
equivalent result. Let i, j, and k be the numbers in the three
counters. We have to represent these by a unique integer. Let m =
2i3j5k be the integer. Put this in one counter.

To increment say i, one has to multiply m by 2. This can be done

Finally to decrease i, j, k, divide m by 2, 3, 5, respectively. This

exercise is also similar to the previous one except that the machine
will halt whenever m is not divisible by a constant by which we are
dividing.

A TM can be restricted by reducing the tape alphabet or by

Theorem 10.9

There exists a TM with one tape and tape alphabet {0, 1, } to
recognize any recursively enumerable language L over {0, 1}.

Proof. Let M = (K, {0, 1}, Γ, δ, q0, F) be a TM recognizing L. Now

Theorem 10.10

Any TM can be simulated by an offline TM having one storage tape

with two symbols 0 and 1 where 0 indicates blank. A blank (0) can
be retained as 0 or replaced by 1 but a ‘1’ cannot be rewritten at a
cell with ‘0’.

Turing Machines as Enumerators

The enumerator machine has an input tape, which is blank

Theorem 10.11

A language L is recursively enumerable if and only if there exists

an enumerator M such that G(M) = L.

Proof. Let M be an enumerator such that L = G(M). To show that

there exists a TM recognizing L. Let w be an input for .
Perform the following two steps on w.
1.

Run M and compare each output string of M with w.

2.
3.

If w appears on the output tape of M, accept w.

That is, accepts only those strings that appear on the output
tape of M. Hence .

Conversely let be a TM such that . We construct an

enumerator M that prints every string of L as follows.

Let Σ be the alphabet of L and w1, w2, w3, w4, ... be all possible

strings over Σ.

The enumerator machine M will do the following for any input from

Σ*.

Repeat the following for i = 1, 2, 3, ...

2.
3.

Run for i steps on each input w1, w2, ..., wi.

4.
5.

If accepts any string wi print the corresponding string wi.

Clearly, if any string w is accepted by , it will be output by the

enumerator M. In the above procedure, one can see that there will
be repeated printing of a string w. It is straightforward to
see .

If L is a recursive set (a set accepted by a TM which halts on all

inputs), then there exists an enumerator M for L which will print the
strings in L in canonical order.

Equivalence Between Turing

Theorem 10.12

If L is the language generated by an unrestricted grammar G = (N,

T, P, S), then L is recognized by a TM.

Proof. For G, we construct a TM M with two tapes such that on one

Theorem 10.13

If L is accepted by a TM M, then there exists an unrestricted

grammar generating L.

Proof. Let L be accepted by a TM M = (K, Σ, Γ, q0, δ, F).

Then, G is constructed as follows. Let G = (N, Σ, P, S1) where N =
((Σ ∪ {ε} × Γ) ∪ {S1, S2, S3}). P consists of the following rules:

S1 → q0S2

2.
3.

S2 → (a, a)S2 for each a ∊ T.

That is, G produces every time two copies of symbols from Σ.

5.
6.
S2 → S3

7.
8.

S3 → (ε, ε)S3

9.
10.

S3 → ε

11.
12.

q(a, X) → (a, Y)p

13.

if δ(q, X) = (p, Y, R) for every a in Σ ∪ {ε}, each q ∊ Q, X, Y ∊ Γ.

This rule simulates the action of M on the second component of the
symbols (α, β).

14.
15.

(b, Z)q(a, X) → p(b, Z)(a, Y)

16.

if δ(q, X) = (p, Y, L) for each a, b ∊ Σ ∪ {ε}, each q ∊ Q, X, Y, Z ∊

Γ. This rule does the same job as rule 6.

17.
18.

[a, X]q → qaq, q[a, X] → qaq and q → ε for each a ∊ Σ ∪ {ε}, X ∊ Γ

and q ∊ F.

19.

These rules bring out w from the pair if the second component of

the input pair is properly accepted by M.

Hence, we see from the rules that the constructed grammar

Linear-Bounded Automata
A linear-bounded automata (LBA) is a NTM with a bounded,
finite-input tape. That is, input is placed between two special
symbols ¢ and $.

¢ a1 a2 ... an

Definition 10.4
A LBA is a 8-tuple M = (K, Σ, Γ, δ, q0, ¢, $, F) where K,Σ, Γ, q0, F
and δ are as in any TM. The language recognized by M is L(M) =
{w|w ∊ Σ* and for some p ∊ F}.

One can show that the family of languages accepted by a LBA is

exactly CSL.

Theorem 10.14

If L is a CSL, then L is accepted by a LBA.

Proof. For L, one can construct a LBA with 2-track tape. The

Theorem 10.15

If L is recognized by a LBA, then L is generated by a context-

sensitive grammar.

Proof. Let M = (K, Σ, Γ, q0, δ, ¢, $, F) be a LBA such that L(M) = L.

Then, one can construct a CSG, G = (N, Σ, P, S1) as below.

N consists of nonterminals of the form (a, β) where a ∊ Σ and β is

of the form x or qx or q¢x or x$ or qx$ where q ∊ K, x ∊ Γ.

P consists of the following productions:

1.
S1 → (a, q0¢a)S2;

2.
3.

S1 → (a, q0¢a$);

4.
5.

S2 → (a, a)S2; and

6.
7.

S2 → (a, a$) for all a ∊ Σ.

The above four rules generate a sequence of pairs whose first

components form a terminal string a1a2 ... at and the second
components form the LBA initial ID.

The moves of the LBA are simulated by the following rules in the
second component.

10.
11.

If δ(q, X) = (p, Y, R) we have rules of the form (a, qX)(b, Z) →

(a, Y)(b, pZ)(a, q¢X)(b, Z) → (a, ¢Y)(b, pZ) where a, b ∊
Σ, p, q ∊ K, X, Y, Z ∊ Γ.
12.
13.

If δ(q, X) = (p, Y, L) we have rules of the form (b, Z)(a, qX) →

(b, pZ)(a, Y) (b, Z)(a, qX$) → (b, pZ)(a, Y$) where a, b ∊
Σ, p, q ∊ K, X, Y, Z ∊ Γ.

14.
15.

(a, qβ) → a if q is final, for all a ∊ Σ.

16.
17.

(a, α)b → ab

18.

b(a, α) → ba for any a ∊ Σ and all possible α.

19.

Clearly, all the productions are context-sensitive. The simulation

leads to a stage where the first components emerge as the string
generated if the second components representing LBA ID has a
final state.

ε will not be generated by the grammar whether or not it is in T(M).

We have already seen that ε ∉ L, if L is context-sensitive by

Gödel Numbering
In the construction of counter automata, we have used the concept
of Gödel numbering. Let us consider this topic more formally.

Gödel Numbering of Sequences of Positive

Definition 10.5

The Gödel number of the finite sequence of positive integers is i 1,

i2, ..., in is

2i1 * 3 i2 * 5 i3 * ... * (Prime (n − 1))in

Example 10.1.

The Gödel number of the sequence 2, 1, 3, 1, 2, 1 is

22 * 31 * 53 * 71 * 112 * 131 = 16, 516, 500.

Gödel Number of Strings

Gödel Number of Undirected Graphs

Consider the following graph, where the nodes are assigned prime
numbers

The Gödel numbering for this graph is

2K0 * 3K1 * 5K2 * 7K3 * 11K5

where

K0 = 32 * 5 * 7

K1 = 22

K2 = 2*7

K4 = 2 * 5 and

K5 = 1

i.e., the Gödel number is given by

23257 * 322 * 527 725 111

From the Gödel numbering, the graph can be obtained. This idea
can be extended to labeled graphs, directed graphs, suitably.

Problems and Solutions

1. For the following two-way infinite TM, construct an equivalent one-way TM.

Solution.

2. Construct a TM M with a 2-dimensional tape. M starts with initial ID:

...

X X X ... X X X

...
i.e., a row of n X’s surrounded by blanks. It has to halt with the final ID.
...

X ... X
X ... X

X ... X

...
i.e., above and below the row of n X’s, a row of (n−2) X’s is printed, centrally

Solution.
K = {q0, ..., q11}

Γ = {X, Y, }
δ is given by:
δ(q0, X) = (q1, Y, R)

δ(ql, X) = (q2, Y, U)

δ(q2, ) = (q3, X, D)

δ(q3, Y) = (q4, Y, D)

δ(q4, ) = (q5, X, U)

δ(q5, Y) = (q1, Y, R)

δ(q1, ) = (q6, , L)

δ(q6, Y) = (q7, Y, U)

δ(q7, X) = (q8, , D)

δ(q8, Y) = (q9, Y, D)

δ(q9, X) = (q10, , U)

δ(q10, Y) = (q10, X, L)

δ(q10, ) = (q11, , halt)

Exercises
1 Consider the following TM with two-way infinite tape.
. M = ({q0, q1, q2, q3}, {0, 1}, {0, 1}, δ, q0, {q3})
where δ is given by:
δ(q0, 0) = (q1, 1, R)

δ(q1, 1) = (q2, 0, L)

δ(q2, 0) = (q0, 1, R)

δ(q1, 0) = (q3, 0, R)

Construct an equivalent one-way infinite tape TM.

2 It is desired to design a TM that copies patterns of 1’s and 2’s in accordance with the f
.

3 It is desired to design a 2-tape TM that behaves as follows. The machine’s first tape is
. with a pattern of the form

M’s tape: −5 −4 −3 −2 −1 0 1 2 3 4

−11 −9 −7 −5 −3 −1 1 3 5 7

’s tape: −10 −8 −6 −4 −2 0 2 4 6 8

−9 −7 −5 −3 −1 1 3 5 7 9

5 Construct a TM with two-dimensional tape which gives the following output for the g
. 1.

2.
Output is an array of X’s surrounded by blanks.
3.
4.
5.

6 Give type 0 grammar for the TM,

. 1.
given in Exercise 1.
2.
3.
given in Exercise 7, Chapter 9.
4.

Skip to content
 Home

 Your O'Reilly
 Profile
 History
 Playlists
 Highlights
 Featured
 Navigating Change
 For Government
 Recommended
 Explore
 All Topics
 Early Releases
 Shared Playlists
 Most Popular Titles
 Resource Centers
 Practice
 Katacoda Scenarios
 Jupyter Notebooks
 Sandboxes
 Kubernetes
 Python
 TensorFlow
 Ubuntu
 Attend
 Certifications
 Newsletters
 Settings
 Support
 Sign Out

Table of Contents for

Introdu
ction
to
Formal
Languag
es,
Automat
a
Theory
and
Computa
tion
CLOSE

Introduction to Formal
Languages, Automata Theory and Computationby Kamala
Krithivasan; R RamaPublished by Pearson India, 2009

. Copyright (01:09 mins)
. About the Authors (01:09 mins)
. Preface (04:36 mins)
. Acknowledgements (01:09 mins)
. 1. Preliminaries (24:09 mins)
. 2. Grammars (44:51 mins)
. 3. Finite State Automata (37:57 mins)
. 4. Finite State Automata: Characterization, Properties, and
Decidability (16:06 mins)
. 5. Finite State Automata with Output and Minimization (17:15
mins)
. 6. Variants of Finite Automata (51:45 mins)
. 7. Pushdown Automata (25:18 mins)
. 8. Context-Free Grammars–Properties and Parsing (42:33
mins)
. 9. Turing Machines (40:15 mins)
. 10. Variations of Turing Machines (28:45 mins)
. 11. Universal Turing Machine and Decidability (39:06 mins)
. 12. Time and Space Complexity (52:54 mins)
. 13. Recent Trends and Applications (73:36 mins)
. 14. New Models of Computation (56:21 mins)
. Multiple Choice Questions (Set I) (05:45 mins)
. Multiple Choice Questions (Set II) (08:03 mins)
. Bibliography (14:57 mins)
. Illustrations (02:18 mins)
 Search in book...





 Toggle Font Controls





PREV Previous Chapter
10. Variations of Turing Machines

NEXT Next Chapter

12. Time and Space Complexity

Chapter 11. Universal Turing
Machine and Decidability
In this chapter, we consider universal turing machine (TM), the
halting problem, and the concept of undecidability.
Encoding and Enumeration of
Turing Machines
The TM is specified by a 6-tuple M = (K, Σ, Γ, δ, q0, F)
(refer Definition 9.1). in Γ is a special symbol. The TM can be
encoded as a binary string. Without loss of generality, we can take
the state set as (q1,q2, ..., qs} where q1 is the initial state and q2 is
the final state. The tape symbol set can be taken as { ,0,1}. A move
of the TM is represented as:


δ(qi, Xj) = (qk, Xr, dl)

Note that we are considering deterministic TMs. This move means
that the TM while reading Xj in state qi, goes to state qk,
prints Xr over Xj and moves left or right as specified by dl. This can
be represented as a binary string 0i10j10k10r10l. Note that 1 ≤ i,
k ≤ s, 1 ≤ j, r ≤ 3, ℓ = 1 or 2. 0i denotes state qi; 0j denotes tape
symbol Xj; j = 1 denotes ;j = 2 denotes 0; j = 3 denotes 1;
0k denotes state qk and 0r denotes symbol Xr. If ℓ = 1, move is to the
left; if ℓ = 2, move is to the right. Thus, each move of the TM can be
represented as a binary string. Suppose the moves of the TM are
given by m1, m2, ..., mn. The encoding of these moves are given by
the binary strings dm1,dm2, ...,dmn. A binary string 111dm1 11dm2 11 ...
11dmn 111 specifies an encoding of the TM moves one separated
from another by two 1’s and the encoding begins with three 1’s and
ends with three 1’s. Note that any permutation of m1, ..., mn also
represents the same TM and hence, different encodings may
represent the same TM.

Enumeration of TMs
The binary encoding of a TM can be looked at as the binary
representation of an integer p. So, we can say that the binary
representation of integer p represents the TM Tp. Thus, we have an
enumeration T1,T2,T3, ..., Tp, ... of TMs. Some of these
representations may not be proper encodings. For example, strings
not beginning with three 1’s are not proper encodings. We take
them as representing TM with no moves and hence accepting the
empty set φ.

Recursive and Recursively

Enumerable Sets
We have seen that the language accepted by a TM is a type 0
language. It is also called a recursively enumerable (RE) set. A TM
accepts a string by going to a final state and halting. It can reject a
string by halting in a non-final state or getting into a loop. A TM
when started on an input may get into a loop and never halt.

We have also seen that a TM corresponds to an effective procedure.

An effective procedure is one where at each step, the next step is
specified. An effective procedure need not halt. For example,

i = 1while true dobegin print i i = i + 1end.

is an effective procedure which will keep on printing integers one

by one and will never halt. An algorithm is an effective procedure
which always halts and gives an answer. Hence, an algorithm
corresponds to a TM which halts on all inputs. While the set of
strings accepted by a TM is called a RE set, the set of strings which
is accepted by a TM which halts on all inputs is called a recursive
set.

Theorem 11.1

The complement of the recursive set is recursive.

Proof. Let L be a recursive set accepted by a TM M which halts on

all inputs.
will be accepted by , given by:

Hence, is recursive.

Theorem 11.2

The union of two recursive sets is recursive.

Proof. Let L1 and L2 be recursive sets accepted by TMs M1 and

M2, respectively. L1 ∪ L2 will be accepted by:

Theorem 11.3

If L1 and L2 are recursively enumerable sets, then L1 ∪ L2 is

recursively enumerable.

Proof. Let L1 and L2 be accepted by TMs M1 and M2 (which need

not always halt), respectively. L1 ∪ L2 will be accepted by:
Theorem 11.4

If L and are both recursively enumerable, then L is recursive.

Proof. Let L be accepted by M and be accepted by . Given w,

w will either be accepted by M or . L can be accepted by a TM M
′ which halts on all inputs as follows:

We have seen that the set of all TMs over the tape alphabets {0,1, }
can be enumerated and the set of strings over {0, 1} can be
enumerated. So, we can talk about the ith TM and the jth string
over {0, 1}. Consider an infinite Boolean matrix.

The entry on the jth row and ith column is 1 if wi is accepted
by Tj and 0 if wi is not accepted by Tj. Consider the diagonal
elements of this infinite Boolean matrix. Take those wi which
correspond to 0 elements in the matrix. Define:


Ld(the diagonal language) = {wi|wi is not accepted by Ti}

Theorem 11.5

Ld is not recursively enumerable.

Proof. Ld = {wi|wi is not accepted by Ti}. Suppose Ld is recursively

enumerable. Then, there exists a TM Tj accepting it. If wj is
accepted by Tj, then by definition of Ld, wj ∉ Ld. But
since Tj accepts Ld, wj ∊ Ld. This is a contradiction. Therefore, Ld is
not RE.

Theorem 11.6

There exists a language which is not recursively enumerable.

We have shown that Ld is not RE. Later, we show that is RE but
not recursive.

It should be noted that for a language L and its complement, three

possibilities exist.

L and are recursive.

3.
4.
L and are not RE.

6.
7.

L is RE but not recursive. is not RE.

The following situation cannot arise as seen in Theorem 11.4.

Universal Turing Machine

A Universal TM is a TM which can simulate any other TM
including itself. Let us denote a universal TM by U. Without loss of
generality, we are considering TMs with tape alphabets {0,1, }. The
encoding of such a TM T is a binary string dT. U has three tapes.
The first tape is presented with dT t; i.e., the encoding of T and the
input t to T. It will be of the form:

111 ... 11 ... ... 111 t

Note that t is a string over {0, 1}. The second tape initially has a 0.
Without loss of generality, we can assume that T has states
{q1, ..., qk} where q1 is the initial state, and q2 is the final state. The
second tape contains the information about the state. If at any
instant T is supposed to be in state qi, U while simulating T will
have 0i in tape 2. Tape 3 is used for simulation. Pictorially, U can
be represented by the following figure.

U first checks whether the prefix of the input is a proper encoding

of a TM. i.e., it checks whether the portion between the first 111 and
the second 111 is divided into blocks by 11 and the code within one
11 and the next 11 is of the form 0i10j10k10r10l, 1 ≤ j, r ≤ 3, ℓ = 1 or 2.
If so, it proceeds. Otherwise, it halts rejecting the input.

If it finds that the string between the first 111 and the second 111 is
a proper encoding of a TM, then it copies the portion of the input
after the second 111 onto the third tape and positions the head of
the third tape on the leftmost non-blank symbol. The second tape
initially contains 0 denoting that the state is q1 initially. At any
instance, the second tape will contain 0i denoting that the current
state is qi. The third tape head will be positioned on the symbol
being read as (Xj say). Then, U will store i and j in its memory and
scan the encoding of T in the first tape looking for a block of the
form 0i10j10k10r10l. If it does not find one, U halts without
accepting. If it finds one, then it replaces 0i in the second tape by
0k, rewrites the symbol Xj scanned by the third tape head to Xr and
moves the third tape head left or right depending on whether l = 1
or 2. It should be noted that U may take more than one step to
simulate one step of T. If at any time T reaches the final state q2,
00 will be the content of tape 2 and U halts accepting the input.
If T halts on t without accepting, U also halts without accepting.
If T when started on t gets into a loop, U also gets into a loop. The
language accepted by U is Lu and consists of strings of the
form dTt where T accepts t.
Even though U has used three tapes, we can have a universal TM
with a single tape simulating U. It is known that there is a
universal TM with a single tape, 5 states, and 7 symbols.

Earlier, we have seen that Ld is not RE. Now, we show is RE but
not recursive.

Theorem 11.7

is recursively enumerable but not recursive.

Proof. It is easy to see is not recursive. If it were recursive, the

complement Ld will be recursive. We have seen that the
complement Ld is not even RE. Hence, is not recursive.

That is RE can be seen as follows: can be accepted by a

multi-tape TM as follows: has w on its input tape. In a
second tape, it generates strings w1,w2,... in canonical order each
time comparing with the input and stops this process when it
finds w = wi, the ith string in the enumeration. Then, it generates
the encoding of Ti on another tape. Next, it calls U as a subroutine
with dTiwi as input. U has three possibilities: (i) It halts without
accepting. In this case, halts without accepting wi; (ii) U halts
accepting . This means, Ti accepts wi. Hence, halts and
accepts wi; and (iii) U gets into a loop. In this case, will not
come out of the subroutine and will not halt.

It is easy to see that accepts a string w = wi if and only if wi is

accepted by Ti. Since we are able to have a TM for , it is RE.

Theorem 11.8

Lu is recursively enumerable but not recursive.

Proof. Since we have a TM for Lu, it is RE.

Next, we show it is not recursive. Suppose Lu were recursive,

then becomes recursive. We can have a TM M for which
halts on all inputs. M works as follows: Given w as input, it
generates strings w1, w2,... until it finds wi = w. i.e., the given string
is the ith string in the enumeration. Then it calls with dTiwi. will
halt either accepting dTiwi or rejecting dTiwi. If it accepts, M accepts
and halts. If it rejects, M rejects and halts. Thus, M accepts , and
always halts. Thus, we have to conclude is recursive. We have
earlier seen that is not recursive. Hence, we arrive at a
contradiction. Therefore, Lu is not recursive.

By the property of a language L and its complement we discussed

earlier, we note that:

Theorem 11.9

is not recursively enumerable.

The Halting Problem

The halting problem for TMs was shown to be undecidable by
Turing. This was a major breakthrough and many problems were
shown to be undecidable after this.

The halting problem for TMs can be stated as follows: Given a TM

in an arbitrary configuration, will it eventually halt? We shall show
that there cannot exist an algorithm which takes as input a TM in a
configuration and tells whether it will halt or not. Thus, we say that
the halting problem is recursively unsolvable or undecidable. It
should be noted that this does not mean that for a particular TM in
a specified configuration, we cannot tell whether it will halt or not.
We may be able to say for this particular instance whether it will
halt or not. We shall see subsequently, what a problem and an
instance of a problem mean.
Theorem 11.10

The halting problem for Turing machines is recursively

undecidable.

Proof 1. The proof is by contradiction. Suppose the halting problem

is decidable. Then, there is an algorithm to solve this and hence a
corresponding TM (which we call as ‘Halt’).

This TM ‘Halt’ takes as an input, an encoding dT of a TM T and

input t. Finally, it halts saying ‘yes’ if T on t halts and ‘no’
if T on t does not halt. Note that ‘Halt’ always halts and says either
‘yes’ or ‘no.’

Now, let us modify this TM ‘Halt’ a little and have a TM ‘copyhalt’ as

follows: The input to ‘copyhalt’ is the encoding dT of a TM. Note
that dT is a binary string. Given dT ‘copyhalt’ makes a copy
of dT as dTdT and calls ‘Halt’. i.e., it calls ‘Halt’ as a subroutine
with dT|dT.
It halts and says ‘yes’ if T on dT halts. It halts and says ‘no’
if T on dT does not halt. If it is possible to have ‘Halt’, it is possible
to have ‘copyhalt’.

Now, modify ‘copyhalt’ a little to get a TM ‘contradict’. We add two

states at the ‘yes’ exit of ‘copyhalt’ to make the machine oscillate.

If it is possible to have ‘copyhalt’, it is possible to have ‘contradict’,

Now, what happens if you give the encoding of ‘contradict’ as input

to ‘contradict’?

There are two possibilities:

1.
dcontradict halts on dcontradict → In this case ‘contradict’ takes the
‘yes’ exit and gets into a loop and never halts.

2.
3.

dcontradict does not halt on dcontradict. In this case ‘contradict’ takes

the ‘no’ exit and halts.

Hence, in both cases we arrive at a contradiction. Therefore, our

assumption that a TM ‘Halt’ exists is not correct. i.e., there cannot
exist an algorithm which solves the halting problem for TMs.

We can present a slightly different argument.

Proof 2. We show that if the halting problem is decidable, we can

have a TM M accepting Ld. M is constructed as follows: Given w as
input, M finds out that w = wi or w is the ith string in the
enumeration. Then, it generates the encoding of the ith TM Ti and
calls ‘Halt’ (which solves the halting problem) with Ti and wi as
input. Halt will come out with one of the following answers.

Ti on wi does not halt. In this case Ti does not accept wi.

Therefore, M halts and accepts w = wi.

2.
3.

Ti on wi halts. In this case, M calls U as a subroutine

with Ti and wi and finds out whether Ti accepts wi or not.
If Ti accepts wi, M halts and rejects wi. If Ti does not
accept wi, M halts and accepts wi. Thus, we find
that M accepts Ld and M always halts. Therefore, we should
conclude Ld is recursive. But, we know that Ld is not even RE.
Hence, the assumption that ‘Halt’ exists is not correct. In other
words, the halting problem is undecidable.

Problems, Instances, and

Languages
A decision problem is a problem for which the answer is ‘yes’ or
‘no.’ For example, satisfiability problem means to find out whether
a Boolean expression is satisfiable or not. We call this as SAT. Also
AMB is the ambiguity problem for CFG – i.e., finding out whether
a CFG is ambiguous or not. A particular CFG is an instance of the
problem. If this particular CFG G is ambiguous, it is an ‘yes’
instance of the problem. If it is not ambiguous, it is a ‘no’ instance
of the problem. Similarly, a particular Boolean expression is an
instance of SAT problem. If it is satisfiable, it is an ‘yes’ instance of
the problem. If it is not satisfiable, it is a ‘no’ instance of the
problem. In contrast to these problems, problems like finding a
Hamiltonian circuit in a graph are called optimization problems.
Whether a graph has a Hamiltonian circuit or not is a decision
problem. Finding one is an optimization problem. We shall
consider more of this concept in the next chapter.

Any instance of a problem can be encoded as a string. A Boolean

expression can be looked at as a string. For example, (x1 + x2)(
1 + 2) can be written as (x1 + x2)(¬ x1 + ¬ x2) – a string from the
alphabet {(,), x, +, ¬, 0,1,..., 9}. A CFG G = (N, T, P, S) can be
looked at as a string over N∪T∪{(,), {,}, →,}. For example, the
grammar ({S}, {a, b}, P, S) where P consists of S → aSb, S → ab can
be looked at as a string ({S}, {a, b}, {S → aSb, S → ab}, S). It is to
be noted that generally integers are represented as decimal or
binary and not as unary. The problem can be re-formulated as one
recognizing the language consisting of the ‘yes’ instances of the
problem.
Lu is the language for the problem ‘Does M accept w?’ It consists of
strings dMw where dM is the encoding of M and M accepts w.

Rice’s Theorem
Let us recall that a language accepted by a TM is called a
recursively enumerable (RE) language. Without loss of generality
we consider TMs over the input alphabet {0, 1} and tape alphabet
{0,1, }.

Let F be a set of RE languages. Each language is over the alphabets

{0, 1}. F is said to be a property of the RE languages. A
language L has the property F, if L ∊ F. For example, F may denote
the set {L|L is recursive}. F is a trivial property if F is empty
or F contains all RE languages. Let LF denote the set {dM|L(M) is
in F} where M is a TM, L(M) is the language accepted by it
and dM is the encoding of M.

Theorem 11.11 (Rice’s Theorem (for recursive index sets))

Any non-trivial property F of the recursively enumerable languages

is undecidable.

Proof. In essence we want to show LF is not recursive.

Without loss of generality we assume that φ is not in F. Otherwise,

we have to consider the family RE – F, where RE is the family of
RE languages.

Since F is a non-trivial property, F must contain at least one L. Let

a specific language L ∊ F and ML be the TM accepting L. Suppose
this property F is decidable, then LF must be recursive. There must
be a TM MF which halts on all inputs which accepts LF. We
use ML and MF to show that Lu is recursive as follows:
Construct a TM M′ as follows given a TM M and input w.

It is straightforward to have an algorithm A to construct M′

given M, w and ML. M′ works as follows: It ignores its input x and
simulates M on w. If M accepts w, it starts ML on x. x will be
accepted by ML if x ∊ L. So, if M accepts w, M′ accepts L. If M does
not accept w, then M′ does not accept any string. L(M′) = φ. Now,
we know that φ ∉ F and L ∊ F. If LF is recursive there should be a
TM MF accepting it and halting on all inputs. Give the encoding
of M′ as input to MF.

If it comes out of ‘yes’ exit, then L(M′) is in F. i.e., L(M′) = L which

can happen only if M accepts w. If it comes out of the ‘no’ exit
then L(M′) is not in F. i.e., L(M′) = φ which happens only if M does
not accept w.

So, depending on whether MF takes the ‘yes’ exit or ‘no’ exit, we

can say whether M accepts w or M does not accept w. i.e., we are
able to have a TM for Lu which halts on all inputs. The structure of
that TM is given in the following figure.
Hence, we see Lu is recursive. But, we have already seen Lu is not
recursive. Hence, there is contradiction. This contradiction is due to
the assumption of the TM MF which halts on all inputs.
Hence, MF cannot exist. Therefore, F is undecidable.

We state the following results which follow from the above

theorem.

Theorem 11.12

The following properties of recursively enumerable sets are not

decidable.

emptiness (Is L empty?)

2.
3.

finiteness (Is L finite?)

4.
5.

regularity (Is L regular?)

6.
7.
context-freedom (Is L context-free?)

8.
9.

nonemptiness (Is L nonempty?)

10.
11.

recursiveness (Is L recursive?)

12.
13.

nonrecursiveness (Is L nonrecursive?)

14.

We found that if F is a non-trivial property, LF is not recursive.

Under what conditions LF is RE is given by another theorem of
Rice. We state this theorem without proof.

Theorem 11.13 (Rice’s theorem for recursively enumerable

index sets)

LF is recursively enumerable if and only if F satisfies the following

three conditions:

If L is in F and L ⊆ L′ for some recursively enumerable language L

′, then L′ is in F (the containment property).
2.
3.

If L is an infinite language in F, then there is a finite subset of L

which is in F.

4.
5.

The set of finite languages in F is enumerable.

Because of the above theorem, we have the following results.

Theorem 11.14

The following properties of recursively enumerable languages are

not recursively enumerable.

L = φ

2.
3.

L = Σ*

4.
5.
L is recursive

6.
7.

L is nonrecursive

8.
9.

L has only one string

10.
11.

L is a regular set

12.
Theorem 11.15

The following properties of recursively enumerable languages are

recursively enumerable.

L ≠ φ

2.
3.

L contains at least eight members

4.
5.

w is in L for some fixed w (membership problem)

Reduction of Problems to Show

Undecidability
In the last section, we have shown that any non-trivial problem of
RE sets is undecidable. This theorem helps to prove many
problems to be undecidable. Consider the problem: ‘Will a TM
started on a blank tape halt?’ This is equivalent to the problem
does ε ∊ T(M)? and hence undecidable by Rice’s theorem. But for
problems about TMs, there is no simple theorem which can be
applied. Each problem has to be looked at separately.

Example 11.1.

Consider the question, “Does machine T ever print the

symbol S0 when started on tape t?” This is undecidable. There does
not exist a TM which will take dTt as input and say whether the
symbol S0 will be printed or not, for all (T, t).

For, we may take each machine T which does not ordinarily use the
symbol S0 and alter the machine so that before it halts, ‘it
prints S0 and halts.’ Then, the ‘printing problem’ for the new
machine is the same as the halting problem for the old machine.
Since any machine that does use S0 can first be converted to one
that does not (by renaming S0) and then altered as above, solution
of the printing problem would give solution to all halting problems,
and we know that this cannot be done.

Example 11.2.

It is decidable whether a single-tape TM started on blank tape

scans any cell three or more times. If the tape does not scan any
cell three or more times, the crossing sequence on the left and right
boundary of a cell can be at most two. The crossing sequence is a
sequence of states q1, q2, q3,· · ·, qk where if the machine enters a
cell in q1 from one direction, it leaves the cell after sometime
in q2 in the reverse direction. There are only a finite number of
crossing sequences of length two. So considering these
possibilities, either the tape head will stay within a bounded region
or get into a periodic repeating pattern. But either way, it will be
possible to find out whether it scans a cell three times or not.

After the halting problem for TMs was proved to be undecidable,

many problems for which researchers were trying to find
algorithms were shown to be undecidable. For proving the new
problem undecidable, a known undecidable problem has to be
reduced to it. Let Pnew be the problem which is to be shown to be
undecidable and let Pknown be the known undecidable problem.
Then, the argument to be given is that if Pnew were
decidable, Pknown will be decidable. Since we know Pknown is
undecidable, Pnew has to be undecidable. This process we term as
reducing Pknown to Pnew.

Many problems can be proved undecidable by reducing the halting

problem to it. But many problems related to formal languages can
be proved by reducing a problem known as PCP to them. The PCP
is a well-known undecidable problem. We consider this in the next
section.

Post’s Correspondence Problem

Let Σ be an alphabet having at least two elements.

Definition 11.1

Given two ordered sets of strings w1,... ,wk and x1,..., xk over Σ, is

it possible to find integers i1,... ,in, 1 ≤ ij ≤ k such that
wi1 ... win = xi1 ... xin? This is called Post’s Correspondence Problem
(PCP). If integers i1,... ,in can be found, for an instance of a
problem, that instance of PCP has a solution. If no such integers
can be found, then that instance does not have a solution.
Example 11.3.

Consider the sets of strings:

w x

1. bbab a

2. ab abbb

3. baa aa

4. b bbb

2, 1, 4, 3, is a solution for this instance of PCP as:

w2w1w4w3 = x2x1x4x3 = abbbabbbaa

Example 11.4.

Consider the following sets:

w x

1. aab aba

2. bb bba

3. aaaa b

If this instance has a solution, it has to begin with 2. Since the third
pair has different first letters and the first pair has different second
letters. Beginning the solution with 2, we have:


w: bb


x: bba

To proceed we have to select only the first pair, since second and
third pair will give different w and x strings. Selecting 1, we
continue to find a solution (Now, we have 21):


w: bbaab


x: bbaaba

w matches with x upto the fifth letter, but x has six symbols. To
continue the solution by a similar argument, we have to select only
the first pair. Now the processed strings are:


w: bbaabaab


x: bbaabaaba

Again, we get the same situation. If we try to get the same string,
we have to select the first pair only and we always
get w and x where |x| = |w| + 1 and w matches with x but for the
last symbol. At no point, we will be able to get x = w and hence this
instance of PCP has no solution.

Note that we may be able to look at a particular instance of a

problem and give an answer. When we say a problem is
undecidable, we mean that there is no algorithm which will take an
arbitrary instance of the problem and will give a solution.

Modified PCP (MPCP)

We consider a slightly different variation of PCP.

Definition 11.2
Given two ordered sets of strings w1,... ,wk and x1,... ,xk, can we
find a sequence of integers i1,i2,...,in 1 ≤ ij ≤ k, such that

w1wi1wi2 ... win = x1xi1 ... xin

Note that in MPCP we are fixing the first pair.

Theorem 11.16

If PCP were decidable, then MPCP would be decidable.

Proof. We are reducing MPCP to PCP. Given an instance of

MPCP over Σ, we construct an instance of PCP over Σ ∪ {#, $} as
follows:

Let w1,... ,wk and x1,..., xk be the sets for MPCP. For PCP, we have

two sets y0,y1,...,yk, yk+1, and z0, z1,...,zk, zk + 1. yi is obtained
from wi by putting a # symbol after each symbol of wi. zi is obtained
from xi by putting the # symbol before each symbol of xi. y0 =
# y1 and z0 = z1; yk+ 1 = $ and zk+1 = # $.

Example 11.5.

Let us consider the instance MPCP.

w x

1. ab abbb
2. bbab a

3. baa aa

4. b bbb

This has solution as 1, 2, 4, and 3.

The corresponding PCP we construct is as follows.

y z

1. #a#b# #a#b#b#b

2. a#b# #a#b#b#b

3. b#b#a#b# #a

4. b#a#a# #a#a

5. b# #b#b#b

6. $ #$

If the instance of PCP, we have, constructed has a solution, it has to

be begin with the 0th pair and end with the (k + 1)th pair because
only the first pair has the same first symbol and the (k + 1)th pair
has the same last symbol.

It is straightforward to see that if 1, i1,..., in is a solution of the

MPCP, then 0, i1,..., in, k + 1 is a solution of the PCP and vice versa.
In the example considered above, 0, 2, 4, 3, 5 will be a solution of
PCP generating the string,


#a#b#b#b#a#b#b#b#a#a#$.

Thus, if there is an algorithm to solve PCP, any instance of MPCP
can be converted to an instance of PCP by the above method and
solved. Hence, if the PCP were decidable, MPCP would be
decidable or in otherwords, MPCP can be reduced to PCP.

Next, we prove that if MPCP were decidable, then Lu becomes

recursive or the problem whether a TM M accepts a
string w becomes decidable (which we know is not decidable).

Theorem 11.17

If MPCP were decidable, then the problem whether a Turing

machine accepts a string becomes decidable.

Proof. Given a TM M and an input w, we construct an instance of

MPCP as follows:

Let M be given by (K, Σ, Γ, δ, q0, F) where K = {q0, ..., qr–1}.

Without loss of generality, we can assume that when M reaches
the final state it halts and accepts. The MPCP we want to construct
will be over the alphabets K∪Γ∪{¢}.

First pair is:

¢ ¢q0w¢

We know that q0w is the initial ID. The other pairs are grouped as
follows:

Group I: X X X ∊ Γ

¢ ¢
Group II: Pairs for moves.

If δ(q,a)contains (p,A,R) we have the pair:

qa Ap

If δ(q, a) contains (p, A, L)

we have a set of pairs:

Xqa pXA X ∊ Γ

If there are m symbols in Γ, we have m pairs

If δ(q, ) = (p,C,R)

we have:

q¢ Cp¢

If δ(q, ) = (p,C,L,)

we have a set of pairs:

Xq¢ pXC¢ X ∊ Γ

Group III: Consuming pairs

If q is a final state, we have:

aqb q

aq q

qb q a, b ∊ Γ

Group IV: Ending pairs

q¢¢ ¢

We show that the instance of MPCP, we have constructed, has a

solution if and only if M accepts w.

If M accepts w, then there is a sequence of IDs q0w ⊢ ···

⊢ α1qfα2 where qf is a final state.

The instance of MPCP, we have constructed, will try to generate

these IDs successively and finally consuming some symbols will
generate the same string with both the ordered sets.

The initial strings will be:


:¢

:¢q0w¢

To proceed with the solution, we must get q0w# in the first string.
While we do that we get the second ID in the next string. At any
instance, the partial solution will be of the form:

:¢ID0¢ID1¢ ···¢IDn−1¢

:¢ID0¢ID1¢ ···¢IDn−1¢IDn¢

Once the final state is reached, strings from Group III will be used
in consuming the symbols and we shall get the same string in
and . Thus, the constructed instance of MPCP has a solution if
and only if M accepts w.

Theorem 11.18

PCP is not decidable.

Proof. Given M and w, by the above method we can construct an
instance of MPCP. If MPCP were decidable, we can find out
whether this instance of MPCP has a solution. If it has a
solution, M accepts w. If it does not have a solution, M does not
accept w. Thus, the problem whether M accepts w becomes
decidable. But we know that this is not decidable. (Lu is not
recursive). Hence, MPCP is not decidable.

By the previous theorem if PCP were decidable, MPCP would be

decidable. But, we have just proved that MPCP is not decidable.
Therefore, PCP is not decidable.

We shall illustrate the construction given in Theorem 11.17 with an

example.

Example 11.6.

Consider the following TM with the alphabets {0,1, } and state set
{q0, q1, q2, q3, q4}. q0 is the initial state and q4 is the final state. The
mappings are given by:

0 1

q0 (ql, 0, R) – –

q1 (q2, 0, R) – –

q2 (q3, 1, R) (q3, 1, L) (q3, 1, R)

q3 (q2, 1, L) (q4, 1, R) (q2, 1, L)

q4 – – –

On input 00, the sequence of IDs are given by:


q000 ⊢ 0q10 ⊢ 00q2 ⊢ 001q3 ⊢ 00q211 ⊢ 0q3011 ⊢ q20111 ⊢
1q3111 ⊢ 11q411

The two sets of strings of the MPCP to be constructed are given by:

First pair 1. ¢

Group I: 2. 0

3. 1

4. ¢

Group II:

for δ(q0, 0) = (q1, 0, R) 5. q00

for δ(q1, 0) = (q2, 0, R) 6. q10

for δ(q2, 0) = (q3, 1, R) 7. q20

for δ(q2, 1) = (q3, 1, L) 8. 0q21

9. 1q21

for δ(q2, ) = (q3, 1, R) 10. q2¢

for δ(q3, 0) = (q2, 1, L) 11. 0q30

12. 1q30

for δ(q3, 1) = (q4, 1, R) 13. q31

for δ(q3, ) = (q2, 1, L) 14. 0q3¢

15. 1q3¢

Group III: 16. 0q40

17. 1q41

18. 0q41

19. 1q40

20. 1q4

21. 0q4

21. q41

22. q40

Last pair 23. q4 ¢¢

and strings are constructed step by step as follows:

¢q000¢

using 5, 2, 4

¢q000¢

¢q000¢0q10¢

using 2, 6, 4

¢q000¢0q10¢

¢q000¢0q10¢00q2¢

using 2, 2, 10

¢q000¢0q10¢00q2¢

¢q000¢0q10¢00q2¢001q3¢
using 2, 2, 15

¢q000¢0q10¢00q2¢001q3¢

¢q000¢0q10¢00q2¢001q3¢00q211¢

using 2, 8, 3, 4

¢q000¢0q10¢00q2¢001q3¢00q211¢

¢q000¢0q10¢00q2¢001q3¢00q211¢0q3011¢

using 11, 3, 3, 4

¢q000¢0q10¢00q2¢001q3¢00q211¢0q3011¢

¢q000¢0q10¢00q2¢001q3¢00q211¢0q3011¢q20111¢

using 7, 3, 3, 3, 4

¢q000¢0q10¢00q2¢001q3¢00q211¢0q3011¢q20111¢

¢q000¢0q10¢00q2¢001q3¢00q211¢0q3011¢q20111¢1q3111¢

using 3, 13, 3, 3, 4

¢q000¢0q10¢00q2¢001q3¢00q211¢0q3011¢q20111¢1q3111¢

¢q000¢0q10¢00q2¢001q3¢00q211¢0q3011¢q20111¢1q3111¢11q411¢

Now, we start consuming symbols

using 3, 17, 3, 4

¢q000¢0q10¢00q2¢001q3¢00q211¢0q3011¢q20111¢1q3111¢11q411¢

¢q000¢0q10¢00q2¢001q3¢00q211¢0q3011¢q20111¢1q3111¢11q411¢1q41¢

using 17, 4
¢q000¢0q10¢00q2¢001q3¢00q211¢0q3011¢q20111¢1q3111¢11q411¢1q41¢

¢q000¢0q10¢00q2¢001q3¢00q211¢0q3011¢q20111¢1q3111¢11q411¢1q41¢q4¢

using 23

¢q000¢0q10¢00q2¢001q3¢00q211¢0q3011¢q20111¢1q3111¢11q411¢1q41¢q4¢¢

We get the same string in and .

Note that successive IDs appear between ¢’s and once the final
state is reached, symbols are consumed to make both strings equal.

We can make use of PCP to prove many problems in formal

language theory to be undecidable.

Theorem 11.19

The ambiguity problem for CFG is undecidable.

Proof. We reduce PCP to the ambiguity problem of CFG. Given an

instance M of PCP, we construct a CFG G such that M has a
solution if and only if G is ambiguous.

Let the two sets of strings of M be W = {w1, ..., wk} and X =

{x1, x2, ..., xk} over the alphabet Σ. Let Σ′ = {a1,..., ak}, Σ′ ∩ Σ = φ.
Construct a CFG G with non-terminal alphabet {S, SA, SB} and
terminal alphabet Σ ∪ Σ′. The productions of G are given by:

S → SA

S → SB

SA → wiSAai
SB → xiSBai

SA → wiai

SB → xiai, 1 ≤ i ≤ k

S is the start symbol.

If M has a solution i1,..., in then, wi1wi2 ... win = xi1 ... xin. Then, the

string wi1 ... win ain ain-1 ... ai2 ai1 has two derivations in G, one
starting with rule S → SA and another starting with
rule S → SB and G is ambiguous.

On the other hand, let G be ambiguous. Any string derived starting

with S → SA can have only one derivation tree as the symbols from
Σ′ dictate which rules have to be applied. A similar argument holds
if the derivation starts with S → SB. If G is ambiguous, there should
be a string for which we can have two derivation trees. One should
start with S → SA and another should start with S → SB. In this
case, the string derived will
be wi1 wi2 ... win ain ... ai1 = xi1 xi2 ... xin ain ... ai1 and deleting the
suffix ain ... ai1 we get:

wi1 ... win = xi1 ... xin,

i.e., i1, ..., in is a solution for M. Hence, M has a solution if and only

if G is ambiguous. If there were an algorithm for the ambiguity
problem, then given an instance of PCP, we can construct the
corresponding CFG and find out whether it is ambiguous or not by
the algorithm for the ambiguity problem. From this, we can find out
if the given instance of PCP has a solution or not. Here, PCP would
become decidable. But we know that PCP is not decidable. Hence,
we have to conclude ambiguity problem for CFG is not decidable.

Computable Functions
In an earlier chapter, we have looked at a TM as a computing
device. It computes a function. The TM M starts with input 1n. It
halts in an accepting state ha and at that time if the contents of the
tape is 1f(n), if f is defined at n. It halts at a non-accepting state or
goes into a loop if f(n) is not defined at n. The partial function f is
computed by M. A function which can be computed by a TM is
called a Turing computable function. Here, we are restricting
ourselves where the domain and range of the functions are tuples
of integers. The focus on numerical function values is not as
restrictive as it might sound, because any function from strings to
strings can be described by encoding both arguments and values of
the function as numbers.

Primitive Recursive Function

Definition 11.3 (Initial Functions)

The initial functions are the following:

Constant functions: For each k ≥ 0 and each a ≥ 0, the constant
function : is defined by the formula:

2.
for every
3.

In the case k = 0, we may identify the function with the number
a.
4.
5.

The successor functions: is defined by the formula:

s(x) = x+1

7.
8.

Projection functions: For each k ≥ 1 and each i with 1 ≤ i ≤ k, the

projection function is defined by the formula:

10.
Definition 11.4 (Composition)

Suppose f is a partial function from to , and for each i with 1

≤ i ≤ k, gi is a partial function from to . The partial function
obtained from f and g1, g2, ..., gk by composition is the partial
function h from to defined by the formula:

Definition 11.5 (The Primitive Recursion Operation)

Suppose n ≥ 0, and g and h are functions of n and n + 2 variables,

respectively. The function obtained from g and h by the operation
of primitive recursion is the function defined by the
formulas:
f(X, 0) = g(X)

f(X, k + 1) = h(X, k, f(X, k))

for every and every k ≥ 0.

Definition 11.6 (Primitive Recursive Functions)

The set PR of primitive recursive functions is defined as follows:

All initial functions are elements of PR.

2.
3.

For any k ≥ 0 and m ≥ 0, if f: and g1,g2, ...,gk: are
elements of PR, then the function f (g1, g2, ..., gk) obtained from f
and g1, g2, ..., gk by composition is an element of PR.

4.
5.

If g: and h: are primitive recursive

then obtained by primitive recursion (as in Definition
11.5) is in PR.

6.
7.
starting from the initial functions all functions obtained by the
application of composition and primitive recursion are in PR.

8.
Example 11.7.

Let us consider sum and production functions.

sum: sum (x, y) = x + y

product: product (x, y) = x * y

(* denotes multiplication here)

It can be seen that sum is primitive recursive, as it can be defined

as follows:

It can also be seen that multiplication is primitive recursive, as it

can be defined as follows:

It is straightforward to see that:

Theorem 11.20

Every primitive recursive function is a computable total function.

Primitive Recursive Predicates

An arbitrary function may be defined as:
where P1,P2, ..., Pk are conditions.

A condition P depending on the variable , makes P(X) either

true or false. This is called a predicate. When it has n arguments, it
is called a n-place predicate.

The associated characteristic function χP: n → {0,1} is defined by:

If XP is a primitive recursive function, χP is called primitive

recursive predicate.

Theorem 11.21

Suppose f1, f2, ..., fk are primitive recursive functions from to

, P1, P2, ...,Pk are primitive recursive n-place predicates, and for
every , exactly one of the conditions P1(X), ..., Pk(X) is true.
Then, the function f: defined by:

is primitive recursive.

Definition 11.7 (Bounded Minimalization)

For an (n+1)-place predicate P, the bounded minimalization of P is
the function defined by:

The symbol μ is often used for the minimalization operator, and we

sometimes write:

mp(X,k) = μky[P(X,y)]

An important special case is that in which P(X, y) is (f(X, y) = 0), for

some . In this case, mp is written mf and referred to
as the bounded minimalization of f.

Theorem 11.22

If P is a primitive recursive (n + 1)-place predicate, its bounded

minimalization mP is a primitive recursive function.

There are functions which can be precisely described but not

primitive recursive. Ackermann’s function is defined as:

For m, n ∊
A(n, 0) = n + 1

A(0, m + 1) = A(1, m)

A(n + 1, m + 1) = A(A(n, m + 1), m)

This function is not primitive recursive.

Definition 11.8 (Unbounded Minimalization)

If P is an (n+1)-place predicate, the unbounded minimalization of P

is the partial function Mp: defined by:

MP(X) = min{y|P(X, y) is true}

MP(X) is undefined at any for which there is no y satisfying

P(X, y).

The notation μy[P(X, y)] is also used for MP(X). In the special case
in which P(X, y) = (f(X, y) = 0), we write MP = Mf and refer to this
function as the unbounded minimalization of f.

Definition 11.9 (μ-Recursive Functions)

The set M of μ-recursive, or simply recursive, partial functions is

defined as follows:
1.

Every initial function is an element of M.

2.
3.

Every function obtained from elements of M by composition or

primitive recursion is an element of M.

4.
5.

For every n ≥ 0 and every total function in M, the

function Mf: defined by:

Mf (X) = μy[f(X, y) = 0]

is an element of M.

8.
9.

No other functions are in the set M.

10.
Theorem 11.23

All μ-recursive partial functions are computable.

The equivalence between μ-recursive functions and Turing
computable functions can be shown. The proof is fairly lengthy and
is beyond the scope of this book.

We next show that there exist functions which can be specified

properly but are not μ-recursive.

The Busy Beaver Problem

Consider a TM with two-way infinite tape, an input alphabet equal
to {1}, and a tape alphabet equal to {1, b}. How many l’s can there
be when the empty word is accepted? Since 1 is the only non-blank
symbol, this is equivalent to asking if it halts when given the blank
tape as input, then how many non-blank cells can there be?

This is called busy beaver game because of the similarity of 1’s to

twigs and the activity of the machine to the industrious activity of
beavers.

The maximum number of non-blank cells that can be obtained by

such an n-state TM is denoted by Σ(n). A n-state machine that
produces Σ(n) non-blank cells is called a busy beaver. Σ(1) = 1, Σ(2)
= 4, Σ(3) = 6, Σ(4) = 13, respectively. 1-state and 2-state busy
beavers are shown below:

Σ(5) is at least 1915 and Σ(n) grows very fast.

We next show Σ(n) is not a Turing computable function.

Let us define this function slightly differently. Let σ: → be
defined as follows: σ(0) = 0. For n > 0, σ(n) is obtained by
considering TMs having n non-halting states and tape alphabet {1,
0} (0 is taken as blank). We can take the non-halting set of states
as {q0,..., qn − 1} with q0 as the initial state. Since the number of
non-halting states and number of tape symbols are finite, there are
only a finite number of TMs of this type. We restrict our attention
to those which halt on input 1n. σ(n) denotes the largest number of
1’s that can be printed when any of these machines halt.

We show that σ(n) cannot be computed by a TM. Suppose σ is

Turing computable. Then, it is possible to find a TM Tσ having tape
alphabets {1, 0} that computes it. This is possible as we have seen
(Theorem 10.10).

Let T = TσT1, where T1 is a TM having tape alphabets {0,1} that

moves its tape head to the first square to the right of its starting
position in which there is a 0 writes a 1 there and
halts. T simulates Tσ first and when Tσ halts, T continues
simulating with T1. Let m be the number of states of T. By
definition of σ, no TM with m states and tape alphabets {0, 1} can
end up with more than σ(m) 1’s on the tape if it halts on input 1m.
But T is a machine of this type that halts with output 1σ(m)+1. This is a
contradiction. Hence, σ is not Turing computable.

Problems and Solutions

1. Which of the following properties of RE sets are themselves RE?
1.
L contains at least two strings.
2.
3.
L is infinite.
4.

Solution 1.
. This is RE.
2.
Let S = {L1, L2,... } each L in S contains at least two strings. LS = {< M > |T(M
3.
LS will be accepted by a TM MS which halts accepting < M > if < M > is in LS
< M > is not in LS. MS takes as input < M > and uses a pair generator and simu
of M on string xi for j steps. It also keeps a counter, which initially contains 0.
4.
If M accepts xi in j steps, it increases the counter by 1. If the counter reads 2, M
< M >. Otherwise, it keeps on generating pair after pair and keeps testing. Sinc
nontrivial. LS is not recursive.
5.
6.
This is not RE.
7.
Let S = {L1, L2,...} each L in S is infinite.
8.
LS = {< M > |T(M) = L and L ∊ S}.
9.
If LS is RE, it should satisfy the three conditions of Theorem 11.13. But condit
violated. LS is not RE.
10.

2. Which of the following decision problems are decidable?

Let Ti denote the ith TM.
1.
Given x, determine whether x ∊ S, where the set S is defined inductively as fol
then u2 + 1, 3u + 2 and u! are all members of S.
2.
3.
Given x1, x2, and x3 determine whether f(x1) = π2(x2, x3), where f is a fixed non
function (π2 is Cantor numbering).
4.

Solution 1.
. This is decidable.
2.
Construct a TM T halting on all inputs which stops saying ‘yes’ if x ∊ S or stop
if x ∉ S.
3.
T has one input tape which initially contains x. It has six more tapes
4.
Tu, T1, T2, T3, T0, Tlist.
5.
Tu initially contains nothing (blank meaning 0)
6.
T1 initially contains 1
7.
T2 initially contains 2
8.
T3 initially contains 1
9.
T1ist contains in the increasing order a list of numbers separated by #’s. Initiall
#.
10.
T0 is the output tape which is initially blank. When u is in Tu, T computes u2 +
on T2 and u! on T3, it then compares x with the contents of T1, T2 and T3. If the
match, T outputs ‘yes’ in T0 and halts. If there is no match, the contents of T1,
to the list in T1ist in the proper place. Next string from T1ist is taken and placed
process repeats. The process stops if a match between the input and the conten
found at any time with output ‘yes.’ If the contents of T1, T2, and T3 are greate
the process stops outputting ‘no.’ This is possible as the three functions compu
monotonically increasing. Hence, the problem is decidable.
11.
12.
This problem is undecidable.
13.
f is a non-total function and π2 is a total function.
14.
Any algorithm to solve the problem with inputs x1, x2, and x3 should have two
computing f(x1) and another computing π2(x2, x3). Since f is nontotal for some
algorithm will not come out of the subroutine for computing f(x1) and will not
Hence, the problem is undecidable.
15.

3. Fermat’s last theorem, until recently one of the most-famous unproved stateme
asserts that there are no integer solution (x, y, z, n) to the equation xn + yn = zn s
and n > 2. Show how a solution to the halting problem would allow you to det
falsity of the statement.

Solution Suppose the halting problem is decidable. Construct a TM T which will solve F
. as follows:
TM T has four tapes. In tape T1, it systematically generates ordered quadruples
1, z ≥ 1, n ≥ 3. Initial one is (1, 1, 1, 3).
In T2, it computes xn + yn.
In T3, it computes zn.
It compares contexts of T2 and T3 and if they match, outputs ‘yes’ in T4. If ther
quadruple is generated in T1 and the process repeats. If there is no solution for
theorem, T will not halt and will go on forever. Now if ‘Halt’ is the algorithm
problem, give T and the first quadruple as input. If ‘Halt’ says ‘yes’, then Ferm
a solution. If ‘Halt’ says ‘no’, then Fermat’s last theorem has no solution.

Exercises
1 Suppose the tape alphabets of all TMs are selected from some infinite set of symbols a
. each TM may be encoded as a binary string.

2 Which of the following properties of RE sets are themselves RE.

. 1.
L is a context-free language.
2.
3.
L = LR
4.

3 Which of the following decision problems are decidable?

. Let Ti denote the ith TM
1.
Given i and n, determine whether Ti visits more than n squares after being started on b
2.
3.
Given i, determine whether Ti ever writes the same symbol during two consecutive mo
started on blank tape.
4.
5.
Given i, determine whether there exists a TM with fewer states that computes the sam
function as Ti.
6.

4 Show that the following functions are primitive recursive.

. 1.
exponentiation
2.
3.
factorial function
4.
5.
predecessor function
6.
7.
proper subtraction
8.

9.
10.
absolute difference |x – y|
11.
12.
Sign function
13.

14.
15.
comparison function
16.

17.

5 For each decision problem given, determine whether it is solvable or unsolvable and p
. 1.
Given a TM T, does it ever reach a state other than its initial state when it starts with a
2.
3.
Given a TM T and a non-halting state q of T, does T ever enter state q when it begins w
4.
5.
Given a TM T and a non-halting state q of T, is there an input string x that would caus
enter state q?
6.
7.
Given a TM T, does it accept the string ∊ in an even number of moves?
8.
9.
Given a TM T, is there a string which it accepts in an even number of moves?
10.
11.
Given a TM T and a string w, does T loop forever on input w?
12.
13.
Given a TM T, are there any input strings on which T loops forever?
14.
15.
Given a TM T and a string w, does T reject input w?
16.
17.
Given a TM T, are there any input strings rejected by T?
18.
19.
Given a TM T, does T halt within ten moves on every string?
20.
21.
Given a TM T, is there a string on which T halts within 10 moves?
22.
23.
Given TMs T1 and T2, is L(T1) ⊆ L(T2) or L(T2) ⊆ L(T1)?
24.

6 Which of the following problems about CFGs and their languages are decidable, and w
. decidable? Prove.
1.
the problem of determining whether an arbitrary string belongs to the language genera
2.
3.
the problem of determining whether a CFG generates a non-empty language.
4.
5.
the problem of determining whether the languages generated by two CFGs have any s
6.
7.
the problem of determining whether two CFGs generate the same language.
8.
9.
given two CFGs G1 and G2, L(G1) ⊆ L(G2).
10.
11.
given CFG G and a regular set R determine whether L(G) = R.
12.
13.
given CFG G and a regular set R determine whether R ⊆ L(G).
14.
15.
given CFG G determine whether is a CFL.
16.
17.
given two CFGs G1 and G2 determine whether L(G1) ∩ L(G2) is a CFL.
18.

Chapter 12. Time and Space

Complexity
In the earlier chapters, we considered the Turing machine (TM)
and its acceptance power. By Church–Turing hypothesis, we
realize that whatever could be done by a computer can be achieved
by a TM. Also, while considering the variations of TMs, we found
that even though all of them have the same accepting power, the
number of steps could be very much increased in some cases. We
also noted earlier that a procedure corresponds to a TM and an
algorithm corresponds to a TM, which halts on all inputs. When we
study algorithms for problems, we are also interested in finding
efficient algorithms. Hence, it becomes essential to study the time
and tape (space) complexity of TMs. In this chapter, we study some
of these results.

The RAM Model

The standard model of computation which will represent the action
of a modern computer is the family of random access machines
(RAM). They are also referred to as register machines sometimes.
There are several variations of this model. We consider the
following model. It consists of a central processor and an
unbounded number of registers. The processor carries out
instructions from a given limited set on these registers. Each
register can store an arbitrarily large integer. The program is not
stored in these registers. It immediately follows that a RAM
program cannot modify itself. In this way, it is different from the
stored program concept. Also, at any time, the RAM program can
refer to only a fixed number of registers, even though it has
unbounded number of registers. When the machine starts, the
input data is stored in a few registers. When it stops, the result of
the computation is stored in a few registers. Let us consider the
simplest RAM model which uses only four instructions, viz,
increment, decrement, jump on zero, and halt. To add two
numbers stored in R1 and R2 and store the result in R1, the
following program can be executed.

loop: Jump on zero, R2, done

Decrement R2
Increment R1
done: Halt.

It may look as though RAM model is more efficient compared to

TMs, when such operations like addition or subtraction are
considered. But for string operations like concatenating two
strings x and y, TM may look more efficient. In fact, we shall see
that one can be simulated by the other, where the time could be
cubed and the order of the space is the same.
We would also like to consider addition and subtraction, though
including them as complicate things.

Assume that we assign unit cost to the four instructions mentioned

earlier. Increment is the one instruction, where space may
increase. It should be noted that for charging for space, we can
either use the maximum number of bits used in all the registers or
in any register as any program uses only a fixed number of
registers and the two will only differ by a constant factor. We
would like the space measure not to exceed the time measure.
Since, increment instruction can result in increasing the space by
one, we realize:

Space = O (input size + time)

We also assume register transfers can be done at unit cost. We also

assume that addition and subtraction can be done at unit cost.
Under these assumptions also, the relationship between space and
time is preserved.

Involving multiplication makes the space increase exponentially.

Hence, we shall use a RAM model without multiplication.

Here, we are not using indexing capability or indirect addressing

used in computers. But allowing these techniques it can be easily
proved that the relationship between SPACE and TIME becomes:

SPACE = O(TIME2)

Comparison of RAM and TM

When we talk about computation which model should we
consider? The RAM or the TM? How does SPACE and TIME get
affected when we consider one model rather than the other? We
prove below that our two models are equivalent in terms of
computability and that the choice of model causes only a
polynomial change in complexity measures. The proof consists
simply of simulating one machine by the other and vice versa. This
also establishes the equivalence of the two models from the point
of view of computability.
Simulation of RAM by a TM
At any time, only a finite number of registers of RAM are used. Let
the registers be numbered as 1, 2, ..., m,
containing u1, u2, ..., um. Then, the simulating TM has in its tape #1
* u1#2 * u2#3 * u3 ... #m * um# where the block #i * ui# denotes
the fact that the ith register of RAM contains the integer ui. Note
that i and ui can be represented in binary. Any operation like
‘increment’ or ‘decrement’ is transfered to the finite state control of
the TM and executed. The change in register i from ui to υi is
simulated by changing the block #i * ui# to #i * υi#. When this is
done, shifting of symbols to the left or right is done if necessary.

Our simulation is efficient in terms of space: the space used by the

TM is at most a constant multiple of the space used by the RAM,

SPACETM = Θ (SPACERAM)

But, more time is required as the TM has to scan and find the
correct block and shift symbols if necessary. Nevertheless, for one
step of RAM, the TM may scan the non-blank portion of the tape a
finite number of times (maximum k say). Hence, TIMETM =
O(TIMERAM.SPACERAM).

Simulation of TM by RAM
Simulating a TM with a RAM requires representing the state of the
TM, as well as the contents of its tape and the position of tape head
only in registers. The state can be kept in one register and the RAM
program can decrement this register and use jump on zero to
simulate a move of the TM. The symbol scanned by the tape head
will be kept in one register. The symbols of the TM can be taken
as X1, ..., Xk−1, and if the current symbols scanned is Xi,
integer i will be stored in that register. In the next move, the tape
head will move left or right and scan Xj say. Then, the contents of
this register will be changed to integer j. This can be done using
‘decrement’ and ‘increment’ operations of the RAM. The ID of the
TM is given by α1qα2 or
Xi1 Xi2...Xij−1 qXij...Xin, where Xij is the symbol scanned by the tape
head. To store this and the symbol read, three registers R1, R2,
and R3 can be used.

R1 contains ij.

R2 contains ij − 1 + kij − 2 + k2ij − 3 + ... + kj − 2i1,

R3 contains ij + 1 + kij + 2 + k2ij + 3 + ... + kn − j − 1in.

If Xa is read and Xb is to be printed in the cell and the move is left
the mapping will be δ(q, Xa) = (q’, Xb, L). This is simulated by the
RAM using the following instructions:

In order to simulate a right move:

δ(q, Xa) = (q′, Xb, R)

the following steps are executed by RAM:

R1 ← b

R2 ← k. R2 + R1

R1 ← R3 mod k

R3 ← R3 ÷ k

Except for the division, all operations can be carried out in

constant time by the RAM model. The multiplication by d can be
done with a constant number of additions. Division requires more
time. It can be accomplished by building the quotient, digit by digit
in time proportional to the square of the number of digits of the
operand – or equivalently in time proportional to the square of the
space used by the TM.

Hence, we have:

SPACERAM = Θ (SPACETM)

TIMERAM = Ο(TIMETM · SPACE2TM)

We can conclude that any TM can be simulated on RAM with at

most cubic increase in time and linear increase in space.

Problems, Instances, and Languages

Recall the connection between problems, instances, and languages
given in Section 11.4. A decision algorithm has ‘yes’ instances and
‘no’ instances. A TM can be constructed to accept the ‘yes’
instances of the problem and reject the ‘no’ instances of the
problem, if the problem is decidable. Note that the instance of the
problem is encoded suitably as a string.

When we talk about an algorithm, we talk about its efficiency. i.e.,

the amount of time it will take or the memory it will use as a
measure of the size of the input. When we talk about solving a
problem with an algorithm, intuitively, we think of the computer
for executing it, i.e., RAM will be a suitable model. We call an
algorithm efficient if it has polynomial-time complexity. Since the
TM can simulate a RAM with time complexity increasing by a cubic
factor, equivalently we can think about the TMs, which accept the
‘yes’ instances and their time complexity. Hence, we shall consider
the time and tape (space) complexity of TMs and formulate our
study.

Time and Tape Complexity of

Turing Machines
Space Complexity
Consider the offline TM M. It has a read-only input tape and the
input is placed within end markers # and $. It has k storage tapes
infinite in one direction. If for every input word of length n,
M scans at most S(n) cells on any storage tape, then M is said to be
an S(n) space bounded TM, or of space complexity S(n). The
language recognized by M, is also said to be of space
complexity S(n). It should be noted that symbols on the input tape
cannot be rewritten and only the cells used in storage tapes count
towards space complexity. This way of looking at the TM helps to
consider space bounds less than O(n). For example, log n. If the
symbols on the input tape can be written, then minimum space
complexity will be n, as we need to look at whole of the input.

Figure 12.1. Multi-tape TM with read only input tape

Time Complexity
For considering the time complexity, we look at the following
variation of the TM. The TM M has k two-way infinite tapes. One of
them initially contains the input. Symbols in all tapes can be
rewritten.

If for every input word of length n, M makes at most T(n) moves

before halting, then M is said to be a T(n) time-bounded TM or of
time complexity T(n). The language recognized by M is said to be
of time complexity T(n).

The two different models for time and space complexity are
considered for making the proofs simpler. Also, time complexity
will be different if we use a single tape TM. In Chapter 10, we have
seen that when we simulate a multi-tape TM by a single tape,
number of steps may be squared. Consider the following example.
We would like to have a TM for accepting strings of the
form wcwcw. In a single tape, we can have two tracks, check off
the symbols moving the tape head left and right several times. If |
w| = n, the machine may make n left sweeps and n right sweeps.
Hence, the time complexity will be O(n2). If we use three tapes,
where one tape initially contains wcwcw, the tape head on this
tape moves right reading w and copying it on the second tape and
when it encounters a c, copies the portion after the first c till the
second c w onto the third tape, moves the second and third heads
to the leftmost non-blank symbols and moves all the heads right
checking that the symbols read under the three heads are the same
i.e., the input string is of the form wcwcw.

The length of the input is 3n + 2.

Number of steps is 2n + 2 + n + n = 4n + 2.

Hence, time complexity is O(n).

Assumptions
It should be clear that every TM uses atleast one cell on all inputs,
so if S(n) is a space complexity measure, S(n) ≥ 1 for all n. Space
complexity S(n) really means max(1, ⌈S(n)⌉). Suppose the space
complexity is log2 n, this does not make sense if input is of size 1 or
0 (string ε), whereas if we say max(1, ⌈S(n)⌉) it makes sense.

Similarly, it is reasonable to assume that any time complexity

measure T(n) is atleast n + 1, as the whole of the input has to be
read and to realize that the first blank has to be read. Hence, time
complexity T(n) really means max(n + 1, ⌈T(n)⌉).

Non-deterministic Time and Space

Complexity
The concept of time and tape complexity of deterministic TMs
which we just now considered, can be carried over to non-
deterministic TMs also. A non-deterministic TM is of time
complexity T(n), if no sequence of choices of moves of the machine
on input of size n, causes the machine to make more than T(n)
moves. It is of space complexity S(n), if no sequence of choices of
moves on input of size n, enables it to scan more than S(n) cells on
any tape.

Complexity Classes
The family of languages accepted by deterministic TMs of space
complexity S(n) is denoted by DS(S(n)). The family of languages
accepted by non-deterministic TMs with space complexity S(n) is
denoted by NS(S(n)). The family of languages accepted by
deterministic (non-deterministic) TMs of time complexity T(n) is
denoted by DT(T(n)) (NT(T(n))).

If we do not put bound on the number of states and number of tape

symbols, some information can be stored in state and tape symbols
can be taken as tuples to hold the information in some (say k)
consecutive cells. This may help to get linear speed up in S(n). We
state some results without giving proofs. The proofs involve
detailed constructions.

1.
If L is accepted by an S(n) space-bounded TM with k storage tapes,
then for any c > 0, L is accepted by a cS(n) space bounded TM.
2.
3.
If a language L is accepted by a TM with k storage tapes with space
complexity S(n), it is accepted by a TM with a single tape with the
same space complexity S(n).
4.
5.
Linear speed-up like in 1 above can be given for time complexity
also.
6.
7.
If L is accepted by a multi-tape TM with time complexity T(n), it is
accepted by a single tape TM with time complexity (T(n))2.
8.

Space Hierarchy
What complexity functions are considered as ‘well behaved’? For
this we define ‘space constructible,’ ‘fully space constructible,’ ‘time
constructible,’ ‘fully time constructible’ functions.

When we say ‘space complexity is S(n),’ we mean that on an input

of size n, the TM uses at most S(n) space. It need not use S(n)
space on all inputs of size n (can use less). S(n) is said to be space
constructible, if there is some TM M that is S(n) tape bounded, and
for each n, there is some input of length n on which M actually
uses S(n) tape cells. It should be noted that M need not use S(n)
space on all inputs of length n. If for all n, M uses exactly S(n) cells
on any input of length n, then we say S(n) is fully space
constructible. nk, 2n, n! (k an integer) are fully space constructible
functions.

A function T(n) is said to be time constructible if there exists a T(n)

time-bounded multi-tape TM M such that for each n there exists
some input of length n on which M actually makes T(n) moves. We
say that T(n) is fully time constructible if there is a TM, which
uses T(n) time on all inputs of length n. Most common functions
are fully time constructible.

Some Results on Complexity Classes

We state some results without proof.

If S2(n) is a fully space constructible function and

and S1(n), S2(n) ≥ log2n, then there is a language in DS(S2(n))
which is not in DS(S1(n)).
2.
3.
If T2(n) is a fully time constructible function

and , then there is a language in DT(T2(n)) but

not in DT(T1(n)).
4.
5.
If L is in DT(f(n)), then L is in DS(f(n)).
6.
7.
Savitch’s Theorem: If L is in NS(S(n)), then L is in DS(S2(n))
provided S(n) is fully space constructible and S(n) ≥ log2 n.
8.

Polynomial Time and Space

Whenever an algorithm can be solved in deterministic polynomial-
time, we consider it as an efficient algorithm. When this cannot be
done, we call such problems as intractable problems.

The languages recognizable in deterministic polynomial-time form

a natural and important class and we denote it as P.

P = ∪i≥1 DT(ni)

There are a number of well-known problems which do not appear

to be in P but have efficient non-deterministic polynomial-time
algorithms. They are denoted as NP:

NP = ∪i≥1 NT (ni)

A deterministic algorithm corresponds to finding a solution by

searching the solution space sequentially, whereas a non-
deterministic algorithms corresponds to guessing a solution and
verifying it. Consider the problem of finding whether a graph has
clique of size k. Clique is a complete subgraph. A deterministic
algorithm will take one by one subsets of the vertex set of
size k and will check whether the induced subgraph on
these k vertices forms a clique. If there are n vertices in the original
graph, there are nCk ways of selecting k vertices and one by one we
have to check for each of these subsets. On the other hand a non-
deterministic algorithm guesses a k subset and verifies whether the
induced subgraph on these k vertices is a complete graph. Thus,
the difference between P and NP is analogous to the difference
between efficiently finding a proof of a statement (such as “this
graph has a Hamiltonian circuit”) and efficiently verifying a proof.
(i.e., checking that a particular circuit is Hamiltonian). Intuitively,
verifying looks easier, but we do not know for sure.
Whether P = NP is still an open problem, though evidences lead us
to believe they are not equal.

Other important classes are:

PSPACE = ∪i≥1 DS (ni)

NSPACE = ∪ i≥1 NS (ni)

By Savitch’s Theorem, NS(ni) ⊆ DS(n2i) and hence, PSPACE

= NSPACE.

Obviously, P ⊆ NP ⊆ PSPACE, yet it is not known whether any

these containments is proper.

Because of space hierarchy results, we have:

We know that:

Equation 12.1.

Hence, atleast one of the containment in Equation (12.1) is proper.

Intractable Problems
We have already seen the connection between problems and
languages. Time complexity of an algorithm is measured in terms
of the size of the input to the algorithm. Looking at from the point
of view of TMs and languages accepted, we have seen that the time
complexity is measured in terms of the length of the input string.
Exponential time algorithms are not considered as “good”
algorithms. Most exponential time algorithms are merely variation
on exhaustive search, whereas polynomial-time algorithms
generally are made possible only through the gain of some deeper
insight into the structure of the problem. Generally, people agree
that a problem is not “well-solved” until a polynomial-time
algorithm is known for it. Hence, we shall refer to a problem as
intractable if it is so hard that no polynomial-time algorithm can
possibly solve it.

There are two different causes for intractability. The first one is
that the problem is so difficult that an exponential amount of time
is needed to find a solution. The second is that the solution itself is
required to be so extensive that it cannot be described with an
expression having length bounded by a polynomial function of the
input length. For example, if the input is a sequence of m distinct
integers i1, i2, ..., im and we want to output all possible
permutations of them; the length of the output itself cannot be a
polynomial function of the length of the input and so cannot be
computed in polynomial-time. The existence of this sort of
intractability is mostly apparent from the problem definition. So,
we shall consider the first type of intractability, i.e., we shall
consider only problem for which the solution length is bounded by
a polynomial function of the input length.

The earliest intractability results were by A. M. Turing, when he

proved some problems to be undecidable. (We have considered
this concept in the last chapter.) After the undecidability of the
halting problem of TM was proved, a variety of other problems
were proved to be undecidable, including Hilbert’s 10th problem
(solvability of polynomial equations in integers) and some
problems of ‘tiling the plane.’

The interest hence is on ‘decidable’ intractable problems. Some

problems from automata theory have been proved to be
intractable. For example, it has been shown that the problem
whether an extended regular expression (including Boolean
operators ∩ and ¬ apart from concatenation, (+) union, and (*)
star) denoting the empty set requires exponential space.
When theoreticians try to find powerful methods for proving
problems intractable, they try to focus on learning more about the
ways in which various problems are interrelated with respect to
their difficulty.

The principal technique used for demonstrating that two problems

are related is that of “reducing” one to the other, by giving a
constructive transformation that maps any instance of the first
problem into an equivalent instance of the second. Such a
transformation provides the means for converting any algorithm
that solves the second problem into a corresponding algorithm for
solving the first problem. (We have seen this type of reduction in
the last chapter.) When we want to show a new problem to be
‘undecidable,’ we reduce a known undecidable problem to it.

In the study of NP-complete problems, we are trying to identify

problems which are hardest in the class NP. The foundations of the
theory of NP-completeness were laid in a paper by Stephen Cook
(1971) in which he introduced the idea of ‘polynomial-time
reducibility’ and proved the Boolean satisfiability problem is NP-
complete. Once this problem has been proved NP-complete,
several other problems were proved NP-complete, by reducing this
problem to them. Satisfiability problem has the property that every
problem in NP can be polynomially reduced to it. If the
satisfiability problem can be solved with a polynomial-time
algorithm, then so can every problem in NP and if any problem
in NP is intractable, then the satisfiability problem must also be
intractable. Thus, in a sense, the satisfiability problem is the
‘hardest’ problem in NP.

The question of whether or not the NP-complete problems are

intractable is now considered to be one of the foremost open
questions in theoretical computer science. Though many believe
that NP-complete problems are intractable, no proof has been
given yet. Neither has it been disproved. Apart from the interest in
proving the connection between NP-completeness and
intractability, proving some problems to be NP-complete is
important by itself. Because, once we know a problem to be NP-
complete, one can stop trying to find deterministic algorithms for
the problem and either consider approximate or randomized
algorithms for the problem or concentrate on restricted versions of
the problem which may have polynomial deterministic algorithms.

Noting the connection between decision problems and languages

(Section 11.4), we now formulate the study of NP-completeness in
terms of languages.

Reducibility and Complete Problems

Definition 12.1

We say that a language L′ is polynomial-time reducible to L if there

is a deterministic polynomial-time bounded TM that for each input x
produces an output y that is in L if and only if x is in L′.

Theorem 12.1

Let L′ be polynomial-time reducible to L. Then

L′ is in NP if L is in NP.

2.
3.

L′ is in P if L is in P.

Proof. Proofs of (a) and (b) are similar.

Proof for (b). Assume that the reduction of L′ to L is done by a TM
with time complexity P1(x) where P1 is a polynomial. Let L be
accepted by a P2(n) time-bounded TM where P2 is a polynomial.
Then L′ can be accepted in polynomial time as follows: Given
input x of length n, produce y by P1(n) time reduction. As at most
one symbol can be printed per move, it follows that |y| ≤ P1(x).
Give y as input to the TM accepting L. In time P2(P1(n)), it will tell
whether y is in L or not. Then, the total time to test whether x is in L
′ is P1(n) + P2(P1(n)) which is a polynomial in x. Hence L′ is in P.

Theorem 12.2

If L1 is polynomial-time reducible to L2 and L2 is polynomial-time

reducible to L3, then L1 is polynomial-time reducible to L3.

Proof. Let Σ1, Σ2, and Σ3 be the alphabets of the languages L1, L2,

and L3, respectively. Let be a polynomial-time
transformation from L1 to L2, i.e., there is a DTM M1 which
takes x1 as input and outputs x2 in time P1(|x1|) such that x1 is
in L1 if and only if x2 is in L2. Similarly, let be a
polynomial transformation from L2 to L3. i.e., there is a
DTM M2 which takes x2 as input and outputs x3 in time P2(|x2|)
such that x2 is in L2 if and only if x3 is in L3. Here P1 and P2 are
polynomials. Then, the function defined by f(x)
= f2(f1(x)) for all x ∊ Σ is the desired transformation from L1 to L3.
A DTM (which is a combination of M1 and M2) first
converts x1 to x2 in P1(|x1|) time, |x2| ≤ P1(|x1|). Then, x2 is
converted to x3 by P2(|x2|)= P2(P1(|x1|)) time. It is easy to see x3 is
in L3 if and only if x1 is in L1 and this transformation is achieved in
time P1(|x|) + P2(P1(|x|)) which is a polynomial.

Next we define log-space reducibility, though we shall not use this

concept further in this chapter.

Definition 12.2
A log-space transducer is an off-line TM that always halts,
having log n scratch storage and a write-only output tape on which
the head never moves left. We say that L′ is log-space reducible to
L if there is a log-space transducer that given an input x, produces
an output string y that is in L if and only if x is in L′.

We say that two languages L1 and L2 are polynomially equivalent

if L1 is polynomial-time reducible to L2 and vice versa.

By Theorem 12.2, we see that this is a reasonable legitimate

equivalence relation. The relationship “polynomial-time reduces”
imposes a partial order on the resulting equivalence classes of
languages. The class P forms the ‘least’ equivalence class under this
partial order and can be viewed as consisting of the ‘easiest’
languages. The class of NP-complete languages (problems) will
form another such equivalence class. It contains the “hardest”
languages in NP.

Definition 12.3

Let C be a class of languages.

We say that a language L is complete for C with respect to

polynomial-time reductions if L is in C, and every language in C is
polynomial-time reducible to L.

2.
3.

We say L is hard for C with respect to polynomial-time reductions if

every language in C is polynomial-time reducible to L, but L is not
necessarily in C.

4.
We can also define complete and hard problems with respect to
log-space reductions. The following definitions follow:

Definition 12.4

A language L is defined to be NP-complete, if:

L ∊ NP

2.
3.

any language L′ ∊ NP is polynomial-time reducible to L.

4.
We can also define this in terms of decision problems.

Definition 12.5

A decision problem ∏ is NP-complete if:

∏ ∊ NP

2.
3.

any other decision problem ∏′ ∊ NP is polynomial-time reducible

to ∏.
4.
Thus NP-complete problems can be identified as ‘the hardest
problems in NP.’ If any single NP-complete problem can be solved
in polynomial-time, then all problems in NP can be solved in
deterministic polynomial-time. If any problem in NP is proved
intractable, then so are all the NP-complete problems. If ∏ is
an NP-complete problem, correspondingly if L is a NP-complete
language, then if P ≠ NP, L ∊ NP − P.

If P ≠ NP (which is still open), the following figure gives the

relationship between these sets.

How do we show that a given problem is NP-complete? In the last

chapter, we saw that to show a new problem to be undecidable, we
reduce a known undecidable problem to it. Similarly, to show a
new problem (language) to be NP-complete, we reduce
(polynomial-time) a known NP-complete problem (language) to it.

Theorem 12.3

If L1 and L2 are in NP, L1 is NP-complete, and L1 is polynomial-

time reducible to L2, then L2 is NP-complete.

To show that L2 is NP-complete, two conditions have to be

satisfied.

L2 is in NP (This is given) and

2.
3.

any L in NP is polynomial-time reducible to L2.

Any L in NP is polynomial-time reducible to L1 as L1 is NP-

complete. It is given that L1 is polynomial-time reducible to L2.
Hence, by Theorem 12.2, any L in NP is polynomial-time reducible
to L2. Therefore, L2 is NP-complete.

Therefore, to show a problem ∏ to be NP-complete, we have to

show ∏ ∊ NP, and a known NP-complete problem ∏0 is
polynomial-time reducible to ∏.

Satisfiability-Cook’s Theorem
A Boolean expression is an expression composed of variables,
parentheses, and the operators ∧ (logical AND), ∨ (logical OR) and
¬ (negation). The order of precedence among this is ¬, ∧, ∨.
Variables take on values 0 (false) and 1 (true); so do expressions.
The truth table for AND, OR, NOT are given below.
Here E1 and E2 are expressions.

E1 E2 E1 ∧ E2 E1 ∨ E2

0 0 0 0

0 1 0 1

1 0 0 1

1 1 1 1

Suppose E is a Boolean expression involving variables {x1, ..., xm}.

Each variable can be assigned value 0 or 1. Hence, there are
2m possible assignments of the variables. The satisfiability problem
is that whether there is an assignment of the variables which
makes the expression take the value 1 (true).

We may represent the satisfiability problem as a language Lsat as

follows. Let the variables of some Boolean expression
be x1, ..., xm for some m. Encode variable xi as the
symbol x followed by i written in binary. So, the alphabet for Lsat is
{∧, ∨, ¬, (,), x, 0, 1}. Each xi may take [log m] + 1 symbols. An
expression:

Equation 12.2.

will be coded as:

Equation 12.3.

If an expression of the form of Equation (12.2) is of length n its

coded version will be of length at most O(nlog n). Without our
argument getting affected, we can take the expression in form of
Equation (12.2) as the log n factor is not going to affect the
polynomial-time reducibility.

A Boolean expression is satisfiable if it evaluates to 1 for some

assignment of the variables. It is unsatisfiable if it evaluates to 0
for all assignments of the variables. (x1 ∨ x2) is satisfiable for
assignment x1 x2: 10, 01, 11; x1∧ ¬x1 is unsatisfiable as it evaluates
to 0 whether x1 = 0 or 1.

A Boolean expression is said to be in conjunctive normal form

(CNF) if it is of the form E1 ∧ E2 ∧ ... ∧ Ek, and each Ei, called a
clause is of the form αi1 ∨ αi2 ... ∨ αir , where each αij is a literal, that
i

is, either x or ¬x, for some variable x. (x1 ∨ x2) ∧ (x1 ∨ ¬x2 ∨ x3) is

in CNF. The expression is said to be in 3-CNF if each clause has
exactly three distinct literals.

We next state Cook’s theorem.

Theorem 12.4
The satisfiability problem is NP-complete.

Proof. The decision problem is ‘Is a Boolean expression satisfiable

or not?’. Consider the language L0 consisting of all strings
representing satisfiable Boolean expressions. i.e., the ‘yes’
instances of the problem. We claim that L0 is in NP. A non-
deterministic algorithm to accept L0 guesses a satisfying
assignment of 0’s and 1’s to the Boolean variables and evaluates
the expression to verify if it evaluates to 1.

The evaluation can be done in time proportional to the length of the

expression by a number of parsing algorithms. Even a naive
method should not take more than O(n2) steps. In one pass, try to
evaluate the innermost subexpressions. In the next passes repeat
the same process. Even in the worst case it may not require more
than n passes. Hence, there is a non-deterministic polynomial-time
algorithm for this and the non-deterministic TM accepting L0 will do
it in O(n2) time in the worst case.

The first condition for NP-completeness is satisfied. Next, we have

to show that any L in NP is polynomial-time reducible to L0.
Consider a language L in NP. Let M be a non-deterministic TM of
polynomial-time complexity that accepts L, and let w be the input
to M. From M and w, we can construct a Boolean
expression w0 such that w0 is satisfiable if and only if M accepts w.
We show that for each M, w0 can be constructed from w in
polynomial-time. The polynomial will depend on M.

Without loss of generality, assume M has a single tape and has

state set {q1, q2, ... qs} and tape alphabet {X1, ..., Xm}. q1 is taken
as the initial state and q2 as the final state. X1 is taken as the blank
symbol. Let p(x) be the time complexity of M.

Suppose M has an input w of length n. M accepts w in less than or

equal to p(n) moves. Let ID0 ⊢ ID1 ⊢ ... ⊢ IDq be a sequence of
moves accepting w. q ≤ p(n) and in each ID the non-blank portion
of the tape has no more than p(n) cells.

Now, we construct a Boolean expression w0 that “simulates” a

sequence of ID’s entered by M. Each assignment of true(1) and
false(0) to the variables of w0 represents at most one sequence of
ID’s of M. It may or may not be a legal sequence. The Boolean
expression w0 will take on the value true(1) if and only if the
assignment to the variables represents a valid sequence ID0 ⊢
ID1 ⊢ ... ⊢ IDq of ID’s leading to acceptance. i.e., w0 is satisfiable if
and only if M accepts w, q ≤ p(n).

In forming the Boolean expression, the following variables are

used. We also mention how we have to interpret them.

A〈i,j,t〉 – this is 1 if the ith cell of the TM M’s tape contains the jth
tape symbol Xj at time t. We can see that 1 ≤ i ≤p(n), 1 ≤ j ≤m, 0
≤ t ≤p(n). Hence, there are O(p2(n)) such variables.

2.
3.

S〈k, t〉 – this is 1 if and only if M is in state qk at time t. Here 1

≤ k ≤s, 0 ≤ t ≤ p(n).

S〈1, 0〉 should be 1 as M is in the initial state q1 at time 0. There

are O(p(n)) such variables.

5.
6.
H〈i, t〉 - this is 1 if and only if at time t, the tape head is scanning
tape cell i. Here 1 ≤ i ≤ p(n), 0 ≤ t ≤ p(n). Therefore, there
are O(p2(n)) such variables. H〈1, 0〉 should be 1 as the tape
head of M starts on cell 1 initially.

Thus w0 uses O(p2(n)) variables. If they are represented using

binary numbers, each variable may require c log n for some
constant c. c will depend on p. This c log n factor will not affect the
polynomial complexity and hence without loss of generality, we can
assume that each Boolean variable can be represented by a single
symbol.

From the Boolean variables mentioned above, we construct the

expression w0 to represent a sequence of IDs.

It will be useful to use the following Boolean expression to simplify

the notation:

Equation 12.4.

ψ(x1, ..., xr) = 1 when exactly one of x1, ..., xr is true. If none

of x1, ..., xr is true, the first factor x1 ∨ ... ∨ xr will be 0. If two or
more x1, ..., xr is 1, then atleast one of the factors (¬xi∨ ¬xj) will be

0 and ψ will be 0. There are factors of the form (¬xi ∨ ¬xj) and

so there are factors in ψ. Hence, the length of ψ is O(r2).
If M accepts w, then there is a valid sequence of IDs, ID0⊢ ... ⊢
IDq, q ≤p(n). In each ID, the non-blank portion of the tape is less
than or equal to p(n).

To have uniformity, we assume that M runs upto time p(n) taking

each ID after IDq identical with IDq. We also consider upto p(n)
cells in the tape in each ID. The non-blank portion cannot
exceed p(n) cells but can be less than p(n) cells. In that case, we
consider blank cells upto the p(n)th cell.

There are certain conditions that should be satisfied by the IDs. We

put them together to get the expression w0. These conditions can
be mentioned as follows:

The tape head scans exactly one cell in each ID.

2.
3.

In each ID, each tape cell contains exactly one tape symbol.

4.
5.

The machine is in one state at each instance of time or in other

words each ID has only one state.

6.
7.
The modification of the content of the tape takes place only in the
cell scanned by the tape head when the machine moves from one
ID to the next ID.

8.
9.

Depending on the state and the symbol read, the move

of M defines (nondeterministically) the next state, symbol to be
printed on the cell scanned by the tape head, and the direction of
the move of the tape head.

10.
11.

The first ID, ID0 is the initial ID.

12.
13.

The final ID should have final state. (Note that we have taken q1 as
the initial state and q2 as the final state.)

14.

We now write Boolean expressions E1, E2, ..., E7 to denote these

conditions.

The first condition is translated into the Boolean expression

2.
E1 = A0 A1 ... Ap(n)

At = ψ(H 〈1, t〉, ..., H〈p(n), t〉)

At tells that at time instant t, the head is scanning only one cell.
Length of At is O(p2(n)) and hence, length of E1 is O(p3(n))
and E1 can be written in that time.

5.
6.

At time instant t, the ith cell contains only one symbol. This can be
written as:

Length of Bit is O(m2) where m is a constant. Considering the range

of i and t (1 ≤ i ≤ p(n), 0 ≤ t ≤ p(n)) E2 is of length O(p2(n)) and can
be written down in that time.

9.
10.

The fact that M can be in only one state at any instant t can be

written by the expression:

11.
Ct = ψ(S 〈1, t〉, S〈2, t〉, ..., S〈s, t〉)

12.

E3 = C0C1C2... Cp(n)

13.

Length of Ct is O(s2) where s is a constant. Hence, length

of E3 is O(p(n)).

14.
15.

From one ID to the next ID, only the symbol scanned by the head
can be changed:

16.

Dijt = (A〈i, j, t〉 ≡ A〈i, j, t + 1〉 ∨ H〈i, t〉)

17.

and

18.

Dijt means the head is scanning the ith cell at time t or the contents

of the ith cell at time t + 1 is the same as its content at time t.
Length of Dijt is constant and hence, length of E4 is O(p2(n))
considering the range of i, j, and t.

19.
20.

The next condition says that the change from one ID to the next ID
is effected by a move of M. This is represented by:

21.

22.

Here, l ranges over all possible moves of M. (M is nondeterministic

and may have many choices for the next move.)

23.

This expression means either the ith cell is not containing

the jth symbol at time t or the head is not scanning the ith cell at
time t or the state is not qk at time t or else (the state is qk, ith cell
contains Xj, and the head scans the ith cell), the next move is one
of the possible choices of δ(qk, Xj). If

24.

25.

Length of each Eijkt is constant and hence, length of E5 is O(p2(n)).

26.
27.
In the initial ID, the state is q1, the head scans the first cell and the
first n cells contain the input while the remaining cells contain blank
symbol.

28.

Remembering the fact that X1 represents the blank symbol:

29.

30.

E6 is of length O(p(n)).

31.
32.

In the final ID, the state is q2 which is given by E7 = S〈2, p(n)〉.

33.

The Boolean expression w0 is E1 ∧ E2 ∧ ... ∧ E7. E1 is of

length O(p3(n)) while other Ei’s are less than that. Hence, w0 is of
length O(p3(n)). This holds if each variable is looked at as a single
symbol. Otherwise, there will be one more (log n) factor. w0 will
be O(p3(n)log n) and hence O(np3(n)). Length of w0 is a polynomial
function of length of w. w0 can be written down in time proportional
to its length.

It is straightforward to see that given an accepting sequence of ID’s

ID0 ⊢ ID1 ⊢ ... ⊢ IDp(n), we can assign values 0 and 1 to the
variables such that w0 becomes 1. Conversely, if there is an
assignment of values 0 and 1 to the variables, which
makes w0 equal to 1, this will represent a valid sequence of moves
of M leading to the acceptance of w. Thus, w0 is satisfiable if and
only if M accepts w.

We have taken an arbitrary non-deterministic TM M.

M accepts L and hence L is in NP. From any string w in L, w0 can
be written down in polynomial time. Thus, any L in NP is
polynomial-time reducible to the satisfiability problem. Therefore,
satisfiability problem is NP-complete.

It is a known fact that any Boolean expression of finite length can

be converted to CNF and the length may increase at most by a
constant factor.

In Theorem 12.4, we considered expressions E1, ... E7. E1, E2, E3 are

in CNF. E6 and E7 contain conjunction of single literals and hence,
trivially in CNF.

E4 is the conjunction of expressions of the form (x1 ≡ x2)

∨ x3. x1 ≡ x2 means (x1 ∧ x2) ∨ (¬x1 ∧ ¬x2)

(x1 ≡ x2) ∨ x3 can be written as:

(x1 ∧ x2)∨(¬x1 ∧ ¬x2)∨x3 which can be written as (x1 ∨

¬x2∨x3)∧(¬x1 ∨ x2 ∨ x3) which is in CNF.

E5 is the product of Eijkt where each Eijkt is of finite length and can

be converted to CNF, when the length of the expression can be
increased at most by a constant factor.

So, w0 can be converted to in CNF where the length of w0 may

increase by a constant factor. Hence, can be written down in
polynomial-time of the length of w. Hence, we have the following
theorem.

Theorem 12.5

The satisfiability problem for Boolean expressions in CNF is NP-

complete (CNF - satisfiability is NP-complete).
Theorem 12.6

3-SAT is NP-complete.

Proof. A CNF is said to be in 3-SAT form if each of the clauses has

exactly three literals. We have already seen by the previous
theorem that Boolean expression can be written in CNF such
that M accepts w if and only if is satisfiable; length of is a
polynomial function of w and this polynomial depends on M. can
be written down in time proportional to its length. Now, we show
that from we can write an expression by introducing some
more variables and is satisfiable if and only if is satisfiable.
The length of will be a constant multiple of the length of .

We do this as follows:

Each clause in is of the form (x1 ∨ x2 ... ∨ xk). If k = 3, we leave
the clause as it is. If k = 1, then introduce new
variables y1 and y2 and replace x1 by (x1 ∨ y1 ∨ y2) ∧ (x1 ∨
¬y1 ∨ y2) ∧ (x1 ∨ y1 ∨ ¬y2) ∧ (x1 ∨ ¬y1 ∨ ¬y2). Whatever value we
give for y1 and y2 one of the above four clauses will have two
literals (apart from x1) false and to make the clause evaluate to
1, x1 has to be 1. If k = 2, x1 ∨ x2 is replaced by (x1 ∨ x2 ∨ y1) ∧
(x1 ∨ x2 ∨ ¬y1) by introducing new variable y1. One of the above
two clauses will have the literal apart from x1 and x2 as 0
whether y1 = 0 or 1. Hence, to make the expression evaluate to
1, x1 ∨ x2 must evaluate to 1.

If k ≥ 4, x1 ∨ x2 ... ∨ xk is replaced by:

Equation 12.5.
We have introduced new variables y1, ..., yk −3. We show
that x1 ∨ x2 ... ∨ xk evaluates to 1 if and only if Equation (12.5)
evaluates to 1 for some assignment of the new variables.

Suppose x1 ∨ ... ∨ xk evaluates to 1. Then, let xi be 1. Then,

assign y1 = y2 = ... = yi−2 = 1 and yi−1 = yi = ... = yk−3 = 0. Then,
each clause in Equation (12.5) becomes 1 and Equation (12.5)
evaluates to 1.

Next, we show if Equation (12.5) evaluates to 1 for some

assignment of x1, ..., xk, y1, ... yk−3, then x1 ∨ ... ∨ xk evaluates to
1. Suppose x1 or x2 is 1, then x1 ∨ ... ∨ xk = 1. Hence, consider the
case when x1 = 0 and x2 = 0. In this case y1 = 1 to make Equation,
(12.5) evaluate to 1. Similarly, if xk − 1 or xk = 1, x1∨ ... ∨ xk = 1.
Hence, consider the case when xk − 1 = xk = 0. In this case, yk −
3 should be 0 to make the last clause equal to 1. So, in the case
when x1 = x2 = xk − 1 = xk = 0 y1 = 1 and yk − 3 = 0. So, for some i,
yi = 1 and yi + 1 = 0. In this case, in order that the clause (xi + 2 ∨
¬yi∨ yi + 1) becomes 1, xi + 2 must be 1. Hence, x1 ∨ ...
∨ xk evaluates to 1.

So, given a clause z = x1∨ ... ∨ xk in CNF k ≥ 4, we can introduce

new variables y1, ..., yk−3 and write an expression z′ in 3-CNF such
that, if there is an assignment of variables which makes z = 1, there
is an assignment of variables which makes z′ = 1 and vice versa.
Also the length of z′ is at most a constant multiple of z. Hence, we
conclude 3-SAT is NP-complete.

When we want to show a new problem ∏ (corresponding

language L) to be NP-complete, we have to show that ∏ (L) is
in NP and that some known NP-complete problem ∏0(L0) is
polynomial-time reducible to ∏ (L). Suppose and .
There is a polynomial-time DTM, which converts a string x in to
a string y in such that y ∊ L if and only if x ∊ L0.

We next show that the clique problem is NP-complete by showing

that CNF satisfiability is polynomial-time reducible to the clique
problem. A clique is a complete subgraph of a graph. Clique
problem may be stated as follows “Does an undirected
graph G have a clique of size k?”. We have to represent the
graph G as a string. This can be done by listing the edges of G. k is
also an input. If dG is an encoding of G, k#dG is an encoding of the
clique problem.

Theorem 12.7

Clique problem is NP-Complete.

Proof

Clique problem is in NP.

We can have a NTM which nondeterministically selects k vertices

of G and checks whether edges exist between every pair of
these k vertices. It is straightforward to see that this checking can
be done in polynomial time. Hence, clique problem is in NP.

3.
4.

Next, we show that CNF-satisfiability is polynomial-time reducible

to the clique problem. Given an instance of expression in CNF
with k clauses, we construct a graph, which has a clique of size k if
and only if the CNF expression is satisfiable. Let e = E1 ... Ek be a
Boolean expression in CNF. Each Ei is of the form (xi1 ∨ ... ∨ xik ) i

where xij is a literal. Construct an undirected graph G = (V, E)

whose vertices are represented by pairs of integers [i, j] where 1
≤ i ≤ k and 1 ≤ j ≤ ki. The number of vertices of G is equal to the
number of literals in e. Each vertex of the graph corresponds to a
literal of e. The edges of G are these pairs [i, j], [k, l] where i ≠
k and xij ≠ ¬xkl. i.e., xij and xkl are such that if one is variable y the
other is not ¬y. If one is y and the other is ¬y, we cannot assign
values independently to xij and xkl. To enable independent
assignment of values to xij and xkl we have the condition that xij ≠
¬xkl.

The number of vertices of G is less than the length of e and the
number of edges will be atmost the square of it. Thus, G can be
encoded as a string whose length if bounded by a polynomial in the
length of e and can be computed in time bounded by a polynomial
in the length of e. We next show that G has a clique of size k if and
only if e is satisfiable.

If e is satisfiable, then G has a clique of size k. If e is satisfiable,

there is a literal in each clause which takes the value 1. Consider
the subgraph of G whose vertices correspond to these literals.
The k vertices have their first components 1, ..., k. (No two of them
will have the same first component). We see that these vertices
[i, mi] 1 ≤ i ≤ k form a clique. If not, there must be two vertices [i, mi]
[j, mj] i≠j such that there is no edge between them. This can
happen only if xim = ¬ xjm . If xim =1, xjm = 0 and vice versa. But, we
i j i j

have chosen xim and xjm , such that each is equal to 1. Hence xim =

i j i

¬ xjm is not possible and there are edges between every pair of
j

vertices of this set.

2.
3.

If G has a clique of size k, then e is satisfiable. Let [i, mi], 1 ≤ i ≤ k,

form a clique. By our construction, the vertices will have such
labels. No two of them will have the same first component. The
vertex [i, mi] corresponds to a literal xmi in the ith clause. This literal
may be a variable y or the negation of a variable ¬y. If xmi is a
variable assign the value 1 to it. If xmi is the negation of a variable,
assign the value 0 to it. As there is an edge between every pair of
vertices, xim ≠ ¬ xjm . So, we can consistently assign values to these
i j

variables. This will make the literal Xmi 1 and each clause will
evaluate to 1 and so will e. So e will be satisfiable.

Thus, we have a polynomial-time reduction of the CNF-satisfiability

problem to the clique problem. Therefore, the clique problem is NP-
complete.

Example 12.1.

Let us illustrate the construction in the above theorem with an

example. Let e be (p1 ∨ p2 ∨ p3)∧(¬p1 ∨ ¬p3)∧(¬ p2 ∨ ¬ p3). The
corresponding graph G will be:

p1 = 1 p2 = 1 p3 = 0 is an assignment which satisfies the given

expression. [1, 1], [2, 2], [3, 2] correspondingly form a clique of
size 3.
So far, thousands of problems have been proved to be NP-
complete. Still more and more problems are being proved to be
NP-complete. They are from different fields like automata theory,
graph theory, set theory, and so on. We list a few of them (without
proof).

Some NP-complete problems:

1.
Vertex cover: Let G = (V, E) be a graph. A vertex cover of G is a
subset S ⊆ V such that each edge of G is incident upon some vertex
in S.
2.
Problem: Does an undirected graph have a vertex cover of size k?
3.
4.
Hamiltonian circuit: A Hamiltonian circuit is a cycle
of G containing every vertex of V.
5.
Problem: Does an undirected graph have a Hamiltonian circuit?
6.
7.
Set cover: Given a family of sets S1, S2, ..., Sn does there exist a
subfamily of k sets Si1, Si2,..., Sik such that:
8.

9.
10.
Regular expression inequivalence: Given two regular
expressions E1 and E2 over the alphabet Σ, do E1 and E2 represent
different languages?
11.
12.
3-Dimensional matching: Given a
set M ⊆ W × X × Y where W, X, and Y are disjoint sets having the
same number q of elements, does M contain a subset M′
of q elements and no two elements of M′ agree in any coordinate?
13.
14.
Integer programming: Given a finite set X of pairs ( , b) where is
an m-tuple of integers, and b is an integer, an m-tuple of
integers, and an integer B, is there an m-tuple of integers such
that:
15.

, ≤ b for all ( , b) ∊ X and c.y ≥ B


If = (x1,...,xm) and = (y1, ..., ym), . = x1y1 + x2y2 + ... +xmym).


Beyond NP-completeness
We next see some hierarchy of problems. Let NPC denote the
family of NP-complete problems.

Definition 12.6

A problem is NP-hard if every problem in NP is polynomial-time

reducible to it. It is NP-easy if it is polynomial-time reducible to
some problem in NP.

Definition 12.7

The class co − NP is composed of the complements of the problem

in NP.

co − NP = {Σ* − L: L is over the alphabet Σ and L ∊ NP}



Let Co-NPC denote the family of Co-NP-complete problems.

NPC and co-NPC represent the set of NP-complete and co-NP-

complete problems, respectively.

It is believed that co − NP ≠ NP. This is a stronger conjecture

than P ≠ NP. This is because NP ≠ co − NP implies P ≠ NP, while
we can have NP = co − NP and still we can have P ≠ NP.

We know that PSPACE = NSPACE by Savitch’s theorem.

Considering the hierarchy with respect to PSPACE, we have,

Definition 12.8
A language L is PSPACE complete if

L is in PSPACE and

2.
3.

any language in PSPACE is polynomial-time reducible to L.

4.
We state a few results without proof.

Quantified Boolean formulas (QBF) are built from variables, the

operators ∧, ∨ and ¬, parentheses, and the quantifiers ∀ (for all)
and ∃ (there exists). When defining the QBFs inductively, we have
to define free variables and bound variables. Consider:


∀ x(P(x) ∧ Q(y))

Here, x is a bound variable bounded by ∀ x, y is a free variable.
When an expression has free variables like R(y, z), it is called a
predicate, for example,


S(x, y, z): z = x + y

The variables can be bound by giving values to them or by using
quantifiers. If all variables are bound, the predicate becomes a
proposition and assumes value true (1) or false (0).

For example, if we assign 2 to z, 3 to x and 4 to y, S(x, y, z) = S(2, 3,

4): 2 = 3 + 4 which is false.
∀ x ∀ y ∃ z, S(x, y, z) is a proposition where all variables are bound
by quantifiers. This means for all x and for all y (taking the
underlying set as the set of integers), there is z such
that z = x + y. This is true. The QBF problem is to determine
whether a QBF with no free variables has the value true.

Theorem 12.8

QBF is in PSPACE and is a PSPACE complete problem.

Context-sensitive recognition problem is stated as: Given a

CSG G and a string w, is w in L(G)?

Theorem 12.9

The CSL recognition problem is PSPACE complete.

Some problems have been proved to require exponential amount

of time for their solution.

Definition 12.9

An extended regular expression over an alphabet Σ is defined as

follows:

ε,φ and a, for a in Σ are extended regular expressions

denoting {ε}, the empty set and {a}, respectively.

2.
3.
If R1 and R2 are extended regular expressions denoting the
languages L1 and L2, respectively, then (R1 + R2), (R1 .R2), ( ),
(R1 ∩ R2) and (¬R1) are extended regular expressions denoting
L1 ∪ L2, L1L2, , L1 ∩ L2 and Σ* − L1, respectively.

Redundant pairs of parentheses may be deleted from extended

regular expressions if we assume the operators have the following
increasing order of precedence.

+∩¬.*

Definition 12.2.10

Let us define the function g(m, n)by:

g(0, n) = n

2.
3.

g(m, n) = 2g(m−1, n) for m>0

g(1, n) = 2n, g(2, n) = 22n and

g(m, n) = 222...n

A function f (n) is elementary if it is bounded above for all but a
finite set of n’s by g(m0, n)for some fixed m0.

Theorem 12.10

Let S(n) be any elementary function. Then, there is no S(n) space

bounded (hence S(n) time bounded) deterministic TM to decide
whether an extended regular expression denotes the empty set.

Many more complexity classes can be defined and this leads to the
introduction of ‘complexity theory.’ Oracle computations have also
been defined. ‘Complexity Theory’ is a field of study by itself and is
beyond the scope of this book.

Problems and Solutions

1. Show that DT(22n+2n) properly includes DT(22n).

Solution
.
By result 2 mentioned in “Some Results on Complexity Classes,” DT(22n+2n) pr
DT(22n).

2. What is the relationship between:

1.
DS(n2) and DS(n3)?
2.
3.
DT(2n) and DT(3n)?
4.
5.
NS(2n) and DS(5n)?
6.
Solution
a.
.

By result 1 mentioned in “Some Results on Complexity Classes,” DS(n3) pro

includes DS(n2).

b
.

By result 2 DT(2n) is properly included in DT(3n).

c. (c) NS(2n) is included in DS((2n)2) = DS(4n) (result 4) DS(4n) is properly

DS(5n) (result 1) Therefore, NS(2n) is properly included in DS(5n).

3. Show that the following problem is NP-complete. One-in-Three-3SAT (1in3S

description as 3SAT, except that a satisfying truth assignment must set exactly
each clause.

Solution One-in-Three-3SAT (1in3SAT)

. Is the given Boolean expression, given in 3SAT, satisfiable with the additional
truth assignment must set exactly one literal to true in each clause.
1in3SAT is in NP, as we can non-deterministically give a truth assignment and
expression and also check the additional condition on truth assignment in poly
We next reduce 3SAT to 1in3SAT. Let w0 be a Boolean expression in 3SAT fo
construct a Boolean expression for the 1in3SAT problem as follows. To ea
Equation 12.6.

of w0 introduce new variables a, b, c, d and convert this to:

Equation 12.7.

If Equation (12.6) is not satisfiable x1 = x2 = x3 = 0. In this case, looking at Eq

lin3SAT problem . Hence, a = b = c = d = 0. Since x2 = 0. (b + x2 +
Equation (12.7) is not satisfiable or in otherwords, if Equation (12.7) is satisfia
is satisfiable.
To prove the other way round, let Equation (12.6) be satisfiable.
If x2 = 1, then make b = 0 = c, a = x1, d = x3.
Then, Equation (12.7) satisfies the additional restriction and is satisfiable.
If x2 = 0, then if x1 = x3 = 1 make a = 1, b = 0, c = 1, d = 0.
If x1 = 1, x2 = 0, x3 = 0 make a = 0, b = 1, c = 0, d = 0.
If x1 = 0, x2 = 0, x3 = 1 make a = 0, b = 0, c = 1, d = 0.
In all cases if Equation (12.6) is satisfiable Equation (12.7) is satisfiable.
If w0 has k clauses and n variables, ’ has 3k clauses and n + 4k variables. w0
to ’ in polynomial time.
Hence, the known NP-complete problem 3SAT is polynomial-time reducible t
Hence, 1in3SAT is NP-complete.

4. Positive 1in3SAT problem is the 1in3SAT problem, where none of the literals
that it is NP-complete.

Solution Positive 3SAT problem is: Given a Boolean expression in 3SAT form, where n
. are negated, is the expression satisfiable?
This is a trivial problem as the assignment giving values 1 to all the variables m
satisfiable.
It is straightforward to see that positive 1in3SAT problem is in NP. It can be s
complete by reducing the 1in3SAT problem to it.
Let w0 = c1... ck be a Boolean expression in 3SAT form. From this, we constru
3SAT form where none of the literals are negated such that w0 is satisfiable wi
if and only if w is satisfiable with the 1in3 condition.
Also, given w0, w can be constructed in polynomial time.
Let x1, ..., xn be the variables used in w0. There are k clauses in w0. In w, we us
variables x1, ..., xn and also variables y1, ..., yn and z1, ..., zk, z′1, ..., z′k, z″1, ..., z
We shall now give the method of constructing w from w0.
For each clause ci in w0 some clauses c′i are constructed.
1.
If ci is of the form (xk + xl + xm) with no literal negated, this is kept as it is in c
2.
3.
If ci is of the form (¬xk + xl + xm) with one literal negated, c′i consists of (yk + x
(x1 + z′i + zi)(xm + z″i + zi) (yk in essence refers to ¬ xk). If (¬xk + xl + xm) is sat
condition, then c′i is satisfiable with the 1in3 condition and vice versa. It shoul
of yk or xk will be 0 and the other 1 and zi has to be 0. The values of z′i, z″i can
satisfy the 1in3 condition.
4.
5.
If ci is of the form (¬xk + ¬xl + xm) with two of the literals negated, c′i consists
(yk + xk + zi)(yl + xl + zi)(xm + z′i + zi).
6.
Again, it can be checked that if ci is satisfiable with the 1in3 condition then c′i
1in3 condition and vice versa. Note that one of yk, xk is 0 and the other 1 as als
the other 1 and zi = 0. z′i assumes suitable value to satisfy the 1in3 condition.
7.
8.
If ci is of the form (¬xk+¬xl+¬xm) will all three literals negated, c′i consists of (
(yk + xk + zi)(yl + xl + zi)(ym + xm + zi). Again, it can be checked that if ci is sati
condition c′i is satisfiable with the 1in3 condition and vice versa. Also zi = 0 an
l, m) is 0 and the other 1, and it is straightforward to see that w′ ca
from w0 in polynomial time. Hence, positive 1in3SAT is NP-complete.
9.

5. Exact cover by three-sets (X3C): Given a set with 3n elements for some natura
collection of subsets of the set, each of which contains exactly three elements,
collection n subsets that together cover the set?
Show that this problem is NP-complete.

Solution Membership in NP is obvious. Select a set of n three-sets and check whether th

. set. To show X3C is NP-complete, we reduce positive 1in3SAT to it. Let us co
positive 1in3SAT with n variables and k clauses. This instance of positive 1in3
converted to a X3C instance with 18k elements and 15k three-sets such that po
instance is satisfiable if and only if the X3C instance has a ‘yes’ answer. i.e., a
with n three-sets is present.
For each clause c = (x + y + z) we set up six elements xc, yc, zc, tc, f′c, f″c. The f
represent the three literals, while the other three will distinguish the true literal
literals. For each variable, we construct a component with two attaching points
to true and the other to false) for each of its occurrences. See Figure 12.2.
Figure 12.2. x is the variable occuring 4 times

Let variable x occur nx times in the Boolean expression. We set up 4nx elemen

be used as attaching points while the other will ensure consistency. Call the att
and for 1 ≤ i ≤ nx; call the other point , 1 ≤ i ≤ 2nx. Now, we construct thr
component associated with variable x has 2nx sets:

Suppose there are n variables x1, ..., xn, xi occurs nxi times say. Then, nx1 + nx2 +

are k clauses. So the number of three-sets formed is 2(nx1 + ...+nxn)=6k.
Also, corresponding to each clause c = (x + y + z), we make nine three-sets, thr
The first of these sets, if picked for the cover indicates that the associated litera
true in the clause; for literal x in clause c, this set is for some attach
the other two is picked, the associated literal is set to false; for literal x, they
are .
So for (x + y + z) we have:
for some i, j, k.
So, totally we have 6k + 9k = 15k three sets. We have element and for 1
number of such elements is 2 * 3k = 6k. Number of elements of the for , 1 ≤
2.3k = 6k. Hence, totally we have 18k elements.
Note that, for each variable x, the element can be covered only by one of th
sets or . If the first is chosen, then the second cann
the element , must be covered by the only other three-set in which it appear
namely, . Continuing this chain of reasoning, we see that the c
entirely determines the cover for all . In this,
1.
all are included and all are excluded or
2.
3.
all are included and all are excluded. Thus, a covering of the component
variables corresponds to a legal truth assignment, where the concerned elemen
complement values. Considering the components associated with clauses, we n
of the nine sets must be selected for the cover. Whichever set is selected to cov
element, tc must include a true literal, thereby ensuring that at least one literal
other two sets chosen cover f′c and f″c and thus, contain one false literal each, e
one literal per clause is true. It can be seen that the construction of the sets can
polynomial time, given an instance of positive 1in3SAT.
4.
Note that there are 18k elements and 15k three-sets. Out of the 15k three-sets, w
get an exact cover. For each clause, 3 three-sets are chosen amounting to 3k th
variable x occurring nx times, nx three-sets are chosen involving elements of th
number of such three-sets is . Hence, totally we select 3k + 3k = 6k t
choice they are disjoint and hence form an exact cover.

Exercises
1 The notion of a crossing sequence – the sequence of states in which the boundary betw
. crossed – was introduced in Chapter 6 with regard to two-way FSA. The notion can be
Prove the following about crossing sequences.
1.
The time taken by single-tape TM M on input w is the sum of the lengths of the crossi
between each two cells of M’s tape.
2.
3.
Suppose M is a single tape TM which accepts an input after reading the whole input a
accepting state when the tape is in a cell to the right of the cells where the input was o
Show that if M accepts input w1w2, and the crossing sequence between w1 and w2 is th
between x1 and x2 when M is given input x1, x2, then M accepts x1w2.
4.

2 Discuss how many steps a single-tape TM will require to accept

. 1.
{wcwR|w ∊ {a, b}*}
2.
3.
{wcw|w ∊ {a, b}*}
4.

3 Show that the following functions are fully time and space constructible
. 1.
n2
2.
3.
2n
4.
5.
n!
6.

4 Show that the following problems are NP-complete. Let G = (V, E) be an undirected g

. 1.
A vertex cover of G is a subset S ⊆ V such that each edge of G is incident upon some
2.
Does an undirected graph have a vertex cover of size k, |S| = k?
3.
4.
A Hamilton circuit is a cycle of G containing every vertex of V. Does an undirected gr
circuit?
5.
6.
G is k-colorable if there exists an assignment of the integers 1, 2, ..., k, called “colors,”
of G such that no two adjacent vertices are assigned the same color. The chromatic nu
smallest integer k such that G is k-colorable. Is an undirected graph k-colorable?
7.
Let G″ = (V′, E′) be a directed graph.
1.
A feedback vertex set is a subset S′ ⊆ V′ such that every cycle of G′ contains a vertex
2.
Does a directed graph have a feedback vertex set with k members?
3.
4.
A feedback edge set is a subset F′ ⊆ E′ such that every cycle of G′ contains a edge in
5.
Does a directed graph have a feedback edge set with k members?
6.
7.
A directed Hamilton circuit is a cycle containing every vertex of V.
8.
Does a directed graph have a directed Hamilton circuit?
9.

5 Consider the following problems:

. 1.
Not-all-equal 3SAT(NAE3SAT) has the same description as 3SAT, except that a satis
assignment may not set all three literals of any clause to true. This constraint results in
problem: the complement of a satisfying truth assignment is a satisfying truth assignm
2.
3.
Maximum cut (MxC): Given an undirected graph and a positive integer bound, can th
partitioned into two subsets such that the number of edges with endpoints in both subs
the given bound?
4.
5.
Partition: Given a set of elements, each with a positive integer size, and such that the s
sizes is an even number, can the set be partitioned into two subsets such that the sum o
elements of one subset is equal to that of the other subset?
6.
7.
Art Gallery (AG): Given a simple polygon, P, of n vertices and a positive integer boun
most B “guards” be placed at vertices of the polygon in such a way that every point in
polygon is visible to at least one guard? (A simple polygon one with a well-defined in
visible to a guard if and only if the line segment joining the two does not intersect any
polygon).
8.

Show that these problems are NP-complete.

Chapter 13. Recent Trends and

Applications

Regulated Re-writing
In a given grammar, re-writing can take place at a step of a
derivation by the usage of any applicable rule in any desired place.
That is, if A is a nonterminal occurring in any sentential form
say αAβ, the rules being A → γ, A → δ, then any of these two rules
are applicable for the occurrence of A in αAβ. Hence, one
encounters nondeterminism in its application. One way of
naturally restricting the nondeterminism is by regulating devices,
which can select only certain derivations as correct in such a way
that the obtained language has certain useful properties. For
example, a very simple and natural control on regular rules may
yield a non-regular language.

While defining the four types of grammars in Chapter 2, we put

restrictions in the form of production rules to go from type 0 to
type 1, then to type 2 and type 3. In this chapter, we put
restrictions on the manner of applying the rules and study the
effect. There are several methods to control re-writing. Some of the
standard control strategies will be discussed below.

Matrix Grammar
A matrix grammar is a quadruple G = (N, T, P, S) where N, T,
and S are as in any Chomsky grammar. P is a finite set of
sequences of the form:
m = [α1 → β1, α2 → β2, ..., αn → βn]

n ≥ 1, with αi ∊ (N ∪ T)+, βi, ∊ (N ∪ T)*, 1 ≤ i ≤ n. m is a member

of P and a ‘matrix’ of P.

G is a matrix grammar of type i, where i ∊ {0, 1, 2, 3}, if and only if

the grammar Gm = (N, T, m, S) is of type i for every m ∊ P.
Similarly, G is ε-free if each Gm is ε-free.

Definition 13.1

Let G = (N, T, P, S) be α matrix grammar. For any two strings u, v
∊ (N ∪ T)+, we write u ⇒ v (or u ⇒ v if there is no confusion on
G), if and only if there are strings u0, u1, uG2, ..., un in (N ∪ T)+ and
a matrix m ∊ M such that u = u0, un = v and

ui−1 = u′i−1xiu″i−1, ui = u′i−1yiu″i−1

for some u′i−1, u″i−1 for all 0 ≤ i ≤ n − 1 and xi → yi ∊ m, 1 ≤ i ≤ n.

Clearly, any direct derivation in a matrix grammar G corresponds to

an n-step derivation by Gm = (N, T, m, S). That is, the rules in m
are used in sequence to reach v. is the reflexive, transitive
closure of ⇒ and

L(G) = {w/w ∊ T*, S w}

Definition 13.2

Let G = (N, T, P, S) be a matrix grammar. Let F be a subset of

rules of M. We now use the rules of F such that, the rules in F can
be passed over if they cannot be applied, whereas the other rules
in any matrix m ∊ P, not in F must be used. That is, for u, v ∊
(N ∪ T)+, , if and only if there are strings u0, u1, ..., un and a
matrix m ∊ M with rules {r1, r2, ..., rn} (say), with ri: xi → yi, 1
≤ i ≤ n. Then, either ui−1 = u′i−1xiu″i−1, ui = u′i−1yiu″i−1 or the rule
xi → yi ∊ F. Then ui = ui−1.

This restriction by F on any derivation is denoted as ,

where ‘ac’ stands for ‘appearance checking’ deviation mode. Then,

L(G, F) = {w/S w, w ∊ T*}

Let M(Mac) denote the family of matrix languages without

appearance checking (with appearance checking) of type 2 without
ε-rules.

Let denote the family of matrix languages without

appearance checking (with appearance checking) of type 2 with ε-
rules.

Example 13.1.

Let G = (N, T, P, S) be a matrix grammar where

N = {S, A, B, C, D}

T = {a, b, c, d}

P = {P1, P2, P3, P4}, where

P1: [S → ABCD]

P2: [A → aA, B → B, C → cC, D → D]

P3: [A → A, B → bB, C → C, D → dD]

P4: [A → a, B → b, C → c, D → d]
Some sample derivations are:

We can see that the application of matrix P2 produces an equal

number of a’s and c’s, application of P3 produces an equal number
of b’s and d’s. P4 terminates the derivation. Clearly, L(G) =
{anbmcndm|n, m ≥ 1}.

The rules in each matrix are context-free, but the language

generated is context-sensitive and not context-free.

Example 13.2.

Let G = (N, T, P, S) be a matrix grammar with

N = {S, A, B, C}

T = {a, b}

P = {P1, P2, P3, P4, P5},

where

P1: [S → ABC]

P2: [A → aA, B → aB, C → aC]

P3: [A → bA, B → bB, C → bC]

P4: [A → a, B → a, C → a]

P5: [A → b, B → b, C → b]

Some sample derivations are:

Clearly, L(G) = {www|w ∊ {a, b}+}.

PROGRAMMED GRAMMAR
A programmed grammar is a 4-tuple G = (N, T, P, S), where N, T,
and S are as in any Chomsky grammar. Let R be a collection of re-
writing rules over N ∪ T, lab(R) being the labels of R. σ and ϕ are
mappings from lab(R) to 2lab(R)

P = {(r, σ(r), ϕ(r))|r ∊ R}

Here, G is said to be type i, or ε-free if the rules in R are all type i,

where i = 0, 1, 2, 3 or ε-free, respectively.

Definition 13.3

For any x, y over (N ∪ T)*, we define derivation as below:

(u, r1) ⇒ (v, r2) if and only if u = u1xu2, v = u1yu2 for u1, u2 are

over N ∪ T and (r1: x → y, σ(r1), ϕ(r1)) ∊ P and r2 ∊ σ (r1) and

2.
3.

if and only if (u, r1) ⇒ (v, r2) holds, or else u = v if

r1: (x → y, σ (r1), ϕ(r1)) is not applicable to u, i.e., x is not a
subword of u and r2 ∊ ϕ(r1). Thus, only depends on ϕ.

Here, σ(r) is called the success field as the rule with label r is used
in the derivation step. ϕ(r) is called the failure field as the rule with
label r cannot be applied and we move on to a rule with label in
ϕ(r). , are the reflexive and transitive closures of ⇒, and
, respectively.

The language generated is defined as follows:

L(G, σ) = {w|w ∊ T*, (S1, r1) (w, r2) for some r1, r2 ∊ lab(P)}.




L(G, σ, ϕ) = {w|w ∊ T*, (S1, r1) (w, r2) for some r1, r2 ∊ lab(P)}.

Let P(Pac) denote the family of programmed languages without

(with) appearance checking of type 2 without ε-rules.

Let denote the family of programmed languages without

(with) appearance checking of type 2 with ε-rules.

Example 13.3.

Let G = (N, T, P, S) be a programmed grammar with

N = {S, A, B, C, D}

T = {a, b, c, d}

P:
r σ(r)

1. S → ABCD 2, 3, 6

2. A → aA 4

3. B → bB 5

4. C → cC 2, 3, 6

5. D → dD 2, 3, 6

6. A → a 7

7. B → b 8

8. C → c 9

9. D → d φ

Let lab(F) = {1, 2, 3, 4, 5, 6, 7, 8, 9}.

Some sample derivations are:

L(G) = {anbmcndm|n, m ≥ 1}

Example 13.4.

Let G = (N, T, P, S) be a programmed grammar with:

N = {S, A, B, C}

T = {a, b}

r σ
1. S → ABC 2, 5, 8, 11

2. A → aA 3

3. B → aB 4

4. C → aB 2, 5, 8, 11

5. A → bA 6

6. B → bB 7

7. C → cB 2, 5, 8, 11

8. A → a 9

9. B → a 10

10. C → a φ

11. A → b 12

12. B → b 13

13. C → b φ

L(G) = {www|w ∊ {a, b}+}.

RANDOM CONTEXT GRAMMAR

A random context grammar has two sets of nonterminals X,
Y where the set X is called the permitting context and Y is called
the forbidding context of a rule x → y.

Definition 13.4

G = (N, T, P, S) is a random context grammar where N, T, S are as

in any Chomsky grammar, where
P = {(x → y, X, Y)|x → y is a rule over N ∪ T, X, Y are subsets of
N}.

We say u ⇒ v if and only if u = u′xu″, v = u′yu″ for u′, u″ over N ∪ T

and (x → y, XG, Y) ∊ P such that all symbols X appear in u′u″ and
no symbol of Y appears in u′u″. is the reflexive, transitive
closure of ⇒.

L(G) = {w: S w, w ∊ T*}.

As before, L is of type i, whenever G with rules x → y in P are of

type i, i = 0, 1, 2, 3, respectively.

Example 13.5.

Consider the random context grammar G = (N, T, P, S), where:

N = {S, A, B, C}

T = {a}

P = {(S → AA, φ, {B, D}),

(A → B, φ, {S, D}),

(B → S, φ, {A, D}),

(A → D, φ, {S, B}),

(D → a, φ, {S, A, B})}.

Some sample derivations are:

S ⇒ AA ⇒ DA ⇒ DD ⇒ aD ⇒ aa

S ⇒ AA ⇒ BA ⇒ BB ⇒ SB ⇒ SS
⇒ AAS ⇒ AAAA a4

L(G) = {a2n|n ≥ 1}.

TIME-VARYING GRAMMAR
Given a grammar G, one can think of applying set of rules only for
a particular period. That is, the entire set of rules in P is not
available at any step of a derivation. Only a subset of P is available
at any time ‘t’ or at any i-th step of a derivation.

Definition 13.5

A time-varying grammar of type i, 0 ≤ i ≤ 3, is an ordered pair (G,

φ) where G = (N, T, P, S) is a type i grammar, and φ is a mapping
of the set of natural numbers into the set of subsets of P. (u, i) ⇒
(v, j) holds if and only if:

j = i + 1 and

2.
3.

there are words u1, u2, x, y over N ∪ T such that

u = u1xu2, v = u1yu2 and x → y is a rule over N ∪ T in φ(i).

be the reflexive, transitive closure of ⇒ and

A language L is time varying of type i if and only if for some time-
varying grammar (G, φ) is of type i with L = L(G, φ).

Definition 13.6

Let (G, φ) be a time-varying grammar. Let F be a subset of the set

of productions P. A relation on the set of pairs (u, j), where u is a
word over N ∪ T and j is a natural number which is defined as
follows:

holds, if

(u, j1) ⇒ (v, j2) holds, or else,

j2 = j1 + 1, u = v, and for no production

x → y in F ∩ φ(j1), x is a subword of u.

is the reflexive, transitive closure of . Then, the

language generated by (G, φ) with appearance
checking for productions in F is defined as:

The family of languages of this form without appearance checking

when the rules are context-free (context-free and ε-free) and φ is a
periodic function are denoted as Iλ and I, respectively. With
appearance checking, they are denoted as
and, Iac, respectively.

Example 13.6.
Let (G, φ) be a periodically time-varying grammar with

G = (N, T, P, S), where

N = {S, X1, Y1, Z1, X2, Y2, Z2}

T = {a, b}

P = φ (1) ∪ φ (2) ∪ φ (3) ∪ φ (4) ∪ φ (5) ∪ φ (6) where

φ (1) = {S → a X1 aY1aZ1, S → bX1bY1bZ1, X1 → X1, Z2 → Z2}

φ (2) = {X1 → aX1, X1 → bX2, X2 → aX1, X2 → bX2, X1 → ε,X2 → ε}

φ (3) = {Y1 → aY1, Y1 → bY2, Y2 → aY1,Y2 → bY2, Y1 → ε, Y2 → ε}

φ (4) = {Z1 → aZ1, Z1 → bZ2, Z2 → aZ1, Z2 → bZ2, Z1 → ε, Z2 → ε}

φ (5) = {X2 → X2, Y1 → Y1}

φ (6) = {Y2 → Y2, Z1 → Z1}

Some sample derivations are:

(S, 1) ⇒ (aX1aY1aZ1, 2) ⇒ (aaY1aZ1, 3) ⇒ (aaaZ1, 4) ⇒ (aaa, 5)

(S, 1) ⇒ (bX1bY1bZ1, 2) ⇒ (baX1bY1bZ1, 3) ⇒ (baX1baY2bZ1, 4)

⇒ (baX1baY1baZ1, 5) ⇒ (baX1baY1baZ1, 6) ⇒ (baX1baY1baZ1, 7)

⇒ (baX1baY1baZ1, 8) ⇒ (babaY1baZ1, 9) ⇒ (bababaZ1, 10) ⇒ (bababa, 11)

Here, L(G, φ) = {www|w ∊ {a, b}+}.

Example 13.7.

Let (G, φ) be a periodically time-varying grammar with

G = (N, T, P, S), where:

N = {A, B, C, D, S, A1, A2, B1, B2, C1, C2, D1, D2}

T = {a, b, c, d}

, where

φ (1) = {S → aAbBcCdD, D1 → D, A2 → A}

φ (2) = {A → aA1, A1 → A2, A → ε}

φ (3) = {B → B1, B → bB2, B → ε}

φ (4) = {C → cC1, C → C2, C → ε}

φ (5) = {D → D1, D → dD2, D → ε}

φ (6) = {A1 → A, B2 → B}

φ (7) = {B1 → B, C2 → C}

φ (8) = {C1 → C, D2 → D}

L(G, φ) = {anbmcndm|n, m ≥ 1}.

REGULAR CONTROL GRAMMARS

Let G be a grammar with production set P and lab(P) be the labels
of productions of P. To each derivation D, according to G, there
corresponds a string over lab(P) (the so-called control string).
Let C be a language over lab(P). We define a language L generated
by a grammar G such that every string of L has a derivation D with
a control string from C. Such a language is said to be a controlled
language.

Definition 13.7

Let G = (N, T, P, S) be a grammar. Let lab(P) be the set of labels
of productions in P. Let F be a subset of P. Let D be a derivation of
G and K be word over lab(P). K is a control word of D, if and only if
one of the following conditions are satisfied:
1.

for some string u, v, u1, u2, x, y over N ∪ T, D: u ⇒ v and K = f,

where u = u1xu2, v = u1yu2 and x → y has a label f.

2.
3.

for some u, x, y, D is a derivation of a word ‘u’ only and K = ε or

else K = f, where x→ y has a label f ∊ F and x is not a subword of
u.

4.
5.

for some u, v, w, K1, K2, D is a derivation , where

K = K1K2 and uses K1 as control string and uses
K2 as control string.

Let C be a language over the alphabet lab(P). The language

generated by G with control language C with appearance checking
rules F is defined by:

If F = φ, the language generated is without appearance checking

and denoted by L(G, C).

Whenever C is regular and G is of type i, where i = 0, 1, 2, 3, then

G is said to be a regular control grammar of type i.
Let L(i, j, k) denote a family of type i languages with type j control
with k = 0, 1. k = 0 denotes without appearance checking; k =
1 denotes with appearance checking.

Example 13.8.

Let G = (N, T, P, S) be a regular control grammar where

N = {A, B, C, D, S}

T = {a, b, c, d}

1. S → ABC

2. A → aA

3. B → bB

4. C → cC

5. D → dD

6. A → a

7. B → b

8. C → c

9. D → d

Then, lab(P) = {1, 2, 3, 4, 5, 6, 7, 8, 9}

Let K = 1(24)(35)6789. Clearly, K is regular. Then,

L(G, K) = {anbmcndm|n, m ≥ 1}

Some sample derivations are:

For u = 124356789 ∊ K,

If u = 124246789 ∊ K,

Example 13.9.

Let G = (N, T, P, S) be a grammar with:

N = {S, A, B, C}

T = {a, b}

1. S → ABC

2. A → aA

3. B → aB

4. C → aC

5. A → bA

6. B → bB

7. C → bC

8. A → a

9. B → a

10. C → a

11. A → b

12. B → b
13. C → b

and lab(P) = {1, 2, ..., 13}.

Let K = 1(234 + 567)*(89(10) + (11)(12)(13)) be a regular control

on G.

L(G, K) = {www|w ∊ {a, b}+}

Theorem 13.1

The family of languages generated by type i grammars, with

regular control and with appearance checking is equal to Li, for i =
0, 3.

The following table consolidates the inclusion relation among the

families of regular control languages.

i L = L(i, 3, 0) L = L(i, 3, 1)

0 L = L0 L = L0

1 L = L1 L = L1

2 L2 ⊆ L L = L0

3 L = L3 L = L3

We state some results without proof.

Theorem 13.2

The family of languages generated (with appearance checking) by

type i matrix grammars, the family of languages generated (with
appearance checking) by type i periodically, time-varying
grammars and the family of languages generated (with appearance
checking) by type i programmed grammars are equal to the family
of languages Li of type i for i = 0, 1, 3.

As we have seen earlier, , Mac, Mλ, and M denote the family of

context-free matrix languages with appearance checking and ε-
rules, without appearance checking and but with ε-rules, and
without appearance checking and without ε-rules. , Tac, Tλ,
and T denote the corresponding families of context-free
periodically time-varying languages. , Pac, Pλ, and P denote the
corresponding families of programmed languages.

Let

Theorem 13.3

INDIAN PARALLEL GRAMMARS

In the definition of matrix, programmed, time-varying, regular
control, and random context grammars, only one rule is applied at
any step of derivation. In this section, we consider parallel
application of rules in a context-free grammars (CFG).

Definition 13.8
An Indian parallel grammar is a 4-tuple G = (N, T, P, S) where the
components are as defined for a CFG. We say that x ⇒ y holds in
G for strings x, y over N ∪ T, if

x = x1Ax2A ... AxnAxn+1, A ∊ N, xi ∊ (N ∪ T) — {A})* for 1 ≤ i ≤ n + 1.

y = x1 wx2 w ... wxnwxn+ 1, A → w ∊ P.

i.e., if a sentential form x has n occurrences of the nonterminal A,

and if A → w is to be used it is applied to all A’s in x
simultaneously. is the reflexive, transitive closure of ⇒.

L(G) = {w|w ∊ T*, S w}

Example 13.10.

We consider the Indian parallel grammar:

G = ({S}, {a}, {S → SS, S → a}, S).

Some sample derivations are:

S ⇒ a,

S ⇒ SS ⇒ aa,

S ⇒ SS ⇒ SSSS ⇒ aaaa and

L(G) = {a2n/n ≥ 0}.

It is clear from this example that some non-context-free languages

can be generated by Indian parallel grammars.

The other way round, the question is: Can all context-free
languages (CFL) be generated by Indian parallel grammars? Since
the first attempt to solve this was made in (Siromoney and
Krithivasan, 1974), this type of grammar is called an Indian
parallel grammar. We state the following theorem without proof.

Theorem 13.4

The Dyck language (which is a CFL) cannot be generated by an

Indian parallel grammar.

Marcus Contextual Grammars

Marcus defined the contextual grammars in 1969. The implicit
motivation for these new generative devices were in the concepts of
descriptive linguistics. S. Marcus introduced first what are known
as ‘external contextual grammars (ECG)’ and other variants like
‘internal contextual grammars (ICG)’; total contextual grammars
were developed later. The power of contextual grammars is
compared with Chomskian grammars, and some other grammars
(Păun, 1997). The research on this branch of formal language
theory is well developed now.

In a CFG, rules are always of the form A → α. One understands that

the application of the rules to any word containing A does not
depend on the context. That is, any word of the form w1 Aw2 will
yield w1αw2 on application of A → α once, whereas a CFG will
contain rules of the form w1Aw2 → w1 αw2, which are understood
to be sensitive to the contexts w1 and w2. That is, one cannot apply
this rule to any word containing A as in the case of context-free
rule. Thus, in the above-said Chomskian models of re-writing,
contexts may or may not play a role for re-writing.

One can see that one can derive a collection of strings from a
specified set of rules by means of grammars. That is, grammars are
used to compute strings or words. There are various models to
generate a set of strings. Contextual grammars are one such
models to compute a specified set of strings. In this model, strings
are attached to the existing strings to generate longer strings.

Definitions and Examples

Here, we will introduce basic definitions of contextual grammars
and illustrate the models with examples.

Definition 13.9

A contextual grammar with choice is a construct:

G = (V, A, C, ψ),

where V is an alphabet, A is a finite language over V, C is a finite

subset of V* × V* and ψ: V* → 2C.

The strings in A are called the axioms, the elements of C are of the
form (u, v) called as contexts. ψ is a selection or choice mapping.

For x, y ∊ V*, we define:

, if and only if y = uxv for a context (u, v) ∊ ψ (x) and

, if and only if x = x1x2x3, y = x1ux2vx3 and for x1,x2,x3 ∊ V*, (u, v) ∊ ψ (x2)

means that the derivation of y from x is direct and external in G.

That is the context (u, v) is adjoined externally to x to reach y,
provided ψ (x) contains (u, v).

means that the derivation of y from x is direct and internal in G.

That is, the context (u, v) is inserted as specified by ψ. That is, if
ψ(x2) contains (u, v), if x = x1x2x3, then y will be x1ux2vx3.
As in the derivations of strings in Chomsky grammars, , are
reflexive, transitive closure of , respectively.

That is, Lex(G) is the language generated by G in the external

mode. Here, G may be called ECG. Lin (G) is the language
generated by G in the internal mode. Here, G may be called ICG.

Example 13.11.

Consider the grammar G = ({a, b}, {a}, {(a, b)}, ψ), where ψ (a) =

(a, b) = ψ (b). Then, the words generated by G as ECG will be:

Lex(G) = {a, aab} and Lin(G) = {a, aab, aabab, aaabb, ...}.

Some sample derivations are:

Remark. In the above example, one can see that the same
grammar has been taken to act on two different derivation modes.
Hence, the languages generated are different.

One can understand that re-writing that happens in contextual

grammar depends on the selection mapping ψ. Suppose ψ(x) =
(u, v), one can understand that the string x is re-written as uxv.
Hence, the contextual grammar can be represented as a 3-tuple G′
= (V, A, P), where V and A are as before, P is a finite set of pairs
(S, C), where S ⊆ V*, S ≠ φ and C a finite subset of V* × V*, called
the contexts. The pair (S, C) is called a contextual production or
rule. G′ is now said to be in modular form. Any contextual
grammar has an equivalent modular form and vice versa. The
grammar G′ = (V, A, P) is said to be the canonical modular variant
of a contextual grammar G.
Definition 13.10

A total contextual grammar is a construct G = (V, A, C, ψ), where

V, A, C as in the above definition of contextual grammar. ψ is a
mapping V* × V* × V* → 2C.

⇒ relation is defined as below:

x ⇒ y iff x = x1x2x3; y = x1ux2vx3 where (u, v) ∊ ψ(x1, x2, x3).

is the reflexive transitive closure of ⇒.

An ECG is a total contextual grammar with ψ(x1, x2, x3) = φ for all

x1, x2, x3 ∊ V* with x1x3 ≠ ε. If ψ(x1, x2, x3) = ψ(x′1, x2, x′3) for all
x1, x′1, x2, x2, x3, x′3 ∊ V*, in a total contextual grammar, then G is
an ICG.

Example 13.12.

Let G = ({a, b}, {aba}, {(a, a)}, ψ) be a total contextual grammar,

where ψ(a, b, a) = {(a, a)}, ψ(x1, x2, x3)
= φ where x1 ≠ a ≠ x3, x2 ≠ b. Then, some sample derivations are:

a↑b↑a ⇒ aa↑b↑aa ⇒ aaabaaa.

Hence, L(G) = {anban|n ≥ 1}.

Definition 13.11

A contextual grammar G = (V, A, C, ψ) is said to be without choice

if ψ(x) = C for all x ∊ V*.
In such a case, ψ can be ignored and the contextual grammar is
simply G = (V, A, C). ⇒ and are defined as below:

y if and only if y = uxv for any (u, v) ∊ C

y if and only if x = x1x2x3; y = x1ux2vx3 for any (u, v) ∊ C.

Example 13.13.

Consider the contextual grammar G = ({a, b}, {ε}, {a, b}). Clearly,

G is a grammar without choice.

Lex(G) = {anbn|n ≥ 0}

whereas Lin(G) = D{a b}, the Dyck language over {a, b}. For G, in

the external mode, the context (a, b) is always attached outside a
word w derived previously.

Hence, Lex(G) = {anbn|n ≥ 0}.

For G, in the internal mode, (a, b) is inserted anywhere in the

word derived so far.

NOTATIONS
Let us represent the languages of contextual grammars in the
following way:

1.
TC = the family of languages generated by total contextual
grammars with choice.
2.
3.
ECC(EC) = the family of languages generated by contextual
grammars in the external mode with choice (without choice).
4.
5.
ICC(IC) = the family of languages generated by contextual
grammars in the internal mode with choice (without choice).
6.
7.
The families of finite, regular, linear, context-free, context-
sensitive, and recursively enumerable-languages are denoted
by FIN, REG, LIN, CF, CS, RE, respectively.
8.

Generative Capacity
In this section, we prove some basic results that show the
generative power of contextual grammars. In other words, we
obtain the relations between families of contextual languages with
families in the Chomskian hierarchy.

Theorem 13.5

Every finite language is a contextual language in (EC).

Proof. Let L be a finite language over an alphabet V. Then, a

contextual grammar (V, L, P) with P = φ generates L.

Converse of the above proposition is not true, as we can see this

from some of the previous examples.

Theorem 13.6

There exists a contextual language in EC which is not regular.

Proof. Consider the language L = {anbn|n ≥ 1}. Then, the contextual

grammar G = (V, A, P), where V = {a, b}, A = {ab}, P = (φ, (a, b))
generates L in the external mode. L is not regular but a contextual
language in EC.
Theorem 13.7

Every ECL is a CFL.

Proof. If L is a contextual language and L is finite, L is in REG.

Suppose L is contextual and L is infinite. Then, there exists a

contextual grammar G = (V, A, P) such that A ≠ φ, P contains
contexts of the form (Si, Ci), Ci is a non-null context. That is, if Ci =
(ui, vi) then at least one of ui, vi is not ε. Clearly, Si = φ. Let Ci = (ui,
vi), 1 ≤ i ≤ n be such contexts.

Let G′ = (N, V, S, P) be the CFG with N = {S} such that:

for A = {x1, ..., xk}, P contains the following rules:

2.


S → x1

S → x2

⋮


S → xk


3.

for Ci = (ui, vi), 1 ≤ i ≤ n, P contains:

4.


S → u1Sv1

S → u2Sv2

S → unSvn

Clearly, G′ is context-free.

To prove L(G′) ⊆ L(G), let w ∊ L(G). Then, w will be of the

form . The contexts (ui1, ui1) are,
applied j1 times, (ui2, ui2) are applied j2 times, and so on, Xl ∊ A.
Start with rule S → ui1 Sui1,
apply j1 times, S → ui2 Sui2 apply j2 times, and so on, and finally
apply S → Xl once to get w. By re-tracing the argument from
the CFG G′ to G, one can see that L(G) ⊆ L(G′). Hence, L(G)
= L(G′).

Remark If L is a CFL and L is infinite, the set of integers which

represent the lengths of the strings in L contains an infinite
arithmetic progression. The sequence {n2}, n = 1, 2, · · · contains no
subsequence which is an infinite arithmetic progression. Hence,
{an2|n ≥ 1} is not context-free and hence non-external contextual
language (non-EC).

Converse of the above theorem is not true.

Theorem 13.8

There exists a language L ∊ REG which is not in EC.

Proof. Consider L = {abmcabn|n, m ≥ 1}. It is not difficult to

see L being in REG. But L is not in EC. Let there be an ECG G =
(V, A, P) generating L in external manner. For P = {(Si, Ci)|1
≤ i ≤ n}, Si = φ. Any word generated by G will be of the form:

where X ∊ A, j1, j2, ...,jn are positive integers and (uij, vij) are

contexts. Since in any word w ∊ L, c occurs only once and a occurs
only twice, they must be in the axiom and the intermediate terms
between the two a’s belong to X. Hence, X will be of the form:

X = Y abmcaZ,
where Y = ε and Z is of the form bk. Hence,

X = abmcab, m > 0.

This implies that A is infinite, which cannot happen. Hence, L is not

in EC.

It follows from Theorem 13.8.

Theorem 13.9

There exists a CFL which is not in EC.

In order to compare the variants of contextual grammar among

themselves, we need to understand the structure of the strings
generated by them. Rather, we would like to discuss some
necessary conditions that a contextual language must satisfy. We
will first understand such properties. This will also help us to
compare the variants of contextual languages among themselves.

Property 1. Let L be a language over V*. L is said to have external

bounded step (EBS) property, if there is a constant p such that for
each x ∊ L, | x| > p, there is a ‘y’ such that x = uyv, and 0 < |uv| ≤ p.

Example 13.14.

Let L = {anbn|n ≥ 1}.

Here p = 2. Suppose x = anbn, then x can be put as a(an − 1 bn − 1)b
where uv = ab. Hence, L satisfies EBS property.

Property 2. Let L be a language over V. Then L satisfies internal

bounded step (IBS) property, if there is a constant p such that for
each x ∊ L, |x| > p, there is y ∊ L such
that x = x1ux2vx3, y = x1x2x3 and 0 < |uv| ≤ p.

Example 13.15.

Let L = {anbm|n, m ≥ 1}. For x = anbm = an − 2a(ab)bbm − 2, y = an −

2(ab)bm − 2 = an − 1 bm − 1 ∊ L, and p = 2. Hence, L satisfies IBS

property.

Property 3. Let L be a language over V. L is said to satisfy

‘bounded length increase’ (BLI) property, if there is a
constant p such that for each x ∊ L, |x| > p, there is a y ∊ L with 0 <
|x|−|y| ≤ p.

Example 13.16.

Let L = {anbncndnen|n ≥ 1}. This language has a BLI property. For

x = anbncndnen, y = an−1bn−1cn−1dn−1en−1 with |x| − |y| = 5, and p =
5 here.

Remark. If L is a language satisfying both EBS and IBS property,

then it also satisfies BLI property. But by our previous example,
one can see that the converse need not be true.

Theorem 13.10

All contextual languages in ICC satisfy IBS property.

Proof. Let L ⊆ V* have a contextual grammar G = (V, A, P).

Let p = max{|x‖x ∊ A}, p′ = max{|uv‖(u, v) ∊ C}.

Then ψ(x) = {(u, v) ∊ Ci|x1ux2vx3 ∊ L, x1, x2, x3 ∊ V*, x ∊ Sj},

where (Si, Ci) ∊ P.

Choose p = max{p, p′}.

For any z ∊ L(G), with

|z| > p, z = x′1uxvx′2 for some

(u, v) ∊ Ci and x ∊ Si such that (Si, Ci) ∊ P.

Hence, all languages in ICC satisfies IBS property.

Example 13.17.

Let L = {anbn|n ≥ 1}. Suppose L is in IC. Then, there is a contextual

grammar without choice mapping. Let it be G = ({a, b}, A, C)
where C is a collection of contexts.
Since A ⊂ L, suppose ab ∊ A. The elements of C must be of the
form (a, b) so that inserting this into ‘ab’ in internal mode we
get a2b2. In order to get a3b3 from this, we can use the same context.

Since G is without choice, we get which is not in

L. One can attach (a2, b2) ∊ C to get a3b3 and also other words like
a3ba2b3 ∉ L are obtained. That is, it is not possible to split a word
w as x1x2x3 such that x1 x2 x3 = anbn and x1 ax2 bx3 = an + 1bn + 1.
Hence, {anbn|n ≥ 1} ∉ IC.

Example 13.18.

Consider the contextual grammar G = ({a, b}, {ab}, P), where P =

(S, C) with S = ab, C = (a, b). The typical internal mode derivatives
are:

Hence, L = {anbn|n ≥ 1} ∊ ICC.

Theorem 13.11

IC ⊊ ICC

Proof. Inclusion follows from definitions. For proper inclusion, one

can see that L = {anbn|n ≥ 1} ∉ IC but in ICC.

Theorem 13.12

L = {an|n ≥ 1} ∪ {bn|n ≥ 1} is not in EC.

Proof. Suppose L be in EC. Then, there exists a contextual

grammar G = (V, A, C) without choice such that L(G)
= L. Clearly, V = {a, b}. Since L is infinite, there has to be a
recursion, through C generating L. Since A is finite, without loss of
generality we may assume A = {a, b}. To get a+ from a by external
mode, we need contexts like (ε, a) or (a, ε). But as it is a contextual
grammar without choice, we get other strings like ba, ab, aba etc.,
that are not in L. Similar argument holds for b+. Hence, L cannot be
in EC.

Example 13.19.

Let G = (V, A, P) be a contextual grammar with V = {a, b}, A = {a,

b}. P = {(ai, (ε, a)), (bi, (ε, b))}, for i ≥ 1 be the choice mapping.
That ψ (ai) = (ε, a), i ≥ 1, ψ(bi) = (ε, b), i ≥ 1.

Clearly, in the external mode, L(G) = {a+ ∪ b+}.

Some sample derivations are:

Hence, L(G) ∊ ECC.

Theorem 13.13

EC ⊊ ECC.

Proof. Inclusion follows from definition. For proper inclusion, we

see that L = {a+ ∪ b+} ∊ ECC but not EC.

Theorem 13.14

L = {anbncn|n ≥ 1} ∉ ICC.

Proof. Suppose L ∊ ICC. Then, there exists an ICG G with choice

such that L(G) = L. Let G = (V, A, P) be such that V = {a, b, c}.
Without loss of generality, let A = {abc}. P must be constructed
suitably to generate anbncn. That, any word in L is of the
form anbncn which has to be split suitably so that one can identify
the choice word, on both sides of which the context is inserted.
Suppose w = anbncn = x1x2x3, such
that x1ux2vx3 ∊ L and x1ux2vx3 = an + 1bn + 1cn + 1, for (u, v) ∊ ψ(x2).
But, this implies x1 uix2vix3 ∊ L which is not true for any choice
mapping.

Example 13.20.

Consider the TC grammar G = ({a, b, c}, A, P), where A =

{abc}, P = (Si, Ci) where Si = (ai, bi, ci), Ci = (ab, c) for i ≥ 1.

Some sample derivations are: .

Clearly, L(G) = {anbncn|n ≥ 1}.

Theorem 13.15
ICC ⊊ TC.

Proof. Inclusion follows from definitions. For proper inclusion, we

can see that L = {anbncn|n ≥ 1} is in TC but not in ICC.

Theorem 13.16

L = {abna|n ≥ 1} ∉ ECC.

Proof. Suppose L is in ECC. Then, there exists an ECG G = (V, A,

P) with choice generating L. Clearly, V = {a, b}. Without loss of
generality A = {aba}. Since L is infinite, there exists a
word w ∊ L such that w = uw′v such that (u, v) ∊ ψ(w′) and w′ ∊ L.
For if w = abna, w′ has to be chosen such that w′ ∊ L and uw′
v = abna. This means w′ is also of the form abma. In such
case, w cannot be in L. Hence, L is not in ECC.

Example 13.21.

Consider the TC grammar G = (V, A, P), where V = {a, b}, A =

{aba}, P contains ((a, b, a), (ε, b)), ((a, b, b), (ε, b)). Then, L(G) =
{abna|n ≥ 1}.

Some sample derivations are:

a↑b↑a ⇒ a↑b↑ba⇒a↑b↑bba ⇒ abbba.

Theorem 13.17

ECC ⊊ TC.

Proof. Inclusion follows from definitions. For proper inclusion, we

see that {ab+a} is in TC but not in ECC.
Closure and Decidability Properties of
Contextual Languages
We study the closure properties of contextual languages IC, EC,
ICC, ECC, and TC. One observes that since the contextual
grammars defined do not use nonterminals, these families are not
closed under all the operations. Let us first see the statement of a
positive result of TC without proof (păun, 1997).

Theorem 13.18

TC is closed under substitution.

Using this theorem, one can say that TC is closed under union,
catenation, Kleene +, finite substitution, and morphism. One can
see that for a language L1 = {1, 2}, L2 = {1, 2}, and L3 = 1+ are
in TC. Hence, if σ is a substitution with σ(1) = L, σ(2) = L′,
where L and L′ are any arbitrary TC languages, σ(L1) = L ∪ L
′, σ(L2) = LL′ and σ(L3) = L+. By the above theorem, all these are
in TC. Closure under morphism follows from the definition of
substitution.

Theorem 13.19

EC, IC, ICC are not closed under union and substitution.

Proof. Consider L1 = {an|n ≥ 1} and L2 = {bn|n ≥

1}. L1, L2 ∊ EC (IC). But, we have seen previously,
that L1 ∪ L2 ∉ EC(IC).

For non-closure of ICC, consider L3 = {anbn|n ≥ 1} ∊ ICC. L1 ∪ L3 =

{an|n ≥ 1} ∪ {anbn|n ≥ 1} ∉ ICC. Since EC, IC, and ICC are not
closed under union, for L0 = {1, 2}, with σ(1) = L1, σ(2) = L2, we
get σ(L0) = L1 ∪ L2 ∉ EC(IC). Similarly for ICC.
One can generate examples to see that none of the families EC, IC,
ICC, and ECC are closed under concatenation. All the closure
properties are listed in Table 13.1 and we have given the proof for
selected ones.

Table 13.1.

IC EC ICC

Union No No No

Intersection No No No

Complement No No No

Concatenation No No No

Kleene+ No No No

Morphisms No Yes No

Finite substitution No Yes No

Substitution No No No

Intersection with regular languages No No No

Inverse morphisms No No No

Shuffle No No No

Mirror image Yes Yes Yes

We consider some basic decidability results of contextual

languages.

For all contextual grammars G = (V, A, P) as L(G) includes A, L(G)

≠ φ, if and only if A is not empty. Hence, we have:

Theorem 13.20
Emptiness problem is decidable for all contextual languages.

For any external/internal contextual grammar without choice G =

(V, A, P), we have seen that if L(G) is infinite then the contexts (α,
β) in (C ∊ P) must satisfy the condition that at least one of the
contexts α, β is not ε, and A = φ. Hence, we have:

Theorem 13.21

The finiteness problem is decidable for grammars corresponding to

languages of IC and EC.

Theorem 13.22

The membership problem is decidable for grammars

corresponding to languages of IC and EC.

Proof. Let G = (V, A, P with contexts in P are of the form (ψ, C),

where C = (u, v) (say) be an external/internal contextual grammar.

Let us construct the following sets recursively.

K0 (G) = A

Ki (G) = {x ∊ V*/w ⇒ x for some w ∊ Ki−1 (G)} ∪ Ki − 1 (G), i ≥ 1.

For a given x over V*, x ∊ L(G), if and only if x ∊ K|x| (G), where |x|

is the length of x. Such a computation can be done in a finite
number of steps. Hence the result.

One can see from the definitions of ECC, ICC, and TC, the

grammars generating them have a choice of mapping which may or
may not be computable. Hence, we have:

Theorem 13.23
The membership problem is undecidable for grammars with
arbitrary selection mapping.

Lindenmayer Systems
L-systems were defined by Lindenmayer in an attempt to describe
the development of multi-cellular organisms. In the study of
developmental biology, the important changes that take place in
cells and tissues during development are considered. L-systems
provide a framework within which these aspects of development
can be expressed in a formal manner. L-systems also provide a way
to generate interesting classes of pictures by generating strings and
interpreting the symbols of the string as the moves of a cursor.

From the formal language theory point of view, L-systems differ

from the Chomsky grammars in the following three ways:

1.
Parallel re-writing of symbols is done at every step. This is the
major difference.
2.
3.
There is no distinction between nonterminals and terminals (In
extended L-system, we try to introduce the distinction).
4.
5.
Starting point is a string called the axiom.
6.
Definition 13.12

A 0L system is an ordered triple π = (V, w0, P) where V is an

alphabet, w0 a non-empty word over V which is called the axiom or
initial word; and P is a finite set of rules of the form a →
α, a ∊ V and α ∊ V*. Furthermore, for each a ∊ V, there is at least
one rule with a on the left-hand side (This is called the
completeness condition).
The binary relation ⇒ is defined as follows: If a1 ... an is a string
over V and ai → wi are rules in P,

a1 ... an ⇒ w1 ... wn

is the reflexive transitive closure of ⇒. The language generated

by the 0L system is:

Definition 13.13

A 0L system π = (V, w0, P) is deterministic if for every a ∊ V, there
is exactly one rule in P with a on the left-hand side. It is
propagating (ε-free), if ε is not on the right-hand side of any
production. Notations DOLS, P0LS, and DP0LS are used for these
systems.

The languages generated by these systems are called DOL, P0L,

and DP0L languages, respectively.

Example 13.22.

Consider the following DP0L system:

π1 = ({a, b}, ab, {a → aa, b → bb})

The derivation steps are:

ab ⇒ aabb ⇒ aaaabbbb ⇒...

L(π1) = {a2nb2n|n ≥ 0}

Example 13.23.
Consider the following DP0L system:

π2 = (Σ, 4, P),

where Σ = {0, 1, 2, ..., 9, (,)}

P has rules:

0 → 10

1 → 32

2. → 3(4)

3→3

4. → 56

5. → 37

6. → 58

7. → 3(9)

8. → 50

9. → 39

(→(

)→)

The 10 steps in the derivation are given below:

1 4

2 56

3 3758

4 33(9)3750

5 33(39)33(9)3710
6 33(339)33(39)33(9)3210

7 33(3339)33(339)33(39)33(4)3210

8 33(33339)33(3339)33(339)33(56)33(4)3210

9 33(333339)33(33339)33(3339)33(3758)33(56)33(4)3210

10 33(3333339)33(333339)33(33339)33(33(9)3750)33(3758)33(56)33(4)3210

If the symbols are interpreted as steps, cells with the portion

within ( ) representing a branch, we get the following growth
pattern for each step.

8
9

The following hierarchy can be easily seen as:

It can be seen that the family of 0L languages is not closed under

any of the usual operation (AFL operation) union, concatenation,
Kleene closure, ε-free, homomorphism, intersection with regular
sets, and inverse homomorphism. For example, even though {a}
and {aa} are 0L languages generated by 0L systems with
appropriate axioms and rule a → a, we find their union {a, aa} is
not a 0L language. If it is a 0L language, the 0L system generating
it will have as axiom either a or aa. If a is the axiom, to
generate aa, there must be a rule a → aa, in which case other
strings will also be generated. If aa is the axiom, to generate a, we
must have a → a and a → ε, in which case ε will also be generated.
In a similar manner, we can give examples for nonclosure under
other operations.

This also shows that there are finite languages which are not 0L.

Theorem 13.24

Every P0L language is context-sensitive.

Proof. Let π = (V,w0, P) be a P0L system. Then, construct
a CSG to generate:

{XwX| w ∊ L(π), where X ∉ V}.

Let G be the type 1 grammar required, where

G = (N, T, P, S)

N = {X0} ∪ {<a, L>, <a, R> |a ∊ V}∪{< R, X >}

S = X0 and P consists of the following rules:

X0 → Xw0 X

Xa → X < a, L >

< a, L > b → wa < b, L > a → wa ∊ P

< a, L > X → wa < R, X >

a< R, X > → < a, R > X

b < a, R > → < b, R > a

X < b, R> → Xb

Initially, Xw0X is generated. In one left to right sweep, each symbol

in w0 is replaced by a right-hand side. When the marker L reaches
the right X, it changes to R, and by making a right to left sweep, it
reaches the left X and the process repeats.
We have shown that if L ⊆ V* is a P0L language, then L′ is a type 1
language where L′ = {XwX| w ∊ L} and X ∉ V. It can be easily seen
that if L′ is a CSL, then so is L. This follows from a result known as
workspace theorem (Salomaa, 1973).

The following result can be proved even though the proof is slightly
more involved.

Theorem 13.25

Every 0L language is context-sensitive.

The development rules describing the growth of an organism need

not be the same always. For example, the growth of a plant maybe
described by a set of rules in daytime and by a different sets of
rules during night. There maybe different sets of rules for summer,
for winter etc. Hence, this concept is used in tabled systems.

Definition 13.14

A Tabled 0L (T0L) system is an ordered triple T π = (V, w0,

P), where V is an alphabet, w0 is the axiom and P is a finite set of
tables. Each table contains rules of the form a → α for a ∊ V. Each
table will have at least one rule with a on the left-hand side for
each a ∊ V (completeness condition). If α ≠ β is a derivation step
α = a1 ....an, β = β1 . . . βnl ai → β ∊ t, where t is a table in P. i.e., in
one step only, rules from the same table should be used. is the
reflexive transitive closure of ⇒. The language generated is:

Example 13.24.

Consider the T0L system:

Tπ = ({a}, a, {{a → a2}, {a → a3}}).

L(T π) = {ai|i = 2m3n for m, n ≥ 0}

A T0L system is deterministic, if each table has exactly one rule for

each a ∊ V. It is propagating if ε-rules are not allowed. We can
hence talk about DT0L systems, PT0L systems, DPT0L systems,
and the corresponding languages.

Extended System
Nondistinction between nonterminals and terminals affected the
closure properties. From formal-language theory point of view,
extended systems are defined which make the families defined
closed under many operations. Here, the system has two sets of
symbols, terminals and nonterminals, or total alphabet, and target
alphabet.

Definition 13.15

An E0L system is defined as a 4-tuple G = (V, Σ, w0, P) where (V,

w0, P) is a 0L system and Σ ⊆ V is the target alphabet. ⇒ and
are defined in the usual manner. The language generated is
defined as:

In a similar manner, ET0L systems can be defined by specifying

the target alphabet.

Example 13.25.

Let G be ({S, a}, {a}, S, {S → a, S → aa, a → a}) be an E0L system.

The language generated is {a, aa} which is not a 0L language.

Systems with Interactions

Definition 13.16
A 2L system is an ordered 4- tuple H = (V, w0, P, $), where V and
w0 are as in 0L system. $ ∉ V is the input from environment and P
is a finite set of rules of the form < a, b, c > → w, b ∊ V, a, c ∊ V ∪
{$}, w ∊ V*. ⇒ is defined as follows:

a1 ... .an is a sentential form and a1 ... an ⇒ α1 ... αn if (ai −

1, ai, ai a1 + 1) → αi, is in P for 2 ≤ i ≤ n − 1, ($, a1, a2) → α1, (an −
1, an, $) → αn ∊ P.

is the reflexive transitive closure of ⇒ . i.e., for re-writing a

symbol, the left and right neighbors are also considered. The
language generated is defined as . If
only the right neighbor (or left neighbor) is considered, it is called
a 1L system. i.e., A 2L system is a 1L system if and only if one of
the following conditions hold.

for all a, b, c, d in V, P contains (a, b, c) → α if and only if P

contains (a, b, d) → α for all d ∊ V ∪ {$} or

2.
3.

(a, b, c) — α is in P if and only if for all d ∊ V ∪ {$} (d, b, c) → α is
in P.

The corresponding languages are called 2L and 1L languages.

Example 13.26.

Consider 2L system:
H2 = ({a, b},a, P, $)

where P is given by:

($, a, $) → a2| a3|a3b|ba3

(x, b, y) → b|b2

(a, a, a) → a

(a, a, b) → a

($, a, a) → a

x, y ∊ {a, b, $}

L(H2) = {a, a2, a3, a3b, b a3}

The hierarchy of the language families generated is shown in the

below figure.

Deterministic 0L systems are of interest because they generate a

sequence of strings called D0L sequence. The next step generates a
unique string. The lengths of the sequence of the strings may
define a well-known function. Such a function is called a growth
function. Consider the following example.
Example 13.27.

Consider the following 0L systems:

1. S = ({a}, a, {a → a2}), we have:

L(S) = {a2n | n ≥ 0}.

The above language is a DP0L-language. The growth function

is f(n) = 2n.

2. S = ({a, b}, a, {a → b, b → ab}), the words in L(S) are:

a, b, ab, bab, abbab, bababbab,....

The lengths of these words are the Fibonacci numbers.

3. S = ({a, b, c}, a, {a → abcc, b → bcc, c → c}), the words in L(S) are:

a, abcc, abccbcccc, abccbccccbccccc,....

The lengths of these words are the squares of natural numbers.

4. S = ({a, b, c}, a, {a → abc, b → bc, c → c}), the words in L(S) are:

a, abc, abcbcc, abcbccbccc, abcbccbcccbcccc,....

The lengths of these words are the triangular numbers.

5. S = ({a, b, c, d}, a, {a → abcd5, b → bcd5, c → cd6, d → d}), the words in L(S) are:

a, abcd5, abcd5bcd5cd6d5,....

The lengths of these words are the cubes of natural numbers.

Two D0L systems are growth equivalent, if they have the same

growth function. Let π = (V, w0, P) be a D0L system. Then, let
there be a n-dimensional row vector which is the Parikh mapping
of w0. M(P) is a n × n matrix whose ith row is the Parikh mapping
of αi, where ai → αi ∊ P.η is a n-dimensional column vector .
consisting of 1’s. Then, f(n), the growth function is given by f(n)
= π Mηη.

Example 13.28.

π = ({a, b, c}, a, {a → abcc, b → bcc, c → c}).

The strings generated in a few steps and their lengths are given
below.

w0 a

step 1 abcc

step 2 abccbcccc

step 3 abccbccccbccccc

It is obvious that f(n) = (n + 1)2

π is (1, 0, 0)

π M2η = 9 and so on.

The growth function f(n) is termed as malignant, if and only if

there is no polynomial p(n) such that f(n) ≤ p(n) for all n.

Applications
L-systems are used for a number of applications in computer
imagery. It is used in the generation of fractals, plants, and for
object modeling in three dimensions. Applications of L-systems
can be extended to reproduce traditional art and to compose
music.

Two-Dimensional Generation of Patterns

Here, the description of the pattern is captured as a string of
symbols. An L-system is used to generate this string. This string of
symbols is then viewed as commands controlling a LOGO-like
turtle. The basic commands used are move forward, make right
turn, make left turn etc. Line segments are drawn in various
directions specified by the symbols to generate the straight-line
pattern. Since most of the patterns have smooth curves, the
positions after each move of the turtle are taken as control points
for B-spline interpolation. We see that this approach is simple and
concise.

Fractals Generated by L-systems

Many fractals can be thought of as a sequence of primitive
elements. These primitive elements are line segments. Fractals can
be coded into strings. Strings that contain necessary information
about a geometric figure can be generated by L-systems. The
graphical interpretation of this string can be described based on
the motion of a LOGO-like turtle.

A state of the turtle is defined as a triplet (x, y, A), where the

Cartesian coordinates (x, y) represent the position of the turtle and
angle A, called the turtle’s heading, is interpreted as the direction
in which the turtle is facing. Given the step size d and the angle δ,
the turtle can move with respect to the following symbols.

f: Move forward a step length d. The state of the turtle changes to

(x′, y′, A), where x′ = x + d * cos(A) and y′ = y + d * sin(A). A line
is drawn between the points (x, y) and (x′, y′).

F Move forward as above but without drawing the line.

+: Turn the turtle left by an angle δ. The next state of the turtle will be (x, y, A + δ). Posit
angle is taken as anti-clockwise.

−: Turn the turtle as above but in clockwise direction.

Interpretation of a String
Let S be a string and (x0, y0, A0) be the initial state of the turtle,
and step size d, angle increment δ are the fixed parameters. The
pattern drawn by the turtle corresponding to the string S is called
the turtle interpretation of the string S.

Consider the following L system.

Axiom: w: f + f + f + f

production: f → f + f − f − ff + f + f − f

The above L-system is for ‘Koch island.’

The images corresponds to the string generated for different

derivation steps n = 0, 1, 2 is shown in Figure 13.1. The angle
increment δ is 90°. The step size d could be any positive number.
The size of the ‘Koch island’ depends on the step size and the
number of derivation steps.

Figure 13.1. Fractals generated by L-system

Koch constructions are a special case of L-systems. The initiator
corresponds to the axiom in the L-system. The generator is
represented by the single production.

Interpolation
Consecutive positions of the turtle can be considered as control
points specifying a smooth interpolating curve. B-spline
interpolation is used for most of the kolam patterns (Figure 13.2)

Figure 13.2. Kolam patterns generated by L-system

Candies
Figure 13.2 shows the kolam pattern ‘Candies’ which can the
generated by the following L-system.

Axiom: (−D − − D)

Productions:

A → f + + ffff − − f − − ffff + + f + + ffff − − f

B → f − − ffff + + f + + ffff − − f − − ffff + + f

C → BfA − −BfA
D → CfC − − CfC

Angle increment = 45°

Generation of Plant Structure

For generating living organisms like plants, a three-dimensional
turtle is used. A three-dimensional turtle is different in some
respects, compared to the two-dimensional one. It has additional
parameters for width and color and is represented by a 6-tuple
< P, H, L, U, w, c> where the position vector P represents the
turtle’s position in cartesian coordinates, and
vectors H, L, and U represent the turtle’s orientation. w represents
the width of the line drawn by the turtle. c corresponds to the color
of the lines drawn.

For generating plant-like structures, some special L-systems are

used. They are bracketed 2L-systems and stochastic L-system. In
bracketed L-systems, brackets are used to separate the branches
from the parent stem. In stochastic L-systems, probabilities are
attached to the rules which can capture the variations in plant
development.

Figure 13.3 gives a few images of plants formed by L-systems.

(See color plate on page 418)

Figure 13.3. Some pictures of plants generated by L-systems

Grammar Systems and Distributed

Automata
With the need to solve different problems within a short-time in an
efficient manner, parallel and distributed computing have become
essential. Study of such computations in the abstract sense, from
the formal-language theory point of view, started with the
development of grammar systems. In classical formal-language
theory, a language is generated by a single grammar or accepted by
a single automaton. They model a single processor or we can say
the devices are centralized. Though multi-tape Turing machines
(TMs) try to introduce parallelism in a small way, the finite control
of the machine is only one. The introduction of distributed
computing useful in analyzing computation in computer networks,
distributed databases etc., has led to the notions such as
distributed parallelism, concurrency, and communication. The
theory of grammar systems and the distributed automata are
formal models for distributed computing, where these notions
could be formally defined and analyzed.

Grammar Systems
A grammar system is a set of grammars working in unison,
according to a specified protocol, to generate one language. These
are studied to model distribution, to increase the generative power,
to decrease the complexity etc. The study of grammar systems
probes into the functioning of such system under specific protocols
and the influence of various protocols on various properties of the
system are considered.

Grammar system can be sequential or parallel. A co-operating

distributed (CD) grammar system is sequential. Here, all
grammars work on one sentential form. At any instant only one
grammar is active. This is called a ‘blackboard’ model. Suppose a
problem is to be solved in a class. The teacher asks one student to
start working on the problem on the blackboard. The student
writes a few steps, then goes back. Another student comes and
continues working on the problem. On his return, a third student
comes and continues. The process continues till the problem is
solved. Now, the question arises. At what time does one student
return and the next one starts? There may be several ways for
defining this. Correspondingly, in the CD grammar system, they
are different modes of co-operation. The student may return when
he is not able to proceed further (terminating mode); he may
return at any time (*-mode); he may return after doing k-steps
(= k-mode); he may return after doing k steps or less (≤ k-mode);
he may return after k or more steps (≥ k-mode). The power and
properties of CD grammar systems under these different protocols
are studied.

A parallel communicating (PC) grammar system is basically a

parallel generation of strings by different grammars and putting
them together by communication. This is called the “classroom”
model. There is a problem which a group of students have to solve.
Each student has his own ‘notebook’ containing the description of
a particular subproblem of the given problem. Each student
operates only on his own notebook. The leader (master) has the
description of the whole problem and decides when the problem is
solved. The students communicate on request the contents of their
‘notebooks.’ The students may communicate with each other either
in each step (non-centralized mode) or when the leader requests it
(centralized mode). The whole problem is solved is this way.
Hence, in PC grammar system each grammar has its own
sentential form and communicates it with others on request.

CD Grammar Systems
Definition 13.17

A CD grammar system of degree n ≥ 1, is a construct:

GS = (N, T, S, P1, ..., Pn),

where N and T are disjoint alphabets (non-terminals and

terminals); S ∊ N is the start symbol and P1, ... Pn are finite sets of
rewriting rules over N ∪ T. P1, ..., Pn are called components of the
system.

Another way of specifying a CD grammar system is:

GS = (N, T, S, G1, ..., Gn),

where Gi = (N, T, Pi, S), 1 ≤ i ≤ n.

Definition 13.18

Let GS = (N, T, S, P1, ..., Pn) be a CD grammar system. We now

define different protocols of co-operation:

Normal mode (* mode): is defined by, without any

restriction. The student works on the blackboard as long as he
wants.

2.
3.

Terminating mode (t mode): For each i ∊ {1, ..., n}, the terminating

derivation by the ith component, denoted by , is defined

by if and only if and there is no z ∊
(N ∪ T)* with .

4.
5.

= k mode: For each i ∊ {1, ..., n}, the k-steps derivation by the ith

component, denoted by , is defined by if and only if there
are x1, ..., xk + 1 ∊ (N ∪ T)* such that x = xi, y = xk + 1 and for each j,
1 ≤ j ≤ k

7.
8.

≤ k mode: For each component Pi, the ≤ k-steps derivation by the

ith component denoted by , is defined by:

10.
11.

≥ k mode: The ≥ k steps of derivation by the ith component,

denoted as , is defined by:

12.

13.
Let D = {*, t}∪{≤ k, ≥ k, = k|k ≥ 1}.

Definition 13.19

The language generated by a CD grammar system GS =

(N, T, S, P1, ..., Pn) in derivation mode f ∊ D is:

Example 13.29.

1. Consider the following CD grammar system:

GS1 = ({S, X, X′, Y, Y′}, {a, b, c}, S, P1 P2), where

P1 = {S → S, S → XY, X′ → X, Y′ → Y}

P2 = {X → aX′, Y → bY′c, X → a, Y → bc}

If f = * mode, the first component derives S ⇒ XY, the second component derives from
switch to first component or derive a X′ from X.
In the first component, X′ can be changed to X or Y′ can be changed to Y or both. The
similarly.
It is not difficult to see that the language generated is {ambncn\m, n ≥ 1}.
The same will be true for t mode, = 1 mode, ≥ 1 mode, ≤ k mode for k ≥ 1.
But, if we consider = 2 mode, each component should execute two steps. In the first c
S ⇒ S ⇒ XY. In the second component, XY ⇒ aX’ Y ⇒ aX’ bY’ c. Then, control goes b
one where X’ and Y’ are changed to X and Y in two steps. The derivation proceeds in
easy to see that the language generated by GS1 in the = 2 mode is {anbncn|n ≥ 1}. A sim
for ≥ 2-mode also and the language generated is the same.
At most, two steps of derivation can be done in each component. Hence, in the case o
where k ≥ 3, the language generated is the empty set.

2. GS2 = ({S, A}, {a}, S, P1, P2, P3)

P1 = {S → AA}
P2 = {A → S}

P3 = {A → a}

In the * mode, {an|n ≥ 2} is generated as the control can switch from component to co
A similar result holds for ≥ 1, ≤ k (k ≥ 1) modes. For = 1, = k, ≥ k (k ≥ 2), the languag
as S → AA can be used only once in P1 and A → a can be used once in P3.
In the t mode, in P1, S ⇒ AA and if the control goes to P3 from AA, aa is derived. If th
to P2 from AA, SS is derived. Now, the control has to go to P1 to proceed with the
derivation SS ⇒ AAAA, and if the control goes to P2, S4 is derived; if it goes to P3, a4 i
to see that the language generated in t mode is {a2n|n ≥ 1}.

3. GS3 = ({S, X1, X2}, {a, b}, S, P1, P2, P3), where

P1 = {S → S, S → X1X1, X2 → X1}

P2 = {X1 → aX2, X1 → a}

P3 = {X1 → bX2, X1 → b}

In * mode, = 1, ≥ 1 mode, ≤ k mode (k ≥ 2), t mode the language generated will be {w

2}. In = 2 mode, each component has to execute two steps, so the language generated
b}+}. A similar argument holds for ≥ 2 steps. For = k or ≥ k modes (k ≥ 3), the langua
empty, as each component can use at most two steps before transferring the control.

We state some results about the generative power without giving

proof. The proofs are fairly simple, and can be tried as exercise. It
can be easily seen that for CD grammar systems working in any of
the modes defined having regular, linear, context-sensitive, or type
0 components, respectively, the generative power does not change;
i.e., they generate the families of regular, linear, context-sensitive,
or recursively enumerable languages, respectively. But by the
example given, we find that CD grammar systems with context-free
components can generate context-sensitive languages. Let CDn(f)
denote the family of languages generated by CD grammar systems
with ε-free context-free components, the number of components
being at most n. When the number of components is not limited,
the family is denoted by CD∞(f). If ε-rule are allowed the
corresponding families are denoted by and ,
respectively.
Theorem 13.26

CD∞(f) = CF, for all f ∊ {= 1, ≥ 1, *} ∪ {≤ k|k ≥ 1}

2.
3.

CF = CD1(f) ⊆ CD2(f) ⊆ CDr(f) ⊆ CD∞(f) ⊆ MAT, for all f ∊ {= k,

≥ k|k ≥ 2}, r ≥ 3, where MAT is the family of context-free matrix
languages without ε-rules and appearance checking.

4.
5.

CDr (= k) ⊆ CDr (= sk), for all k, r, s ≥ 1.

6.
7.

CDr(≥ k) ⊆ CDr(≥k + 1), for all r, k ≥ 1.

8.
9.

CD∞(≥) ⊆ CD∞(=).

10.
11.

CF = CD1(t) = CD2(t) ⊂ CD3(t) = CD∞(t) = ET0L, where ETOL is

the family of languages defined by extended, tabled-0L systems.

12.
13.

Except for the inclusion CD∞(f) ⊆ MAT all the previous relations
are also true for CD grammar system with ε-rules

14.

15.

where MATε is the family of context-free matrix languages without

appearance checking allowing for ε-rules.

16.

The proofs are straight-forward (and not given here).

PC Grammar Systems
Definition 13.20

A PC grammar system of degree n, n ≥ 1, is an (n + 3)-tuple:

GP = (N, K, T, (S1, P1), ..., (Sn, Pn)),

where N is a non-terminal alphabet, T is a terminal alphabet, K =

{Q1, Q2, ..., Qn} are query symbols. N, T, K are mutually disjoint.
Pi is a finite set of rewriting rules over N ∪ K ∪ T, and Si ∊ N, for
all 1 ≤ i ≤ n.

Let VGP = N ∪ K ∪ T.
The sets Pi, are called components of the system. The index i of
Qi points to the ith component Pi of GP.

An equivalent representation for GP is (N, K, T, G1, ... Gn), where

Gi = (N ∪ K, T, Si, Pi), 1 ≤ i ≤ n.

Definition 13.21

Given a PC grammar system

GP =(N, K, T, (S1, P1),..., (Sn, Pn)),

as above, for two n-tuples (x1, x2, ..., xn), (y1, ..., yn), with

xi, yi ∊ , 1 ≤ i ≤ n, where x1 ∉ T*, we write (x1, ..., xn) ⇒
(y1, ..., yn) if one of the following two cases holds.

For each i, 1 ≤ i ≤ n, |xi|K = 0 (i.e., no query symbol in xi), and either

xi ⇒ yi by a rule in Pi or xi = yi ∊ T*.

2.
3.

There is i, 1 ≤ i ≤ n, such that |xi|K > 0. (i.e., xi has query symbols).

Let for each such i, xi=z1Qi1z2Qi2, ...ztQitZt+1, t ≥ 1 for zi∊(N ∪ T)*, 1
≤ j ≤ t + 1.

4.
If |xij|K = 0, for all j, 1 ≤ j ≤ t, then yi = zixi1z2xi2...ztxitZt+1 and
yij = Sj (in returning mode) and yij = xij (in non-returning mode) 1
≤ j ≤ t. If for some j, 1 ≤ j ≤ t, |xij|K ≠ 0, then yi = xi. For all i, 1 ≤ i ≤ n,
such that yi is not specified above, we have yi = xi.

An n-tuple (x1, ..., xn) with xi ∊ is called an instantaneous

description (ID) (of GP).

Thus an ID (x1, ..., xn) directly gives rise to an ID (y1, ..., yn), if

either (a) component-wise derivation: No query symbol appears in
x1, ..., xn; then we have xi ⇒ yi using a rule in Pi, if xi has a non-
terminal. If xi ∊ T*, yi = xi. (b) communication step: Query symbols
occur in some xi. Then, a communication step is performed. Each
occurrence of Qj in xi is replaced by xj, provided xj does not
contain query symbols. In essence, a component xi containing
query symbols is modified only when all occurrences of query
symbols in it refer to strings without occurrences of query symbols.
In this communication step, xj replaces the query symbol Qj. After
that, Gj resumes starting from axiom (this is called returning
mode) or continues from where it was (this is called non-returning
mode). Communication has priority over rewriting. No rewriting is
possible as long as one query symbol is present in any component.
If some query symbols in a component cannot be replaced in a
given communication step, it may be possible that they can be
replaced in the next step. When the first component (master) has a
terminal string derivation stops. ⇒ is used to represent both
rewriting and communication steps. is the reflexive transitive
closure of ⇒. We write for returning mode. for non-returning
mode.

7.
Definition 13.22
The language generated by a PC grammar system GP

in returning mode is:

3.
4.

in non-returning mode is:

If a query symbol is present, rewriting is not possible in any

component. If circular query occurs, communication will not be
possible and the derivation halts without producing a string for the
language. Circular query means something like the following:
component i has query symbol Qj; component j has query symbol
Qk; and component k has query symbol Qi; this is an example of a
cycle query.

The first component is called the master and the language consists
of terminal strings derived there.

Generally, any component can introduce the query symbols. This

is called non-centralized system. If only the first component is
allowed to introduce query symbols, it is called a centralized
system.
A PC grammar system is said to be regular, linear, context-free,
and context-sensitive, depending on the type of rules used in
components.

There are a number of results on the hierarchy of PC grammar

systems. Csuhaj-Varju et al., (1994) and Dassow and Păun, (1997)
give a detailed description of these systems.

Example 13.30.

1.
Let GP1 = ({S1, , S2, S3}, {Q1, Q2, Q3}, {a, b}, (S1, P1), (S2, P2), (S3,
P3)), P1 consists of rules:
2.
1. S1 → abc

2. S1 → a2b2c2

4. S1 → a3Q2

5. → aS

6. → a3Q2

7. S2 → b2Q3

8. S3 → c

P2 = {S2 → bS2}

P3 = {S3 → cS3}
3.
Lr(GP1) = Lnr(GP1) = {anbncn|n ≥ 1}
4.
This can be seen as follows:
5.

abc bs2 cS3

6.
- derivation stops abc ∊ Lr(GP1), Lnr(GP1)
7.
S1 S2 S3

a2b2c2 bS2 cS3

8.
a2b2c2 ∊ Lr(GP1), Lnr(GP1)
9.

a3Q2 bS2 cS3

a3bS2 S2 cS3

a3bb2Q3 bS2 c2S3

a3b3c2S3 bS2 S3

a3b3c3 b2S2 cS3 in returning mode

S1 S2 S3

a3Q2 bS2 cS3

a3bS2 bS2 cS3

a3bb2Q3 b2S2 c2S3

a3b3c2S3 b2S2 c2S3

a3b3c3 b3S2 c3S3 in non-returning mode

10.
a3b3c3 ∊ Lr(GP1), Lnr(GP1)
11.

aS1 bS2 cS3

aa3Q3 b2S2 c2S3

a4b2S2 S2 c2S3

a4b2b2Q3 bS2 c3S3

a4b4c3S3 bS2 S3 in returning mode

a4b4c4 b2S2 cS3

S1 S2 S3

aS’1 bS2 cS3

aa3Q2 b2S2 c2S3

a4b2S2 b2S2 c2S3

a4b2b2Q3 b3S2 c3S3 in non-returning mode

a4b4c3 S3 b3S2 c3S3

a4b4c4 b4S2 c4S3

12.
a4b4c4 ∊ L1(GP1), Lnr(GP1).
13.
14.
Let GP2 = ({S1, S2}, {Q1, Q2}, {a, b}, (S1, P1,) (S2, P2)), where
15.
P1 = {S1 → S1, S1 → Q1Q2}
16.
P2 = {S2 → aS2, S2 → bS2, S2 → a, S2 → b}
17.
Lr(GP2) = Lnr(GP2) = {ww|w ∊ {a, b}+}
18.
We see how abbabb is derived.
19.

S1 aS2

S1 abS2

Q2Q2 abb

abbabb S2 in returning mode

abbabb abb in non-returning mode

20.
However, abbabb ∊ Lr(GP1), Lnr(GP1)
21.
This type of communication is called communication by request.
There is also another way of communication known as
communication by command (Dassow and Păun, 1997). A
restricted version of this communication by command is found
useful in characterizing the workload in computer networks (Arthi
et al., 2001).

Distributed Automata[*]
Formal-language theory and automata theory are twins, in the
sense that the development of one triggers the other one, so as to
make them as close to each other as possible. Hence, we study co-
operative distributed automata here. We restrict ourselves to the
blackboard or sequential model here (Krithivasan et al., 1999),
though there is work on the parallel model also.
Distributed Nondeterministic FSA
Definition 13.23

An n-FSA is a 5-tuple Γ = (K, V, Δ, q0, F) where,

K is an n-tuple (K1, K2,..., Kn), where each Ki is a set of states (of

the ith component)

2.
3.

V is the finite set of alphabet

4.
5.

Δ is an n-tuple (δ1, δ2, ..., δn) of state transition functions where

each δi: Ki × (V ∪ {ε}) → 2K union, 1 ≤ i ≤ n

6.
7.

q0 ∊ Kunion is the initial state

8.
9.

F ⊆ Kunion is the set of final accepting states.

10.
where Kunion = ∪iKi. This notation is followed throughout this
section.

Each of the component FSA of the n-FSA is of the form Mi =

(Ki, V, δi), 1 ≤ i ≤ n. Note that here Ki′s need not be 1disjoint. In this
system, we can consider many modes of acceptance depending
upon the number of steps the system has to go through in each of
the n components. The different modes of acceptance are t-mode,
*-mode, ≤ k-mode, ≥k-mode, and = k-mode. Description of each of
the above modes of acceptance is as follows:

T -MODE ACCEPTANCE
Initially, the automaton which has the initial state begins the
processing of input string. Suppose that the system starts from the
component i. In component i, the system follows its transition
function as any “stand alone” FSA. The control is transfered from
the component i to component j only if the system arrives at a
state q ∉ Ki and q ∊ Kj. The selection of j is nondeterministic
if q belongs to more than one Kj. The process is repeated and we
accept the strings if the system reaches any one of the final states.
It does not matter which component the system is in.

If, for some i(1 ≤ i ≤ n) Ki = Kunion, then by no way the system can

go out of the ith component. In this case, ith component acts like a
“sink.”

Definition 13.24

The ID of the n-FSA, (ID) is given by a 3-tuple (q, w, i) where

q ∊ Kunion, w ∊ V*, 1 ≤ i ≤ n.

In this ID of the n-FSA, q denotes the current state of the whole

system, w the portion of the input string yet to be read, and i the
index of the component in which the system is currently in.

The transition between the ID’s is defined as follows:

(q, aw, i) ⊢ (q′, w, i) iff q′ ∊ δi(q, a), where q ∊ Ki, q′ ∊ Kunion,

a ∊ V ∪ {ε}, w ∊ V*, 1 ≤ i ≤ n

2.
3.

(q, w, i) ⊢ (q, w, j) iff q ∊ Kj - Ki

Let ⊢* be the reflexive and transitive closure of ⊢.

Definition 13.25

The language accepted by the n-FSA Γ = (K, V, Δ, q0, F) working

in t-mode is defined as follows:

Lt(Γ) = {w ∊ V*|(q0, w, i) ⊢ (qf, ε, j) for some qf ∊ F, 1 ≤ j, i ≤ n and q0 ∊ Ki}

*-MODE ACCEPTANCE
Initially, the automaton which has the initial state begins the
processing of the input string. Suppose the system starts the
processing from the component i. Unlike the termination mode,
here there is no restriction. The automaton can transfer the control
to any of the component at any time if possible, i.e, if there is
some j, such that q ∊ Kj then the system can transfer the control to
the component j. The selection is done nondeterministically, if
there is more than one j.
The ID and the language accepted by the system in *-mode can be
defined analogously. The language accepted in *-mode is denoted
as L*(Γ).

= K -MODE (≤ K -MODE, ≥ K -MODE) ACCEPTANCE

Initially, the component which has the initial state begins the
processing of the input string. Suppose the system starts the
processing from the component i. The system transfers the control
to the other component j, only after the completion of exactly k (k′
(k′ ≤ k), k′(k′ ≥ k)) number of steps in the component i, i.e., if there
is a state q ∊ Kj, then the transition from component i to the
component j takes place only if the system has already
completed k(k′(k′ ≤ k), k′(k′ ≥ k)) steps in component i. If there is
more than one choice for j, the selection is done
nondeterministically.

The ID of an n-FSA in the above three modes of derivations and

the language generated by them are defined as follows,

Definition 13.26

The ID of the n-FSA is given by a 4-tuple (q, w, i, j), where

q ∊ Kunion, w ∊ V*, 1 ≤ i ≤ n, j is a non-negative integer.

In this ID of the n-FSA, q denotes the current state of the whole

system, w the portion of the input string yet to be read; i the index
of the component in which the system is currently in, and j denotes
the number of steps for which the system has been in, in the ith
component. The system accepts the strings only if the n-FSA is in
the final state in some component i after reading the string and
provided, it has completed k-steps in the component i in the case
of = k-mode of acceptance (it has completed some k′(k′ ≤ k) steps
in the component i in the case of ≤ k-mode of acceptance or it has
completed some k′(k′ ≥ k) steps in the component i in the case
of ≥ k-mode of acceptance). The languages accepted by the
respective modes are denoted as L=k, L≤k, and L≥k.
Power of Acceptance of Different Modes
We find that, whatever maybe the mode of co-operation, it does
not increase the acceptance power if the n components are FSA.

Theorem 13.27

For any n-FSA Γ working in t-mode, we have Lt(Γ) ∊ REG.

Proof. Let Γ = (K, V, Δ, q0, F) be an n-FSA working in t-mode,

where Δ = (δ1, δ2, ..., δn) and the components have states K1,
K2, ..., Kn. Consider the FSA M = (K′, V, δ, q′0, F′) where,

K′ = {[q, i]|q ∊ Kunion, 1 ≤ i ≤ n} ∪ {q′0}

F′ = {[qf, i]|qf ∊ F, 1 ≤ i ≤ n}

δ contains the following transitions:

for each qk ∊ δi(qj, a), qj ∊ Ki, a ∊ V ∪ {ε}, 1 ≤ i ≤ n,

[q0, i′] ∊ δ(q′0 ε), such that q0 ∊ Ki′

2.
3.

if qk ∊ Ki, then [qk, i] ∊ δ([qj, i], a)

4.
5.
if qk ∊ Kj − Ki, then [qk, j] ∊ δ([qj, i], a), 1 ≤ j ≤ n

This construction of FSA clearly shows that L(M) = L(Γ) and so L(Γ)

∊ REG.

Theorem 13.28

For any n-FSA Γ working in -mode, we have L(Γ) ∊ REG.

Proof. Let Γ = (K, V, Δ, q0, F) be an n-FSA working in *-mode,

where Δ = (δ1,δ2, ..., δn) and the components have states K1,
K2, ..., Kn. Consider the FSA M = (K′, V, δ, q′0, F′) where,

K′ = {[q, i]/q ∊ Kunion, 1 ≤ i ≤ n} ∪ { }

F′ = {[qf, i]/qf ∊ F, 1 ≤ i ≤ n}

δ contains the following transitions:

such that q0 ∊ Ki, 1 ≤ i ≤ n,

2.
3.

for each qy ∊ δi(qs, a), qs ∊ Ki, a ∊ V ∪ {ε}, 1 ≤ i ≤ n,

4.
{[qy, j]}⊆ δ ([qs, i], a), 1 ≤ j ≤ n and qy ∊ Kj.

This construction of FSA clearly shows that, L(M) = L(Γ) and

so L(Γ)∊ REG.

Theorem 13.29

For any n-FSA Γ, n ≥ 1 working in = k-mode, we have L=k ∊ REG.

Proof. Let Γ = (K, V, Δ, q0, F) be an n-FSA working in = k-mode

where Δ = (δ1, δ2, ..., δn) and the components have
states K1, K2, ..., Kn.

Consider the FSA M = (K′, V, δ, , F′) where,

K′ = {[q, i, j]|q ∊ Kunion, 1 ≤ i ≤ n, 0 ≤ j ≤ k}

F′ = {[qf, i, k]|qf ∊ F, 1 ≤ i ≤ n}

δ contains the following transitions:

for each qy ∊ δi(qs, a), qs ∊ Ki, a ∊ V ∪ {ε}, 1 ≤ i ≤ n, 0 ≤ j ≤ k

2. if j < k then [qy, i, j] ∊ δ ([qs, i, j−1], a)

3. if j = k then [qs, j′, 0] ∊ δ(qs, i, k], ε), 1 ≤ j′ ≤ n and qs ∊ Kj′.

This construction of FSA clearly shows that, L(M) = L(Γ) and
so, L(Γ) ∊ REG.

Theorem 13.30

For any n-FSA Γ in ≤ k-mode, we have L≤k(Γ) ∊ REG.

Proof. Let Γ = (K, V, Δ, q0, F) be a n-FSA working in ≤ k-mode

where Δ = (δ1, δ2, ..., δn) and the component states K1, K2, ..., Kn.

Consider the FSA M = (K′, V, δ, , F′) where,

K′ = {[q, i, j]|q ∊ Kunion, 1 ≤ i ≤ n, 0 ≤ j ≤ k}

q0 = [q0, i′, 0] s.t. q0 ∊ Ki′, 1 ≤ i′ ≤ n

F′ = {[qf, i, k′]|qf ∊ F, 1 ≤ i ≤ n, 1 ≤ k′ ≤ k}

δ contains the following transitions and nothing more for

each qy ∊ δi(qs, a), qs ∊ Ki, a ∊ V ∪ {ε}, 1 ≤ i ≤ n, 0 ≤ j ≤ k + 1

if j − 1 < k then:

2.
.

{[qy, i, j]} ∊ δ([qs, i, j−1],a) where qy ∊ Ki, 1 ≤ i ≤ n

.
.

{[qy, i″, 0]} ∊ δ([qs, i, j−1],a) where qy ∊ Ki″, 1 ≤ i″ ≤ n, i≠i″

.
3.

if j − 1 = k then [qs, j′, 0] ∊ δ([qs, i, j − 1], ε), 1 ≤ j′ ≤ n and qs ∊ Kj′.

This construction of FSA clearly shows that, L(M) = L(Γ).

So L(Γ)∊ REG.

Theorem 13.31

For any n-FSA Γ in ≥ k-mode, we have L≥k(Γ)∊ REG.

Proof. Let Γ = (K, V, Δ, q0, F) be a n-FSA in ≥ k-mode, where Δ =

(δ1, δ2, ..., δn) and the component states K1, K2, ..., Kn.

Consider the FSA M = (K′, V, δ, , F′) where,

K′ = {[q, i, j]|q ∊ Kunion, 1 ≤ i ≤ n, 0 ≤ j ≤ k}∪{[q, i]|q ∊ Ui Ki, 1 ≤ i ≤ n}

q0 = [q0, i′, 0] s.t. q0 ∊ Ki′, 1 ≤ i′ ≤ n

F′ = {[qf, i]|qf ∊ F, 1 ≤ i ≤ n}

δ contains the following transitions and nothing more for

each qy ∊ δi (qs, a), qs ∊ Ki, a ∊ V ∪ {ε}, 1 ≤ i ≤ n, 0 ≤ j ≤ k + 1
1.

if j−1 <k then [qy, i, j] ∊ δ([qs, i, j−1], a)

2.
3.

if j−1 = k then:

4.
.

{[qy, i]} ∊ δ([qs, i, j−1},a), qy ∊ Ki

.
.

{[qy, j′, 0]} ∊ δ([qs, i, j−1], a), 1 ≤ j′ ≤ n, j′ ≠ i, and qy ∊ Kj′

.
5.

{[qy, i]} ∊ δ([qs, i], a), qy ∊ Ki

6.
7.

{[qy, j′, 0]} ∊ δ([qs, i], a), 1 ≤ j′ ≤ n, j′ ≠ i, and qy ∊ Kj′

This construction of FSA clearly shows that, L(M) = L(Γ). So L(Γ)

∊ REG.
Thus, we find for n-FSA the different modes of acceptance are
equivalent and n-FSA accept only regular sets. Basically, the
model we have defined is nondeterministic in nature. Restricting
the definition to deterministic n-FSA, will not decrease the power as
any regular set can be accepted by a 1-DFA.

Distributed Nondeterministic Pushdown

Automata
Next we define distributed PDA and consider the language
accepted in different modes. We find that the power of distributed
PDA is greater than the power of a single “centralized” PDA.
Distributed PDA with different modes of acceptance have equal
power, equal to that of a TM. In the case of PDA, usually two types
of acceptance viz. acceptance by empty store and acceptance by
final state are considered. Initially, we consider only acceptance by
final state. Towards the end, we show the equivalence to
acceptance by empty store.

Definition 13.27

An n-PDA is a 7-tuple M = (K, V, Γ, Δ, q0, Z, F) where,

K is an n-tuple (K1, K2, ..., Kn), where each Ki is the finite set of

states for component i.

Kunion = UiKi

3.
4.
V is the finite set of input alphabet.

5.
6.

Γ is an n-tuple (Γ1, Γ2, ..., Γn), where each Γi is a finite set of stack

symbols for component i.

7.
8.

Δ is an n-tuple (δ1, δ2, ..., δn) of state transition functions where

each

9.
10.

q0 ∊ Kunion is the initial state.

11.
12.

Z is an n-tuple (Z1, Z2, ..., Zn) where each Zi ∊ Γi (1 ≤ i ≤ n) is the

start symbol of stack for the ith component.

13.
14.

F ⊆ Kunion is the set of final accepting states.

15.

Each of the component PDAs of the n-PDA is of the form M i = (Ki,

V, Γi, δi, Zi), 1 ≤ i ≤ n. Here, need not be disjoint. As in the
case of distributed FSA, we can have several modes of
acceptance.

T -MODE ACCEPTANCE
Initially, the component which has the initial state begins the
processing of the input string. Suppose the component i has the
start state. The ith component starts the processing with the stack
having the start symbol Zi. The processing proceeds in the
component i as in a stand alone PDA. Suppose in the
component i the system arrives at a state q where q ∉ Ki the system
goes to the jth component (1 ≤ j ≤ n) provided q ∊ Kj. If there is
more than one choice for j, we choose any one of them
nondeterministically. After choosing a particular jth component,
the automaton remains in this component until it reaches a state
outside the domain of its transition function and the above
procedure is repeated. The string is accepted if the automaton
reaches any one of the final states. It does not matter which
component the system is in, or the stacks of the components are
empty or not. The presence of multi-stacks increases the generative
capacity of the whole system.

*-MODE ACCEPTANCE
Initially, the automaton which has the initial state begins the
processing of the input string. Suppose the system starts the
processing from the component i. Unlike the termination mode,
here there is no restriction. The automaton can transfer the control
to any of the component at any time if possible, i.e, if there is
some j such that q ∊ Kj then the system can transfer the control to
the component j. The selection is done nondeterministically if
there is more than one j. The stacks for each component is
maintained separately.

= K -MODE(≤ K -MODE, ≥ K -MODE) ACCEPTANCE

Initially, the component which has the initial state begins the
processing of the input string. Suppose the system starts the
processing from the component i. The system transfers the control
to the other component j only after the completion of exactly k(k′
(k′ ≤ k), k′(k′ ≥ k)) number of steps in the component i, i.e., if there
is a state q ∊ Kj then the transition from component i to the
component j takes place only if the system has already
completed k(k′(k′ ≤ k), k′(k′ ≥ k)) steps in component i. If there is
more than one choice for j, the selection is done
nondeterministically.

Definition 13.28

The ID of the n-PDA working in t-mode is given by a n+3 -tuple (q,

w, α1, α2, ..., αn, i) where q ∊ Kunion, w ∊ V*, , 1 ≤ i, k ≤ n.

In this ID of the n-PDA, q denotes the current state of the whole

system, w the portion of the input string yet to be read and i the
index of the component in which the system is currently in and
α1, α2, ..., αn are the contents of the stacks of the components,
respectively.

The transition between the ID’s in t-mode is defined as follows:

(q, aw, α1, α2, ..., Xαi, ..., αn, i) ⊢ (q′, w, α1, α2, ..., βαi,..., αn, i)

if (q′, β) ∊ δi(q, a, X)

and




(q, w, α1, α2, ..., Xαi, ..., αn, i) ⊢ (q′, w, α1, α2, ..., βαi, ..., αn, j)

if (q′, β) ∊ δi(q, a, X) and q′ ∊ Kj−Ki,

where

q ∊ Ki, q′ ∊ Kunion, a ∊ V ∪ {ε}, w ∊ V*, 1 ≤ i ≤ n, ,1

≤ j < n, , X ∊ Γi.

Let ⊢* be the reflexive and transitive closure of ⊢

Definition 13.29

The language accepted by the n-PDA M = (K, V, Γ, Δ, q0, Z, F) in

the t-mode is defined as follows:

Lt(M) = {w ∊ V* |(q0, w, Z1, Z2, ..., Zn, i’) ⊢* (qf, ε, α1, α2, ...,

αn, i) for some qf ∊ F, 1 ≤ i, i′ ≤ n, αi ∊ and q0 ∊ Ki′}

Definition 13.30

The ID of the n-PDA working in = k-mode is given by a n+4-

tuple (q, w, α1, α2, ..., αn, i, j), where q ∊ K, w ∊ V*, , 1 ≤ i, i′
≤ n, 0 ≤ j ≤ k.

In this ID of the n-PDA, q denotes the current state of the whole

system, w the portion of the input string yet to be read, and i the
index of the component in which the system is currently in and α 1,
α2, ..., αn are the contents of the stacks of the components,
respectively and j denotes the number of steps completed in the
component i.

The transition between the ID’s in = k-mode is defined as follows:

(q, aw, α1, α2, ..., Xαi, ..., αn, i, j) ⊢ (q′, w, α1, α2, ..., βαi, ..., αn, i, j+
1)

if (q′, β) ∊ δi(q, a, X), j <k −1,

where


q ∊ Ki, q′ ∊ Kunion, a ∊ V ∪{ε}, w ∊ V*, 1 ≤ i ≤ n, ,1
≤ j ≤ n, , X ∊ Γi.

and if q′ ∊ Kj(j ≠ i)




(q′, w, α1, α2, ..., Xαi, ..., αn, i, k) ⊢ (q′, w, α1, α2, ..., Xαi, ..., αn, j, 0)

Let ⊢* be the reflexive and transitive closure of ⊢.

Definition 13.31

The language accepted by the n-PDA M = (K, V, Γ, Δ, q0, Z, F) in

the = k-mode is defined as follows:

L=k (M) = {w ∊ V* | (q0, w, Z1, Z2, ..., Zn i′, 0) ⊢* (qf, ε, α1, α2, ..., αn,
i, k) for some qf ∊ F, 1 ≤ i, i′ ≤ n, αi ∊ Γ*i and q0 ∊ Ki′}

The instantaneous description and the language accepted for the

other modes of acceptance are similarly defined.
Example 13.31.

Consider the = 2-mode 2-PDA

M = ((K1, K2),V, (Γ1, Γ2), (δ1, δ2), {q0}, (Z1, Z2), {qf}),

where

K1 = {q0, q1, qp, qp′, qs, qz, qc, qc′}

K2 = {q1, q2, qc, qb, qf}

V = {a, b, c}

Z1 = Z

Z2 = Z

F = {qf}

Γ1 = {Z, a}

Γ2 = {Z, b}

δ1 and δ2 are defined as follows, with the assumption that X ∊ {Z,

a} and Y ∊ {Z, b}.

Equation 13.1.

The above 2-PDA working in =2-mode accepts the following

language:

L = {anbncn | n ≥ 1}

Explanation:

The first component starts the processing. When it uses the first
two transitions, it should have read an a. Then, it switches the
control to the second component where it is in the state q1. After
using the ε-transition to go to the state q2, it can either read
a a or b. Suppose it reads an a, then the system will be in the
state qp. The state qp is used here to put the already read a on to
the first-component stack. This task is carried out by rules 13.5 and
13.6. Suppose when in the second component it reads a b, then the
system will be in the state qs, which is used to see whether there is
one a for each b. This task is carried out by rules 13.8, 13.9, and
13.10. Immediately, after seeing there is no more a’s in the first
component’s stack then it realizes that number of b’s is equal to
number of b’s and it goes to the state qc, which is used to read c’s.
This task is carried out by rule 13.9. After reading each and
every c through rule 13.11, it will erase a b. When there is no b left
in the stack, the system will be in the final state qf. If there are
more b’s left in the second component stack, then the system will
be in the state qc. Then, it uses the last two ε-rules in the first
component and repeats the above procedure until it arrives at the
state qf.

Example 13.32.

Consider the t-mode 2-PDA:

M = ((K1 K2), V, (Γ1, Γ2), (δ1, δ2), {q0}, (Z1, Z2), {qf}),

where

K1 = {q0, qa, qb, qT}

K2 = {qa′, qb′, qs, qf, qe}

V = {a, b}

F = {qf}

Γ1 = {Z, a, b}

Γ2 = {Z, a, b}

Δ = (δ1, δ2),

where δ1 and δ2 are defined as follows, with the assumption that

X ∊ {Z1, a, b}, and Y ∊ {Z2, a, b}.
Equation 13.16.

The above 2-PDA in t-mode accepts the following language L:

L = {ww | w ∊ {a, b}*}.

Explanation:

From q0, the first component either reads a or b and stores the

information that the first alphabet is a or b by entering the
state qa or qb, respectively. It also stacks the first alphabet already
read. This is done by rules 13.16 and 13.17. From qa or qb, it
reads b or a and stacks the read alphabet in the first-component
stack. This is done by rules 13.18, 13.19, 13.20, and 13.21. In qx(x ∊
{a, b}), if the first component reads x then it could be the start of
the string identical to the string read. So, in order to check this, the
stacked up alphabets in the first component are transfered to the
second-component stack. This is done by the rules 13.22, 13.23,
13.24, and 13.25. After transferring the stack, the system will be in
the state qe. The second component in state qe erases the top
alphabet, since it has already checked that the first alphabet
matches. This is carried out by the rule 13.27. Rules 13.28, 13.29,
and 13.30 check whether the read alphabet matches with the top-
stack alphabet, i.e., second half of the input is the same as the first
half.

Acceptance Power of NPDA

As in the case of NFSA, we can show the equivalence between
different modes of co-operation in the case of NPDA also. We know
that a two-stack machine can simulate a TM. Hence, it is
straightforward to see that a 2-PDA is as powerful as a TM.

Acceptance by Empty Store

Definition 13.32

The language accepted by the n-PDA

M = (K, V, Γ, Δ, q0, Z, F) by “empty store acceptance” is defined as
follows:

N(M) = {w ∊ V|(q0, w, Z1,Z2, ..., Zn, i′) ⊢ (q, ε, ε, ε, ..., ε (n

times), i) for some q ∊ Ki, 1 ≤ i, i′ ≤ n, and q0 ∊ Ki′}

Equivalence
The equivalence of acceptance by final state and empty store in a n-
PDA in t-mode is proved by the following theorems:

Theorem 13.32

If L is L(M2) for some n-PDA M2, then L is N(M1) for some n-PDA

M1, where the acceptance mode is t-mode both for M 1 and M2.

Proof. Let M2 = (K, V, Γ, Δ, q0, Z, F) be a n-PDA in t-mode, where

the acceptance is by final state. Let,

K = (K1, K2, ..., Kn)

Γ = (Γ1, Γ2,..., Γn)

Δ = (δ1, δ2, ..., δn)

Z = (Z1, Z2, ..., Zn)

The n-PDA M1 = (K′, V, Γ′, Δ′, q0, Z′, φ) in t-mode, where the

acceptance is by empty store is constructed as
follows: and
1.

3. For each i , includes all elements of ∀ q ∊ Ki, a ∊ V, X ∊ Γi

4. contains

5. For all i(1 ≤ i ≤ n) if q ∊ F ∩ Ki then contains (q1, X)

6. For 1 ≤ i ≤ n − 1,
1.
contains (qi, ε), X ∊ Γi
2.
3.
contains (qi + 1, ε)
4.

7. contains (qn, ε)

Whenever the system enters a final state, string read by the system
should be accepted by the system, i.e., the stacks of the
components should be emptied. For this, as soon as the system
enters the final state it has the possibility of going to the first
component through the state q1. When in the state q1, the system
empties the first-component stack and enters the second
component through the state q2 and the procedure is repeated. In
the state qn, the system empties the stack of the nth component. It
is straightforward to prove that L(M2) = N(M1).

Theorem 13.33

If L is N(M1) for some n-PDA M1 in t-mode, then L is L(M2) for

some n + 1-PDA M2 in t-mode.

Proof. Let M1 = (K, V, Γ, Δ, q0, Z, φ) be a n-PDA in t-mode. Let,

K = (K1, K2, ..., Kn),

Γ = (Γ1, Γ2, ..., Γn),

Δ = (δ1, δ2, ..., δn),

Z = (Z1, Z2, ..., Zn).

The n + 1-PDA M2 = (K′, V, Γ′, Δ′, q0, Z′, {qf}), where

K′ = (K′1, K′2, ..., K′n, K′n+1),

Γ′ = (Γ′1, Γ′2, ..., Γ′n, Γ′n+1),

Δ′ = (δ′1, δ′2, ..., δ′n, δ′n+1) and

Z′ = (Z′1, Z′2, ..., Z′n, Z′n+1) is constructed as follows:

1. K′i = Ki ∪ {r′qi | q ∊ Ki} ∪ {qi}, 1 ≤ i ≤ n

2. K′n+1 = {rqi | q ∊ Kunion} ∪ {qf, qg}

3. , 1 ≤ i ≤ n

5. contains

6. includes all elements of ∀q ∊ Ki, a ∊ V, X ∊ Γi

7.
For 1 ≤ i ≤ n, contains

8.
contains (q1, ZqiZn + 1)

9. contains , 1 ≤ i ≤ n − 1

10. contains (qg, X), 1 ≤ i ≤ n, X ∊ Γi

11.
contains (qf, ε)

12.
contains

13.
contains (q, X), 1 ≤ i ≤ n, X ∊ Γi

Whenever the system’s stacks are empty, the system enters the
new final state qf included in the newly added component . For
this, if the system in the state q in the component i sees its stack is
empty, the system enters the state rqi which is only in the newly
added state set . In the component in state rqi, Zqi is put
into its stack to store the information, which state and which
component the system has to go, if the system sees that some
stacks are non-empty. After stacking the Zqi, the system uses the
states qj to see whether the stack of the component j is empty or
not. If it is empty, it goes to the next component to check the
emptiness of the next component. If not, it enters the state qg which
is in the component n + 1 to get the information from which state
and component it has to continue the processing. This work is done
by the state .

It is straightforward to prove that L(M2) = N(M1).

The equivalence of the “empty store acceptance” and the “final

state acceptance” in the other modes can be proved similarly.

Distributed k-turn PDA
In this section, we consider only *-mode acceptance.
Also, K = K1 = K2 = ··· = Kn. In this restricted version of distributed
PDA, the stacks of the component PDAs can perform at most
a k number of turns while accepting a string. A n-PDA in which the
stack of each of the components can perform at most k-turns each
is called a k-turn n-pushdown automata (k-turn n-PDA). If n is
not explicitly mentioned, then the system is called k-turn
distributed PDA and is denoted by the symbol k-turn *-PDA. The
following corollary immediately follows from this definition of k-
turn n-PDA and the fact that the family of languages accepted by k-
turn PDAs is the family of non-terminal bounded languages. The
proofs are omitted here.

Theorem 13.34

For any k-turn -PDA M1, there exists a 1-turn -PDA M2 such that

L(M1) = L(M2).

This theorem tells us that we can restrict our attention to 1-turn *-

PDA as far as analyzing accepting power is concerned.

Theorem 13.35

The family of languages accepted by 1-turn *-PDAs is closed under

the following operations:

Morphism

2.
3.

Intersection with regular sets

The following result is well known (Păun et al., 1998).

Theorem 13.36

Each language L ∊ RE, L ⊆ T*, can be written in the form

L = prT(EQ(h1, h2) ∩ R), where is a regular language and
h1, are two ε-free morphisms, T ⊆ V1.

Here, EQ(h1, h2) means the following:

For two morphisms h1, , the set EQ(h1, h2) =

{w ∊ V1*|h1 (w) = h2(w)} is called the equality set of h1, h2.

We say that a homomorphism h is a projection (associated to V1)

and denote it by prv1 if
for a ∊ V1 and h(a) = ε, otherwise.

The following theorem shows that equality sets are accepted by 1-

turn 2-PDAs.

Theorem 13.37

For any 2-morphisms h1, , there exists a 1-turn 2-

PDA M such that L(M) = EQ(h1, h2).

This theorem coupled with the characterization of RE by equality

sets (Theorem 13.36) and the closure properties under morphism
and intersection with regular sets (Theorem 13.35), proves that the
family of languages accepted by 1-turn 2-PDA includes the whole
of RE, where RE is the family of recursively enumerable languages
(family of languages accepted by Turning machines). Coupled with
Church’s hypothesis, we have:
Theorem 13.38

For each L ∊ RE, there exists a 1-turn 2-PDA M such that L(M) = L
and conversely, for each 1-turn 2-PDA M, we have L(M) ∊ RE.

[*] The contents of this section originally appeared in “Distributed Processing

in Automata” by Kamala Krithivasan, M. Sakthi Balan and Prahlad
Harsha, IJFCS, Vol. 10, No. 4, 1999, pp 443–463. © World Scientific
Company. Reproduced with permission.

Chapter 14. New Models of
Computation

DNA Computing
Mathematical biology is a highly interdisciplinary area of research
that lies at the intersection of mathematics and biology. So far, in
this area, mathematical results have been used to solve biological
problems. The development of stochastic processes and statistical
methods are examples of such a development. In contrast, an
instance of the directed Hamilton path problem was solved solely
by manipulating DNA strings by Leonard Adleman. Hence, one can
see that biological technique is used to solve a mathematical
problem, thus paving way for a new line of research ‘DNA
computing.’ The resemblance between mathematics and biology is
that in both, simple operations are applied to initial information to
obtain a result. Hence, the use of biology to solve a mathematical
problem was demonstrated by a mathematician with adequate
knowledge in biology, to bring together these two fields. Adleman
thought that DNA strings can be used to encode an information
while enzymes can be employed to simulate simple computations.

Molecules that play central roles in molecular biology and genetics,

are DNA, RNA, and the polypeptides. The recombinant behaviors
of double-stranded DNA molecules is made possible by the
presence of restricted enzymes. Hence, DNA computing is a fast-
growing research area concerned with the use of DNA molecules
for the implementation of computational processes.

Even before the first experiment was performed by Adleman in

1994, in 1987 itself, Tom Head studied the recombinant behavior
of DNA strands from formal language theory point of view. Culik
connected this behavior of DNA strands to semigroup of dominoes.

The theory developed here does not require a concern for its
origins in molecular biology. We present only the details of
‘splicing rules’ and hence ‘splicing systems.’ We provide a new
generative device called ‘splicing system’ that allows a close
simulation of molecular recombination processes by corresponding
generative processes acting on strings. Here, first we start a brief
discussion on double stranded form of DNA and ‘splicing
operation.’

DNA molecules may be considered as strings over the alphabet

consisting of the four compound “two-level” symbols,

ACGT

T G C A.

Any DNA strand, for example, will consist of double strands like
the one given below:

A A A A A G A T C A A A

T T T T T C T A G T T T

Here, A, G, C, T stand for deoxyribonucleotide adenine, guanine,

cytosine and thymine, respectively. A always bonds with T (double
hydrogen bond) and C always bonds with G (triple hydrogen
bond). One can look at (A, T) as complements and as also (C, G).

Let us first explain what a splicing rule is. Any splicing rule
recognizes critical subwords in the double stranded DNA molecule.
For example in
A A A A A A A, G A T C, A A A A

T T T T T T T, C T A G, T T T T

the strands between the comma’s indicate critical subwords in the

molecule.

For example, when DpnI encounters a segment in any DNA

molecule having the 4-letter subword,

GATC

CTAG

it cuts (in both strands of) the DNA molecule between A and T.

The ligase enzyme has the potential to bind together pairs of these
molecules. For example if:

Molecule 1

A A A A A A A G A T C A A A A

T T T T T T T C T A G T T T T

Molecule 2

C C C C C T G G C C A C C

G G G G G A C C G G T G G

then if the enzymes DpnI, which cuts between A and T and BalI,
which cuts between G and C act on molecules 1 and 2, to obtain

A A A A A A A G A T C A A A
and
T T T T T T T C T A G T T T

and
C C C C C T G G C C A C C
and
G G G G G A C C G G T G G

The ligase enzyme is put in the solution containing the above ‘cut
positions’ to recombine. Hence, one can see that new molecule 3 is
formed.

Molecule 3

A A A A A A A G A C C A C C

T T T T T T T C T G G T G G

Several possible molecules can be formed by the ligase. But we

have given molecule 3, which is got from the recombination of the
first cut portion of molecule 1 with the second cut portion of
molecule 2. Molecule 3 thus cannot be cut further by DpnI and BalI
further.

Suppose u and υ represent critical subwords in molecule 1 and u′

and υ′ represent critical subwords in molecule 2 where uυ is the
underlying subword in molecule 1, u′υ′ is the underlying subword
in molecule 2. Then, the action of the enzymes on molecule
1: xuvy (say) and molecule 2: x′u′υ′y′ results in,

1.
xu
2.
3.
υy
4.
5.
x′u′
6.
7.
υ′y′.
8.
Then, the presence of an appropriate ligase in the aqueous solution
results in molecule 3 which is xuv′y′. The whole operation defined
above is known as splicing operation.

Hereafter we can refer DNA strands as strings over the alphabet Σ

= {A, G, C, T}. Splicing system has been conceived as a generative
mechanism, thus paving way for the study of recombinant behavior
of DNA molecules by means of formal language theory. ‘Splicing
system’ was proposed by Tom Head. The action of enzymes and a
ligase is represented by a set of splicing rules acting on the strings.
The language of all possible strings that may be generated by the
splicing process serves as a representation of the set of all
molecules that may be generated by the recombination process. In
the description so far, we have omitted the chemical aspects of
molecular interaction.

Splicing Operation
In this section, we consider splicing systems with uniterated and
iterated modes.

Let V be an alphabet. #, $ be two symbols not in V. A splicing rule

over V is a string of the form:

r = u1#u2 $u3#u4,

where ui ∊ V*, 1 ≤ i ≤ 4.

If x, y ∊ V* such that x = x1u1u2x2, y = y1u3u4y2, then by

rule r to yield z = x1u1u4y2. We say that on applying the splicing
rule ‘r’ to x, y, one gets z. In x and y, u1u2, u3u4 are called the sites
of the splicing and x, the first term and y, the second term of the
splicing operation. When understood from the context, we omit the
specification of r and we write ⊢ instead of .

When we build computability models, in order to keep these

models as close as possible to the reality, we shall consider the
operation of the form:
where r = u1#u2$u3#u4 is a splicing rule.

Example 14.1.

Let V = {a, b, c}. Let r = a#b$c#a. Let x = a10b and y = ca5.

Applying r to x and y,

(a10b,ca5) a15.

In general for x = amb, y = can,

(amb,can) am+n.

This rule produces, for a suitable input of two

integers m and n (represented in unary) m + n (in unary).

The following figure illustrates the result z by the application of the

splicing rule r to x and y.

An H-scheme is a pair σ = (V, R) where V is an alphabet

and R ⊆ V*#V*$V*#V* is a set of splicing rules.

R can be infinite and hence depending on the nature of R (in

Chomskian class of languages), we can classify σ. That is,
if R belongs to REG, then we say that σ is of REG type or an REG
H-scheme. Thus, σ(L) = {z ∊ V*|(x, y) ⊢ z for some x, y ∊ L, r ∊ R}.

Example 14.2.

Suppose σ = ({a, b}, R) where R = {a#b$b#a}.

For L = {anbn, bnan|n ≥ 1},

σ(L) is computed as follows:

Hence σ(L) = {an|n ≥ 2} which is regular.

Example 14.3.

Suppose σ = ({a, b, c}, R) where

R = {an#b$b#cn|n ≥ 1}

Let L1 = {anb|n ≥ 1}, L2 = {bcn|n ≥ 1}

For x ∊ L1, y ∊ L2, (x, y) ⊢ ancn.

Hence σ(L = L1 ∪ L2) = {ancm|n, m ≥ 1}.

Example 14.4.

Let σ = ({a, b}, R) where

R = {w#$#wR|w ∊ {a, b}+}

For x ∊ L1 = {w | w ∊ {a,b}+} and ,

where .

Hence σ(L = L1 ∪ L2) = {wwR|w ∊ {a, b}+}.

Remarks

1.
From the splicing scheme, one can see that a splicing rule is
applicable to any two strings having the subwords that can be
subjected to splicing. Hence, the site of cutting is to be present in
the two strings. In such a case, the whole strings that are
considered need not be over the alphabet of splicing rule. But, the
alphabet of a splicing rule is a subset of the alphabet of the strings
to be spliced.
2.
3.
The scheme defined above considers a language L of strings subject
to splicing. That is, σ is defined as a unary operation with
languages. One can also think σ as binary operation. That is:
4.
σ(L1, L2) = {z ∊ V*|(x, y) z, x, y∊L1, r∊L2}
5.
Given any two families of languages F1 and F2 over V, we define
6.
S(F1, F2) = {σ(L)|L ∊ F1, σ = (V, R), with R ∊ F2}
7.
8.
One can see that each set of splicing rules is a language over the
basic alphabet and {#, $}. We say that F1 is closed under σ of F2-
type if S(F1, F2) ⊆ F1. Hence, we try to investigate the power of F2-
type splicing by investigating S(F1, F2) for various F1. Here, F1,
F2 will belong to {FIN, REG, LIN, CF, CS, RE}. In all the results,
the families of languages considered are supposed to contain
atleast the finite languages.
9.
Lemma 14.1

if and then S(F1, F2) ⊆ S( , ) for all F1, , F2, .

Proof is obvious from the definitions. For example, if F1 = REG,

= CF, F2 = REG, = CF, then S(REG, REG) ⊆ S(CF, CF).

Lemma 14.2
If F1 is a family of languages closed under concatenation with
symbols, then F1 ⊆ S(F1, F2) for all F2.

Proof. Let L ⊆ V*, L ∊ F1 and c ∊ V. Suppose L′ = {wc|w ∊ L}

∊ F1. Then σ = (V ∪ {c}, {#c$c#}) generates L, where L ∊ S(F1, F2).

Lemma 14.3

If F1 is a family of languages closed under union, catenation with

symbols and FIN splicing, then F1 is closed under concatenation.

Proof. Let L1, L2 ∊ F1, L1, L2 ⊆ V*. Let C1, C2 ∉ V and σ = (V ∪

{C1, C2}, {#C1$C2#}). Clearly σ(L1C1 ∪ C2L2)
= L1L2. L1C1, C2L2 ∊ F1 as F1 is closed under concatenation with
symbols. L1C1 ∪ C2L2 ∊ F1 as F1 is closed under
union. σ(L1 C1 ∪ C2L2) ∊ F1 as F1 is closed under FIN splicing.
Therefore L1L2 ∊ F1.

Theorem 14.1

S(REG, REG) ⊆ REG.

Proof. By the above lemma, since REG languages are closed

under catenation and arbitrary gsm mapping, REG is closed under
REG splicing

Since REG languages are closed under catenation with symbols we

have REG S(REG, F2) for any F2 ∊ (FIN, REG, CF, CS, RE).
Hence, we have,

Theorem 14.2
S(REG, REG) = REG.

Table 14.1 gives power of splicing on Chomskian languages. In the

table, the rows are marked for F1 and columns are marked
for F2 in S(F1, F2). The intersection of the row and column entries
indicate the power of S(F1, F2). If there are two entries like F3, F4 it
means . All the assertions are proved in the
literature (Păun et al., 1998).

Table 14.1. Families obtained by non-iterated splicing

S(F1, F2) FIN REG LIN CF CS RE

FIN FIN FIN FIN FIN FIN FIN

REG REG REG REG, LIN REG, CF REG, RE REG,

LIN LIN, CF LIN, CF RE RE RE RE

CF CF CF RE RE RE RE

CS RE RE RE RE RE RE

RE RE RE RE RE RE RE

Iterated Splicing
Let σ = (V, R) be a splicing scheme where R ⊆ V*#V*$V*#V*, one
can apply σ to a language L ⊆ V* iteratively. That is:

σ0(L) = L

σ1(L) = σ0(L)∪σ(L)

σ2(L) = σ1(L)∪σ(σ1(L))

⋮
σi+1(L) = σi(L)∪σ(σi(L))

σ* (L) =

σ*(L) is the smallest language L′ which contains L and is closed

under the splicing with respect to σ, σ(L′) ⊆ L′. Also for two
languages L1 and L2, we can define H(L1, L2) = σ*(L) where σ =
(V, R), L = L1, R = L2. That is for any family of languages F1 and F2,

H(F1, F2) = {σ*(L)|L ∊ F1 and σ = (V, R) with R ∊ F2}.

Example 14.5.

Let σ = ({a, b}, R) be a splicing scheme. Let L = {aba, abba}, R =

{r1: a#b$b#ba, r2: b#a$b#ba}.

Consider,

Here, the vertical line ‘|’ indicates the position at which the splicing
rule r1 is used.

Consider,

Here,

L = {aba, abba}

σ0(L) = {abba, aba}

σ1(L) = {aba, abba, abbba}

σi(L) = {abna|1 ≤ n ≤ i + 2} for i ≥ 1

Hence σ*(L) = {abna|n ≥ 1}.

We now state a few simple lemmas.

Lemma 14.4
If and , then for all
families F1, , F2, .

Proof follows from definition.

Definition 14.1

Let σ = (V, R) be a H-scheme with R finite. The radius of σ is

defined as:

rad(σ) = max{|x|x = ui, 1 ≤ i ≤ 4, for some u1#u2$u3#u4 ∊ R}.

S(F, p) for p ≥ 1, denote the family of languages σ(L) for some

L ∊ F and σ, an H-scheme with radius less than or equal to p.

In Example 14.5, rad(σ) = 2. The important observation made for

radius is that:

S(F, 1) ⊆ S(F, p) for all p ≥ 1.

Here, F ∊ {FIN, REG, CF, CS, RE}.

Lemma 14.5

F ⊆ H(F, 1) for all F ∊ {FIN, REG, CF, CS, RE}.

Proof. Let L ⊆ Σ* be given. Let c ∉ V and the H-scheme σ = V ∪

{c}, {#c$c#}. We clearly have σi(L) = L, for all i ≥ 1 and σ* (L) = L.
Theorem 14.3

(The regularity preserving lemma) H(REG, FIN) ⊆ REG.

Proof is lengthy and is not included here.

Theorem 14.4

FIN ⊂ H(FIN, 1) ⊂ H(FIN, 2) ⊂ ... ⊂ H(FIN, FIN) ⊂ REG.

Proof. Inclusions follow from definitions and from the previous

lemmas and theorem. For proper inclusion, we have the following
example.

Consider the language:

Ln = {a2nb2nakb2na2n|k ≥ 2n + 1}.

Consider the H-scheme σ =
({a, b}, an+1#an$an+1#an}), .

Clearly .

Ln cannot be generated by any H-scheme σ1 = (V, R) with rad(σ1)

≤ n. Suppose it is possible to do so. This means there exist a H-
scheme σ2 = (V, R) where rad(σ2) ≤ k such that .
Let r = u1#u2$u3#u4 ∊ R such that |u1u2| ≤ 2n. Clearly if u1u2 ∊ a*,
then it is a subword of a2n occurring in x = a2nb2nakb2na2n, k ≥ 2n + 1;
similarly for y.
(an |anb2n ak b2n a2n, a2n b2n ak b2n an|an) a2n is not in Ln.

There are also other ways for identifying u1, u2, u3, u4 in x and y so
that (x, y) ⊢ z and z is not of the form a2nb2nakb2na2n, k ≥ 2n + 1. We
arrive at strings not in Ln. Hence the result.

Lemma 14.6

Every language L in RE, L ⊆ Σ, can be written as L = Σ ∩ L′ for L′

∊ H(FIN, REG).

Proof. Let G = (N, Σ, P, S) be a type 0 grammar such that L(G)

= L. Let σ = (V, R) be a H-scheme such that:

V = N ∪ Σ ∪ {X, X′, B, Y, Z} ∪ {Yα|α ∊ N ∪ Σ ∪ {B}}.

L0 = {XBSY, XZ, ZY}∪{ZvY|u → υ ∊ P}∪{ZYα, X′αZ|α ∊ N∪Σ∪{B}}.

That is, L0 is the initial set of strings (axiom) available for splicing.

R contains the following group of rules:

Xw#uY$Z#υY for u → υ ∊ P and w ∊ (N ∪ Σ ∪{B})*.

This set of rules is used to simulate rules in P. That is, they indicate
the application of the rule u → υ ∊ P for a string containing ‘u.’
3.
4.

Next we have to simulate any sentential form w of G by a

string XBwY produced by σ and conversely if Xw1Bw2 Y is
produced by σ, then, w2w1 is a sentential form of G.

For this we have the following set of rules.

These rules move symbols from the right-hand end of the current
string to the left-hand end. The presence of B indicates the
beginning of the corresponding sentential form. For, if the current
string in σ is of the form β1 w1 Bw2 β2 for some β1, β2 markers of
type X, X′, Y, Yα, α ∊ N ∪ Σ ∪ {B}, w1, w2 ∊ (N ∪ Σ)*, then w2w1 is
a sentential form of G.

The starting string will be XBSY where S is the axiom of G.

Let us actually see how the above rules simulate the sentential
forms. If:

10.
(XwαY, ZYα) ⊢ XwYα
11.

by rule 1. Yα indicates the erasing of α which is to the right of w.

Now to XwYα, the only applicable rule is rule 2. i.e.,

12.
(X′αZ, XwYα) ⊢ X′αwYα.
13.

This indicates the shifting of α to the left of w. The only possible
rule to be applied is rule 3. Hence,

14.
(X′αwYα, ZY) ⊢ X′αwY.
15.

Now the application of rule 4, produces:

16.
(XZ, X′αwY ⊢ XαwY).
17.

Hence, we started with XwαY and reached XαwY by the application

of above set of rules. This can be repeated as long as we want.

18.
19.

Now, the strings thus produced by σ must contain only symbols

from Σ. Hence, we have the following set of rules.
20.
.

#ZYσXB#wY for w ∊ Σ*

.
.

#Y$XZ #.

Clearly, XB can be removed only if the string produced

by r has Y, the string between X and Y is over Σ, the symbol B is at
the left-hand position. That is, the string must be of the form XBwY,
w ∊ Σ*. After removing XB, we can remove Y using rule 2 above.
i.e.,

(wY, XZ) ⊢ w.

Hence, for w ∊ L(G), w ∊ σ(L0) ∩ Σ,

Clearly, σ(L0) ∩ Σ ⊆ L(G) and conversely if w ∊ L(G), it can be

produced by the above said way by σ giving w ∊ σ*(L0) ∩ Σ*.
Thus L(G) = σ*(L0) ∩ Σ*.

We state a result by means of Table 14.2 which gives the hierarchy

among various families of H(F1, F2), where F1, F2 ∊ {FIN, REG,
LIN, CF, CS, RE}.

Table 14.2. Generative power of H-systems

H(F1F2) FIN REG LIN CF CS

FIN FIN, REG FIN, RE FIN, RE FIN, RE FIN, RE

REG REG REG, RE REG, RE REG, RE REG, RE

LIN LIN, CF LIN, RE LIN, RE LIN, RE LIN, RE

CF CF CF, RE CF, RE CF, RE CF, RE

CS CS, RE CS, RE CS, RE CS, RE CS, RE

RE RE RE RE RE RE

Here the intersection of the row marked F1 with the column

marked F2, there appears H(F1, F2) or two families F1, F2 such
that F3 ⊂ H(F1, F2) ⊂ F4.

Next, we define a system called extended H-system (EH). A EH-

system is a quadruple γ = (V, T, A, R) where V is an
alphabet, T ⊆ V is a set of terminal alphabet, A ⊆ V* is its axiom
set, R is a set of splicing rules. Here σ = (V, R) is the underlying
splicing scheme or H-scheme. We are now familiar about usage
of σj for given set of input strings.

The language generated by γ is defined to be L(γ) = Σ(A) ∩ T.

For any two families of languages F1, F2, let EH(F1, F2) denote the
family of languages L(γ) generated by EH-
system γ where A ∊ F1 and R ∊ F2.

For any two families F1, F2 we denote EH(F1, F2) to be the family of

languages L(γ) generated by an EH-system γ, where A ∊ F1, R ∊ F2.

The generative power of EH-systems is given by Table 14.3.

Table 14.3. The generative power of extended H-systems

EH(F1, F2) FIN REG LIN CF

FIN REG RE RE RE
EH(F1, F2) FIN REG LIN CF

REG REG RE RE RE

LIN LIN, CF RE RE RE

CF CF RE RE RE

CS RE RE RE RE

RE RE RE RE RE

We state two important results without proof.

Theorem 14.5

EH(FIN, FIN) = REG

Theorem 14.6

EH(FIN, REG) = RE

Example 14.6.

Let γ = (V, T, A, R) be an EH-system where

V = {B, a, b} T = {a, b} A = {a, b, aB, Bb}

R = {a#B $B #b,

a#$#aB,
a#B$#a,
Bb#$#b,
b#$B#b}.

L(γ) = {an|n ≥ 1} ∪ {bm|m ≥ 1} ∪ {anbm|n, m ≥ 1}.

We see that from splicing scheme, one can define a splicing system
with a well-defined collection of input. This splicing system
actually resembles a generative grammar. Formally, it is a
system γ = (V, L, R) where (V, R) is a splicing scheme.

Definition 14.2

A simple H-system is a triple γ = (V, A, M) where V is the total

alphabet, A is a finite language over V and M ⊆ V. The elements of
A are called axioms and those of M are called markers. When
simple H-systems were introduced, four ternary relations on the
language V* were considered corresponding to splicing rules of the
form

a#$a#, #a$#a, a#$#a, #a$a#

where a is an arbitrary element of M. The rules listed above

correspond to splicing rules of type (1, 3), (2, 4), (1, 4) and (2, 3),
respectively. Clearly, rules of types (1, 3) and (2, 4) define the
same operation for x, y, z ∊ V* and a ∊ M. We obtain:

For the (1, 4) and the (2, 3) types, we have

Similar to H-systems, we define iterated splicing for simple H-
systems; for a language L ⊆ V* and (i, j) ∊ {(1, 3), (2, 4), (1, 4), (2,
3)}. We denote:

Define

The language generated by γ with splicing rules of type(i, j) is

defined as:

Note that SH-systems are of radius one.

Example 14.7.

Let γ = ({a, b, c}, A, M = {a}) be a simple H-system. Let a#$a# be

the splicing rule. This is a (1, 3) SH-system. Let A =
{abc, aabc, aaabc}.

L(1, 3) (γ) = {anbc|n ≥ 1}.

Definition 14.3

An EH-system with permitting contexts is a quadruple γ = (V, T, A,

R) where V, T, A are the same as defined earlier and R is a finite
set of triples p = (r = u1#u2 $u3#u4, C1, C2), where C1, C2 ⊆ V and
r is a usual splicing rule.

In this case, iff and all symbols of C1 appear in x

and all symbols of C2 occur in y.

An EH-system with forbidding contexts is a quadruple γ = (V, T, A,

R) where V, T, A are the same as defined earlier and R is a finite
set of triples p = (r = u1#u2 $u3#u4, C1, C2) where C1, C2 ⊆ V and r
is a usual splicing rule.

In this case, iff and no symbol of C1 appears in

x and no symbol of C2 occurs in y.

EH (FIN, p [k]) refers to the family of languages generated by EH-

systems with permitting contexts, finite set of axioms and rules with
maximum radius equal to k for k ≥ 1. In a similar fashion, one can
define EH(FIN, f[k]) to be the family of languages generated by EH-
systems with forbidding contexts, finite set of axioms, and rules
with maximum radius equal to k.

We state some results without proof.

Theorem 14.7

EH2(FIN, p[2]) = RE.

Theorem 14.8

EH2(FIN, p[1]) = CF.

There are several models of splicing systems. Some of them are
studied by splicing graphs, arrays, etc. In each model, the power of
splicing is explored from formal language theory point of view.
Other aspects of splicing like circular splicing, self-assembly are
some of the topics of recent research.

Membrane Computing
Membrane computing is a new computability technique which is
inspired from biochemistry. The model used for any computation
here resembles a membrane structure and it is a highly parallel,
distributed computing model. Several cell-like membranes are
recurrently placed inside a unique membrane called “skin”
membrane. The structure of this model may be placed as a Venn
diagram without intersected sets and with unique superset. Hence,
the structure or compartment of the computing model can be
diagrammatically represented as below:

The name or the representation of the whole diagram will be called

a membrane structure. In the regions delimited by the membranes,
the objects or data structure are placed. These objects evolve by
evolution rules. After evolution, the developed new data structure
may remain in the same membrane or may move from one
compartment to another. Starting with a certain number of objects
in certain membranes, let the system evolve. If it halts, then the
computation is finished. The result can be viewed either in some
membrane region or outside skin membrane. If the development of
the system goes forever, then the computation fails to produce an
output.
The new method of computation is proposed by Păun in 1998 and
hence, this system of computation is also called as P system. In the
regions delimited by membranes, objects are placed. Hence, the
object of evolution may be strings, multisets etc. These evolve
based on rules placed in the regions. By varying the evolving
objects and evolution rules, several variants of P systems or
membrane systems have been developed. In any variant, the
objects are assumed to evolve; each object can be transformed into
other objects, can pass through a membrane, or can dissolve the
membrane in which it is placed. A priority relation between the
rules is also permitted. The evolution is done in parallel for all
objects which can evolve.

A multiset over a set X is a mapping M: X → N ∪ {∞}.

For a ∊ X, M(a) is called the multiplicity of a in the multiset M.
The support of M is the set supp(M) = {a ∊ X | M(a) > 0}. We can
write a multiset M of finite support, supp(M) = {a1,..., an} in the
form {(a1, M(a1)),..., (an, M(an))}. We can also represent this
multiset by the string , as well as by any
permutation of w(M). Conversely, having a string w ∊ V*, we can
associate with it the multiset M(w): V → N ∪{∞} defined by M(w)
(a) = |w|a, a ∊ V.

For two multisets M1, M2 over the same set X we say that M1 is

included in M2, and we write M1 ⊆ M2 if M1(a) ≤ M2(a) for
all a ∊ X. The union of M1, M2 is the multiset M1 ∪ M2 defined by
(M1 ∪ M2)(a) = M1(a) + M2(a). We define here the
difference M2 − M1 of two multisets only if M1 ⊆ M2 and this is the
multiset defined by (M2 − M1)(a) = M2(a) − M1 (a).

Operations with Strings and Languages

The Boolean operations (with languages) are denoted as usual: ∪
for union, ∩ for intersection, and c for complementation.

P Systems with Labeled Membranes

We start from the observation that life “computes” not only at the
genetic level, but also at the cellular level. At a more specific level
with respect to the models we are going to define, important to us
is the fact that the parts of a biological system are well delimited by
various types of membranes, in the broad sense of the
term, starting from the membranes, which delimit the various
intra-cell components, going to the cell membrane and then to the
skin of organisms, and ending with more or less virtual
“membranes,” which delimit, for instance, parts of an ecosystem.
In very practical terms, in biology and chemistry one knows
membranes which keep together certain chemicals and allow other
chemicals to pass, in a selective manner, sometimes only in one
direction (for instance, through protein channels placed in
membranes).

Formalizing the previous intuitive ideas, we now introduce the

basic structural ingredient of the computing devices we will define
later: membrane structures.

Let us consider first the language MS over the alphabet {[,]}, whose

strings are recurrently defined as follows:

1.
[] ∊ MS;
2.
3.
If μ1, ... , μn ∊ MS, n ≥ 1, then [μ1 ... μn] ∊ MS;
4.
5.
nothing else is in MS.
6.
Consider now the following relation over the elements
of MS: x ~ y if and only if we can write the two strings in the
form x = μ1μ2μ3μ4, y = μ1μ3μ2μ4, for μ2μ3 ∊ MS and μ2, μ3 ∊ MS(two
pairs of parentheses, which are neighbors at the same level are
interchanged, together with their contents). We also denote by ~
the reflexive and transitive closure of the relation ~. This is clearly
an equivalence relation. We denote by the set of equivalence
classes of MS with respect to this relation. The elements of are
called membrane structures.
Each matching pair of parentheses [,] appearing in a membrane
structure is called a membrane. The number of membranes in a
membrane structure μ is called the degree of μ and is denoted
by deg(μ) while the external membrane of a membrane
structure μ is called the skin membrane of μ. A membrane, which
appears in μ ∊ in the form [ ](no other membrane appears inside
the two parentheses) is called an elementary membrane.

The depth of a membrane structure μ, denoted by dep(μ) is defined

recurrently as follows:

1.
If μ = [ ], then dep(μ) = 1;
2.
3.
If μ = [μ1 ... μn], for some μ1, ... , μn ∊ MS, then dep(μ) =
max{dep(μi) | 1 ≤ i ≤ n} + 1.
4.
A membrane structure can be represented in a natural way as a
Venn diagram. This makes clear that the order of membrane
structures of the same level in a larger membrane structure is
irrelevant; what matters is the topological structure, the
relationship between membranes.

The Venn diagram representation of a membrane structure μ also

makes clear the notion of a region in μ: any closed space delimited
by membranes is called a region of μ. It is clear that a membrane
structure of degree n contains n internal regions, one associated
with each membrane. By the outer region, we mean the whole
space outside the skin membrane. Figure 14.1 illustrates some of
the notions mentioned above.
Figure 14.1. A membrane structure

We now make one more step towards the definition of a computing

device, by adding objects to a membrane structure. Let U be a
finite set whose elements are called objects. Consider a membrane
structure μ of degree n, n ≥ 1, with the membranes labeled in a
one-to-one manner, for instance, with the numbers from 1 to n. In
this way, the regions of μ are also identified by the numbers from 1
to n. If a multiset Mi: U → N is associated with each region i of μ, 1
≤ i ≤ n, then we say that we have a super-cell (note that we do not
allow infinite multiplicities of objects in U).

Any multiset Mi mentioned above can be empty. In particular, all

of them can be empty, that is, any membrane structure is a super-
cell. On the other hand, each individual object can appear in
several regions, and in several copies.

Several notions defined for membrane structures are extended in

the natural way to super-cells: degree, depth, region etc.

The multiset corresponding to a region of a super-cell (in

particular, it can be an elementary membrane) is called the
contents of this region. The total multiplicities of the elements in
an elementary membrane m (the sum of their multiplicities) is
called the size of m and is denoted by size(m).

If a membrane m′ is placed in a membrane m such that m and m′

contribute to delimiting the same region (namely, the region
associated with m), then all objects placed in the region associated
with m are said to be adjacent to membrane m′ (so, they are
immediately “outside” membrane m′ and “inside” membrane m).

A super-cell can be described by a Venn diagram where both the

membranes and the objects are represented (in the case of objects,
taking care of their multiplicities).

We are now ready to introduce the subject of our investigation, a

computing mechanism essentially designed as a distributed
parallel machinery, having as the underlying structure a super-cell.
The basic additional feature is the possibility of objects to evolve,
according to certain rules. Another feature refers to the definition
of the input and the output (the result) of a computation.

A P system is a super-cell provided with evolution rules for its

objects and with a designated output region.

More formally, a P system of degree m, m ≥ 1, is a construct

Π = (V, T, C, μ, w1, ... , wm, (R1, ρ1), ... , (Rm, ρm), i0),

where:

1.
V is an alphabet; its elements are called objects;
2.
3.
T ⊆ V (the output alphabet);
4.
5.
C ⊆ V, C ∩ T = φ (catalysts);
6.
7.
μ is a membrane structure consisting of m membranes, with the
membranes and regions labeled in a one-to-one manner with
elements of a given set; here we use labels 1, 2, ... , m;
8.
9.
wi, 1 ≤ i ≤ m are strings representing multisets over V associated
with the regions 1, 2, ... , m of μ;
10.
11.
Ri, 1 ≤ i ≤ m, are finite sets of evolution rules over V associated
with the regions 1, 2, ... , m of μ; ρi is a partial order relation
over Ri, 1 ≤ i ≤ m, specifying a priority relation among rules of Ri.
12.
An evolution rule is a pair (u, υ), which we usually write in the
form of u → υ, where u is a string over V and υ = υ′ or υ = υ′δ,
where υ′ is a string over
13.
{ahere, aout, ainj | a ∊ V, 1 ≤ j ≤ m},
14.
and δ is a special symbol not in V. The length of u is called the
radius of the rule u → υ. (The strings u, υ are understood as
representations of multisets over V, in the natural sense.)
15.
16.
i0 specifies the output membrane of Π if it is a number
between 1 and m. If it is equal to ∞, it indicates that the output is
read in the outer region.
17.
When presenting the evolution rules, the indication “here” is in
general omitted. It may be remembered that the multiset
associated with a string w is denoted by M(w).

If Π contains rules of radius greater than one, then we say that Π is

a system with co-operation. Otherwise, it is a non-
cooperative system. A particular class of co-operative systems is
that of catalytic systems: the only rules of radius greater than one
are of the form ca → cυ, where c ∊ C,a ∊ V – C, and v contains no
catalyst; moreover, no other evolution rules contain catalysts
(there is no rule of the form c → υ or a → υ1cυ2, for c ∊ C). A system
is said to be propagating if there is no rule, which diminishes the
number of objects in the system (note that this can be done by
“erasing” rules, but also by sending objects out of the skin
membrane).

Of course, any of the multisets M(w1), ... , M(wn) can be empty

(that is, any wi can be equal to φ) and the same is valid for the
sets R1, ... ,Rn and their associated priority relations ρi.

The components μ and w1, ... ,wn of a P system define a super-cell.

Graphically, we will draw a P system by representing its underlying
super-cell, and also adding the rules to each region, together with
the corresponding priority relation. In this way, we can have a
complete picture of a P system, much easier to understand than a
symbolic description.

The (m + 1)—tuple (μ, w1,...,wm) constitutes the initial

configuration of Π. In general, any sequence (μ′, w′i1, . . . , w′ik),
with μ′, a membrane structure obtained by removing from μ all
membranes different from i1, ..., ik (of course, the skin membrane
is not removed), with w′j strings over V, 1 ≤ j ≤ k, and {i1, ..., ik} ⊆
{1, 2, ..., m}, is called a configuration of Π.

It is important to note that the membranes preserve the initial

labeling in all subsequent configurations; in this way, the
correspondence between membranes, multisets of objects, and sets
of evolution rules is well specified by the subscripts of these
elements.

For two configurations:

of Π we write C1 ⇒ C2 and we say that we have

a transition from C1 to C2, if we can pass from C1 to C2 by using the
evolution rules appearing in Ri1, ..., Rik in the following manner:

Consider a rule u → υ in a set Rit. We look to the region of μ′

associated with the membrane it. If the objects mentioned by u,
with the multiplicities specified by u appear in w′it (that is, the
multiset M(u) is included in M(W′it) ), then, these objects can
evolve according to the rule u → υ. The rule can be used only if no
rule of a higher priority exists in Rit and can be applied at the same
time with u → υ. More precisely, we start to examine the rules in
the decreasing order of their priority and assign objects to them. A
rule can be used only when there are copies of the objects whose
evolution it describes and which are not “consumed” by rules of a
higher priority. Moreover, there is no rule of a higher priority,
irrespective of which rules it involves, which is applicable at the
same step. Therefore, all objects to which a rule can be
applied must be the subject of a rule application. All objects
in u are “consumed” by using the rule u → υ, that is, the
multiset M(u) is subtracted from M(w′it).

The result of using the rule is determined by υ. If an object appears

in υ in the form ahere, then it will remain in the same region it. If an
object appears in υ in the form aout, then a will exit the
membrane it and will become an element of the region
immediately outside it (thus, it will be adjacent to the
membrane it from which it was expelled). In this way, it is possible
that an object leaves the system. If it goes outside the skin of the
system, then it never comes back. If an object appears in the
form ainq, then a will be added to the multiset M(w′q), providing
that a is adjacent to the membrane q. If ainq appears in υ and
membrane q is not one of the membranes delimiting “from below”
the region it, then the application of the rule is not allowed.

If the symbol δ appears in v, then membrane it is removed (we

say dissolved) and at the same time the set of rules Rit (and its
associated priority relation) is removed. The multiset M(w′it) is
added (in the sense of multisets union) to the multiset associated
with the region which was immediately external to membrane it.
We do not allow the dissolving of the skin, because this means that
the super-cell is lost and we no longer have a correct configuration
of the system.

All these operations are performed in parallel, for all possible

applicable rules u → υ, for all occurrences of multisets u in the
region associated with the rules, for all regions at the same time.
No contradiction appears because of multiple membrane
dissolving, or because simultaneous appearance of symbols of the
form aout and δ. If at the same step, we have aini outside a
membrane i and δ inside this membrane, then, because of the
simultaneity of performing these operations, again no
contradiction appears: we assume that a is introduced in
membrane i at the same time when it is dissolved, thus a will
remain in the region placed outside membrane i; that is, from the
point of view of a, the effect of aini in the region outside
membrane i and δ in membrane i is ahere.

A sequence of transitions between configurations of a

given P system Π is called a computation with respect to Π. A
computation is successful if and only if it halts, that is, there is no
rule applicable to the objects present in the last configuration. If
the system is provided with an output region i0 ≠ ∞, then the
membrane i0 is present as an elementary one in the last
configuration of the computation. Note that the output membrane
was not necessarily an elementary one in the initial configuration.
The result of a successful computation can be the total number of
objects present in the output membrane of a halting configuration,
or ψT(w), where w describes the multiset of objects from T present
in the output membrane in a halting configuration (ψT is the
Parikh mapping associated with T), or a language, as it will be
defined immediately. The set of vectors ψT(w) for w describing the
multiset of objects present in the output membrane of a system Π
in a halting configuration is denoted Ps(Π) (from “Parikh set”) and
we say that it is generated by Π in the internal mode. When we are
interested only in the number of objects present in the output
membrane in the halting configuration of Π, we denote by N(Π)
the set of numbers “generated” in this way.

When no output membrane is specified (i0 = ∞), we observe the

system from outside and collect the objects ejected from the skin
membrane, in the order they are ejected. Using these objects, we
form a string. When several objects are ejected at the same time,
any permutation of them is considered. In this way, a string or a
set of strings is associated with each computation, that is, a
language is associated with the system.

We denote by L(Π) the language computed by Π in the way

described above. We say that it is generated by Π in the external
mode.

We now illustrate with an example, the functioning of a P system.

Example 14.8.

Consider the P system of degree 3

Π = (V, T, C, μ, w1, w2, w3, (R1, ρ1), (R2, ρ2), (R3, ρ3), 3),

V = {a, b, c, d, x},

T = {x},

C = φ,

μ = [1[2]2[3]3]1,

w1 =
φ, , ρ1=φ,

w2 = {abm−2cn}, R2 = {ab → a, a → δ, c → chere dout},

ρ2 = {ab → a > a → δ},

w3 = φ, R3 = φ, ρ3 = φ.

The initial configuration of the system is presented in Figure 14.2.

Figure 14.2. P system generating product mn

In this example, there are no objects present in membrane 1 and 3,

hence, no rule can be applied here. We start in membrane 2, using
the objects a, b, c present in membrane 2, the
rules ab → a,c → cheredout, applied m − 2 times reduces abm-2 to a
and cn remains in membrane 2. m − 2 copies of d are sent out and
then membrane 2 dissolves. At this time xn(m-2) is available in
membrane 3. When membrane 2 dissolves cn dn are in membrane
1, and sent to membrane 3 as x2n in the next step. Hence N(π) =
{mn|n ≥ 1, m ≥ 2}.

Rewriting P Systems
Now, we consider objects which can be described by finite strings
given over a finite alphabet, instead of objects of an atomic type.
The evolution of an object then corresponds to a transformation of
the string. The transformations take place in the form of rewriting
steps. Consequently, the evolution rules are given as rewriting
rules.

Assume that we have an alphabet V. A usual rewriting rule is a pair

(u, υ) of words over V. For x, y ∊ V* we
write x ⇒ y iff x = x1ux2 and y = x1υx2, for some strings x1, x2 ∊ V*.
The rules are also provided with indications on the target
membrane of the produced string. Here, the membrane dissolving
action is not considered as it is not required for obtaining
computational completeness. Always we use context-free rules of
the form:

X → (υ, tar)

where X → υ is a context-free rule and tar ∊ {here, out, inj} (“tar”

comes from target, j is the label of a membrane), with the obvious
meaning: the string produced by using this rule will go to the
membrane indicated by tar.

The important fact now is that a string is a unique object, hence, it

passes through different membranes as a unique entity, its symbols
do not follow different itineraries, as it was possible for the objects
in a multiset. Of course, in the same region we can have several
strings at the same time, but it is irrelevant whether or not we
consider multiplicities of strings: each string follows its own “fate.”
That is why we do not speak here about multiplicities. In this
framework, the catalysts are meaningless. Now, we discuss some
basic variants of rewriting P systems.

P Systems with Sequential Rewriting

A sequential rewriting P system (or rewriting P systems, for short)
(Păun, 1998) is a language-generating mechanism of the form:

Π = (V, T, μ, L1,..., Ln, (R1, ρ1), . . (Rn, ρn), i0),

where V is an alphabet, T ⊆ V is the output alphabet, μ is a

membrane structure consisting of n membranes labeled with 1,
2, ... ,n, L1, L2, ... , Ln are finite languages over V associated with the
regions 1, 2, ... , n of μ, R1, ... , Rn are finite sets of context-free
evolution rules and ρ1, ... , ρn are partial order relations
over R1, ... , Rn. i0 is the output membrane if 1
≤ i0 ≤ n, otherwise i0 = ∞. In the former case, the output is read
within the system, and in the latter case, the output is read in the
outer region.

The language generated by a system Π is denoted by L(Π) and it is

defined as explained in the previous chapter with the differences
specific to an evolution based on rewriting: we start from an initial
configuration of the system and proceed iteratively, by transition
steps performed using the rules in parallel, to all strings which can
be rewritten, obeying the priority relations, and collecting the
terminal strings generated in a designated membrane, the output
one.

Note that each string is processed by one rule only, the parallelism
refers here to processing simultaneously all available strings by all
applicable rules. If several rules can be applied to a string, maybe
in several places each, then we take only one rule and only one
possibility to apply it and consider the obtained string as the next
state of the object described by the string. It is important to have in
mind that the evolution of strings is not independent of each other,
but interrelated in two ways:

Case 1: There exists a priority relation among the rules

A rule r1 applicable to a string x can forbid the use of another
rule r2, for rewriting another string y, which is present at that time
in the same membrane. After applying the rule r1, if r1 is not
applicable to y or to the string x′ obtained from x by using r1, then
it is possible that the rule r2 can now be applied to y.

Case 2: There is no priority relation among the rules

Even without priorities, if a string x can be rewritten forever in the

same membrane or on an itinerary through several membranes,
then all strings are lost, because the computation never stops
irrespective of the strings collected in the output membrane and
which cannot evolve further.

The family of all languages L(Π) generated by sequential

rewriting P systems with atmost m membranes, using priorities
and having internal (external) output can characterize RE. In fact
to show this a system with two membranes is sufficient.

P Systems based on Sequential Rewriting

with Membranes of Variable Thickness
A variant of sequential rewriting P systems with no priorities, by
allowing the membranes to be permeable only under certain
conditions is a P system with variable thickness. The concentration
difference in neighboring regions plays an important role in this
variant. This is motivated biologically from the fact that in real
cells, molecules can pass through membranes mainly because of
concentration difference in neighboring regions, or by means of
electrical charges (ions can be transported in spaces of opposite
polarization).

The control of membrane permeability is achieved as follows:

besides the action of dissolving a membrane, (indicated by
introducing the symbol δ) we also use the action of making a
membrane thicker (this is indicated by the symbol τ). Initially, all
membranes have thickness one. If a rule X → υT is applied in a
membrane of thickness one, introducing the symbol τ, then the
membrane thickness becomes two. A membrane of thickness two
does not become thicker by using further rules that introduce the
symbol τ, but no object can enter or exit it. If a rule X → υδ, which
introduces the symbol δ is used in a membrane of thickness one,
then the membrane is dissolved; if the membrane had thickness
two, then it returns to thickness one. The following points should
be noted:

1.
If rules which introduce both δ and τ are applied in a single step in
a membrane, then the membrane does not change its thickness.
2.
3.
If two or more rules involving τ are applied in a membrane in a
single step, the membrane becomes non-permeable.
4.
5.
If two or more rules involving δ are applied in a single step in a
non-permeable membrane or a membrane of thickness two or one,
the membrane dissolves.
6.
The actions of symbols δ, τ are illustrated in Figure 14.3.
7.

8.
Figure 14.3. The effect of actions δ, τ

9.
A P system based on sequential rewriting, with external output
and membranes of variable thickness is a construct:

Π = (V, T, μ, L1, L2,..., Ln, R1, R2,..., Rn, ∞)

where V is the total alphabet of the system; T ⊆ V is the terminal

alphabet; μ is a membrane structure consisting of n membranes
labeled with 1, 2, ..., n; L1, ... , Ln are finite languages
over V, associated with the regions 1, 2, ... , n of μ and R1,
R2, ... , Rn are sets of evolution rules for the regions of μ. The rules
are of the following form:

X → (υ′, tar), X ∊ V, υ′ = υ or υδ or vτ, υ ∊ V*, tar ∊ {here, out, inj}

The meaning of these rules are obvious: if a rule X → (vτ, tar) is

applied in a membrane i, the string replaces X by υ and moves to
the membrane indicated by tar,

—increasing its thickness, if it is of thickness one

—making it non-permeable if it is of thickness two.

Similarly, the rule X → (uδ, tar) applied in a membrane i reduces

the thickness of the target membrane if it is of thickness two and
dissolves the target membrane if it is of thickness one. The effect
of δ, τ are also carried to the target membrane along with the
string. It should be noted that this is the only variant where the
effect of actions δ, τ are carried over to the target membrane. In all
the other variants, the effects of these actions are pertained to the
membranes where the symbols δ, τ are introduced through rules.

The language generated by the system consists of all strings

over T* coming out of the skin membrane, after a complete
computation. The way of rewriting the strings is the same as that of
sequential rewriting P systems. Here, we do not have priorities, but
we have the membrane dissolving and thickening actions. It is seen
that these two controls are sufficient for obtaining computational
completeness with no priorities and no bound on the number of
membranes.

P Systems with Replicated Rewriting

A variant of rewriting P systems, which is capable of solving hard
problems is introduced in this section. For solving hard problems
with P systems which use rewriting rules, we need to replicate
strings, in order to get an exponential space in a linear time.
Hence, we consider rules which replicate the strings at the same
time when rewriting them.

A P system based on replicated rewriting is a construct

Π = (V, T, μ, L1, L2,..., Lm, R1, R2,..., Rm, ∞)

where V is the total alphabet; T ⊆ V is the terminal alphabet; μ is a

membrane structure consisting of m membranes labeled with 1,
2, ... ,m; L1, ... , Lm are finite languages over V, associated with the
regions of μ and R1, R2, ... , Rm are finite sets of developmental
rules of the form:

X → (υ1, tar1)‖(υ2, tar2)‖ ...‖(υn, tarn), n ≥ 1

where tari ∊ {here, out} ∪ {inj, | 1 ≤ j ≤ m},X ∊ V, vi ∊ V*, 1 ≤ i ≤ n.

If n > 1, then the rule is a replicated rewriting rule, otherwise, it is
just a rewriting rule.

When a rule X → (υ1, tar1)‖(υ2, tar2)‖ ... ‖(υn, tarn) is used to

rewrite a string x1Xx2, we obtain n strings x1υix2 which are sent to
the regions indicated by tari, 1 ≤ i ≤ n, respectively.
When tari = here, the string remains in the same region,
when tari = out, the string exits the current membrane and
when tari = inj, the string is sent to membrane j, providing that it is
directly inside the membrane where the rule is applied; if there is
no such membrane j inside the membrane we work, then the rule
cannot be applied.

A computation is defined as follows: we start from the strings

present in an initial configuration and proceed iteratively, by
transition steps laid down by using the rules in parallel, in each
membrane to all strings which can be rewritten by local rules; the
result is collected outside the system, at the end of halting
computations. We do not consider here further features such as
priorities among the rules or the possibility of dissolving
membranes.

The language generated by a replicated rewriting P system Π is

denoted L(Π) and it consists of all strings over T* sent out of the
system at the end of a halting configuration.
Solving SAT and HPP
We see that the satisfiability of propositional formulas in the
conjuctive normal form (SAT) problem and the Hamiltonian path
problem (HPP) can be solved in a linear time using replicated
rewriting P systems. The time is estimated here as the number of
steps the system works. This means, we have a parallel time where
each unit is the time of a “biological” step in the system, the time of
using any rule, supposing that all rules take the same time to be
applied.

The SAT is probably the most known NP-complete problem. It asks

whether or not for a given formula in the conjunctive normal form
there is a truth-assignment of the variables for which the formula
assumes the value true. A formula as above is of the form:

γ = C1 ∧ C2 ∧ ... ∧ Cm,

where each Cj, 1 ≤ i ≤ n, is a clause of the form of a disjunction

Ci = y1 ∨ y2 ∨ ... ∨ yr,

with each yj being either a propositional variable, xs, or its

negation, ⌍ xs. (Thus, we use the variables x1, x2,... and the three
connectives ∨, ∧, ⌍: or, and, negation).

For example, let us consider the propositional formula:

β = (x1 ∨ x2) ∧ (⌍ x1 ∨ ⌍ x2)

We have two variables, x1, x2, and two clauses. It is easy to see that
it is satisfiable: any of the following truth-assignments makes the
formula true:

(x1 = true, x2 = false), (x1 = false, x2 = true).

We give below two theorems without proof.

Theorem 14.9
The SAT problem can be solved by using replicated rewriting P
systems in a time linear in the number of variables and number of
clauses.

Theorem 14.10

The Hamiltonian path problem can be solved by P systems with

replicated rewriting in a time linear in the number of nodes of the
given graph.

P Systems with Active Membranes

A natural possibility is to let the number of membranes also to
increase during a computation, for instance, by division, as is well
known in biology. Actually, the membranes from biochemistry are
not at all passive, like those in the models cited above. For
example, the passing of a chemical compound through a
membrane is often done by a direct interaction with the membrane
itself (with the so-called protein channels or protein gates present
in the membrane). During this interaction, the chemical compound
passing through the membrane, as well as the membrane itself, can
be modified.

These observations were made use of, to form a new class

of P systems where the central role in the computation is played by
the membranes. Evolution rules were associated both with objects
and membranes, while the communication through membranes
was performed with the direct participation of the membranes;
moreover, the membranes could not only be dissolved, but they
also could multiply by division. An elementary membrane could be
divided by means of an interaction with an object from that
membrane. Each membrane was supposed to have an “electrical
polarization” (we will say charge), one of the three
possible: positive, negative, or neutral. If a membrane had two
immediately lower membranes of opposite polarizations,
one positive and one negative, then that membrane could also
divide in such a way that the two membranes of opposite charge
were separated; all membranes of neutral charge and all objects
were duplicated and a copy of each of them was introduced in each
of the two new membranes. The skin was never divided.

In this way, the number of membranes could grow, even

exponentially. By making use of this increased parallelism, one
could compute faster. It was proved that this is the case, indeed:
the SAT problem, one of the basic NP-complete problems, could be
solved in this framework in linear time. Moreover, the model was
shown to be computationally universal: any recursively
enumerable set of (vectors of) natural numbers could be generated
by these systems.

An important application of this class of P systems is that these

systems are capable of solving a real-world problem, viz., breaking
DES.

We now directly define the variant of P systems.

Let d ≥ 1 be a natural number. A P system with active membranes

and d-bounded membrane division (in short, we say a P system
with active membranes) is a construct

Π = (V,T,H, μ, w1, ..., wm, R),

where:

1.
m ≥ 1;
2.
3.
V is an alphabet (the total alphabet of the system);
4.
5.
T ⊆ V (the terminal alphabet);
6.
7.
H is a finite set of labels for the membranes;
8.
9.
μ is a membrane structure, consisting of m membranes, labeled
(not necessarily in a one-to-one manner) with elements of H; all
membranes in μ are supposed to be neutral;
10.
11.
w1, ... , wm are strings over V, describing the multisets of
objects placed in the m regions of μ;
12.
13.
R is a finite set of developmental rules, of the following forms:
14.
.

, for h ∊ H, a ∊ V, υ ∊ V*, α ∊ {+, −, 0} (object evolution

rules),
.
.

, where a, b ∊ V, h ∊ H, α1, α2 ∊ {+, −, 0} (an object is

introduced in the membrane),
.
.

, for h ∊ H, α1, α2 ∊ {+, −, 0}, a, b ∊ V (an object is

sent out),
.
.

, for h ∊ H, α ∊ {+, −, 0}, a, b ∊ V (dissolving rules),

.
.

, for α, αi ∊ {+, −,
0}, a ∊ V, ai ∊ V*, i = 1, ... ,n, h ∊ H, and n ≤ d (division rules for
elementary membranes),
.
.
,
for k ≥ 1, n > k, hi ∊ H, 0 ≤ i ≤ n, n ≤ d, and there exist i,j, 1 ≤ i,j ≤ n,
such that αi,αj ∊ {+, −}; moreover, βj, γj ∊ {+, −, 0}, 1
≤ j ≤ n (division of non-elementary membranes).
.
Note that in all rules of types (a) − (e), the objects do not directly
interact. In rule (a), an object is transformed into a string of
objects, and in the rest of the rules, an object is replaced with
another one.

These rules are applied according to the following principles:

1.
All the rules are applied in parallel: in a step, the rules of type (a)
are applied to all objects to which they can be applied, all other
rules are applied to all membranes to which they can be applied;
an object can be used by only one rule, non-deterministically
chosen (there is no priority relation among rules), but any object
which can evolve by a rule of any form, should evolve.
2.
3.
If a membrane is dissolved, then all the objects in its region are left
free in the region immediately above it. Because all rules are
associated with membranes, the rules of a dissolved membrane are
no longer available in subsequent steps. The skin membrane is
never dissolved.
4.
5.
Note that in a rule of type (f) at least two membranes in its left-
hand side should have opposite polarization. All objects and
membranes not specified in a rule and which do not evolve are
passed unchanged to the next step. For instance, if a membrane
with the label h is divided by a rule of type (e) which involves an
object a, then all other objects in membrane h which do not evolve
are introduced in each of the resulting membranes h. Similarly,
when dividing a membrane h by means of a rule of type (f), the
contents of each membrane is reproduced unchanged in each copy,
providing that no rule is applied to their objects.
6.
7.
If at the same time a membrane h is divided by a rule of type (e)
and there are objects in this membrane which evolve by means of
rules of type (a), then in the new copies of the membrane we
introduce the result of the evolution. That is, we may suppose that
first the evolution rules of type (a) are used, changing the objects,
and then the division is produced, so that in the new membranes
with label h we introduce copies of the changed objects. Of course,
this process takes only one step. The same assertions apply to the
division by means of a rule of type (f): always we assume that the
rules are applied “from bottom-up”. We first apply the rules of the
innermost region and then proceed level by level until the region of
the skin membrane is reached.
8.
9.
The rules associated with a membrane h are used for all copies of
this membrane, irrespective whether or not the membrane is an
initial one or it is obtained by division. At one step, a
membrane h can be the subject of only one rule of types (b) − (f).
10.
11.
The skin membrane can never divide. As any other membrane, the
skin membrane can be “electrically charged.”
12.
The membrane structure of the system at a given time, together
with all multisets of objects associated with the regions of this
membrane structure is the configuration of the system at that
time. The (m + 1)-tuple (M, W1, ..., wm) is the initial
configuration. We can pass from a configuration to another by
using the rules from R according to the principles given above. We
say that we have a (direct) transition among configurations.
A sequence of transitions which starts from the initial
configuration is called a computation with respect to Π. A
computation is complete if it cannot be continued: there is no rule
which can be applied to objects and membranes in the last
configuration.

Note that during a computation, the number of membranes (hence

the degree of the system) can increase and decrease but the labels
of these membranes are always among the labels of membranes
present in the initial configuration (by division we only produce
membranes with the same label as the label of the divided
membrane).

During a computation, objects can leave the skin membrane (by

means of rules of type (c)). The terminal symbols, which leave the
skin membrane are collected in the order of their expelling from
the system, so, a string is associated to a complete computation;
when several terminal symbols leave the system at the same time,
then any ordering of them is accepted (thus, with a complete
computation we possibly associate a set of strings, due to this
“local commutativity” of symbols which are observed outside the
system at the same time). In this way, a language is associated with
Π, denoted by L(Π), consisting of all strings which are associated
with all complete computations in Π.

Two facts are worth emphasizing: (1) the symbols not in T that
leave the skin membrane as well as all symbols from T, which
remain in the system at the end of a halting computation are not
considered in the generated strings and (2) if a computation goes
for ever, then it provides no output, it does not contribute to the
language L(Π).

Generally, to solve NP-complete problems, which can be solved

deterministically in exponential time, the following steps can be
used.

1.
Start with a linear membrane structure. The central membrane is
always divided into an exponential number of copies.
2.
3.
In the central “parallel engine” one generates, making use of the
membrane division, a “data pool” of an exponential size; due to the
parallelism, this takes only a linear time. In parallel with this
process, a “timer” is simultaneously ticking, in general, for
synchronization reasons.
4.
5.
After finishing the generation of the “data pool”, one checks
whether or not any solution of our problem exists.
6.
7.
A message is sent out of the system at a precise moment telling
whether or not the problem has a solution.
8.

Solving SAT in Linear Time

We illustrate the usefulness of P systems with 2-bounded division
by solving the SAT problem in linear time. The time is estimated
here as the number of steps the system works. This means, we have
a parallel time where each unit is the time of a “biological” step in
the system, the time of using any rule, supposing that all rules take
the same time to be applied. Figure 14.4 illustrates the shape
of P system that solves a NP-complete problem.
Figure 14.4. The shape of P systems solving NP-complete problems

Theorem 14.11

The SAT problem can be solved by a P system with active

membranes and two bounded division in a time, which is linear in
the number of variables and the number of clauses.

Tissue P Systems
The tissue P system, a variant of P system is mainly concerned with
two things, intercellular communication (of chemicals, energy,
information) by means of complex networks of protein channels,
and the way the neurons co-operate, processing impulses in the
complex net established by synapses.

The computing model tissue P system, in short, tP system, consists

of several cells, related by protein channels. In order to preserve
the neural intuition, we will use the shorter and suggestive name of
synapses for these channels. Each cell has a state from a given
finite set and can process multi-sets of objects (chemical
compounds in the case of cells, impulses in the case of the brain),
represented by symbols from a given alphabet. The standard rules
are of the form sM → s′M′, where s, s′ are states and M, M′ are
multisets of symbols. Some of the elements of M′ may be marked
with “go,” and this means that they have to immediately leave the
cell and pass to the cells to which they have direct links through
synapses. This communication (transfer of symbol-objects) can be
done in a replicative manner (the same symbol is sent to all
adjacent cells), or in a non-replicative manner; in the second case
we can send all the symbols to only one neighboring cell, or we can
distribute them nondeterministically. One more choice appears in
using the rules sM → s′M′: we can apply such a rule only to one
occurrence of M (one mode), or all occurrences of M in a parallel
way (parallel mode); or we can apply a maximal package of rules of
the form sMi → s′M′, 1 ≤ i ≤ k, that is, involving the same states s, s
′, which can be applied to the current multiset (the maximal
mode).
By the combination of the three modes of processing objects and
the three modes of communication among cells, we get nine
possible behaviors of tP system.

This model starts from a given initial configuration (that is, initial
states of cells and initial multisets of symbol-objects placed in
them) and proceeds until reaching a halting configuration, where
no further rule can be applied, and then associates a result with
this configuration. Because of the non-deterministic behavior of
a tP system, starting from the given initial configuration we can get
arbitrarily many outputs. Output will be defined by sending
symbols out of the system. For this purpose, one cell will be
designated as the output cell, and in its rules sM → s′M′ the
symbols from M′ are marked with the indication “out”; such a
symbol will immediately leave the system, contributing to the
result of the computation.

We now formally define this computing model.

A tissue P system, of degree m, m ≥ 1, is a construct:

Π = (O,σ1, σ2, ..., σm, syn, iout),

where:


O is a finite non-empty alphabet (of objects);


syn ⊆ {1, ..., m} × {1, ..., m} (synapses among cells);


iout ∊ {1, ..., m} indicates the output cell;


σi, 1 ≤ i ≤ m, are cells, where each σi = (Qi, si,0, wi,0, Pi), with:


Qi is a finite set of states;


si,0 ∊ Qi is the initial state;


wi,0 is the initial multiset of objects;


Pi is a finite set of rules of the form sw → s′xygozout, where s, s′ ∊
Qi, w, x ∊ O*, ygo ∊ (O × {go})* and Zout ∊ (O × {out})*, with the
restriction that zout = λ for all i ∊ {1, 2, ..., m} different from iout.

A tP system is said to be co-operative if it contains at least a
rule sw → s′w′ such that |w| > 1, and non-cooperative in the
opposite case. The objects which appear in the left-hand
multiset w of a rule sw → s′w′ are sometimes called impulses,
while those in w′ are also called excitations.

Remark 1. Note that rules of the forms s → s′, s → s′w′ are not

allowed.

Remark 2. Synapses of the form (i, i), 1 ≤ i ≤ m, are not allowed.

Any m-tuple of the form (s1 w1, ..., smwm), with si ∊ Qi and wi ∊ O*,

for all 1 ≤ i ≤ m, is called a configuration of Π; thus,
(s1,0w1,0, ..., sm,0wm,0) is the initial configuration of Π.

Using the rules from the sets Pi, 1 ≤ i ≤ m, we can define transitions
among the configurations of the system. For this purpose, we first
consider three modes of processing the impulse-objects and
three modes of transmitting excitation-objects from one cell to
another cell.

Let us denote Ogo = {(a, go) | a ∊ O}, Oout = {(a, out) | a ∊ O},

and Otot = O ∪ Ogo ∪ Oout.
For s, s′ ∊ Qi, x ∊ O*, y ∊ O*tot, we write:


sx ⇒min s′y iff sw → s′ ∊ Pi, w ⊆ x, and y = (x − w) ∪ w′;


sx ⇒par s′y iff sw → s′w′ ∊ Pi, wk ⊆ x, wk+1 ⊈ x, for some k ≥ 1
and y = (x − wk) ∪ w′k;


sx ⇒max s′y iff sw1 → s′ w1, ..., swk → s′w′k ∊ Pi, k ≥ 1, such
that w1 ... wk ⊆ x, y = (x − w1 ... wk) ∪ w′1’ ... w′k, and there is
no sw → s′w′ ∊ Pi such that w1 ... wkw ⊆ x.

Intuitively, in the min mode, only one occurrence of the left-hand
side symbol of a rule is processed (replaced by the string from the
right-hand side of the rule, at the same time changing the state of
the cell). In the par mode, a maximal change is performed with
respect to a chosen rule, in the sense that as many as possible
copies of the multiset from the left-hand side of the rule are
replaced by the corresponding number of copies of the multiset
from the right-hand side. In the max mode, the change is
performed with respect to all rules, which use the current state of
the cell and introduce the same new state after processing the
multisets.

Now, remember that the multiset w′ from a rule sw → s′ w′

contains symbols from O, but also symbols of the form (a, go) (or,
in the case of the cell iout, of the form (a, out)). Such symbols will
be sent to the cells related by synapses to the cell σi where the
rule sw → s′ w′ is applied, according to the following three
transmitting modes, repl, one, and spread:


repl: each symbol a, for (a, go) appearing in w′, is sent to each of
the cells σj such that (i, j) ∊ syn;


one: all symbols a appearing in w′ in the form (a, go) are sent to
one for the cells σj such that (i, j) ∊ syn, nondeterministically
chosen;


spread: the symbols a appearing in w′ in the form (a, go) are
nondeterministically distributed among the cells σj such that (i, j)
∊ syn.

If we have at most two cells, three modes of transmitting the
processed multisets from a cell to another cell are coincide.

During any transition, some cells can do nothing: if no rule is

applicable to the available multisets in the current state, a cell
waits until new multisets are sent to it from its ancestor cells. It is
also worth noting that each transition lasts one time unit, and that
the work of the net is synchronized, the same clock marks the time
for all cells.

A sequence of transitions among configurations of the system Π is

called a computation of Π. A computation which ends in a
configuration where no rule in any cell can be applied is called
an halting computation. During a halting computation
the tP system Π sends out, through the cell σiout, the multiset z. We
say that the vector ψO(z), representing the multiplicities of objects
from z, is computed by Π.

We denote by Nα,β(Π), for α ∊ {min, par, max}, β ∊ {repl, one,

spread}, the set of all natural numbers computed in this way by a
system Π, in the mode(α, β). The family of all sets Nα,β (Π),
computed by all co-operative tissue P systems with at most m ≥ 1
cells, each of them using at most r ≥ 1 states, is denoted
by NtPm,r (Coo, α, β); when non-cooperative systems are used, we
write NtPm,r (nCoo, α, β) for the corresponding family of sets of
numbers. When any of the parameters m, r is not bounded we
replace it with *.
Example 14.9.

(Tissue P Systems as Generative Devices) Consider first a

simple tP system: Π1 = (O, σ1, σ2, σ3, syn, iout),

O = {a},

σ1 = ({s},s,a2,{sa → s(a, go), sa → s(a,out)}),

σ2 = ({s}, s, λ,{sa → s(a, go)}),

σ3 = ({s},s, λ,{sa → s(a, go)}),

syn = {(1, 2), (1, 3), (2,1), (3,1)},

iout = 1.

A tP system can be graphically represented (Figure 14.5) with Π1,

with ovals associated with the cells (these ovals contain the initial
state, the initial multiset, and the set of rules, and will be labeled
with 1, 2, ..., m), with arrows indicating the synapses, and with an
arrow leaving from the output cell.

Figure 14.5. An example of a tP system

The system is presented in Figure 14.5.

Nmin, repl(Π1) = {(n) | n ≥ 1},

Nmin, β(Π1) = {1,2}, for β ∊ {one, spread},

Npar, repl(Π1) = {(2n) | n ≥ 1},

Npar,β(Π1) = {2}, for β ∊ {one, spread},

Nmax, repl(Π1) = {(n) | n ≥ 1},

Nmax,β (Π1) = {1, 2}, for β ∊ {one, spread},

Indeed, in the non-replicative mode of communication, no further

symbol is produced, hence, we only generate the vector {1, 2}. In
the replicative case, the symbols produced by the rule sa → s(a, go)
from cell 1 are doubled by communication. When the rules are
used in the parallel mode, then all symbols are processed at the
same time by the same rule, which means that all symbols present
in the system are doubled from a step to the next one, therefore,
the powers of 2 are obtained. When the rules are used in the
minimal mode, the symbols are processed or sent out one by one,
hence, all natural numbers can be obtained. In the maximal mode,
we can send copies of symbol a at the same time to cells 2 and 3,
and outside the system, hence, again any number of symbols can
be sent out.

There are several variants of P systems in the literature. Some of

them are splicing P systems, contextual P systems,
tissue P systems, etc. Issues like complexity, decidability were also
discussed. One of the directions of research, which is being
investigated more in recent times is spiking neural P systems. The
details of the above can be seen in “http://ppage.psystems.eu.”
There are also attempts to make practical use of this theoretical
computing model.

Ma103 Definitions
No ratings yet
Ma103 Definitions
5 pages
Galois Connections and Applications
No ratings yet
Galois Connections and Applications
511 pages
Set Theory
No ratings yet
Set Theory
37 pages
Math 38 Course Outline
No ratings yet
Math 38 Course Outline
2 pages
Wall Thickness - Report
100% (1)
Wall Thickness - Report
1 page
Relations
No ratings yet
Relations
10 pages
Fuad Hajiyev (v.3) - Fall 2021 Lecture Notes On DS - Lecture 1 - Basic Discrete Structures (Sets and Functions)
No ratings yet
Fuad Hajiyev (v.3) - Fall 2021 Lecture Notes On DS - Lecture 1 - Basic Discrete Structures (Sets and Functions)
31 pages
Week02 Relations and Functions
No ratings yet
Week02 Relations and Functions
33 pages
Relation and Function
100% (1)
Relation and Function
19 pages
Set Theory: Reading: Chapter 1 (1-17) From The Textbook
No ratings yet
Set Theory: Reading: Chapter 1 (1-17) From The Textbook
17 pages
Composite Function Classification of Function
No ratings yet
Composite Function Classification of Function
12 pages
Ma1100 Cheatsheet Midterms
No ratings yet
Ma1100 Cheatsheet Midterms
2 pages
Ex. 1.1 Types of Relations
100% (1)
Ex. 1.1 Types of Relations
54 pages
Dixon, P.G. - Set Theory
No ratings yet
Dixon, P.G. - Set Theory
45 pages
6 Relations
100% (1)
6 Relations
38 pages
Study Notes Set Theory PrepLadder
No ratings yet
Study Notes Set Theory PrepLadder
14 pages
Sets
No ratings yet
Sets
45 pages
Bab 3 Sets
No ratings yet
Bab 3 Sets
34 pages
Quine-McCluskey Tabular Method
No ratings yet
Quine-McCluskey Tabular Method
5 pages
Properties of Gsp-Hausdorff Spaces in Topology
No ratings yet
Properties of Gsp-Hausdorff Spaces in Topology
4 pages
07 Partial Order
No ratings yet
07 Partial Order
42 pages
Discrete Mathematics - Functions
No ratings yet
Discrete Mathematics - Functions
4 pages
Written Report
No ratings yet
Written Report
10 pages
Unit 2 - Discrete Structures - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Discrete Structures - WWW - Rgpvnotes.in
10 pages
Relational Algebra and Relational Calculus
No ratings yet
Relational Algebra and Relational Calculus
57 pages
Set Theory
No ratings yet
Set Theory
16 pages
Lecture Notes (Lec 1 3)
No ratings yet
Lecture Notes (Lec 1 3)
7 pages
Partially Ordered Sets
No ratings yet
Partially Ordered Sets
2 pages
Operation of Sets
No ratings yet
Operation of Sets
13 pages
Chapter 5: Relations, Functions, and Matrices: Tannaz R.Damavandi Cal Poly Pomona
100% (1)
Chapter 5: Relations, Functions, and Matrices: Tannaz R.Damavandi Cal Poly Pomona
52 pages
Lecture On Gamma Nearrings
No ratings yet
Lecture On Gamma Nearrings
24 pages
Final Examination It 101 - Discrete Mathematics FILL IN THE BLANKS: Select The Appropriate Letter of Answer From Corresponding Choices
No ratings yet
Final Examination It 101 - Discrete Mathematics FILL IN THE BLANKS: Select The Appropriate Letter of Answer From Corresponding Choices
2 pages
Set Operations: Subjects To Be Learned
No ratings yet
Set Operations: Subjects To Be Learned
6 pages
DM GTU Study Material Presentations Unit - 3 18052021031639PM
No ratings yet
DM GTU Study Material Presentations Unit - 3 18052021031639PM
151 pages
2-1: Graphing Linear Relations and Functions: Objectives
100% (1)
2-1: Graphing Linear Relations and Functions: Objectives
19 pages
3.point Set Topology
No ratings yet
3.point Set Topology
98 pages
Set Theory Notes
No ratings yet
Set Theory Notes
30 pages
Introduction To MCDM Techniques: AHP As Example: August 2019
No ratings yet
Introduction To MCDM Techniques: AHP As Example: August 2019
42 pages
Computation Theory Lec
No ratings yet
Computation Theory Lec
36 pages
Ordered Pair:-An Ordered Pair Consist of Two Elements in A Fixed Order
No ratings yet
Ordered Pair:-An Ordered Pair Consist of Two Elements in A Fixed Order
19 pages
Sets: ST Ephane Bressan
No ratings yet
Sets: ST Ephane Bressan
74 pages
Countable Set
No ratings yet
Countable Set
9 pages
Lattices and Boolean Algebras
No ratings yet
Lattices and Boolean Algebras
93 pages
More About Set Theory
No ratings yet
More About Set Theory
4 pages
SSLC Maths Formulae
100% (1)
SSLC Maths Formulae
22 pages
Ch01 1
100% (1)
Ch01 1
146 pages
Abbott Intro To Real Analysis CH 3 Notes
No ratings yet
Abbott Intro To Real Analysis CH 3 Notes
2 pages
Functional Analysis by R. Vittal Rao: Lecture 6: Open and Closed Sets - June 8, 2012
No ratings yet
Functional Analysis by R. Vittal Rao: Lecture 6: Open and Closed Sets - June 8, 2012
3 pages
Real Analysis by R. Vittal Rao: Function From A Set To Its Power Set
No ratings yet
Real Analysis by R. Vittal Rao: Function From A Set To Its Power Set
4 pages
IJAMSS - Group Action On Picture Fuzzy Soft G-Modules - 1 - 1
No ratings yet
IJAMSS - Group Action On Picture Fuzzy Soft G-Modules - 1 - 1
18 pages
MAT401-Lecture 1
100% (1)
MAT401-Lecture 1
12 pages
Infinity and Its Cardinalities
No ratings yet
Infinity and Its Cardinalities
88 pages
Keune F. Elements of Higher Mathematics. Learning Math through Numbers 2024
No ratings yet
Keune F. Elements of Higher Mathematics. Learning Math through Numbers 2024
501 pages
Unit 4 DM Maths
No ratings yet
Unit 4 DM Maths
59 pages
Recurrence Relations
0% (1)
Recurrence Relations
54 pages
MBA-502 Business Mathematics Course Instructor: Dr. A.S.A.Noor Set Theory
No ratings yet
MBA-502 Business Mathematics Course Instructor: Dr. A.S.A.Noor Set Theory
20 pages
Chapter 3 - Discrete
No ratings yet
Chapter 3 - Discrete
3 pages
Set Theory
No ratings yet
Set Theory
3 pages
2022 CSC 353 2.0 1 Mathematical Preliminaries
No ratings yet
2022 CSC 353 2.0 1 Mathematical Preliminaries
4 pages
Sets and Relations
No ratings yet
Sets and Relations
11 pages
1 Background: 1.1 Set Theory
No ratings yet
1 Background: 1.1 Set Theory
15 pages
4.1 - Statistics - Data and Sampling
No ratings yet
4.1 - Statistics - Data and Sampling
36 pages
UP-Part I Vol-VI
No ratings yet
UP-Part I Vol-VI
261 pages
Gears Spur AmericanStandard
No ratings yet
Gears Spur AmericanStandard
11 pages
Safety Instructions Trouble Shooting: Situation
No ratings yet
Safety Instructions Trouble Shooting: Situation
1 page
Artikel Ficky Tyoga Aditya
No ratings yet
Artikel Ficky Tyoga Aditya
10 pages
Application of Remote Sensing and GIS Final Exam Spot Question
No ratings yet
Application of Remote Sensing and GIS Final Exam Spot Question
1 page
APV Vs WACC
100% (2)
APV Vs WACC
8 pages
A Cephalometric Evaluation of Pretreatme PDF
No ratings yet
A Cephalometric Evaluation of Pretreatme PDF
6 pages
CMSS 2100
No ratings yet
CMSS 2100
2 pages
2.4 - Measures of Dispersion
No ratings yet
2.4 - Measures of Dispersion
16 pages
Madura14e Ch08 Final
No ratings yet
Madura14e Ch08 Final
38 pages
Mahindra & Mahindra Interview questions
No ratings yet
Mahindra & Mahindra Interview questions
23 pages
Baum GMBH Assembly instructions-DIN April 2012 PDF
No ratings yet
Baum GMBH Assembly instructions-DIN April 2012 PDF
2 pages
188 151 Datasheet
No ratings yet
188 151 Datasheet
1 page
4.basic Equations of Fluid Flow
No ratings yet
4.basic Equations of Fluid Flow
31 pages
Linear-Equations-of-Higher-Order
No ratings yet
Linear-Equations-of-Higher-Order
10 pages
FEE Lecture-4-6
No ratings yet
FEE Lecture-4-6
73 pages
Mbsm Dot Pro Private PDF LX72LAT
No ratings yet
Mbsm Dot Pro Private PDF LX72LAT
11 pages
Object Oriented Programming Using Java 7th 2022 2023 Btech
No ratings yet
Object Oriented Programming Using Java 7th 2022 2023 Btech
2 pages
Power Distribution and Utilization: B.Sc. Electrical Engineering
No ratings yet
Power Distribution and Utilization: B.Sc. Electrical Engineering
17 pages
Dimethyl Silicone Fluid XHG-201-50
No ratings yet
Dimethyl Silicone Fluid XHG-201-50
4 pages
Bidwell 2005
No ratings yet
Bidwell 2005
6 pages
Garver On Silence Wittgenstein
No ratings yet
Garver On Silence Wittgenstein
15 pages
Business Decision Making Exams (PT DGSCM) 7
No ratings yet
Business Decision Making Exams (PT DGSCM) 7
2 pages
Extra Dosed Bridges
No ratings yet
Extra Dosed Bridges
276 pages
FCDS Rev - 4
No ratings yet
FCDS Rev - 4
54 pages
Adding and Subtracting
No ratings yet
Adding and Subtracting
3 pages
Spreadsheet Errors and Fixes
No ratings yet
Spreadsheet Errors and Fixes
13 pages
13-09-2020 - JR - Super60 (In Com) - Jee-Adv (2016-P2) - WTA-12 - Question Paper PDF
No ratings yet
13-09-2020 - JR - Super60 (In Com) - Jee-Adv (2016-P2) - WTA-12 - Question Paper PDF
18 pages

Chapter 1. Preliminaries: Example 1.1

Uploaded by

Chapter 1. Preliminaries: Example 1.1

Uploaded by

Chapter 1.

Sets, Relations, and Functions

A set is an unordered collection of objects.

Note that the definition of a set is intuitive in nature and was

The set E of even positive integers less than 20 can be expressed

E = {2, 4, 6, 8, 10, 12, 14, 16, 18}

E = {x|x is even and 0 < x < 20}

A set is finite if it contains a finite number of elements and is

For example, A = {x|x is a positive integer divisible by 3}. The

Sets can be represented by either set builder form or by explicitly

Let W denote the set of well-formed parentheses. It can be defined

Two sets can be combined to form a third set by various set

The intersection of two sets is the collection of all elements of the

The difference of two sets A and B denoted by A − B is the set of all

Let U be the universal set and A ⊆ U. The complement of A with

= {φ, {a}, {b}, {c},{a, b},{b, c}, {a, c}, {a, b, c}}.

Sequences and Tuples

If A and B are two sets, the Cartesian product or cross product

One can also write the Cartesian product of k-sets A1, A2,..., Ak in a

Relations and Functions

Another basic concept on sets is function. A function is an object

A binary relation R is an equivalence relation if R satisfies the

Let N be the set of non-negative integers. The relation ≡ is defined

An equivalence relation induces a partition on the underlying set.

An equivalence relation E2 is a refinement of an equivalence

E21={0, 6, 12, ...}

It can be seen that every E2j is completely included in an E1k, 1≤ j ≤

This bijection is required in some computer science applications.

Let R be a binary relation on a set A. Then the reflexive

Finite and Infinite Sets

Property for all n ≥ 0. To see the validity

Assume P to be true for some n ≥ 0, with .

Sometimes a property P(x) may hold for all n ≥ t. In this case for

Strong Mathematical Induction

Let us give some examples.

P(n): sum of the interior angles of an n-sided convex polygon is

Basis: P(3): Interior angles of a triangle sum upto 180° = (2 * 3 −

Induction Step: Let the result be true for all n upto k, 3 ≤ n ≤ k.

To prove P(k + 1): Sum of the interior angles of a (k + 1)-sided

Hence, sum of the interior angles of the (k + 1)-sided polygon is

Let A = {a,b,c,d} and R = {(a,b), (a,d), (b,b), (b,c), (c,c),

Ra = {b, d}, Rb = {b, c}, Rc = {c}, Rd = {b}, D = {a, d}

Remark. The diagonalization principle holds for infinite sets as

The number of edges at a particular vertex is the degree of the

Figure 1.1. Examples of undirected graphs

An induced subgraph H of a graph G is a graph with nodes

A directed graph has directed lines between the nodes. The

In the following directed graph (Figure 1.2) the indegree of node

Figure 1.2. An example of a directed graph

An undirected graph is connected if every pair of vertices is

A connected, acyclic, undirected graph is a tree.

Consider the following graphs:

A rooted tree is a tree in which one of the vertices is distinguished

It can also be looked at as a directed graph, where only one node

A string or word is a finite sequence of symbols from that alphabet,

For any word w, w ε = εw = w. For any string w = a1 ... an of

The set of all strings over an alphabet Σ is denoted by Σ* which

For example, if Σ = {a, b}, the following are languages over Σ

We have the following inductive definition of Σ+ and Σ*, where Σ is

Let Σ be any alphabet set. Σ+ is a set of non-empty strings

Basis: If a ∊ Σ, then a ∊ Σ+.

Induction: If α ∊ Σ+ and a ∊ Σ, αa, aα are in Σ+.

No other element belongs to Σ+.

Clearly, the set Σ+ contains all strings of length n, n ≥ 1.

Let Σ = {0,1,2}. Then

Let Σ be any alphabet set. Σ* is defined as follows:

Induction: If α ∊ Σ*, α ∊ Σ, then aα, α a ∊ Σ*.

No other element is in Σ*.

Since languages are sets, one can define the set-theoretic

The following operations are also defined for languages.

The concatenation closure (Kleene closure) of a language L, in

Similarly, left quotient of a language L1 by a language L2 is defined

L2/L1 = {z|yz ∊ L1 for some y ∊ L2}.

The left derivative of a language L with respect to a word y is

The mirror image (or reversal) of a language is the collection of the

For each symbol a of an alphabet Σ, let σ(a) be a language over

Induction: If α ∊ Σ, α ∊ Σ, then aα, α a ∊ Σ.