Essentials of Theoretical Computer Science - F. D. Lewis
Essentials of Theoretical Computer Science - F. D. Lewis
of
heor eti c Theor eti cal
ter Computer
nc Sci ence
F. D. Lewis
University of Kentucky
How to Navigate
This Text
Table of
Contents
O CON TEN TS
Title Page
Copyright Notice
Preface
COMPUTABILITY
Introduction
The NICE Programming Language
Turing Machines
A Smaller Programming Language
Equivalence of the Models
Machine Enhancement
The Theses of Church and Turing
Historical Notes and References
Problems
UNSOLVABILITY
Introduction
Arithmetization
Properties of the Enumeration
Universal Machines and Simulation
Solvability and the Halting Problem
Reducibility and Unsolvability
Enumerable and Recursive Sets
Historical Notes and References
Problems
COMPLEXITY
Introduction
Measures and Resource Bounds
Complexity Classes
Reducibilities and Completeness
The Classes P and N P
Intractable Problems
Historical Notes and References
Problems
AUTOMATA
Introduction
Finite Automata
Closure Properties
Nondeterministic Operation
Regular Sets and Expressions
Decision Problems for Finite Automata
Pushdown Automata
Unsolvable Problems for Pushdown Automata
Linear Bounded Automata
Historical Notes and References
Problems
LANGUAGES
Introduction
Grammars
Language Properties
Regular Languages
Context Free Languages
Context Free Language Properties
Parsing and Deterministic Languages
Summary
Historical Notes and References
Problems
C P B Y COM PU TA BI LI TY
Before examining the intrinsic nature of computation we must have a precise
idea of what computation means. In other words, we need to know what were
talking about! To do this, we shall begin with intuitive notions of terms such as
calculation, computing procedure, and algorithm. Then we shall be able to
develop a precise, formal characterization of computation which captures all of
the modern aspects and concepts of this important activity.
Part of this definitional process shall involve developing models of
computation. They will be presented with emphasis upon their finite nature
and their computational techiques, that is, their methods of transforming
inputs into outputs. In closing, we shall compare our various models and
discuss their relative power.
The sections are entitled:
The NICE Programming Language
Turing Machines
A Smaller Programming Language
Equivalence of the Models
Machine Enhancement
The Theses of Church and Turing
Historical Notes and References
Problems
The NICE Pr ogr ammi ng Language
As computer scientists, we tend to believe that computation takes place inside
computers. Or maybe computations are the results from operating computing
machinery. So, when pressed to describe the limits of computation we might, in
the spirit of Archimedes and his lever, reply that anything can be computed
given a large enough computer! When pressed further as to how this is done we
probably would end up by saying that all we need in order to perform all
possible computations is the ability to execute programs which are written in
some marvelous, nonrestrictive programming language. A nice one!
Since we are going to study computation rather than engage in it, an actual
computer is not required, just the programs. These shall form our model of
computation. Thus an ideal programming language for computation seems
necessary. Let us call this the NICE language and define it without further
delay. Then we can go on to describing what is meant by computation.
We first need the raw materials of computation. Several familiar items come
immediately to mind. Numbers (integers as well as real or floating point) and
Boolean constants (true and false) will obviously be needed. Our programs shall
then employ variables that take these constants as values. And since it is a
good idea to tell folks exactly what we're up to at all times, we shall declare
these variables before we use them. For example:
var x, y, z: integer;
p, q: Boolean;
a, b, c: real;
Here several variables have been introduced and defined as to type. Since the
world is often nonscalar we always seem to want some data structures. Arrays
fill that need. They may be multidimensional and are declared as to type and
dimension. Here is an array declaration.
var d, e: array[ , ] of integer;
s: array[ ] of real;
h: array[ , , , ] of integer;
The NICE Programming Language 2
We note that s is a one-dimensional array while h has four dimensions. As is
usual in computer science, elements in arrays are referred to by their position.
Thus s[3] is the third element of the array named s.
So far, so good. We have placed syntactic restrictions upon variables and
specified our constants in a rather rigid (precise?) manner. But we have not
placed any bounds on the magnitude of numbers or array dimensions. This is
fine since we did not specify a particular computer for running the programs. In
fact, we are dealing with ideal, gigantic computers that can handle very large
values. So why limit ourselves? To enumerate:
a) Numbers can be of any magnitude.
b) Arrays may be declared to have as many dimensions as we wish.
c) Arrays have no limit on the number of elements they contain.
Our only restriction will be that at any instant during a computation everything
must be finite. That means no numbers or arrays of infinite length. Huge - yes,
but not infinite! In particular, this means that the infinite decimal expansion
0.333... for one third is not allowed yet several trillion 3's following a decimal
point is quite acceptable. We should also note that even though we have a
number type named real, these are not real numbers in the mathematical sense,
but floating point numbers.
On to the next step - expressions. They are built from variables, constants,
operators, and parentheses. Arithmetic expressions such as these:
x + y(z + 17)
a[6] - (zb[k, m+2])/3
may contain the operators for addition, subtraction, multiplication, and
division. Boolean expressions are formed from arithmetic expressions and
relational operators. For example:
x + 3 = z/y -17
a[n] > 23
Compound Boolean expressions also contain the logical connectives and, or, and
not. They look like this:
x - y > 3 and b[7] = z and v
(x = 3 or x = 5) and not z = 6
These expressions may be evaluated in any familiar manner (such as operator
precedence or merely from left to right). We do not care how they are evaluated,
as long as we maintain consistency throughout.
The NICE Programming Language 3
In every programming language computational directives appear as statements.
Our NICE language contains these also. To make things a little less wordy we
shall introduce some notation. Here is the master list:
E ar bi t r ar y expr essi ons
AE ar i t hmet i c expr essi ons
BE Bool ean expr essi ons
V var i abl es
S ar bi t r ar y st at ement s
N number s
Variables, statements, and numbers may be numbered (V6, S1, N9) in the
descriptions of some of the statements used in NICE programs that follow.
a) As s ign men t . Values of expressions are assigned to variables in statements
of the form: V = E.
b) Tr an s fer . This statement takes the form goto N where N is an integer which
is used as a label. Labels precede statements. For example: 10: S.
c) Con d it ion al. The syntax is: if BE then S1 else S2 where the else clause is
optional.
d) Blocks . These are groups of statements, separated by semicolons and
bracketed by begin and end. For example: begin S1; S2; ; Sn end
Figure 1 contains a fragment of code that utilizes every statement defined so
far. After executing the block, z has taken on the value of x factorial.
begin
z = 1;
10: z = z*x;
x = x - 1;
if not x = 0 then goto 10
end
Fi gur e 1- Fact or i al Comput at i on
e) Rep et it ion . The while and for statements cause repetition of other
statements and have the form:
while BE do S
for V = AE to AE do S
The NICE Programming Language 4
Steps in a for statement are assumed to be one unless downto (instead of to)
is employed. Then the steps are minus one as then we decrement rather
than increment. It is no surprise that repetition provides us with structured
ways to compute factorials. Two additional methods appear in figure 2.
begin
z = 1;
for n = 2 to x
do z = z*n
end
begin
z = 1;
n = 1;
while n < x do
begin
n = n + 1;
z = z*n
end
end
Fi gur e 2 - St r uct ur ed Pr ogr ammi ng Fact or i al Comput at i on
f) Comp u t at ion by cas es . The case statement is a multiway, generalized if
statement and is written:
case AE of N1: S1; N2: S2; ... ; Nk: Sk endcase
where the Nk are numerical constants. It works in a rather straightforward
manner. The expression is evaluated and if its value is one of the Nk, then
the corresponding statement Sk is executed. A simple table lookup is
provided in figure 3. (Note that the cases need not be in order nor must they
include all possible cases.)
case x - y/4 of:
15: z = y + 3;
0: z = w*6;
72: begin
x = 7; z = -2*z
end;
6: w = 4
endcase
Fi gur e 3 - Tabl e Lookup
g) Ter min at ion . A halt(V) statement brings the program to a stop with the
value of V as output. V may be a simple variable or an array.
The NICE Programming Language 5
Now that we know all about statements and their components it is time to
define programs. We shall say that a program consists of a heading, a
declaration section and a statement (which is usually a block). The heading
looks like:
program name(V1, V2, ... , Vn)
and contains the name of the program as well as its input parameters. These
parameters may be variables or arrays. Then come all of the declarations
followed by a statement. Figure 4 contains a complete program.
program expo(x, y)
var n, x, y, z: integer;
begin
z = 1;
for n = 1 to y do z = z*x;
halt(z)
end
Fi gur e 4 - Exponent i at i on Pr ogr am
The only thing remaining is to come to an agreement about exactly what
programs do. Let's accomplish this by examining several. It should be rather
obvious that the program of figure 4 raises x to the y-th power and outputs this
value. So we shall say that programs compute functions.
Our next program, in figure 5, is slightly different in that it does not return a
numerical value. Examine it.
program square(x)
var x, y: integer;
begin
y = 0;
while y*y < x do y = y + 1;
if y*y = x then halt(true) else halt(false)
end
Fi gur e 5 - Bool ean Funct i on
This program does return an answer, so it does compute a function. But it is a
Boolean function since it returns either true or false. We depict this one as:
square(x) = true if x is a perfect square and false otherwise.
The NICE Programming Language 6
Or, we could say that the program named square decides whether or not an
integer is a perfect square. In fact, we state that this program decides
membership for the set of squares.
Let us sum up all of the tasks we have determined that programs accomplish
when we execute them. We have found that they do the following two things.
a) compute functions
b) decide membership in sets
And, we noted that (b) is merely a special form of (a). That is all we know so far.
So far, so good. But, shouldnt there be more? That was rather simple. And,
also, if we look closely at our definition of what a program is, we find that we
can write some strange stuff. Consider the following rather silly program.
program nada(x)
var x, y: integer;
x = 6
Is it a program? Well, it has a heading, all of the variables are declared, and it
ends with a statement. So, it must be a program since it looks exactly like one.
But, it has no halt statement and thus can have no output. So, what does it do?
Well, not much that we can detect when we run it!
Let's try another in the same vein. Consider the well-known and elegant:
program loop(x)
var x: integer;
while x = x do x = 17
which does something, but alas, nothing too useful. In fact, programs which
either do not execute a halt or even contain a halt statement are programs, but
accomplish very little that is evident to an observer. We shall say that these
compute functions which are undefined (one might say that f(x) = ?) since we do
not know how else to precisely describe the results attained by running them.
Let us examine one that is sort of a cross between the two kinds of programs
we have seen thus far. This, our last strange example, sometimes halts,
sometimes loops and is included as figure 6.
The NICE Programming Language 7
program fu(x)
var n, x: integer;
begin
n = 0;
while not x = n do n = n - 2;
halt(x)
end
Fi gur e 6 - A Par t i al l y Def i ned Funct i on
This halts only for even, positive integers and computes the function described
as:
fu(x) =x if x is even and positive, otherwise undefined
When a program does not halt, we shall say it diverges. The function fu could
also be defined using mathematical notation as follows.
=
otherwise diverge
positive and even is x if x
fu(x)
Since it halts at times (and thus is defined on the even, positive integers) we will
compromise and maintain that it is partially defined.
To recap, we have agreed that computation takes place whenever programs are
run on some ideal machine and that programs are written according to the rules
of our NICE language.
An important note is needed here. We have depended heavily upon our
computing background for the definition of exactly what occurs when
computation takes place rather than dwell upon the semantics of the NICE
language. We could go into some detail of how statements in our language
modify the values of variables and so forth, but have agreed not to at this time.
So, given that we all understand program execution, we can state the following
two assertions as our definition of computation.
programs compute functions
any computable function can be computed by some program.
The functions we enjoy computing possess values for all of their input sets and
are called defined, but some functions are different. Those functions that never
have outputs are known as undefined functions. And finally, the functions that
The NICE Programming Language 8
possess output values for some inputs and none for others are the partially
defined functions.
At last we have fairly carefully defined computation. It is merely the process of
running NICE programs.
Tur i ng Machi nes
Computation has been around a very long time. Computer programs, after all,
are a rather recent creation. So, we shall take what seems like a brief detour
back in time in order to examine another system or model of computation. We
shall not wander too far back though, merely to the mid 1930s. After all, one
could go back to Aristotle, who was possibly the first Western person to develop
formal computational systems and write about them.
Well before the advent of modern computing machinery, a British logician
named A. M. Turing (who later became a famous World War II codebreaker)
developed a computing system. In the 1930's, a little before the construction of
the first electrical computer, he and several other mathematicians (including
Church, Markov, Post, and Turing) independently considered the problem of
specifying a system in which computation could be defined and studied.
Turing focused upon human computation and thought about the way that
people compute things by hand. With this examination of human computation
he designed a system in which computation could be expressed and carried out.
He claimed that any nontrivial computation required:
a simple sequence of computing instructions,
scratch paper,
an implement for writing and erasing,
a reading device, and
the ability to remember which instruction is being carried out.
Turing then developed a mathematical description of a device possessing all of
the above attributes. Today, we would recognize the device that he defined as a
special purpose computer. In his honor it has been named the Turing machine.
This heart of this machine is a finite control box which is wired to execute a
specific list of instructions and thus is precisely a special purpose computer or
computer chip. The device records information on a scratch tape during
computation and has a two-way head that reads and writes on the tape as it
moves along. Such a machine might look like that pictured in figure 1.
Turing Machines 2
scr a t ch t a p e
0 1 # 1 1 1 1
. . .
I 4 2
f i ni t e co nt r o l
Fi gur e 1 - A Tur i ng Machi ne
A finite control is a simple memory device that remembers which instruction
should be executed next. The tape, divided into squares (each of which may
hold a symbol), is provided so that the machine may record results and refer to
them during the computation. In order to have enough space to perform
computation, we shall say that the tape is arbitrarily long. By this we mean that
a machine never runs out of tape or reaches the right end of its tape. This does
NOT mean that the tape is infinite - just long enough to do what is needed. A
tape head that can move to the left and right as well as read and write connects
the finite control with the tape.
If we again examine figure 1, it is evident that the machine is about to execute
instruction I42 and is reading a 1 from the tape square that is fifth from the left
end of the tape. Note that we only show the portion of the tape that contains
non-blank symbols and use three dots (. . .) at the right end of our tapes to
indicate that the remainder is blank.
That is fine. But, what runs the machine? What exactly are these instructions
which govern its every move? A Turing machine instruction commands the
machine to perform the sequence of several simple steps indicated below.
a) read the tape square under the tape head,
b) write a symbol on the tape in that square,
c) move its tape head to the left or right, and
d) proceed to a new instruction
Steps (b) through (d) depend upon what symbol appeared on the tape square
being scanned before the instruction was executed.
An instruction shall be presented in a chart that enumerates outcomes for all of
the possible input combinations. Here is an example of an instruction for a
machine which uses the symbols 0, 1, #, and blank.
Turing Machines 3
symbol symbol head next
read written move instruction
I93 0 1 l ef t next
1 1 r i ght I17
b 0 hal t
# # r i ght same
This instruction (I93) directs a machine to perform the actions described in the
fragment of NICE language code provided below.
case (symbol read) of:
0: begin
print a 1;
move one tape square left;
goto the next instruction (I94)
end;
1: begin
print a 1;
move right one square;
goto instruction I17
end;
blank: begin print a 0; halt end;
#: begin
print #;
move to the right;
goto this instruction (I93)
end
endcase
Now that we know about instructions, we need some conventions concerning
machine operation. Input strings are written on the tape prior to computation
and will always consist of the symbols 0, 1, and blank. Thus we may speak of
inputs as binary numbers when we wish. This may seem arbitrary, and it is.
But the reason for this is so that we can describe Turing machines more easily
later on. Besides, we shall discuss other input symbol alphabets in a later
section.
When several binary numbers are given to a machine they will be separated by
blanks (denoted as b). A sharp sign (#) always marks the left end of the tape at
the beginning of a computation. Usually a machine is never allowed to change
this marker. This is so that it can always tell when it is at the left end of its tape
and thus not fall off the tape unless it wishes to do so. Here is an input tape
with the triple <5, 14, 22> written upon it.
Turing Machines 4
# 1 0 1 1 1 1 0 1 0 1 1 0
. . .
In order to depict this tape as a string we write: #101b1110b10110 and
obviously omit the blank fill on the right.
Like programs, Turing machines are designed by coding sequences of
instructions. So, let us design and examine an entire machine. The sequence of
instructions in figure 2 describes a Turing machine that receives a binary
number as input, adds one to it and then halts. Our strategy will be to begin at
the lowest order bit (on the right end of the tape) and travel left changing ones
to zeros until we reach a zero. This is then changed into a one.
One small problem arises. If the endmarker (#) is reached before a zero, then
we have an input of the form 111...11 (the number 2
n
- 1) and must change it to
1000...00 (or 2
n
).
sweep right to end of input
read write move goto
I1 0 0 r i ght same
1 1 r i ght same
# # r i ght same
b b l ef t next
change 1's to 0's on left sweep,
then change 0 to 1
I2 0 1 hal t
1 0 l ef t same
# # r i ght next
input = 11...1, so sweep right
printing 10000
(print leading 1, add 0 to end)
I3 0 1 r i ght next
I4 0 0 r i ght same
b 0 hal t
Fi gur e 2 - Successor Machi ne
In order to understand this computational process better, let us examine, or in
elementary programming terms, trace, a computation carried out by this Turing
machine. First, we provide it with the input 1011 on its tape, place its head on
the left endmarker (the #), and turn it loose.
Turing Machines 5
Have a peek at figure 3. It is a sequence of snapshots of the machine in action.
One should note that in the last snapshot (step 9) the machine is not about to
execute an instruction. This is because it has halted.
St ep 9 )
St ep 8 )
St ep 7 )
St ep 6 )
St ep 5 )
I2
I2
I1
I2
I1
St ep 2 )
St ep 1 )
St ar t )
St ep 3 )
St ep 4 )
I1
I1
I1
I1
#
. . .
1 1 1 0 #
. . .
1 1 1 0
#
. . .
1 1 1 0 #
. . .
1 1 1 0
#
. . .
1 0 1 0
#
. . .
0 0 1 0
#
. . .
1 1 1 0
#
. . .
1 1 1 0
#
. . .
0 0 1 1 #
. . .
1 1 1 0
Fi gur e 3 - Tur i ng Machi ne Comput at i on
Now that we have seen a Turing machine in action let us note some features, or
properties of this class of computational devices.
a) There are no space or time constraints.
b) They may use numbers (or strings) of any size.
c) Their operation is quite simple - they read, write, and move.
In fact, Turing machines are merely programs written in a very simple language.
Everything is finite and in general rather uncomplicated So, there is not too
much to learn if we wish to use them as a computing device. Well, maybe we
should wait a bit before believing that!
Turing Machines 6
For a moment we shall return to the previous machine and discuss its
efficiency. If it receives an input consisting only of ones (for example:
111111111), it must:
1) Go to the right end of the input,
2) Return to the left end marker, and
3) Go back to the right end of the input.
This means that it runs for a number of steps more than three times the length
of its input. While one might complain that this is fairly slow, the machine does
do the job! One might ask if a more efficient machine could be designed to
accomplish this task? Try that as an amusing exercise.
Another thing we should note is that when we present the machine with a blank
tape it runs for a few steps and gets stuck on instruction I3 where no action is
indicated for the configuration it is in since it is reading a blank instead of a
zero. Thus it cannot figure out what to do. We say that this in an undefined
computation and we shall examine situations such as this quite carefully later.
Up to this point, our discussion of Turing machines has been quite intuitive and
informal. This is fine, but if we wish to study them as a model of computation
and perhaps even prove a few theorems about them we must be a bit more
precise and specify exactly what it is we are discussing. Let us begin.
A Turing machine instruction (we shall just call it an instruction) is a box
containing read-write-move-next quadruples. A labeled instruction is an
instruction with a label (such as I46) attached to it. Here is the entire machine.
Defin it ion . A Turing machine is a finite sequence of labeled instructions
with labels numbered sequentially from I1.
Now we know precisely what Turing machines are. But we have yet to define
what they do. Let's begin with pictures and then describe them in our
definitions. Steps five and six of our previous computational example (figure 3)
were the machine configurations:
I1
#
. . .
1 1 1 0
I2
#
. . .
1 1 1 0
If we translate this picture into a string, we can discuss what is happening in
prose. We must do so if we wish to define precisely what Turing machines
accomplish. So, place the instruction to be executed next to the symbol being
read and we have an encoding of this change of configurations that looks like:
Turing Machines 7
#1011(I1)b... #101(I2)1b...
This provides the same information as the picture. It is almost as if we took a
snapshot of the machine during its computation. Omitting trailing blanks from
the description we now have the following computational step
#1011(I1) #101(I2)1
Note that we shall always assume that there is an arbitrarily long sequence of
blanks to the right of any Turing machine configuration.
Defin it ion . A Turing machine configuration is a string of the form x(In)y
or x where n is an integer and both x and y are (possibly empty) strings of
symbols used by the machine.
So far, so good. Now we need to describe how a machine goes from one
configuration to another. This is done, as we all know by applying the
instruction mentioned in a configuration to that configuration thus producing
another configuration. An example should clear up any problems with the
above verbosity. Consider the following instruction.
I17 0 1 r i ght next
1 b r i ght I3
b 1 l ef t same
# # hal t
Now, observe how it transforms the following configurations.
a) #1101(I17)01 #11011(I18)1
b) #110(I17)101 #110b(I3)01
c) #110100(I17) #11010(I17)01
d) (I17)#110100 #110100
Especially note what took place in (c) and (d). Case (c) finds the machine at the
beginning of the blank fill at the right end of the tape. So, it jots down a 1 and
moves to the left. In (d) the machine reads the endmarker and halts. This is
why the instruction disappeared from the configuration.
Defin it ion . For Turing machine configurations C
i
and C
j
, C
i
yields C
j
(written C
i
C
j
) if and only if applying the instruction in C
i
produces C
j
.
In order to be able to discuss a sequence of computational steps or an entire
computation at once, we need additional notation.
Turing Machines 8
Defin it ion . If C
i
and C
j
are Turing machine configurations then C
i
eventually yields C
j
(written C
i
C
j
) if and only if there is a finite
sequence of configurations C
1
,C
2
, ... , C
k
such that:
C
i
= C
1
C
2
... C
k
=C
j .
At the moment we should be fairly at ease with Turing machines and their
operation. The concept of computation taking place when a machine goes
through a sequence of configurations should also be comfortable.
Let us turn to something quite different. What about configurations which do
not yield other configurations? They deserve our attention also. These are
called terminal configurations because they terminate a computation). For
example, given the instruction:
I3 0 1 hal t
1 b r i ght next
# # l ef t same
what happens when the machine gets into the following configurations?
a) (I3)#01101
b) #1001(I3)b10
c) #100110(I3)
d) #101011
Nothing happens - right? If we examine the configurations and the instruction
we find that the machine cannot continue for the following reasons (one for
each configuration).
a) The machine moves left and falls off of the tape.
b) The machine does not know what to do.
c) Same thing. A trailing blank is being scanned.
d) Our machine has halted.
Thus none of those configurations lead to others. Furthermore, any
computation or sequence of configurations containing configurations like them
must terminate immediately.
By the way, configuration (d) is a favored configuration called a halting
configuration because it was reached when the machine wished to halt. For
example, if our machine was in the configuration #10(I3)0011 then the next
configuration would be #101011 and no other configuration could follow. These
halting configurations will pop up later and be of very great interest to us.
Turing Machines 9
We name individual machines so that we know exactly which machine we are
discussing at any time. We will often refer to them as M
1
, M
2
, M
3
, or M
i
and M
k
.
The notation M
i
(x) means that Turing machine M
i
has been presented with x as
its input. We shall use the name of a machine as the function it computes.
If the Turing machine M
i
is presented with x as its input and
eventually halts (after computing for a while) with z written on its
tape, we think of M
i
as a function whose value is z for the input x.
Let us now examine a machine that expects the integers x and y separated by a
blank as input. It should have an initial configuration resembling #xby.
erase x, find first symbol of y
I1 # # r i ght same
0 b r i ght same
1 b r i ght same
b b r i ght next
get next symbol of y - mark place
I2 0 l ef t next
1 l ef t I5
b b hal t
find right edge of output - write 0
I3 b b l ef t same
# # r i ght next
0 0 r i ght next
1 1 r i ght next
I4 b 0 r i ght I7
find right edge of output - write 1
I5 b b l ef t same
# # r i ght next
0 0 r i ght next
1 1 r i ght next
I6 b 1 r i ght next
find the and resume copying
I7 b b r i ght same
b r i ght I2
Fi gur e 4 - Sel ect i on Machi ne
Turing Machines 10
The Turing machine in figure 4 is what we call a selection machine. These
receive several numbers (or strings) as input and select one of them as their
output. This one computes the function: M(x, y) = y and selects the second of
its inputs. This of course generalizes to any number of inputs, but let us not
get too carried away.
Looking carefully at this machine, it should be obvious that it:
1) erases x, and
2) copies y next to the endmarker (#).
But, what might happen if either x or y happens to be blank? Figure it out! Also
determine exactly how many steps this machine takes to erase x and copy y.
(The answer is about n
2
steps if x and y are each n bits in length.)
Here is another Turing machine.
find the right end of the input
I1 0 0 r i ght same
1 1 r i ght same
# # r i ght same
b b l ef t next
is low order bit is 0 or 1?
I2 0 b l ef t next
1 b l ef t I5
# # r i ght I6
erase input and print 1
I3 0 b l ef t same
1 b l ef t same
# # r i ght next
I4 b 1 hal t
erase input and print 0
I5 0 b l ef t same
1 b l ef t same
# # r i ght next
I6 b 0 hal t
Fi gur e 5 - Even Int eger Accept or
Turing Machines 11
It comes from a very important family of functions, one which contains
functions that compute relations (or predicates) and membership in sets. These
are known as characteristic functions, or 0-1 functions because they only take
values of zero and one which denote false and true.
An example is the characteristic function for the set of even integers computed
by the Turing machine of figure 5. It may be described:
even(x) =
1 if x is even
0 otherwise
This machine leaves a one upon its tape if the input ended with a zero (thus an
even number) and halts with a zero on its tape otherwise (for a blank or odd
integers). It should not be difficult to figure out how many steps it takes for an
input of length n.
Now for a quick recap and a few more formal definitions. We know that Turing
machines compute functions. Also we have agreed that if a machine receives x
as an input and halts with z written on its tape, or in our notation:
(I1)#x #z
then we say that M(x) = z. When machines never halt (that is: run forever or
reach a non-halting terminal configuration) for some input x we claim that the
value of M(x) is undefined just as we did with programs. Since output and
halting are linked together, we shall precisely define halting.
Defin it ion . A Turing machine halts if and only if it encounters a halt
instruction during computation and diverges otherwise.
So, we have machines that always provide output and some that do upon
occasion. Those that always halt compute what we shall denote the total
functions while the others merely compute partial functions.
We now relate functions with sets by discussing how Turing machines may
characterize the set by deciding which inputs are members of the set and which
are not.
Defin it ion . The Turing machine M decides membership in the set A if
and only if for every x, if x A then M(x) = 1, otherwise M(x) = 0.
There just happens to be another method of computing membership in sets.
Suppose you only wanted to know about members in some set and did not care
Turing Machines 12
at all about elements that were not in the set. Then you could build a machine
which halted when given a member of the set and diverged (ran forever or
entered a non-halting terminal configuration) otherwise. This is called
accepting the set.
Defin it ion . The Turing machine M accepts the set A if and only if for all
x, M(x) halts for x in A and diverges otherwise.
This concept of acceptance may seem a trifle bizarre but it will turn out to be of
surprising importance in later chapters.
A Smal l er Pr ogr ammi ng Language
At this point two rather different models or systems of computation have been
presented and discussed. One, programs written in the NICE programming
language, has a definite computer science flavor, while the other, Turing
machines, comes from mathematical logic. Several questions surface.
which system is better?
is one system more powerful than the other?
The programming language is of course more comfortable for us to work with
and we as computer scientists tend to believe that programs written in similar
languages can accomplish any computational task one might wish to perform.
Turing machine programs are rather awkward to design and there could be a
real question about whether they have the power and flexibility of a modern
programming language.
In fact, many questions about Turing machines and their power arise. Can they
deal with real numbers? arrays? Can they execute while statements? In order
to discover the answers to our questions we shall take what may seem like a
rather strange detour and examine the NICE programming language in some
detail. We will find that many of the features we hold dear in programming
languages are not necessary (convenient, yes, but not necessary) when our aim
is only to examine the power of a computing system.
To begin, what about numbers? Do we really need all of the numbers we have
in the NICE language? Maybe we could discard half of them.
Negative numbers could be represented as positive numbers in the following
manner. If we represent numbers using sign plus absolute value notation, then
with companion variables recording the signs of each of our original variables
we can keep track of all values that are used in computation. For example, if
the variable x is used, we shall introduce another named signx that will have the
value 1 if x is positive and 0 if x is negative. For example:
val ue x si gnx
19 19 1
-239 239 0
A Smaller Language 2
Representing numbers in this fashion means that we need not deal with
negative numbers any longer. But, we shall need to exercise some caution while
doing arithmetic. Employing our new convention for negative numbers,
multiplication and division remain much the same although we need to be
aware of signs, but addition and subtraction become a bit more complicated.
For example, the assignment statement z = x + y becomes the following.
if signx = signy then
begin z = x + y; signz = signx end
else if x > y
then begin z = x - y; signz = signx end
else begin z = y - x; signz = signy end
This may seem a bit barbaric, but it does get the job done. Furthermore, it
allows us to state that we need only nonnegative numbers.
[NB. An interesting side effect of the above algorithm is that we now have two
different zeros. Zero can be positive or negative, exactly like some second-
generation computers. But this will not effect arithmetic as we shall see.]
Now let us rid ourselves of real or floating point numbers. The standard
computer science method is to represent the number as an integer and specify
where the decimal point falls. Another companion for each variable (which we
shall call pointx) is now needed to specify how many digits lie behind the
decimal point. Here are three examples.
val ue x si gnx poi nt x
537 537 1 0
0.0025 25 1 4
-6.723 6723 0 3
Multiplication remains rather straightforward, but if we wish to divide, add, or
subtract these numbers we need a scaling routine that will match the decimal
points. In order to do this for x and y, we must know which is the greater
number. If pointx is greater than pointy we scale y with the following code:
while pointy < pointx do
begin
y = y*10;
pointy = pointy + 1
end
A Smaller Language 3
and then go through the addition routine. Subtraction (x - y) can be
accomplished by changing the sign (of y) and adding.
As mentioned above, multiplication is rather simple because it is merely:
z = x*y;
pointz = pointx + pointy;
if signx = signy then signz = 1
else signz = 0;
After scaling, we can formulate division in a similar manner.
Since numbers are never negative, a new sort of subtraction may be introduced.
It is called proper subtraction and it is defined as:
x y =maximum(0, x y).
Note that the result never goes below zero. This will be useful later.
A quick recap is in order. None of our arithmetic operations lead below zero
and our only numbers are the nonnegative integers. If we wish to use negative
or real (floating point) numbers, we must now do what folks do at the machine
language level; fake them!
Now let's continue with our mutilation of the NICE language and destroy
expressions! Boolean expressions are easy to compute in other ways if we
think about it. We do not need E
1
> E
2
since it is the same as:
E
1
E
2
and not E
1
= E
2
Likewise for E
1
< E
2
. With proper subtraction, the remaining simple Boolean
arithmetic expressions can be formulated arithmetically. Here is a table of
substitutions. Be sure to remember that we have changed to proper subtraction
and so a small number minus a large one is zero.
E
1
E
2
E
1
E
2
E
1
= E
2
E
1
- E
2
= 0
E
2
- E
1
= 0
(E
1
- E
2
) + (E
2
- E
1
) = 0
This makes the Boolean expressions found in while and if statements less
complex. We no longer need to use relational operators since we can we assign
A Smaller Language 4
these expressions to variables as above and then use those variables in the while
or if statements. Only the following two Boolean expressions are needed.
x = 0
not x = 0
Whenever a variable such as z takes on a value greater than zero, the (proper
subtraction) expression 1 - z turns its value into zero. Thus Boolean
expressions which employ logical connectives may be restated arithmetically.
For example, instead of asking if x is not equal to 0 (i.e. not x = 0), we just set z
to 1 - x and check to see if z is zero. Thee transformations necessary are
included in the chart below and are followed by checking z for zero.
not x = 0
x = 0 and y = 0
x = 0 or y = 0
z = 1 - x
z = x + y
z = x*y
Using these conversions, Boolean expressions other than those of the form x = 0
are no longer found in our programs.
Compound arithmetic expressions are not necessary either. We shall just break
them into sequences of statements that possess one operator per statement.
Now we no longer need compound expressions of any kind!
What next? Well, for our finale, let's remove all of the wonderful features from
the NICE language that took language designers years and years to develop.
a) Arrays. We merely encode the elements of an array into a simple variable
and use this. (This transformation appears as an exercise!)
b) Although while statements are among the most important features of
structured programming, a statement such as:
while x = 0 do S
(recall that only x = 0 exists as a Boolean expression now) is just the same
computationally as:
10: z = 1 - x;
if z = 0 then goto 20;
S;
goto 10
20: (* next statement *)
A Smaller Language 5
c) The case statement is easily translated into a barbaric sequence of tests and
transfers. For example, consider the statement:
case E of: N
1
: S
1
; N
2
: S
2
; N
3
: S
3
endcase
Suppose we have done some computation and set x, y, and z such that the
following statements hold true.
if x = 0 then E = N
1
if y = 0 then E = N
2
if z = 0 then E = N
3
Now the following sequence is equivalent to the original case.
if x = 0 then goto 10;
if y = 0 then goto 20;
if z = 0 then goto 30;
goto 40;
10: begin S
1
; goto 40 end;
20: begin S
2
; goto 40 end;
20: begin S
3
; goto 40 end;
40: (* next statement *)
d) if-then-else and goto statements can be simplified in a manner quite similar
to our previous deletion of the case statement. Unconditional transfers
(such as goto 10) shall now be a simple if statement with a little
preprocessing. For example:
z = 0;
if z = 0 then goto 10;
And, with a little bit of organization we can remove any Boolean expressions
except x = 0 from if statements. Also, the else clauses may be discarded
after careful substitution.
e) Arithmetic. Let's savage it almost completely! Who needs multiplication when
we can compute z = x*y iteratively with:
z = 0;
for n = 1 to x do z = z + y;
A Smaller Language 6
Likewise addition can be discarded. The statement z = x + y can be replaced
by the following.
z = x;
for n = 1 to y do z = z + 1;
The removal of division and subtraction proceeds in much the same way. All
that remains of arithmetic is successor (x + 1) and predecessor (x - 1).
While we're at it let us drop simple assignments such as x = y by
substituting:
x = 0;
for i = 1 to y do x = x + 1;
f) The for statement. Two steps are necessary to remove this last vestige of
civilization from our previously NICE language. In order to compute:
for i = m to n do S
we must initially figure out just how many times S is to be executed. We
would like to say that;
t = n - m + 1
but we cannot because we removed subtraction. We must resort to:
z = 0;
t = n;
t = t + 1;
k = m;
10: if k = 0 then goto 20;
t = t - 1;
k = k - 1;
if z = 0 then go to 10;
20: i = m;
(Yes, yes, we cheated by using k=m and i=m! But nobody wanted to see the
loops that set k and i to m. OK?) Now all we need do is to repeat S over and
over again t times. Here's how:
A Smaller Language 7
30: if t = 0 then goto 40;
S;
t = t - 1;
i = i + 1;
if z = 0 then go to 30;
40: (* next statement *);
Loops involving downto are similar.
Now it is time to pause and summarize what we have done. We have removed
most of the structured commands from the NICE language. Our deletion
strategy is recorded the table of figure 1. Note that statements and structures
used in removing features are not themselves destroyed until later.
Category Item Deleted Items Used
Const ant s negat i ve number s
f l oat i ng poi nt number s
ext r a var i abl es
case
ext r a var i abl es
while
Bool ean ar i t hmet i c oper at i ons
l ogi cal connect i ves
ar i t hmet i c
Ar r ays ar r ays ar i t hmet i c
Repet i t i on while goto, if-then-else
Sel ect i on case, else if-then
Tr ansf er uncondi t i onal goto if-then
Ar i t hmet i c mul t i pl i cat i on
addi t i on
di vi si on
subt r act i on
si mpl e assi gnment
addi t i on, for
successor , for
subt r act i on, for
pr edecessor , for
successor , for, if-then
It er at i on for If-then, successor ,
pr edecessor
Fi gur e 1 - Language Dest r uct i on
We have built a smaller programming language that seems equivalent to our
original NICE language. Let us call it the SMALL programming language and
now precisely define it.
A Smaller Language 8
In fact, we shall start from scratch. A variable is a string of lower case Roman
letters and if x is an arbitrary variable then an (unlabeled) statement takes one
of the following forms.
x = 0
x = x + 1
x = x - 1
if x = 0 then goto 10
halt(x)
In order to stifle individuality, we mandate that statements must have labels
that are just integers followed by colons and are attached to the left-hand sides
of statements. A title is again the original input statement and program
heading. As before, it looks like the following.
program name(x, y, z)
Defin it ion . A program consists of a title followed by a sequence of
consecutively labeled instructions separated by semicolons.
An example of a program in our new, unimproved SMALL programming
language is the following bit of code. We know that it is a program because it
conforms to the syntax definitions outlined above. (It does addition, but we do
not know this yet since the semantics of our language have not been defined.)
program add(x, y);
1: z = 0
2: if y = 0 then goto 6;
3: x = x + 1;
4: y = y - 1;
5: if z = 0 then go to 2;
6: halt(x)
On to semantics! We must now describe computation or the execution of
SMALL programs in the same way that we did for Turing machines. This shall be
carried out in an informal manner, but the formal definitions are quite similar
to those presented for Turing machine operations in the last section.
Computation, or running SMALL language programs causes the value of
variables to change throughout execution. In fact, this is all computation
entails. So, during computation we must show what happens to variables and
their values. A variable and its value can be represented by the pair:
<x
i
, v
i
>
A Smaller Language 9
If at every point during the execution of a program we know the environment,
or the contents of memory, we can easily depict a computation. Thus knowing
what instruction we are about to execute and the values of all the variables used
in the program tells us all we need know at any particular time about the
program currently executing.
Very nearly as we did for Turing machines, we define a configuration to be the
string such as:
k <x
1
, v
1
><x
2
, v
2
>... <x
n
, v
n
>
where k is an instruction number (of the instruction about to be executed), and
the variable-value pairs show the current values of all variables in the program.
The manner in which one configuration yields another should be rather obvious.
One merely applies the instruction mentioned in the configuration to the proper
part of the configuration, that is, the variable in the instruction. The only minor
bit of defining we need to do is for the halt instruction. As an example, let
instruction five be halt(z). Then if x, y, and z are all of the variables in the
program, we say that:
5 <x, 54><y, 23><z, 7> 7
Note that a configuration may be either an integer followed by a sequence of
variable-value pairs or an integer. Also think about why a configuration is an
integer. This happens if and only if a program has halted. We may now reuse
the Turing machine system definition for eventually yielding and computation
has almost been completely defined.
Initially the following takes place when a SMALL program is executed.
a) input variables are set to their input values
b) all other variables are set to zero
c) execution begins with instruction number one
From this we know what an initial configuration looks like.
Halting configurations were defined above to be merely numbers. Terminal
configurations are defined in a manner almost exactly the same as for Turing
machines. We recall that this indicates that terminal configurations might
involve undefined variables and non-existent instructions.
Since this is the stuff detected by compilers, here is a point to ponder. Are
there any more things that might pop up and stop a program?
A Smaller Language 10
We will now claim that programs compute functions and that all of the
remaining definitions are merely those we used in the section about Turing
machines. The formal statements of this is left as an exercise.
At this point we should believe that any program written in the NICE language
can be rewritten as a SMALL program. After all, we went through a lot of work
to produce the SMALL language! This leads to a characterization of the
computable functions.
Defin it ion . The computable functions are exactly those computed by
programs written in the SMALL programming language.
Equi val ence of t he Model s.
Our discussion of what comprises computation and how exactly it takes place
spawned three models of computation. There were two programming
languages (the NICE language and the SMALL language) and Turing Machines.
These came from the areas of mathematical logic and computer programming.
It might be very interesting to know if any relationships between three systems
of computation exist and if so, what exactly they are. An obvious first question
to ask is whether they allow us to compute the same things. If so, then we can
use any of our three systems when demonstrating properties of computation
and know that the results hold for the other two. This would be rather helpful.
First though, we must define exactly what we mean by equivalent programs and
equivalent models of computation. We recall that both machines and programs
compute functions and then state the following.
Defin it ion . Two programs (or machines) are equivalent if and only if
they compute exactly the same function.
Defin it ion . Two models of computation are equivalent if and only if the
same exact groups of functions can be computed in both systems.
Let us look a bit more at these rather official and precise statements. How do
we show that two systems permit computation of exactly the same functions?
If we were to show that Turing Machines are equivalent to NICE programs, we
should have to demonstrate:
For each NICE program there is an equivalent Turing machine
For each Turing machine there is an equivalent NICE program
This means that we must prove that for each machine M there is a program P
such that for all inputs x: M(x) = P(x) and vice versa.
An fairly straightforward equivalence occurs as a consequence of the language
destruction work we performed in painful detail earlier. We claim that our two
languages compute exactly the same functions and shall provide an argument
for this claim in the proof of theorem 1.
(In the sequel, we shall use short names for our classes of functions for the sake
of brevity. The three classes mentioned above shall be TM, NICE, and SMALL.)
Model Equivalence 2
Th eor em 1. The following classes of functions are equivalent:
a) the computable functions,
b) functions computable by NICE programs, and
c) functions computable by SMALL programs.
In for mal Pr oof. We know that the classes of computable functions and
those computed by SMALL programs are identical because we defined
them to be the same. Thus by definition, we know that:
comput abl e = SMALL.
The next part is almost as easy. If we take a SMALL program and place
begin and end block delimiters around it, we have a NICE program since
all SMALL instructions are NICE too (in technical terms). This new
program still computes exactly the same function in exactly the same
manner. This allows us to state that:
comput abl e = SMALL NICE.
Our last task is not so trivial. We must show that for every NICE program,
there is an equivalent SMALL program. This will be done in an informal
but hopefully believable manner based upon the section on language
destruction.
Suppose we had some arbitrary NICE program and went through the step-
by-step transformations upon the statements of this program that turn it
into a SMALL program. If we have faith in our constructions, the new
SMALL program computes exactly same function as the original NICE
program. Thus we have shown that
comput abl e = SMALL = NICE
and this completes the proof.
That was really not so bad. Our next step will be a little more involved. We
must now show that Turing machines are equivalent to programs. The strategy
will be to show that SMALL programs can be converted into equivalent Turing
machines and that Turing machines in turn can be transformed into equivalent
NICE programs. That will give us the relationship:
SMALL TM NICE.
Model Equivalence 3
This relationship completes the equivalence we wish to show when put together
with the equivalence of NICE and SMALL programs shown in the last theorem.
Let us begin by transforming SMALL programs to Turing machines.
Taking an arbitrary SMALL program, we first reorganize it by renaming the
variables. The new variables will be named x
1
, x
2
, ... with the input variables
leading the list. An example of this is provided in figure 1.
program example(x, y) program example(x
1
, x
2
)
1: w = 0; 1: x
3
= 0;
2: x = x + 1; 2: x
1
= x
1
+ 1;
3: y = y - 1; 3: x
2
= x
2
- 1;
4: if y = 0 then goto 6; 4: if x
2
= 0 then goto 6;
5: if w = 0 then goto 2; 5: if x
3
= 0 then goto 2;
6: halt(x) 6: halt(x
1
)
Fi gur e 1 - Var i abl e Renami ng
Now we need to design a Turing machine that is equivalent to the SMALL
program. The variables used in the program are stored on segments of the
machines tape. For the above example with three variables, the machine
should have a tape that looks like the one shown below.
#
. . .
x
1
x
2
x
3
Note that each variable occupies a sequence of squares and that variables are
separated by blank squares. If x
1
= 1101 and x
2
= 101 at the start of
computation, then the machine needs to set x
3
to zero and create a tape like:
0 1 #
. . .
1 1 1 1 0 0
Now what remains is to design a Turing machine which will mimic the steps
taken by the program and thus compute exactly the same function as the
program in as close to the same manner as possible.
For this machine design we shall move to a general framework and consider
what happens when we transform any SMALL program into a Turing machine.
We first set up the tape. Then all of the instructions in the SMALL program are
translated into Turing machine instructions. A general schema for a Turing
machine equivalent to a SMALL program with m instructions follows.
Model Equivalence 4
Set up t he Tape
Pr ogr am Inst r uct i on 1
Pr ogr am Inst r uct i on 2
=
otherwise diverge
integer prime a is x if x
p(x)
We know exactly how to design a program that performs this task. An
inefficient method might involve checking to see if any integer less than x
divides x evenly. If so, then the routine halts and presents x as the output,
otherwise it enters an infinite loop.
But consider the following Boolean function.
f(x, y) = if lightning strikes at latitude x and longitude y
Is this computable? We feel that it is not since one must wait forever on the
spot and observe in order to determine the answer. We cannot think of any
effective and constructive way to compute this.
Other popular functions that many wish were constructive are:
w(n) = the n
th
number from now that will come up on a roulette wheel
h(n) = the horse that will win tomorrows n
th
race
Churchs and Turings Theses 3
We can statistically try to predict the first, but have no effective way of
computing it. The latter is totally out of the question. If these functions were
effective and constructive then computer programmers would be millionaires!
With our definition of human computation in hand we shall state the first of the
two beliefs.
Ch u r ch ' s Th es is : Every finitely specified, constructive computing
procedure can be carried out by a Turing machine.
By the way, Church did not state the thesis in terms of Turing machines. He
stated it in terms of the lambda calculus.
Note that anyone who subscribes to Church's thesis no longer needs to design
Turing machines! This is wonderful news. It means that we need not write
down lengthy sequences of machine instructions in order to show that
something is computable if we can state the algorithm in a more intuitive (but
constructive) manner. Part of the justification for this is that each and every
finitely specified, constructive algorithm anyone has ever thought up has turned
out to be Turing machine computable. We shall appeal to Church's thesis often
in the sequel so that we may omit coding actual machine descriptions.
Our second belief is credited to Turing and deals with systems of computation
rather than individual functions.
Tu r in g' s Th es is : Any formal system in which computation is defined as a
sequence of constructive steps is no more powerful than the Turing
machine model of computation.
In the section dealing with machine enhancement we saw some material which
could be thought of as evidence for Turing's thesis. And history has not
contradicted this thesis. Every formal system of the type specified above which
anyone has ever invented has been shown to be no more than equivalent to
Turing machines. Some of these include Church's lambda calculus, Post's Post
machines, Markov's processes, and Herbrand-Gdel-Kleene general recursive
functions.
So, it seems that we have achieved our goal and defined computation. Now it is
time to examine it.
N OTES
Tur i ng machi nes wer e i nt r oduced by A. M. Tur i ng i n hi s cl assi c paper :
A. M. TURING, "On computable numbers, with an application to the
Entscheidungsproblem," Proceedings, London Mathematical Society 2:42 (1936-
1937), 230-265. Errata appear in 2:43 (1937), 544-546.
Anot her machi ne model f or comput at i on was di scover ed i ndependent l y by:
E. L. POST, "Finite combinatory processes. Formulation I," J ournal of Symbolic
Logic 1 (1936), 103-105.
St i l l addi t i onal comput at i onal model s can be f ound i n:
N. CHOMSKY, "Three models for the description of language," IRE Transactions
on Information Theory 2:3 (1956) 113-124.
A. CHURCH, "The Calculi of Lambda-Conversion," Annals of Mathematics Studies
6 (1941) Princeton University Press, Princeton, New J ersey.
S. C. KLEENE, "General recursive functions of natural numbers," Mathematische
Annalen 112:5 (1936) 727-742.
A. A. MARKOV, "Theory of Algorithms," Trudy Mathematicheskogo Instituta
imeni V. A. Steklova 42 (1954).
E. L. POST, "Formal reductions of the general combinatorial decision problem,"
American J ournal of Mathematics 65 (1943) 197-215.
Computability Notes 2
Mor e i nf or mat i on on enhanced Tur i ng machi nes appear s i n many paper s
f ound i n t he l i t er at ur e. Sever al t i t l es ar e:
P. C. FISCHER, A. R. MEYER, and A. L. ROSENBERG, "Real-time simulation of
multihead tape units," J ournal of The Association for Computing Machinery 19:4
(1972) 590-607.
J . HARTMANIS and R. E. STEARNS, "On the computational complexity of
algorithms," Transactions of the American Mathematical Society 117 (1965) 285-
306.
H. WANG, "A variant to Turing's theory of computing machines," J ournal of the
Association for Computing Machinery 4:1 (1957) 63-92.
Chur ch' s Thesi s was pr esent ed i n:
A. CHURCH, "An unsolvable problem of elementary number theory," American
J ournal of Mathematics 58 (1936) 345-363.
Ot her t ext books whi ch cont ai n mat er i al on Tur i ng machi nes i ncl ude:
M. DAVIS, Computability and Unsolvability. McGraw-Hill, New York, 1958.
J . E. HOPCROFT and J . D. ULLMAN, Introduction to Automata Theory,
Languages, and Computation. Addison-Wesley, Reading, Mass., 1979.
H. R. LEWIS and C. H. PAPADIMITRIOU, Elements of the Theory of Computation.
Prentice-Hall, Englewood Cliffs, N. J ., 1981.
M. L. MINSKY, Computation: Finite and Infinite Machines. Prentice-Hall,
Englewood Cliffs, N. J ., 1967.
The paper s by Chur ch, Kl eene, Post , and Tur i ng ci t ed above have been
r epr i nt ed i n t he col l ect i on:
M. DAVIS, ed., The Undecidable. Raven Press, Hewlett, N.Y. 1965.
L PROBLEM S
The NICE Programming Language
1. Define a model of computation that does not depend on computers or
programming.
2. We used floating point numbers instead of real numbers in our
programming language. Why?
3. Add the data types character and string to the NICE language. Describe the
operations that are needed to manipulate them.
4. Examine the following program:
program superexpo(x, y)
var m, n, w, x, y, z: integer;
begin
w = 1;
for m = 1 to y do
begin
z = 1;
for n = 1 to w do z = z*x;
w = z
end;
halt(z)
end
What are the values of the function it computes when y equals 1, 2, and 3?
Describe this function in general.
Computability Problems 2
5. How many multiplications are performed by the programs named expo and
superexpo? (Express your answer in terms of x and y.)
6. We have seen that exponentiation can be programmed with the use of one
for-loop and that superexponenentiation can be done with two for-loops.
What can be computed with three nested for-loops? How about four?
7. Suppose you are given a program named big(x) and you modify it by
replacing all halt(z) statements with the following statement pair.
z = z + 1
halt(z)
What does the new program compute? Would you believe anyone who told
you that they have written a program that computes numbers larger than
those computed by any other program?
8. Write a program which computes the function
bar x ( ) =
10. Why is the "undefined" clause needed in the above definition of xP(x, y)?
Computability Problems 3
Turing Machines
1. What does the Turing machine of figure 2 that adds 1 to its input do when
given #000 as input? What about the inputs: #bbb, #b011, and #11b10?
2. Examine the following Turing machine:
I1 0 1 r i ght same
1 0 r i ght same
b b l ef t next
# # r i ght same
I2 0 1 hal t
1 0 l ef t same
# # hal t
What does it do when presented with the inputs #1011, #1111, and #0000?
In general, what does this machine accomplish?
3. Design a Turing machine that subtracts 1 from its input.
4. Design a Turing machine that recognizes inputs that read the same
forwards and backwards. (The machine should halt with its output equal to
1 for inputs such as #101101 or #11011, and provide the output 0 for
#1101 or #001110.)
5. How many instructions does the machine of the previous exercise execute
on inputs which are n symbols in length?
6. Design a Turing machine which receives #xby (x and y are binary integers)
as input and computes x + y. (You may use the machines that add and
subtract 1 as subroutines.)
7. Write down the instructions for a Turing machine which determines
whether its input is zero. What happens when this machine is given a blank
tape as input?
8. How many instructions are executed by the Turing machines of the last two
problems on inputs of length n?
9. Design a Turing machine that computes the fu(x) function of the NICE
language section.
Computability Problems 4
11. A 0-1 valued Turing machine is a machine that always provides outputs of
0 or 1. Since it halts for all inputs, it computes what is known as a total
function. Assume that M
i
(x, y) is a 0-1 valued Turing machine and design a
machine which receives y as input and halts if and only if there is an x for
which the machine M
i
(x, y) = 1.
A Smaller Programming Language
1. Describe the ways in which division must change after floating point
numbers have been replaced by triples of integers which denote their signs,
absolute values and decimal points.
2. Assume that you have programs for the following functions:
prime(i) = the i-th prime number
expo(x, y) = x raised to the y-th power.
A pair such as (x, y) can be uniquely encoded as:
expo(prime(1),x)*expo(prime(2),y)
and decoded by a suitable division routine. In a similar way, any single
dimensional array might be encoded. Describe the way in which an array
can be encoded as a single integer. Then write a select(a, k) function which
provides the k-th element of the array encoded as the integer a, and a
replace(a, k, x) program which sets the k-th element of a to the value of x.
3. How might a two dimensional array be encoded as a single integer? What
about an n-dimensional array?
4. Arrays in many programming languages have declared bounds on each
dimension. Is this restriction needed here? How might the routines of the
last two problems be affected if they were to be written for arbitrarily large
arrays?
5. Define integer division. Show that division can be replaced by subtraction
in much the same way that multiplication was replaced by addition.
6. If we allow the predecessor operation (x = x - 1) to be included in a
programming language, then subtraction is not needed. Show this.
7. Suppose procedure calls had been part of our NICE programming language.
How might they have been eliminated? What about recursive calls?
Computability Problems 5
8. Many modern programming languages include pointers as a data type. Do
they allow us to compute any functions that cannot be computed in our
simple language? Why?
9. Dynamic storage allocation is provided by some languages that let programs
call for new variables and structures during runtime. Is this necessary for
increased computational power?
10. The size of a program could be defined as the number of symbols in the
program. (In other words: the length of the program.) Consider two
programs (one in the extended language and one in the simplified language)
that compute the same function.
a) How might their sizes differ?
b) Compare their running times.
11. Let's consider programs in our SMALL language which have no input (and
thus need no titles). The only one line program that computes a defined
function is:
1: halt(x)
and this program outputs a zero if we assume that all variables are
initialized to zero. The two-line program that computes a larger value than
any other two-line program is obviously:
1: x = x + 1;
2: halt(x)
and this outputs a 1. We could go to three lines and find that an even larger
number (in fact: 2) can be computed and output. We shall now add a little
computational power by allowing statements of the form:
k: x = y
and ask some questions. What are the maximum values computed by 4 and
5 line programs of this kind. How about 6? 7? etc.? Do you see a pattern
emerging? How about a general formula?
(Years ago Rado at Bell Laboratories thought up this famous problem. He
called it the Bu s y Beav er Pr oblem and stated it as:
"How many 1's can an n-instruction Turing machine print
before halting if it begins with a blank tape as input?" )
Computability Problems 6
12. Suppose that someone gives you a SMALL language program called beaver(x)
that computes the Busy Beaver function of the last exercise. You find that it
has exactly k+1 lines of code and ends with a halt(z) statement. After
removing the title and the halt, it can be embedded as lines k+13 through
2k+12 of the following code to make the program:
1: x = x + 1;
2: x = x + 1;
k+7: x = x + 1;
k+8: y = x;
k+9: y = y - 1;
k+10: x = x + 1;
k+11: if y = 0 then goto k+13;
k+12: if w = 0 then goto k+9;
k+13:
2k+12:
2k+13: z = z + 1;
2k+14: halt(z)
Program for beaver(x)
Now let's ask some questions about this new program.
a) What value does x posses just before line k+13 is executed?
b) What value is output by this program?
c) What is the value (in words) of beaver(x)?
d) What is the value of z (in words) at line 2k+12?
e) How many lines does this program have?
Comment on this!
Equivalence of the Models
1. Design a "blank-squeezing" Turing machine. That is, a machine which
converts #xbby to #xby.
2. Translate the program instruction x
i
=x
k
into Turing machine instructions.
3. If a SMALL program that has n variables executes k instructions, how large
can the values of these variables be? How much tape must a Turing
machine have to simulate the program?
Computability Problems 7
4. Compare the running times of Turing machines and SMALL programs.
Assume that one instruction of either can be executed in one unit of time.
5. Translate the Turing machine instructions of problem 2 (x
i
=x
k
) into NICE
program instructions. Comment on mechanical translation procedures.
6. In the translation from Turing machines to programs an array was used to
hold the Turing machine tape. How might scalar variables be employed
instead? How would reading, writing and head movement take place?
7. Discuss size trade-offs between Turing machines and programs that
compute the same function.
Machine Enhancement
1. Turing machines have often been defined so that they can remain on a tape
square if desired. Add the command stay to the Turing machine moves and
show that this new device is equivalent to the ordinary Turing machine
model.
2. Post machines are very much like Turing machines. The major difference is
that a Post machine may write or move, but not both on the same
instruction. For example, the instruction:
0 l ef t next
1 0 same
b hal t
tells the machine to move if it reads a 0 and to write if it reads a 1. Show
that Post machines are equivalent to Turing machines.
3. Endmarkers on our Turing machine tapes were quite useful when we wished
not to fall off the left end of the tape during a computation. Show that they
are a bit of a luxury and that one can do without them.
4. How much more tape does a two symbol (0, 1, and blank) machine use when
it is simulating an n symbol machine? Can this extra space be reduced?
How?
5. Design a Turing machine that receives a binary number as input and
transforms it into encoded decimal form. (Use the encoding of the machine
enhancement section.)
Computability Problems 8
6. Describe the process of changing an encoded decimal into the equivalent
binary number.
7. Show that Turing machines which use one symbol and a blank are
equivalent to ordinary Turing machines.
8. Describe instructions for multi-tape Turing machines. Specify input and
output conventions. Prove that these machines are equivalent to one tape
Turing machines.
9. Consider Turing machines that operate on two-dimensional surfaces that
look something like infinite chessboards. They now require two additional
moves (up and down) in order to take advantage of their new data structure.
Prove that these machines are equivalent to standard Turing machines.
10. Turing machines need not have only one head per tape. Define multiheaded
Turing machines. Demonstrate their equivalence to Turing machines that
have one head.
11. Consider the problem of recognizing strings which consist of n ones
followed be n zeros. How fast can this set be recognized with Turing
machines that have:
a) Two tapes with one head per tape.
b) One tape and one tape head.
c) One tape and two tape heads.
Describe your algorithms and comment on the time trade-offs that seem to
occur.
12. A wide-band Turing machine is able to scan several symbols at one time.
Define this class of machines and show that they are equivalent to standard
Turing machines.
13. Can wide-band Turing machines compute faster than standard Turing
machines? Discuss this.
U Y U Y U N SOLV A BI LI TY U N SOLV A BI LI TY
One of the rationales behind our study of computability was to find out exactly
what we meant by the term. To do this we looked carefully at several systems
of computation and briefly examined the things they could compute. From this
study we were able to define the classes of computable functions and
computable sets. Then we compared computation via Turing machines and
program execution. We found that they were equivalent. Then we examined
extensions to Turing machines and found that these added no computational
power. After a brief discussion of whether or not Turing machines can perform
every computational task we can describe, we came close to assuming that
Turing machines (and programs) can indeed compute everything.
Hardly anything is further from the truth! It is not too silly though, for until the
1930's most people (including some very clever mathematicians) felt that
everything was computable. In fact, they believed that all of the open problems
of mathematics would eventually be solved if someone ingenious enough came
along and developed a system in which the problems could be expressed and
either verified or refuted mechanically. But there are things which are not
computable and now we shall attempt to discover and examine a few of them.
Thus our next step in uncovering the nature of computation shall consist of
finding out what we cannot compute!
The sections are entitled:
Arithmetization
Properties of the Enumeration
Universal Machines and Simulation
Solvability and the Halting Problem
Reducibility and Unsolvability
Enumerable and Recursive Sets
Historical Notes and References
Problems
Ar i t hmet i zat i on
Finding out what cannot be computed seems like a difficult task. In the very
least, we probably shall have to make statements such as:
No Turing machine can compute this!
To verify a claim such as that one, we might have to look at some machines to
see if any of them can do the computation in question. And, if we are to start
examining machines, we need to know exactly which ones we are talking about.
Earlier we discussed naming individual machines. In fact, we gave them
delightful names like M
1
, M
2
, and M
3
but neglected to indicate what any of them
really did during their computation. Now is the time to remedy this. After this
discussion, any time someone mentions a machine such as M
942
we shall know
exactly which machine is being mentioned.
Our first task is to make an official roster of Turing machines. It will be sort of
a Who's Who, except that not only the famous ones, but all of them will be
listed. This is called an enumeration and the process of forming this list has
historically been known as arithmetization. It consists of:
a) Encoding all of the Turing machines, and
b) Ordering them according to this encoding.
Now we shall reap a benefit from some of the hard work we undertook while
studying computability. We know that we need only consider the standard
Turing machines - that is, those which use a binary alphabet (0, 1, and b) on a
one-track tape. Since we know that these compute exactly the class of
computable functions, every result about this class of functions applies to these
machines. In addition, all results and observations concerning these machines
will be true for any other characterization of the computable functions. So, by
exploring the properties of the one track, one tape, binary alphabet Turing
machines, we shall be also looking at things that are true about programs also.
Let us begin our task. If we take an instruction line such as:
0 1 l ef t same
lose the nice formatting, and just run the parts together, we get the string:
01leftsame
Arithmetization 2
This is still understandable since we were careful to use only certain words in
our Turing machine instructions. We shall modify this a little by translating the
words according to the following chart.
l ef t r i ght hal t same next
s n
This translation converts our previous instruction line to:
01s
which is a little more succinct, but still understandable. In the same manner,
we can transform an entire instruction such as:
0
1
b
1 l ef t same
1 r i ght I35
0 hal t
into three strings which we shall separate by dots and concatenate. The above
instruction becomes the following string of characters.
01s11I10001b0
(Note that we are using binary rather than decimal numbers for instruction
labels. Instead of writing I35 we jotted down I10001.) This is not pretty, but it
is still understandable, and the instruction has been encoded!
Next, we encode entire machines. This merely involves concatenating the
instructions and separating them with double dots. The general format for a
machine with n instructions looks something like this:
I1I2I3 In
As an example, let us take the machine:
0
1
b
0 right next
1 right same
b right same
0
1
b
0 right same
1 right I1
b halt
Arithmetization 3
(which, by the way, accepts all inputs which contain an even binary number) and
encode it as:
00n11sbbs00s11I1bb
It is not too troublesome to interpret these encodings since we know what an
instruction line looks like and have carefully placed dots between all of the lines
and instructions. The next step is turn these encodings into numbers. Then we
can deal with them in a arithmetical manner. (This is where the term
arithmetization comes in.) Assigning numbers to symbols is done according to
the following chart.
symbol 0 1 b s n I
number 0 1 2 3 4 5 6 7 8 9
Now each Turing machine can be encoded as a decimal number. Our last
machine has become the rather large number:
9,900,379,113,692,236,990,036,911,381,922,599
and the smallest Turing machine, namely:
0 0 hal t
(which accepts all strings beginning with zero) is number 9,900,599 in our nifty
new, numerical encoding.
These encodings for Turing machines will be referred to as machine descriptions
since that is what they really are. A nice attribute of these descriptions is that
we know what they look like. For example, they always begin with two nines
followed immediately by a couple of characters from {0, 1, 2}, and so forth.
Thus we can easily tell which integers are Turing machine descriptions and
which are not. We know immediately that 10,011,458,544 is not a machine and
991,236,902,369,223,699 has to be the rather reactionary machine which always
moves to the right.
Now we shall use Church's thesis for the first time. We can easily design an
algorithm for deciding whether or not an integer is a Turing machine
description. Thus we may claim that there is a Turing machine which decides
this. This makes the set of descriptions computable.
Two important facts emerged from our simple exercise in encoding.
Every Turing machine has a unique description.
The set of machine descriptions is a computable set.
Arithmetization 4
(N.B. Here are two points to ponder. First, suppose two machines have the
same instructions, but with some of the instruction lines in different order. Are
they the same machine even though they have different descriptions? Our
statement above says that they are not the same. Is this OK? Next, imagine
what an arithmetization of NICE programs would have looked like!)
We still do not have our official roster of Turing machines, but we are almost
there. The list we need shall consist of all Turing machine descriptions in
numerical order. Composing this is easy, just go through all of the decimal
integers (0, 1, 2, ...) and discard every integer that is not a Turing machine
description. This straightforward (tedious, but straightforward) process
provides us with a list of machines. Now, we merely number them according to
where they appear on the list of machine descriptions. In other words:
M
1
=the first machine on the list
M
2
=the second machine on the list
and so forth. This list of machines is called the standard enumeration of
Turing machines. A machine's place on the list (i.e., the subscript) is called its
index. So, now we know exactly which machine we're talking about then we
mention M
239
or M
753
. The sets accepted by these machines are also numbered in
the same manner. We call them W
1
, W
2
, and so on. More formally:
W
i
= {x | M
i
(x) halts }
We have now defined a standard enumeration or listing of all the Turing
machines: M
1
, M
2
, M
3
, as well as a standard enumeration of all the
computable sets: W
1
, W
2
, W
3
,
Let us now close by mentioning once again two very important facts about the
constructiveness of our standard enumeration.
Given the instructions for a Turing machine, we can
find this machine in our standard enumeration.
Given the index (in the enumeration) of a Turing
machine, we can produce its instructions.
This property of being able to switch between indices and machines combined
with Church's thesis allows us to convincingly claim that we can locate the
indices of the Turing machines which correspond to any algorithms or
computing procedures we use in theorems or examples.
Pr oper t i es of t he Enumer at i on
The ease with which we formed the official rosters of all Turing machines and
the sets that they accept belies its significance. Though it may not seem to be
very important, it will be a crucial tool as we begin to formulate and explore the
properties of the class of computable sets.
A basic question concerns the cardinality of the class of computable sets.
Cardinality means size. Thus a set containing exactly three objects has
cardinality three. Quite simple really. The cardinalities of the finite sets are:
0, 1, 2, 3, and so on.
Things are not so easy for infinite sets since they do not have cardinalities
corresponding to the integers. For this reason, mathematicians employ the
special symbol
0
(the Hebrew letter aleph with subscript zero - pronounced
aleph naught) to represent the cardinality of the set of integers. Let us state
this as a definition.
Defin it ion . The set of nonnegative integers has cardinality
0
.
Many sets have this cardinality. One is the set of even integers. In fact, if it is
possible to match up the members of two sets in a one-to-one manner with no
elements left over then we say that they have the same cardinality.
Defin it ion . Two sets have the same cardinality if and only if there is a
one-to-one correspondence between them.
A one-to-one correspondence is merely a matching between the members of two
sets where each element is matched with exactly one element of the other set.
Here is an example of a one-to-one correspondence between the nonnegative
integers and the even nonnegative integers:
0 1 2 3 k
0 2 4 6 2k
It is just a mapping from x to 2x. Since the correspondence contains all of the
nonnegative integers and all of the even nonnegative integers we state that
these sets have exactly the same size or cardinality, namely
0
.
Enumeration Properties 2
(At this point we need some notation. The size or cardinality of the set A will
be written |A|. Thus: |{a , b}| = 2 and |set of integers| =
0
.)
Other examples of sets that have cardinality
0
are all of the (positive and
negative) integers, the rational numbers, and the prime numbers. We
traditionally speak of these sets as being countable since they can be put in one-
to-one correspondence with the numbers we use in counting, namely the
nonnegative integers. But, most important of all to us at this moment is the
fact that our standard enumeration gives us exactly
0
Turing machines. This
leads to a result involving the cardinality of the class of computable sets.
Th eor em 1. There are exactly
0
computable sets.
Pr oof. We must show two things in order to prove this theorem. First,
that there are no more than
0
computable sets, and then that there are
at least
0
computable sets.
The first part is easy. From our definition of computable, we know that
every computable set is accepted by some Turing machine. Thus
|Computable Sets| |Turing machines|.
Since we can place the Turing machines in one-to-one correspondence
with the integers, (this is due to our standard enumeration: M
1
, M
2
, ) we
know that there are exactly
0
of them. This means that the cardinality
of the class of computable sets is no greater than
0
. That is:
|Computable Sets| |Turing machines| =
0
Now we must show that there are, in fact, at least
0
computable sets.
(After all - suppose that even though there are an infinite number of
Turing machines, many act the same and so the entire collection of
machines only accepts a finite number of different sets!) Consider the
sequence:
{0}, {1}, {10}, {11}, {100}, ...
of singleton sets of binary integers. There are exactly
0
of these and
they are all computable since each one can be accepted by some Turing
machine. Thus:
0
= |Singleton Sets| |Computable Sets| |Turing machines| =
0
and our theorem is proven.
Enumeration Properties 3
We now know exactly how many computable sets exist. The next obvious
question is to inquire as to whether they exhaust the class of sets of integers.
Th eor em 2. There are more than
0
sets of integers.
Pr oof. There are at least
0
sets of integers since there are exactly
0
computable sets. We shall assume that there are exactly that many sets
and derive a contradiction. That will prove that our assumption is
incorrect and thus there must be more sets.
Our strategy uses a technique named diagonalization that was developed
by Cantor in the mid nineteenth century. We shall list all of the sets of
integers and then define a set that cannot be on the list.
If there are exactly
0
sets of integers then they can be placed in one-to-
one correspondence with the integers. Thus there is an enumeration of all
the sets of integers. And, if we refer to this infinite list of sets as:
S
1
, S
2
, S
3
, ...
we may be assured that every set of integers appears as some S
i
on our
list. (Note that we did not explain how we derived the above roster of sets
- we just claimed that it exists in some mathematical wonderland!)
Since we assumed that the list of all sets exists, we are allowed to use it in
any logically responsible manner. We shall now define a set in terms of
this list of sets. The set we shall define will depend directly on the
members of the list. We shall call this set D and define membership in it
by stating that for each integer i:
i D if and only if i S
i
(Again, note that we did not even consider how one goes about computing
membership for D, just which integers are members. This is one of the
differences between theory and practice.)
Now, what do we know about the set D? First of all, the set D is indeed a
set of integers. So, it must appear somewhere on our list (S
1
, S
2
, S
3
, ...) of
sets of integers since the list contains all sets of integers. If the set D
appears as the d
th
set on the list, then D = S
d
. Now we ask the simple
question: Does the set D contain the integer d?
Watch this argument very closely. If d is a member of D then d must not
be a member of S
d
since that was how we defined membership in D
Enumeration Properties 4
above. But we know that D and S
d
are precisely the same set! This means
that d cannot be a member of D since it is not a member of S
d
. Thus d
must not be a member of D.
But wait a moment! If d is not a member of D then d has to be a member
of S
d
, because that is how we defined the set D. Thus d is not a member
of D if and only if d is a member of D ! We seem to have a small problem
here.
Lets have another look at what just happened. Here is a little chart,
which illustrates the above argument.
Due to: We know:
Definition of D (1) d D if and only if d S
d
S
d
= D (2) d S
d
if and only if d D
Statements (1) and (2) (3) d D
if and only if d D
We shall often use the symbol to mean if and only if, and now use it
to state the above derivation as:
d D d S
d
d D .
As we mentioned earlier, something must be very wrong! And it has to be
one of our assumptions since everything else was logically responsible
and thus correct. Going back through the proof we find that the only
assumption we made was our claim that
there are exactly
0
sets of integers.
Therefore there must be more than
0
sets of integers.
This result brings up an interesting topic in mathematics that was very
controversial during the mid nineteenth century. There seem to be several
kinds of infinity. We have just shown that there is an infinite class (sets of
integers) which has cardinality greater than another infinite class ( the integers).
This larger cardinality is denoted 2
0
1 7
0
The squares that the universal machine is examining are shaded in blue. This
indicates that in the simulation, M
i
is reading a 1 and about to execute
instruction I73.
In the simulation, the universal machine merely applies the instructions written
on the lower tape to the data on the top tape. More precisely, a simulation step
consists of:
a) Reading the symbol on the top tape.
b) Writing the appropriate symbol on the top tape.
c) Moving the input head (on the top tape).
d) Finding the next instruction on the description (bottom) tape.
Instead of supplying all of the minute details of how this simulation progresses,
we shall explain the process with an example of one step and resort once again
to our firm belief in Church's thesis to assert that there is indeed a universal
Turing machine M
u
. We should be able easily to write down the instructions of
M
u
, or at least write a program which emulates it.
Universal Machines 7
In figure 2, our universal machine locates the beginning or read portion of the
instruction to be executed and finds the line of the instruction (on the
description tape) that corresponds to reading a 1 on the top tape. Moving one
square over on the description tape, it discovers that it should write a blank,
and so it does.
. . .
M
u
. . .
. . . . . .
I 7 3 I
0 1 1
1
1 0 s
I
n
2 6
7 7
0
4
[Read]
. . .
M
u
. . .
. . . . . .
I 7 3
0 1
1
1 0 s
I
n
2 6
0
4 7 0
1
[ Wr i t e]
Fi gur e 2 - Uni ver sal Tur i ng Machi ne: Readi ng and Wr i t i ng
Next M
i
should move and find the next instruction to execute. In figure 3, this is
done. The universal machine now moves its simulation tape head one square to
the right, finds that it should move the tape head of M
i
to the left and does so.
Another square to the right is the command to execute the next instruction
(I74), so the universal machine moves over to that instruction and prepares to
execute it.
Universal Machines 8
. . .
M
u
. . .
. . . . . .
I
0 1
1
1 0 s
I
n
2 6
7
0
4
[Move]
0 0
1
. . .
M
u
. . .
. . . . . .
I
0 1
I
n
2 6
7 4
0 0
1
I
4 6
[ Got o]
Fi gur e 3 - Uni ver sal Tur i ng Machi ne: Movi ng and Got o
Several of the details concerning the actual operation of the universal machine
have been omitted. Some of these will emerge later as exercises. So, we shall
assume that all of the necessary work has been accomplished and state the
famous Universal Turing Machine Theorem without formal proof.
Th eor em 2. There is a universal Turing machine.
We do however need another note on the above theorem. The universal
machine we described above is a two-tape machine with a three-track tape and
as such is not included in our standard enumeration. But if we recall the results
about multi-tape and multi-track machines being equivalent to ordinary one-
tape machines, we know that an equivalent machine exists in our standard
enumeration. Thus, with heroic effort we could have built a binary alphabet,
one-tape, one-track universal Turing machine. This provides an important
corollary to the universal Turing machine theorem.
Cor ollar y . There is a universal Turing machine in the standard
enumeration.
Universal Machines 9
At various times previously we mentioned that the universal Turing machine
and s-m-n theorems were very important and useful. Here at last is an example
of how we can use them to prove a very basic closure property of the
computable sets.
Th eor em 3. The class of computable sets is closed under intersection.
Pr oof. Given two arbitrary Turing machines M
a
and M
b
, we must show
that there is another machine M
k
that accepts exactly what both of the
previous machines accept. That is for all x:
M
k
(x) halts if and only if both M
a
(x) and M
b
(x) halt.
or if we recall that the set which M
a
accepts is named W
a
, another way to
state this is that:
W
k
=W
a
W
b
.
The algorithm for this is quite simple. J ust check to see if M
a
(x) and M
b
(x)
both halt. This can be done with the universal Turing machine. An
algorithm for this is:
Intersect(x, a, b)
run M
u
(a, x), and diverge if M
u
diverges
if M
u
(a, x) halts then run M
u
(b, x)
if M
u
(b, x) halts then halt (accept)
Appealing once more to Church's thesis, we claim that there is a Turing
machine which carries out the above algorithm. Thus this machine exists
and has a place in our standard enumeration. We shall call this machine
M
int
. And, in fact, for all x:
M
int
(x, a ,b) halts if and only if both M
a
(x) and M
b
(x) halt.
At this point we have a machine with three inputs (M
int
) which halts on the
proper x's. But we need a machine with only one input which accepts the
correct set. If we recall that the s-m-n theorem states that there is a
function s(int, a, b) such that for any integers a and b:
M
s(int, a, b)
(x) =M
int
(x, a, b).
Universal Machines 10
we merely need to look at the output of s(int, a ,b) and then set the index:
k = s(int, a, b).
Thus we have designed a Turing machine M
k
which satisfies our
requirements and accepts the intersection of the two computable sets W
a
and W
b
.
Sol vabi l i t y and t he Hal t i ng Pr obl em.
Our development period is over. Now it is time for some action. We have the
tools and materials and we need to get to work and discover some things that
are not computable. We know they are there and now it is time to find and
examine a few.
Our task in this section is to find some noncomputable problems. However we
must first discuss what exactly problems are. Many of our computational tasks
involve questions or decisions. We shall call these problems. For example, some
problems involving numbers are:
Is this integer a prime?
Does this equation have a root between 0 and 1?
Is this integer a perfect square?
Does this series converge?
Is this sequence of numbers sorted?
As computer scientists, we are very aware that not all problems involve
numbers. Many of the problems that we wish to solve deal with the programs
we write. Often we would like to know the answers to questions concerning our
methods, or our programs. Some of these problems or questions are:
Is this program correct?
How long will this program run?
Does this program contain an infinite loop?
Is this program more efficient than that one?
A brief side trip to set forth more definitions and concepts is in order. We must
describe some other things closely related to problems or questions. In fact,
often when we describe problems we state them in terms of relations or
predicates. For example, the predicate Prime(x) that indicates prime numbers
could be defined:
Prime(x) is true if and only if x is a prime number.
and this predicate could be used to define the set of primes:
PRIMES = {x | Prime(x) }.
Halting Problems 2
Another way to link the set of primes with the predicate for being a prime is to
state:
x PRIMES if and only if Prime(x)
(N.B. Two comments on notation are necessary. We shall use iff to mean if and
only if and will often just mention a predicate as we did above rather than
stating that it is true.)
We now have several different terms for problems or questions. And we know
that they are closely related. Sets, predicates, and problems can be used to ask
the same question. Here are three equivalent questions:
Is x PRIMES?
Is Prime(x) true?
Is x a prime number?
When we can completely determine the answer to a problem, the value of a
predicate, or membership in a set for all instances of the problem, predicate, or
things that may be in the set; we say that the problem, predicate, or set is
decidable or solvable. In computational terms this means that there is a Turing
machine which can in every case determine the answer to the appropriate
question. The formal definition of solvability for problems follows.
Defin it ion . A problem P is solvable if and only if there is a Turing
machine M
i
such that for all x:
If we can always solve a problem by carrying out a computation it is a solvable
problem. Many examples of solvable problems are quite familiar to us. In fact,
most of the problems we attempt to solve by executing computer programs are
solvable. Of course, this is good because it guarantees that if our programs are
correct, then they will provide us with solutions! We can determine whether
numbers are prime, find shortest paths in graphs, and many other things
because these are solvable problems. There are lots and lots of them. But there
must be some problems that are not solvable because we proved that there are
things which Turing machines (or programs) cannot do. Let us begin by
formulating and examining a historically famous one.
Suppose we took the Turing machine M
1
and ran it with its own index as input.
That is, we examined the computation of M
1
(1). What happens? Well, in this
Halting Problems 3
case we know the answer because we remember that M
1
was merely the
machine:
0 0 halt
and we know that it only halts when it receives an input that begins with a zero.
This is fine. But, how about M
2
(2)? We could look at that also. This is easy; in
fact, there is almost nothing to it. Then we could go on to M
3
(3). And so forth.
In general, let us take some arbitrary integer i and ask about the behavior of
M
i
(i). And, let's not ask for much, we could put forth a very simple question:
does it halt?
Let us ponder this a while. Could we write a program or design a Turing
machine that receives i as input and determines whether or not M
i
(i) halts? We
might design a machine like the universal Turing machine that first produced
the description of M
i
and then simulated its operation on the input i. This
however, does not accomplish the task we set forth above. The reason is
because though we would always know if it halted, if it went into an infinite
loop we might just sit there and wait forever without knowing what was
happening in the computation.
Here is a theorem about this that is very reminiscent of the result where we
showed that there are more sets than computable sets.
Th eor em 1. Whether or not a Turing machine halts when given its own
index as input is unsolvable.
Pr oof. We begin by assuming that we can decide whether or not a Turing
machine halts when given its own index as input. We assume that the
problem is solvable. This means that there is a Turing machine that can
solve this problem. Let's call this machine M
k
and note that for all inputs i:
M x
x
x
k
x
x
( )
( )
( )
=
1 if M halts
0 if M diverges
otherwise diverge
halts (i) M if (x) M
= i) M(x,
i a
which operates like M
a
if i K and diverges on all inputs if i K.
Since all M does is to run M
i
(i) and then if it halts, turn control over to M
a
,
M is indeed a Turing machine. Since it is, Church's thesis provides us with
an index for M ,and as before, the s-m-n theorem provides a function g
such that for all i and x:
i) M(x, M
g(i)
=
So, M
g(i)
is a Turing machine in our standard enumeration of Turing
machines. Now let us have a look at exactly what M
g i) (
accepts. This set
is:
otherwise
K i if W
= W
a
g(i)
because M
g(i)
acts exactly like M
a
whenever M
i
(i) halts.
This means that for all i:
x K iff M
i
(i) halts [definition of K]
iff x[M
g(i)
(x) =M
a
(x)] [definition of M
g(i)
]
iff W
g(i)
=W
a
[M
g(i)
accepts W
g(i)
]
iff g(i) P [W
a
has property P]
Thus we have reduced K to P. Applying the corollary of our last theorem
tells us that P is unsolvable and thus whether a computable set has any
nontrivial property is likewise an unsolvable problem.
This is a very surprising and disturbing result. It means that almost everything
interesting about the computable sets is unsolvable. Therefor we must be very
careful about what we attempt to compute.
To conclude the section we present a brief discussion about unsolvability and
computer science. Since Turing machines are equivalent to programs, we know
Reducibility 8
now that there are many questions that we cannot produce programs to answer.
This, of course, is because unsolvable problems are exactly those problems that
computer programs cannot solve.
Many of the above unsolvable problems can be stated in terms of programs and
computers. For example, the halting problem is:
Does this program contain an infinite loop?
and we have shown this to be unsolvable. This means that no compiler may
contain a routine that unfailingly predicts looping during program execution.
Several other unsolvable problems concerning programs (some of which are set
properties and some of which are not) appear on the following list:
a) Are these two programs equivalent?
b) Will this program halt for all inputs?
c) Does this program accept a certain set? (Correctness)
d) Will this program halt in n
2
steps for an input of length n?
e) Is there a shorter program that is equivalent to this one?
f) Is this program more efficient than that one?
In closing we shall be so bold as to state the sad fact that almost all of the
interesting questions about programs are unsolvable!
Enumer abl e and Recur si ve Set s
Unfortunately, it seems that very few of the general problems concerning the
nature of computation are solvable. Now is the time to take a closer look at
some of these problems and in classify them with regard to a finer metric. To
do this we need more precision and formality. So, we shall bring forth a little
basic mathematical logic.
Examining several of the sets with unsolvable membershp problems, we find
that while we cannot decide their membership problems, we are often able to
determine when an element is a member of some set. In other words, we know
if a Turing machine accepts a particulay set and halts for some input, then that
input is a member of a set. Thus the Turing machine halts for members of the
set and provides no information about inputs that are not members. An
example is K, the set of Turing machines that halt when given their own indices
as input. Recalling that
K ={i | M
i
(i) halts }={i | i W
i
},
consider the machine M that can be constructed from the universal Turing
machine (M
u
) as follows.
M(i) =M
u
(i, i)
Another way to describe M (possibly more intuitively) is:
M
i
(i) =
halt if M(i) halts
diverge otherwise
Partial recursive functions are those which do not give us answers for every
input. These are exactly the functions we try not to write programs for! This
brings up one small thing we have not mentioned explicitly about reducibilities.
We need to have answers for every input whenever a function is used to reduce
one set to another. This means that the reducibility functions need to be
recursive. The proper traditional definition of reducibility follows.
Enumerable and Recursive Sets 6
Defin it ion . The set A is reducible to the set B (written A B) if and only if
there is a recursive function f such that for all x:
x A if and only if f(x) B.
Another thing that functions are useful for doing is set enumeration (or listing).
Some examples of set enumeration functions are:
e(i) =2i =the i
th
even number
p(i) =the i
th
prime number
m(i) =the i
th
Turing machine encoding
These are recursive functions and we have mentioned them before. But, we
have not mentioned any general properties about functions and the
enumeration of the recursive and r.e. sets. Let us first define what exactly it
means for a function to enumerate a set.
Defin it ion . The function f enumerates the set A (or A = range of f), if and
only if for all y,
a) If y A, then there is an x such that f(x) = y and
b) If f(x) = y then y A.
Note that partial recursive functions as well as (total) recursive functions can
enumerate sets. For example, the function:
k(i) =
i if M halts
diverge otherwise
i
otherwise diverge
input some for halts (n) M if halt
= M(i)
i
3. Whether or not an arbitrary Turing machine halts when given the integer 3
as input is obviously a subcase of the general halting problem (just like a
machine halting on its own index). Does this fact alone indicate that the
problem is unsolvable? Provide precise reasoning which indicates that this
is true or a counterexample which shows that it is wrong.
4. We showed that since the problem concerning a machine halting on its own
index is unsolvable, the general halting problem for Turing machines is
unsolvable. Does this imply that any superset of an unsolvable problem is
unsolvable? Provide a proof or a counterexample.
5. Given the Turing machine M
i
, consider the machine:
otherwise diverge
blank a is x if (i) M
= M(x)
i
If x is blank, when does the machine accept? If x is not a blank, what does
the machine accept?
Unsolvability Problems 4
6. Describe the reduction which took place in the last problem. Comment on
the unsolvability of the blank tape halting problem.
7. Prove that the membership problem for any finite set is solvable.
8. Define the following problems as predicates or membership problems for
sets and prove whether or not they are solvable.
a) If an arbitrary Turing machine halts for all even numbers.
b) Whether an arbitrary Turing machine has over 176 instructions.
c) If you will get an A in the next computer course you take.
9. Show that whether or not an arbitrary Turing machine ever executes a
particular one of its instructions is unsolvable. (This is the same as the
problem of detecting unreachable code in a program.)
10. Explain precisely what is meant when we say that a Turing machine has
entered an infinite loop. Prove that detecting this occurrence for arbitrary
Turing machines is unsolvable.
Reducibility and Unsolvability
1. Transform the following problems into set membership problems. Is set
membership in these sets solvable? Why?
a) If an arbitrary Turing machine ever writes a 1 on its tape.
b) If two arbitrary Turing machines ever accept the same input.
c) If an arbitrary Turing machine ever runs longer than 193 steps.
d) If an arbitrary Turing machine accepts at least ten inputs.
2. Suppose there was an enumeration of Turing machines where all of the
machines with even indices halted on every input and all of the odd
numbered machines did not. Is this possible? Comment on this.
3. Let M
i
be a Turing machine which halts for all inputs. Let us further assume
that every output of M
i
is the index of a Turing machine which halts for every
input. That is if for some x, M
i
(x) = k then M
k
halts for all inputs. Thus M
i
outputs a list of Turing machines which always halt. Prove that there is a
Turing machine which halts for all inputs that is not on this list.
4. Any general computational problem concerning Turing machines can be
stated for programs. Show that general problems which are unsolvable for
Turing machines are also unsolvable for programs.
Unsolvability Problems 5
5. Suppose that someone brings you a program which does some task which
you find important. If they claim that nobody can write a shorter program to
accomplish this task, should you believe them? Is there some general
method you can use to check on this?
6. The index set for a property is the set comprised of all indices of all Turing
machines which accept a set possessing the property. Another way to state
this is to say that for some set property P, the index set for P is:
{i | W
i
has property P}.
(Note that these are properties of sets and not properties of individual
machines. Thus, if two machines accept the same set, they are either both in
an index set or neither is a member.) Two examples of index sets are: the
set of all Turing machines which never halt, and the set of all Turing
machines which always halt. Two general facts about index sets are:
a) An index set either contains all of the indices of the Turing
machines which never halt or it contains none of them.
b) If an index set is nontrivial (this means that it has some members,
but not all of the integers), then it is infinite and so is its complement.
Show that the above two statements are indeed facts and intuitively explain
why K ={i | i W
i
}is not an index set.
6. Only half of the proof for Rice's theorem (that only trivial set properties are
solvable) was provided in this section. Prove the remaining half by showing
that K is reducible to any index set which does not contain the empty set.
Enumerable and Recursive Sets
1. Are the following sets recursive? Are they recursively enumerable? J ustify
your conjectures.
a) {x | x is an even integer }
b) {i | M
i
halts for all inputs }
c) {i | M
i
halts only for prime integers }
d) {i | M
i
is not a Turing machine }
2. Prove that if the set A is not recursively enumerable and can be reduced to
the set B, then B cannot be recursively enumerable.
Unsolvability Problems 6
3. Show that the following sets are not recursively enumerable.
a) {i | W
i
= }
b) {i | W
i
=all integers }
4. Show that if P(x, y) is a recursive predicate then the following is r.e.
{x | P(x, y) is true for some y }
5. A complete set is a recursively enumerable set to which every recursively
enumerable set can be reduced. Show that K is a complete set.
6. Prove that every index set which is recursively enumerable is a complete set.
7. Let f(x) be a recursive function. Is its range recursive? r.e.?
8. Is the image of a partial recursive function recursive? r.e.?
9. Prove that a set is recursive if and only if it is finite or can be enumerated in
strictly increasing order.
COM PLEXI TY
Thus far we have examined the nature of computation by specifying exactly
what we mean by computable and then going a step further and becoming
aquainted with several things that are not computable. This was interesting
and somewhat useful since we now have a better idea about what is possible
and what tasks we should avoid. But we need to delve into issues closer to
actual, real-world computation. This brings up the issue of computational cost.
In order to examine this we shall develop a framework in which to classify
tasks by their difficulty and possibly identify things that require certain
amounts of various resources. In addition we shall discover properties of
computational problems which place them beyond our reach in a practical
sense. To do this we will examine decision properties for classes of recursive
sets and functions with an emphasis opon the difficulty of computation.
The sections include:
Measures and Resource Bounds
Complexity Classes
Reducibilities and Completeness
The Classes P and NP
Intractable Problems
Historical Notes and References
Problems
Measur es and Resour ce Bounds
In order to answer questions concerning the complexity or difficulty of
computation, we must first arrive at some agreement on what exactly is meant
by the term complexity. One is tempted to confuse this with the complexity of
understanding just how some task is computed. That is, if it involves an
intricate or tedious computing procedure, then it is complex. But that is a trap
we shall leave for the more mathematical when they claim that a proof is
difficult. We shall base our notion of difficulty upon the very practical notion of
computational cost and in turn define this to be related to the amount of
resources used during computation. (Simply put: if something takes a long time
then it must be hard to do!)
Let us consider the resources used in computation. And, most important are
those which seem to limit computation. In particular, we will examine time and
space constraints during computation. This is very much in line with computer
science practice since many problems are costly to us or placed beyond our
reach due to lack of time or space - even on modern computing equipment.
We shall return to Turing machines in order to examine computational
difficulty. This may seem rather arbitrary and artificial, but this choice is
reasonable since most natural models of computation are not too far apart in
the amounts of time and space used in computation for the same functions.
(For example, consider the space used by Turing machines and programs that
compute the same functions or decide membership in the same sets. They are
very similar indeed!) In addition, the simplicity of the Turing machine model
makes our study far less cumbersome.
All we need do is associate a time cost function and a space cost function with
each machine in order to indicate exactly how much of each resource is used
during computation. We shall begin with time.
Our machine model for time complexity will be the multitape Turing machine.
Part of the reason for this is tradition, and part can be explained by examining
the computations done by one-tape machines. For example, a two tape machine
can decide whether a string is made up of n zeros followed by n ones (in
shorthand we write this as 0
n
1
n
) in exactly 2n steps while a one tape machine
might require about nlogn steps for the same task. Arguments like this make
multitape machines an attractive, efficient model for computation. We shall
Measures and Bounds 2
assume that we have a standard enumeration of these machines that we shall
denote:
M
1
, M
2
, ...
and define a time cost function for each machine.
Defin it ion The time function T
i
(n) is the maximum number of steps taken
by multitape Turing machine M
i
on any input of length n.
These multitape machines are able to solve certain computational problems
faster than their one-tape cousins. But intuitively, one might maintain that
using these machines is very much in line with computation via programs since
tapes are nearly the same as arrays and programs may have lots of arrays. Thus
we shall claim that this is a sensible model of computation to use for our
examination of complexity.
The tradeoffs gained from using many tapes instead of one or two are
presented without proof in the next two theorems. First though, we need some
additional mathematical notation.
Definition Given two recursive functions f and g, f = O(g) (pronounced: f
is the order of g or f is big OH of g) if and only if there is a constant k such
that f(n) kg(n) for all but a finite number of n.
This means that smaller functions are the order of larger ones. For example, x
2
is the order of 2
n
or O(2
n
). And, if two functions are the same up to a constant,
then they are of the same order. Intuitively this means that their graphs have
roughly the same shape when plotted. Examine the three functions in figure 1.
s(x)
f(x)
g(x)
X
1
X
2
X
3
Fi gur e 1 - Thr ee f unct i ons
Measures and Bounds 3
After point x
1
, function s(x) is always greater than f(x) and for values of x larger
than x
2
, it exceeds g(x). Since s(x) grows much faster than f(x) and g(x) we know
that for any constant k, after some point s(x) > kf(x) and s(x) > kg(x). Due to
this, we say that f(x) = O(s(x)) and g(x) = O(s(x)). Similarly, at x
3
, f(x) becomes
larger than g(x) and since it remains so, we note that g(x) = O(f(x)). (The folk
rule to remember here is that small, or slowly growing functions are the order
of larger and faster ones.)
Let us think about this concept. Consider a linear function (such as 6x) and a
quadratic (like x
2
). They do not look the same (one is a line and the other is a
curve) so they are not of the same order. But 5x
3
- 2x
2
-15 is O(x
3
) because it is
obviously less than 6x
3
. And log
2
(n
2
) is O(log
2
n). While we're on the subject of
logarithms, note that logs to different bases are of the same order. For
example:
log
e
x = (log
e
2) log
2
x.
The constant here is log
e
2. This will prove useful soon. Here are the theorems
that were promised that indicate the relationship between one-tape Turing
machines and multi-tape machines.
Th eor em 1. Any computation which can be accomplished in t(n) time on a
multitape Turing machine can be done in O(t(n)
2
) time using a one tape
Turing machine.
Th eor em 2. Any computation that can be accomplished in t(n) time on a
multitape Turing machine can be done in O(t(n)log
2
(t(n)) time using a two
tape Turing machine.
Now we need to turn our attention to the other resource that often concerns us,
namely space. Our model will be a little different.
M
i
#
. . .
I n p ut T a p e
# Wor k Sp a ce
n
L
i
(n )
Fi gur e 2 - Space Compl exi t y Model
Measures and Bounds 4
We shall not allow the Turing machine to use its input tape for computation,
which means using a read-only input tape. For computation we shall give the
machine one (possibly multi-track) work tape. That looks like the picture in
figure 3.
The formal definition of space complexity now follows.
Defin it ion . The space function L
i
(n) is the maximum number of work tape
squares written upon by Turing machine M
i
for any input of length n.
Several additional comments about our choice of machine model are in order
here. Allowing no work to be done on the input tape means that we may have
space complexities that are less than the input length. This is essential since
sets such as strings of the form 0
n
1
n
can be recognized in log
2
n space by
counting the 0's and 1's in binary and then comparing the totals. Note also that
since we are not concerned with the speed of computation, having several work
tapes is not essential.
The machine models used in our examination of complexity have no
restrictions as to number of symbols or number of tracks used per tape. This is
intentional. Recalling the discussion of tracks and symbols when we studied
computability we note that they are interchangeable in that a k symbol
horizontal block of a tape can be written as a vertical block on k tracks. And k
tracks can be represented on one track by expanding the machine's alphabet.
Thus we may read lots of symbols at once by placing them vertically in tracks
instead of horizontally on a tape. We may also do lots of writes and moves on
these columns of symbols. Since a similar idea is used in the proof of the next
theorem, we shall present it with just the intuition used in a formal proof.
Th eor em 3 (Linear Space Compression). Anything that can be computed
in s(n) space can also be computed in s(n)/ k space for any constant k.
Pr oof s ket ch . Suppose that we had a tape with a workspace that was
twelve squares long (we are not counting the endmarker) like that
provided below.
# a b c d e f g h i j k l
Suppose further that we wished to perform the same computation that
we are about to do on that tape, but use a tape with a smaller workspace,
for example, one that was four squares long or a third of the size of the
original tape. Consider the tape shown below.
Measures and Bounds 5
# a b c d
e f g h
i j k l
On this one, when we want to go from the square containing the 'd' to
that containing the 'e' we see the right marker and then return to the left
marker and switch to the middle track.
In this manner computations can always be performed upon tapes that
are a fraction of the size of the original. The general algorithm for doing
this is as follows for a machine M(x) that uses s(n) space for its
computation.
n = t he l engt h of t he i nput x
Lay of f exact l y s(n)/ k squar es on t he wor k t ape
Set up k r ows on t he wor k t ape
Per f or m t he comput at i on M(x) i n t hi s space
There are two constraints that were omitted from the theorem so that it
would be more readable. On is that s(n) must be at least log
2
n since we
are to count up to n on the work tape before computing s(n). The other is
that we must be able to lay out s(n)/ k squares within that amount of tape.
We must mention another slightly picky point. In the last theorem there had to
have been some sort of lower bound on space. Our machines do need at least
one tape square for writing answers! So, when we talk of using s(n) space we
really mean max(1, s(n)) space.
In the same vein, when we speak of t(n) time, we mean max(n+1, t(n)) time since
in any nontrivial computation the machine must read its input and verify that it
has indeed been read.
This brings up a topic that we shall mention and then leave to the interested
reader. It is real time computation. By this, we mean that an answer must be
presented immediately upon finishing the input stream. (For example, strings
of the form 0
n
1
n
are real time recognizable on a 2-tape Turing machine.) In the
sequel whenever we speak of O(n) time we mean O(kn) time or linear time, not
real time. This is absolutely essential in our next theorem on linear time
speedup. But first, some more notation.
Defin it ion The expression inf
n
f(n) denotes the limit of the greatest lower
bound of f(n), f(n+1), f(n+2), ... as n goes to infinity.
Measures and Bounds 6
The primary use of this limit notation will be to compare time or space
functions. Whenever we can say that
inf
f(n)
g(n)
=
n
we know that the function f grows faster than the function g by more than a
constant. Another way to say this is that for every constant k,
f(n) > kg(n)
for all but a finite number of n. In other words, the limit (as n goes to infinity)
of f(n) divided by g(n) cannot be bounded by a constant. Thus f is not O(g), but
larger. For example, f(x) and g(x) could be x
3
and x
2
, or even n and log
2
n.
With that out of the way, we may present another version of the last theorem,
this time with regard to time. Note that we must use our new inf notation to
limit ourselves to machines that read their own input. This infers that if a
machine does not run for at least on the order of n steps, then it is not reading
its input and thus not computing anything of interest to us.
Th eor em 4 (Linear Time Speedup). Anything which can be computed in
t(n) time can also be computed in t(n)/ k time for any constant k if
inf
n
t(n)/ n = .
The proof of this theorem is left as an exercise since it is merely a more careful
version of the linear space compression theorem. All we do is read several
squares at once and then do several steps as one step. We note in passing
though that if proven carefully the theorem holds for O(n) time also. (Recall
that we mean linear time, not real time.)
So far, so good. A naive peek at the last two results might lead us to believe
that we can compute things faster and faster if we use little tricks like doing
several things at once! Of course we know that this is too good to be true. In
fact, practice bears this out to some extent. We can often speed up our
algorithms by a constant if we are clever. (And we did not need to do much
mathematics to learn that!) Also, we know that there are best algorithms for
much of the stuff we compute.
This brings up an interesting question. Does everything have a best or most
efficient algorithm? Or at least a best algorithm up to a constant? A rather
surprising result from abstract complexity theory tells us that there are some
problems that have none. And, in fact, there are problems in which
computation time can be cut by any amount one might wish.
Measures and Bounds 7
Th eor em 5 (Speedup). For any recursive function g(n), there is a set A
such that if the Turing machine M
i
decides its membership then there is an
equivalent machine M
k
such that for all but a finite number of n: T
i
(n)
g(T
k
(n)).
In other words, machine M
k
runs as much faster than machine M
i
as anyone
might wish. Yes, we said that correctly! We can have M
k
and M
i
computing the
same functions with T
i
(n) more than [T
k
(n)]
2
. Or even
T (n) 2
i
T (n)
k
This is quite something! It is even rather strange if you think about it.
It means also that there are some problems that can never be computed in a
most efficient manner. A strict interpretation of this result leads to an
interesting yet rather bizarre corollary.
Cor ollar y . There are computational problems such that given any
program solving them on the world's fastest supercomputer, there is an
equivalent program for a cheap programmable pocket calculator that runs
faster!
A close look at the speedup theorem tells us that the corollary is indeed true
but must be read very carefully. Thoughts like this also lead to questions about
these problems that have no best solution. We shall leave this topic with the
reassuring comment that even though there are problems like that, they are so
strange that most likely none will ever arise in a practical application.
Compl exi t y Cl asses
All of our computing devices now possess two additional attributes: time and
space bounds. We shall take advantage of this and classify all of the recursive
sets based upon their computational complexity. This allows us to examine
these collections with respect to resource limitations. This, in turn, might lead
to the discovery of special properties common to some groups of problems that
influence their complexity. In this manner we may learn more about the
intrinsic nature of computation. We begin, as usual, with the formal definitions.
Defin it ion . The class of all sets computable in time t(n) for some recursive
function t(n), DTIME(t(n)) contains every set whose membership problem
can be decided by a Turing machine which halts within O(t(n)) steps on
any input of length n.
Defin it ion . The class of all sets computable in space s(n) for some
recursive function s(n), DSPACE(s(n)) contains every set whose
membership problem can be decided by a Turing machine which uses at
most O(s(n)) tape squares on any input of length n.
We must note that we defined the classes with bounds of order t(n) and order
s(n) for a reason. This is because of the linear space compression and time
speedup theorems presented in the section on measures. Being able to use
order notation brings benefits along with it. We no longer have to mention
constants. We may just say n
2
time, rather than 3n
2
+ 2n -17 time. And the
bases of our logarithms need appear no longer. We may now speak of nlogn
time or logn space. This is quite convenient!
Some of the automata theoretic classes examined in the study of automata fit
nicely into this scheme of complexity classes. The smallest space class,
DSPACE(1), is the class of regular sets and DSPACE(n) is the class of sets
accepted by deterministic linear bounded automata. And, remember that the
smallest time class, DTIME(n), is the class of sets decidable in linear time, not
real time.
Our first theorem, which follows almost immediately from the definitions of
complexity classes, assures us that we shall be able to find all of the recursive
sets within our new framework. It also provides the first characterization of the
class of recursive sets we have seen.
Th eor em 1. The union of all the DTIME(t(n)) classes or all of the
DSPACE(s(n)) classes is exactly the class of recursive sets.
Complexity Classes 2
Pr oof. This is quite stratighforward. From the original definitions, we
know that membership in any recursive set can be decided by some
Turing machine that halts for every input. The time and space functions
for these machines name the complexity classes that contain these sets.
In other words, if M
a
decides membership in the recursive set A, then A is
obviously a member of DTIME(T
a
(n)) and DSPACE(L
a
(n)).
On the other hand, if a set is in some complexity class then there must be
some Turing machine that decides its membership within some recursive
time or space bound. Thus a machine which always halts decides
membership in the set. This makes all of the sets within a complexity
class recursive.
We now introduce another concept in computation: nondeterminism. It might
seem a bit strange at first, but examining it in the context of complexity is going
to provide us with some very important intuition concerning the complexity of
computation. We shall provide two definitions of this phenomenon.
The first is the historical definition. Early in the study of theoretical computing
machines, the following question was posed.
Suppose a Turing machine is allowed several choices of action for an
input-symbol pair. Does this increase its computational power?
Here is a simple example. Consider the following Turing machine instruction.
0
1
1
b
1 r i ght next
1 l ef t same
0 hal t
1 l ef t I75
When the machine reads a one, it may either print a one and move left or print a
zero and halt. This is a choice. And the machine may choose either of the
actions in the instruction. If it is possible to reach a halting configuration, then
the machine accepts.
We need to examine nondeterministic computation in more detail. Suppose that
the above instruction is I35 and the machine containing it is in configuration:
#0110(I35)110 reading a one. At this point the instruction allows it to make
either choice and thus enter one of two different configurations. This is
pictured below as figure 1.
Complexity Classes 3
#0110(I 35)110
#011(I 35)0110
#0110010
Fi gur e 1 - A Comput at i onal Choi ce
We could now think of a computation for one of these new machines, not as a
mystical, magic sequence of configurations, but as a tree of configurations that
the machine could pass through during its computation. Then we consider
paths through the computation tree as possible computations for the machine.
Then if there is a path from the initial configuration to a halting configuration,
we say that the machine halts. A more intuitive view of set acceptance may be
defined as follows.
Defin it ion . A nondeterministic Turing machine accepts the input x if and
only if there is a path in its computation tree that leads from the initial
configuration to a halting configuration.
Here are the definitions of nondeterministic classes.
Defin it ion . For a recursive function t(n) is NTIME(t(n)) is the class of sets
whose members can be accepted by nondeterministic Turing machines
that halt within O(t(n)) steps for every input of length n.
Defin it ion . For a recursive function s(n), NSPACE(s(n)) is the class of sets
whose members can be accepted by nondeterministic Turing machines
that use at most O(s(n)) tape squares for any input of length n.
(NB. We shall see NSPACE(n) in the context of formal languages. It is the family
of context sensitive languages or sets accepted by nondeterministic linear
bounded automata.)
Now that we have a new group of computational devices, the first question to
ask is whether or not they allow us to compute anything new. Our next
theorem assures us that we still have the recursive sets. It is given with a brief
proof sketch since the details will be covered in other results below.
Th eor em 2. The union of all the NTIME(t(n)) classes or all of the
NSPACE(s(n)) classes is exactly the class of recursive sets.
Complexity Classes 4
Proof Sketch. There are two parts to this. First we maintain that since the
recursive sets are the union of the DTIME or DSPACE classes, then they
are all contained in the union of the NTIME or NSPACE classes.
Next we need to show that any set accepted by a nondeterministic Turing
machine has a decidable membership problem. Suppose that a set is
accepted by a t(n)-time nondeterministic machine. Now recall that the
machine accepts if and only if there is a path in its computation tree that
leads to a halting configuration. Thus all one needs to do is to generate
the computation tree to a depth of t(n) and check for halting
configurations.
Now let us examine nondeterministic acceptance from another viewpoint. A
path through the computation tree could be represented by a sequence of rows
in the instructions that the machine executes. Now consider the following
algorithm that receives an input and a path through the computation tree of
some nondeterministic Turing machine M
m
.
Verify(m, x, p)
Pre: p[] = sequence of rows in instructions to be
executed by M
m
as it processes input x
Post: halts if p is a path through the computation tree
i = 1;
config = (I1)#x;
while config is not a halting configuration and i k do
i = i + 1;
if row p(i) in configs instruction can be executed by M
m
then set config to the new configuration
else loop forever
if config is a halting configuration then halt(true)
This algorithm verifies that p is indeed a path through the computation tree of
M
m
and if it leads to a halting configuration, the algorithm halts and accepts.
Otherwise it either loops forever or terminates without halting. In addition, the
algorithm is deterministic. There are no choices to follow during execution.
Now let us examine paths through the computation tree. Those that lead to
halting configurations show us that the input is a member of the set accepted
by the machine. We shall say that these paths certify that the input is a member
of the set and call the path a certificate of authenticity for the input. This
provides a clue to what nondeterministic operation is really about.
Complexity Classes 5
Certificates do not always have to be paths through computation trees.
Examine the following algorithm for accepting nonprimes (composite numbers)
in a nondeterministic manner.
NonPrime(x)
nondeterministically determine integers y and z;
if y*z = x then halt(true)
Here the certificate of authenticity is the pair <y, z> since it demonstrates that x
is not a prime number. We could write a completely deterministic algorithm
which when given the triple <x, y, z> as input, compares y*z to x and certifies
that x is not prime if x = yz.
This leads to our second definition of nondeterministic operation. We say that
the following deterministic Turing machine M uses certificates to verify
membership in the set A.
M(x, c) halts if and only if c provides a proof of xA
The nondeterministic portion of the computation is finding the certificate and
we need not worry about that. Here are our definitions in terms of verification.
Defin it ion . The class of all sets nondeterministicly acceptable in time t(n)
for a recursive function t(n), NTIME(t(n)) contains all of the sets whose
members can be verified by a Turing machine in at most O(t(n)) steps for
any input of length n and certificate of length t(n).
Note that certificates must be shorter in length than t(n) for the machine to be
able to read them and use them to verify that the input is in the set.
We should also recall that nondeterministic Turing machines and machines
which verify from certificates do not decide membership in sets, but accept
them. This is an important point and we shall come back to it again.
At this point we sadly note that the above wonderfully intuitive definition of
nondeterministic acceptance by time-bounded machines does not extend as
easily to space since there seems to be no way to generate certificates in the
worktape space provided.
We mentioned earlier that there is an important distinction between the two
kinds of classes. In fact, important enough to repeat. Nondeterministic
machines accept sets, while deterministic machines decide membership in sets.
This is somewhat reminiscent of the difference between recursive and
recursively enumerable sets and there are some parallels. At present the
Complexity Classes 6
differences between the two kinds of classes is not well understood. In fact, it
is not known whether these methods of computation are equivalent. We do
know that
DSPACE(1) = NSPACE(1)
DSPACE(s(n)) NSPACE(s(n))
DTIME(t(n)) NTIME(t(n))
for every recursive s(n) and t(n). Whether DSPACE(s(n)) = NSPACE(s(n)) or
whether DTIME(t(n)) = NTIME(t(n)) remain famous open problems. The best that
anyone has achieved so far is the following result that is presented here without
proof.
Th eor em 3. If s(n) log
2
n is a space function, then
NSPACE(s(n)) DSPACE(s(n)
2
).
Our next observation about complexity classes follows easily from the linear
space compression and speedup theorems. Since time and space use can be
made more efficient by a constant factor, we may state that:
DTIME(t(n)) = DTIME(kt(n))
NTIME(t(n)) = NTIME(kt(n))
DSPACE(s(n)) = DSPACE(ks(n))
NSPACE(s(n)) = NSPACE(ks(n))
for every recursive s(n) and t(n), and constant k. (Remember that t(n) means
max(n+1, t(n)) and that s(n) means max(1, s(n)) in each case.)
While we are comparing complexity classes it would be nice to talk about the
relationship between space and time. Unfortunately not much is known here
either. About all we can say is rather obvious. Since it takes one unit of time to
write upon one tape square we know that:
TIME(t(n)) SPACE(t(n))
because a machine cannot use more than t(n) tape squares if it runs for t(n)
steps. Going the other way is not so tidy. We can count the maximum number
of steps a machine may go through before falling into an infinite loop on s(n)
tape and decide that for some constant c:
SPACE(s(n)) TIME(2
cs(n)
)
for both deterministic and nondeterministic complexity classes. And, in fact,
this counting of steps is the subject of our very next theorem.)
Complexity Classes 7
Th eor em 4. If an s(n) tape bounded Turing machine halts on an input of
length n then it will halt within O(2
cs(n)
)steps for some constant c.
Pr oof Sket ch . Consider a Turing machine that uses O(s(n)) tape. There is
an equivalent machine M
i
that uses two worktape symbols and also needs
no more than O(s(n)) tape. This means that there is a constant k such that
M
i
never uses more than ks(n) tape squares on inputs of length n.
We now recall that a machine configuration consists of:
a) the instruction being executed,
b) the position of the head on the input tape,
c) the position of the head on the work tape, and
d) a work tape configuration.
We also know that if a machine repeats a configuration then it will run
forever. So, we almost have our proof.
All we need do is count machine configurations. There are |M
i
|
instructions, n+2 input tape squares, ks(n) work tape squares, and 2
k!s(n)
work tape configurations. Multiplying these together provides the
theorem's bound.
One result of this step counting is a result relating nondeterministic and
deterministic time. Unfortunately it is nowhere near as sharp as theorem 3, the
best relationship between deterministic and nondeterministic space. Part of the
reason is that our simulation techniques for time are not as good as those for
space.
Cor ollar y . NTIME(t(n)) DTIME(2
ct(n)
)
Pr oof. NTIME(t(n)) NSPACE(t(n)) DSPACE(t(n)
2
) DTIME(2
ct(n)
)
because of theorems 3 and 4. (We could have proven this from scratch by
simulating a nondeterministic machine in a deterministic manner, but the
temptation to use our last two results was just too tempting!)
Our first theorem in this section stated that the union of all the complexity
classes results in the collection of all of the recursive sets. An obvious question
is whether one class can provide the entire family of recursive sets. The next
result denies this.
Th eor em 5. For any recursive function s(n), there is a recursive set that is
not a member of DSPACE(s(n)).
Complexity Classes 8
Pr oof. The technique we shall use is diagonalization over DSPACE(s(n)).
We shall examine every Turing machine that operates in s(n) space and
define a set that cannot be decided by any of them.
First, we must talk about the machines that operate in O(s(n)) space. For
each there is an equivalent machine which has one track and uses the
alphabet {0,1,b}. This binary alphabet, one track Turing machine also
operates in O(s(n)) space. (Recall the result on using a binary alphabet to
simulate machines with large alphabets that used blocks of standard size
to represent symbols.) Let's now take an enumeration M
"
, M
2
, ... of these
one track, binary machines and consider the following algorithm.
Examine(i, k, x)
Pre: n = length of x
lay out k*s(n) tape squares on the work tape;
run M
i
(x) within the laid off tape area;
if M
i
(x) rejects then accept else reject
This is merely a simulation of the binary Turing machine M
i
on input x
using ks(n) tape. And, the simulation lasts until we know whether or not
the machine will halt. Theorem 4 tells us that we only need wait some
constant times 2
c!s(n)
steps. This is easy to count to on a track of a tape of
length ks(n). Thus the procedure above is recursive and acts differently
than M
i
on input x if L
i
(n) ks(n).
Our strategy is going to be to feed the Examine routine all combinations
of k and i in hopes that we shall eventually knock out all s(n) tape
bounded Turing machines.
Thus we need a sequence of pairs <i, k> such that each pair occurs in our
sequence infinitely often. Such sequences abound. A standard is:
<1,1>, <1,2>, <2,1>, <1,3>, <2,2>, <3,1>, ...
For each input x we take the x
th
pair in the sequence. The decision
procedure for the set we claim is not s(n) space computable is now:
Diagonal(x)
select the x-th pair <i, k> from the sequence;
Examine(i, k, x)
Complexity Classes 9
Two things need to be verified. First, we need to show that the above
decision procedure can be carried out by some Turing machine. We note
that M
i
comes from an enumeration of two work tape symbol machines
and then appeal to Church's thesis for the actual machine construction
for the decision procedure. Next we need to prove that this procedure
cannot be carried out by an s(n) space bounded Turing machine.
Suppose that the Diagonal procedure is indeed s(n) space computable.
Then there is some two worktape symbol, s(n) space bounded Turing
machine M
i
which computes the above Diagonal procedure. And there is a
constant k such that for all but a finite number of inputs, M
i
uses no more
than ks(n) tape squares on inputs of length n. In particular, there is an x
such that <j, k> is the x
th
pair in our sequence of pairs and the
computation of M
i
(x) requires no more than ks(n) tape. (In fact there are
infinitely many of these x since the pair <j, k> appears infinitely often in
the sequence.) In this case
M
j
(x) Examine(j, k, x) = Diagonal(x)
which is a contradiction. Thus M
i
cannot be an s(n) bounded machine
and the set defined by our Diagonal procedure cannot be
a member of DSPACE(s(n)).
It should come as no surprise that the same result holds for nondeterministic
space as well as time classes. Thus we do have a hierarchy of classes since none
of them can hold all of the recursive sets. This seems in line with intuition
since we believe that we can compute bigger and better things with more
resources at our disposal.
Our next results explore the amount of space or time needed to compute new
things for classes with resource bounds that are tape or time functions. (Recall
that tape or time functions are bounds for actual Turing machines.) We
consider these resource bounds to be well behaved and note in passing that
there are strange functions about which are not tape or time functions.
Th eor em 6 (Space Hierarchy). If r(n) and s(n) are both at least O(n), s(n)
is a space function, and inf
n
r(n)/ s(n) = 0, then
DSPACE(s(n)) - DSPACE(r(n)) .
Pr oof Sket ch . The proof is very similar to that of the last theorem. All
that is needed is to change the space laid off in the Examine routine to
s(n). Since s(n) grows faster than any constant times r(n), the
diagonalization proceeds as scheduled. One note. What makes this
simulation and diagonalization possible is that s(n) is a space function.
Complexity Classes 10
This allows us to lay off s(n) tape squares in s(n) space. Thus the
diagonalization does produce a decision procedure for a set which is s(n)
space decidable but not r(n) space decidable.
The major reason the simulation worked was that we were able to lay out s(n)
tape squares. This is because we could compute s(n) by taking the machine it
was a tape function for and run it on all inputs of length n to find the longest
one. This requires O(n) space. If s(n) is even more well behaved we can do
better.
Defin it ion . A recursive function s(n) is efficiently space computable if
and only if it can be computed within s(n) space.
If s(n) is efficiently space computable, then the space hierarchy theorem is true
for s(n) down to O(log
2
n) because we can lay out the required space for the
simulation and keep track of which input symbol is being read.
Many functions are efficiently space computable, including such all time
favorites such as log
2
n and (log
2
n)
k
. An exercise dealing with efficient space
computability will be to prove that all space functions that are at least O(n) are
efficiently space computable.
Combining the space hierarchy theorem with the linear space compression
theorem provides some good news at last. If two functions differ only by a
constant, then they bound the same class. But if one is larger than the other by
more than a constant then one class is properly contained in the other.
Sadly the result for time is not as sharp. We shall need one of our functions to
always be efficiently time computable and do our simulation with two tapes.
Here is the theorem.
Th eor em 7 (Time Hierarchy). If r(n) and t(n) are both at least O(n), t(n) is
efficiently time computable, and
inf
n
r(n)log
2
r(n)
t(n)
= 0
then DTIME(t(n)) - DTIME(r(n)) .
Reduci bi l i t i es and Compl et eness.
We now have a framework in which to study the recursive functions in relation
to one of the major considerations of computer science: computational
complexity. We know that there is a hierarchy of classes but we do not know
very much about the classes other than this. In this section we shall present a
tool for refining our rough framework of complexity classes.
Let us return to an old mathematical technique we saw in chapter two. There
we mapped problems into each other and used this to be make statements
about unsolvability. We shall do the same at the subrecursive level except that
we shall use mappings to help us determine the complexity of problems. And
even use the same kind of mappings that were used before. Here is the
definition of reducibility one more time.
Defin it ion . The set A is many-one reducible to the set B (written A
m
B)
if and only if there is a recursive function g(x) such that for all x: x A if
and only if g(x) B.
With the function g(x) we have mapped all of the members of A into the set B.
Integers not in A get mapped into B 's complement. Symbolicly:
g(A) B and g(A) B
Note that the mapping is into and that several members of A may be mapped
into the same element of B by g(x).
An important observation is that we have restated the membership problem for
A in terms of membership in B. This provides a new way to decide membership
in A. Let's have a look. If the Turing machine M
b
decides membership in B and
M
g
computes g(x) then membership in A can be decided by:
M
a
(x) = M
b
(M
g
(x)).
(In other words, compute g(x) and then check to see if it is in B.)
Now for the complexity of all this. It is rather straightforward. J ust the
combined complexity of computing g(x) and then membership in B. In fact, the
time and space complexities for deciding membership in A are:
L
a
(n) = maximum[L
g
(n), L
b
( |g(x)| )]
T
a
(n) = T
g
(n) + T
b
( |g(x)| )
Thinking a little about the length of g(x) and its computation
Reducibilities and Completeness 2
we easily figure out that the number of digits in g(x) is bounded by the
computation space and time used and thus:
|g(x)| L
g
(n) !T
g
(n)
for x of length n. This makes the complexity of the decision problem for A:
L
a
(n) maximum[L
g
(n), L
b
( L
g
(n))]
T
a
(n) T
g
(n) + T
b
( L
g
(n))
An essential aside on space below O(n) is needed here. If L
b
(n) is less than O(n)
and |g(x)| is O(n) or greater we can often compute symbols of g(x) one at a time
as needed and then feed them to M
b
for its computation. Thus we need not
write down all of g(x) - just the portion required at the moment by M
b
. We do
however need to keep track of where the read head of M
b
is on g(x). This will
use log
2
|g(x)| space. If this is the case, then
L
a
(n) maximum[L
g
(n), L
b
(|g(x)|), log
2
|g(x)|]
Mapping functions which are almost straight character by character translations
are exactly what is needed. As are many log
2
n space translations.
What we would like to do is to be able to say something about the complexity of
deciding membership in A in terms of B's complexity and not have to worry
about the complexity of computing g(x). It would be perfect if A
m
B meant that
A is no more complex than B. This means that computing g(x) must be less
complex than the decision problem for B. In other words, the mapping g(x)
must preserve | not influence) complexity. In formal terms:
Defin it ion . Let A
m
B via g(x). The recursive function g(x) is complexity
preserving with respect to space if and only if there is a Turing machine
M
b
which decides membership in B and a Turing machine M
g
which
computes g(x) such that:
maximum[L
g
(n), L
b
(L
g
(n))] = O(L
b
(n)).
Defin it ion . Let A
m
B via g(x). The recursive function g(x) is complexity
preserving with respect to time if and only if there is a Turing machine M
b
which decides membership in B and a Turing machine M
g
which computes
g(x) such that:
T
g
(n) + T
b
(L
g
(n)) = O(T
b
(n)).
These complexity preserving mappings now can be used to demonstrate the
relative complexities of decision problems for sets. In fact, we can often
pinpoint a set's complexity by the use of complexity preserving reducibilities.
The next two theorems explain this.
Reducibilities and Completeness 3
Th eor em 1. If A
m
B via a complexity preserving mapping and B is in
DSPACE(s(n)) then A is in DSPACE(s(n)) also.
Pr oof. Follows immediately from the above discussion. That is, if M
a
(x) =
M
b
(g(x)) where g(x) is a complexity preserving mapping from A to B, then
L
a
(n) = O(L
b
(n)).
Th eor em 2. If A
m
B via a complexity preserving mapping and the best
algorithm for deciding membership in A requires O(s(n)) space then the
decision problem for B cannot be computed in less than O(s(n)) space.
Pr oof. Suppose that the membership problem for B can be computed in
less than O(s(n)) space. Then by theorem 1, membership in A can be
computed in less. This is a contradiction.
Now, just what have we accomplished? We have provided a new method for
finding upper and lower bounds for the complexity of decision problems. If
A
m
B via a complexity preserving mapping, then the complexity of A's decision
problem is the lower bound for B's and the complexity of B's decision problem
is the upper bound for A's. Neat. And it is true for time too.
An easy example might be welcome. We can map the set of strings of the form
0
n
#1
n
into the set of strings of the form w#w (where w is a string of 0's and 1's).
The mapping is just a character by character transformations which maps both
0's and 1's into 0's and maps the marker (#) into itself. Thus we have shown
that 0
n
#1
n
is no more difficult to recognize than w#w.
Now that we have provided techniques for establishing upper and lower
complexity bounds for sets in terms of other sets, let us try the same thing with
classes. In other words, why not bound the complexity for an entire class of
sets all at once?
Defin it ion . The set A is hard for a class of sets if and only if every set in
the class is many-one reducible to A.
What we have is a way to put an upper bound on the complexity for an entire
class. For example if we could show that deciding w#w is hard for the context
free languages then we would know that they are all in DSPACE(log
2
n) or
DTIME(n). Too bad that we cannot! When we combine hardness with complexity
preserving reducibilities and theorem 1, out comes an easy corollary.
Cor ollar y . If A DSPACE(s(n)) is hard for DSPACE(r(n)) via complexity
preserving mappings then DSPACE(r(n)) DSPACE(s(n)).
Reducibilities and Completeness 4
And, to continue, what if a set is hard for the class which contains it? This
means that the hard set is indeed the most difficult set to recognize in the
class. We have a special name for that.
Defin it ion . A set is complete for a class if and only if it is a member of the
class and hard for the class.
Think about this concept for a bit. Under complexity preserving mappings
complete sets are the most complex sets in the classes for which they are
complete. In a way they represent the class. If there are two classes with
complete sets, then comparing the complete sets tells us a lot about the classes.
So, why don't we name the classes according to their most complex sets?
Defin it ion . The class of sets no more complex (with respect to space) than
A DSPACE(A) contains all sets B such that for each Turing machine which
decides membership in A there is some machine which decides
membership in B in no more space.
Of course we can do the same for time and nondeterministic resources also. We
must note that we defined the complexity relationship between A and B very
carefully. This is because sets with speedup may be used to name or denote
classes as well as those with best algorithms. Thus we needed to state that for
each algorithm for A there is one for B which is no worse, so if the set B has
speedup, then it can still be in DSPACE(A) even if A has speedup too. Now for a
quick look at what happens when we name a class after its complete set.
Th eor em 3. If the set A is DSPACE(s(n))-complete via complexity
preserving reducibilities then DSPACE(s(n)) = DSPACE(A).
Pr oof. Theorem 2 assures us that every set reducible to A is no more
difficult to decide membership in than A. Thus DSPACE(S(N)) 5
DSPACE(A). And, since A is a member of DSPACE(S(N)) then all members
of DSPACE(A) must be also since they require less time to decide
membership in than A.
That was not unexpected. Seems like we set it up that way! We now go a bit
further with our next question. J ust what kinds of complexity classes have
complete sets? Do all of them? Let us start by asking about the functions
which are resource bounds for classes with complete sets and go on from there.
Th eor em 4. Every space complexity class which has a complete set via
complexity preserving reducibilities is some DSPACE(s(n)) where s(n) is a
space function.
Reducibilities and Completeness 5
Pr oof. Let DSPACE(r(n)) have the complete set A. Since A is a member of
DSPACE(r(n)), there must be a Turing machine M
a
which decides
membership in A in O(r(n)) space. Thus L
a
(n) = O(r(n)). Which makes
DSPACE(L
a
(n)) 5 DSPACE(r(n)). Theorem 1 assures us that DSPACE(r(n)) 5
DSPACE(L
a
(n) since all of the the members of DSPACE(r(n)) are reducible
to A.
Th eor em 5. If there is a best algorithm for deciding membership in the set
A then there is a space function s(n) such that DSPACE(A) = DSPACE(s(n)).
Pr oof. Almost trivial. The space function s(n) which we need to name the
complexity class is just the space function for the Turing machine which
computes the most efficient algorithm for deciding membership in A.
Now we know that space functions can be used to name some of our favorite
complexity classes: those with complete sets, and those named by sets with
best decision procedures. Let us turn to the classes named by sets which have
no best decision procedure. These, as we recall were strange. They have
speedup and can be computed by sequences of algorithms which run faster and
faster.
Th eor em 6. If the set S has speedup then there is no recursive function
s(n) such that DSPACE(s(n)) = DSPACE(S).
All of the results mentioned in the last series of theorems (3 - 6) are true for
time and both kinds of nondeterministic class. Well, except for theorem 6. For
time the speedup must be at least nlog
2
n. But that is not too bad.
We would like to have a theorem which states that all classes with space or time
functions as their resource bounds have complete sets. This is true for a few of
these classes, namely those with polynomial bounds. For now, let us leave this
section with the reassuring that complete sets can define classes and none of
them have speedup.
Compl exi t y Cl asses
All of our computing devices now possess two additional attributes: time and
space bounds. We shall take advantage of this and classify all of the recursive
sets based upon their computational complexity. This allows us to examine
these collections with respect to resource limitations. This, in turn, might lead
to the discovery of special properties common to some groups of problems that
influence their complexity. In this manner we may learn more about the
intrinsic nature of computation. We begin, as usual, with the formal definitions.
Defin it ion . The class of all sets computable in time t(n) for some recursive
function t(n), DTIME(t(n)) contains every set whose membership problem
can be decided by a Turing machine which halts within O(t(n)) steps on
any input of length n.
Defin it ion . The class of all sets computable in space s(n) for some
recursive function s(n), DSPACE(s(n)) contains every set whose
membership problem can be decided by a Turing machine which uses at
most O(s(n)) tape squares on any input of length n.
We must note that we defined the classes with bounds of order t(n) and order
s(n) for a reason. This is because of the linear space compression and time
speedup theorems presented in the section on measures. Being able to use
order notation brings benefits along with it. We no longer have to mention
constants. We may just say n
2
time, rather than 3n
2
+ 2n -17 time. And the
bases of our logarithms need appear no longer. We may now speak of nlogn
time or logn space. This is quite convenient!
Some of the automata theoretic classes examined in the study of automata fit
nicely into this scheme of complexity classes. The smallest space class,
DSPACE(1), is the class of regular sets and DSPACE(n) is the class of sets
accepted by deterministic linear bounded automata. And, remember that the
smallest time class, DTIME(n), is the class of sets decidable in linear time, not
real time.
Our first theorem, which follows almost immediately from the definitions of
complexity classes, assures us that we shall be able to find all of the recursive
sets within our new framework. It also provides the first characterization of the
class of recursive sets we have seen.
Th eor em 1. The union of all the DTIME(t(n)) classes or all of the
DSPACE(s(n)) classes is exactly the class of recursive sets.
Complexity Classes 2
Pr oof. This is quite stratighforward. From the original definitions, we
know that membership in any recursive set can be decided by some
Turing machine that halts for every input. The time and space functions
for these machines name the complexity classes that contain these sets.
In other words, if M
a
decides membership in the recursive set A, then A is
obviously a member of DTIME(T
a
(n)) and DSPACE(L
a
(n)).
On the other hand, if a set is in some complexity class then there must be
some Turing machine that decides its membership within some recursive
time or space bound. Thus a machine which always halts decides
membership in the set. This makes all of the sets within a complexity
class recursive.
We now introduce another concept in computation: nondeterminism. It might
seem a bit strange at first, but examining it in the context of complexity is going
to provide us with some very important intuition concerning the complexity of
computation. We shall provide two definitions of this phenomenon.
The first is the historical definition. Early in the study of theoretical computing
machines, the following question was posed.
Suppose a Turing machine is allowed several choices of action for an
input-symbol pair. Does this increase its computational power?
Here is a simple example. Consider the following Turing machine instruction.
0
1
1
b
1 r i ght next
1 l ef t same
0 hal t
1 l ef t I75
When the machine reads a one, it may either print a one and move left or print a
zero and halt. This is a choice. And the machine may choose either of the
actions in the instruction. If it is possible to reach a halting configuration, then
the machine accepts.
We need to examine nondeterministic computation in more detail. Suppose that
the above instruction is I35 and the machine containing it is in configuration:
#0110(I35)110 reading a one. At this point the instruction allows it to make
either choice and thus enter one of two different configurations. This is
pictured below as figure 1.
Complexity Classes 3
#0110(I 35)110
#011(I 35)0110
#0110010
Fi gur e 1 - A Comput at i onal Choi ce
We could now think of a computation for one of these new machines, not as a
mystical, magic sequence of configurations, but as a tree of configurations that
the machine could pass through during its computation. Then we consider
paths through the computation tree as possible computations for the machine.
Then if there is a path from the initial configuration to a halting configuration,
we say that the machine halts. A more intuitive view of set acceptance may be
defined as follows.
Defin it ion . A nondeterministic Turing machine accepts the input x if and
only if there is a path in its computation tree that leads from the initial
configuration to a halting configuration.
Here are the definitions of nondeterministic classes.
Defin it ion . For a recursive function t(n) is NTIME(t(n)) is the class of sets
whose members can be accepted by nondeterministic Turing machines
that halt within O(t(n)) steps for every input of length n.
Defin it ion . For a recursive function s(n), NSPACE(s(n)) is the class of sets
whose members can be accepted by nondeterministic Turing machines
that use at most O(s(n)) tape squares for any input of length n.
(NB. We shall see NSPACE(n) in the context of formal languages. It is the family
of context sensitive languages or sets accepted by nondeterministic linear
bounded automata.)
Now that we have a new group of computational devices, the first question to
ask is whether or not they allow us to compute anything new. Our next
theorem assures us that we still have the recursive sets. It is given with a brief
proof sketch since the details will be covered in other results below.
Th eor em 2. The union of all the NTIME(t(n)) classes or all of the
NSPACE(s(n)) classes is exactly the class of recursive sets.
Complexity Classes 4
Proof Sketch. There are two parts to this. First we maintain that since the
recursive sets are the union of the DTIME or DSPACE classes, then they
are all contained in the union of the NTIME or NSPACE classes.
Next we need to show that any set accepted by a nondeterministic Turing
machine has a decidable membership problem. Suppose that a set is
accepted by a t(n)-time nondeterministic machine. Now recall that the
machine accepts if and only if there is a path in its computation tree that
leads to a halting configuration. Thus all one needs to do is to generate
the computation tree to a depth of t(n) and check for halting
configurations.
Now let us examine nondeterministic acceptance from another viewpoint. A
path through the computation tree could be represented by a sequence of rows
in the instructions that the machine executes. Now consider the following
algorithm that receives an input and a path through the computation tree of
some nondeterministic Turing machine M
m
.
Verify(m, x, p)
Pre: p[] = sequence of rows in instructions to be
executed by M
m
as it processes input x
Post: halts if p is a path through the computation tree
i = 1;
config = (I1)#x;
while config is not a halting configuration and i k do
i = i + 1;
if row p(i) in configs instruction can be executed by M
m
then set config to the new configuration
else loop forever
if config is a halting configuration then halt(true)
This algorithm verifies that p is indeed a path through the computation tree of
M
m
and if it leads to a halting configuration, the algorithm halts and accepts.
Otherwise it either loops forever or terminates without halting. In addition, the
algorithm is deterministic. There are no choices to follow during execution.
Now let us examine paths through the computation tree. Those that lead to
halting configurations show us that the input is a member of the set accepted
by the machine. We shall say that these paths certify that the input is a member
of the set and call the path a certificate of authenticity for the input. This
provides a clue to what nondeterministic operation is really about.
Complexity Classes 5
Certificates do not always have to be paths through computation trees.
Examine the following algorithm for accepting nonprimes (composite numbers)
in a nondeterministic manner.
NonPrime(x)
nondeterministically determine integers y and z;
if y*z = x then halt(true)
Here the certificate of authenticity is the pair <y, z> since it demonstrates that x
is not a prime number. We could write a completely deterministic algorithm
which when given the triple <x, y, z> as input, compares y*z to x and certifies
that x is not prime if x = yz.
This leads to our second definition of nondeterministic operation. We say that
the following deterministic Turing machine M uses certificates to verify
membership in the set A.
M(x, c) halts if and only if c provides a proof of xA
The nondeterministic portion of the computation is finding the certificate and
we need not worry about that. Here are our definitions in terms of verification.
Defin it ion . The class of all sets nondeterministicly acceptable in time t(n)
for a recursive function t(n), NTIME(t(n)) contains all of the sets whose
members can be verified by a Turing machine in at most O(t(n)) steps for
any input of length n and certificate of length t(n).
Note that certificates must be shorter in length than t(n) for the machine to be
able to read them and use them to verify that the input is in the set.
We should also recall that nondeterministic Turing machines and machines
which verify from certificates do not decide membership in sets, but accept
them. This is an important point and we shall come back to it again.
At this point we sadly note that the above wonderfully intuitive definition of
nondeterministic acceptance by time-bounded machines does not extend as
easily to space since there seems to be no way to generate certificates in the
worktape space provided.
We mentioned earlier that there is an important distinction between the two
kinds of classes. In fact, important enough to repeat. Nondeterministic
machines accept sets, while deterministic machines decide membership in sets.
This is somewhat reminiscent of the difference between recursive and
recursively enumerable sets and there are some parallels. At present the
Complexity Classes 6
differences between the two kinds of classes is not well understood. In fact, it
is not known whether these methods of computation are equivalent. We do
know that
DSPACE(1) = NSPACE(1)
DSPACE(s(n)) NSPACE(s(n))
DTIME(t(n)) NTIME(t(n))
for every recursive s(n) and t(n). Whether DSPACE(s(n)) = NSPACE(s(n)) or
whether DTIME(t(n)) = NTIME(t(n)) remain famous open problems. The best that
anyone has achieved so far is the following result that is presented here without
proof.
Th eor em 3. If s(n) log
2
n is a space function, then
NSPACE(s(n)) DSPACE(s(n)
2
).
Our next observation about complexity classes follows easily from the linear
space compression and speedup theorems. Since time and space use can be
made more efficient by a constant factor, we may state that:
DTIME(t(n)) = DTIME(kt(n))
NTIME(t(n)) = NTIME(kt(n))
DSPACE(s(n)) = DSPACE(ks(n))
NSPACE(s(n)) = NSPACE(ks(n))
for every recursive s(n) and t(n), and constant k. (Remember that t(n) means
max(n+1, t(n)) and that s(n) means max(1, s(n)) in each case.)
While we are comparing complexity classes it would be nice to talk about the
relationship between space and time. Unfortunately not much is known here
either. About all we can say is rather obvious. Since it takes one unit of time to
write upon one tape square we know that:
TIME(t(n)) SPACE(t(n))
because a machine cannot use more than t(n) tape squares if it runs for t(n)
steps. Going the other way is not so tidy. We can count the maximum number
of steps a machine may go through before falling into an infinite loop on s(n)
tape and decide that for some constant c:
SPACE(s(n)) TIME(2
cs(n)
)
for both deterministic and nondeterministic complexity classes. And, in fact,
this counting of steps is the subject of our very next theorem.)
Complexity Classes 7
Th eor em 4. If an s(n) tape bounded Turing machine halts on an input of
length n then it will halt within O(2
cs(n)
)steps for some constant c.
Pr oof Sket ch . Consider a Turing machine that uses O(s(n)) tape. There is
an equivalent machine M
i
that uses two worktape symbols and also needs
no more than O(s(n)) tape. This means that there is a constant k such that
M
i
never uses more than ks(n) tape squares on inputs of length n.
We now recall that a machine configuration consists of:
a) the instruction being executed,
b) the position of the head on the input tape,
c) the position of the head on the work tape, and
d) a work tape configuration.
We also know that if a machine repeats a configuration then it will run
forever. So, we almost have our proof.
All we need do is count machine configurations. There are |M
i
|
instructions, n+2 input tape squares, ks(n) work tape squares, and 2
k!s(n)
work tape configurations. Multiplying these together provides the
theorem's bound.
One result of this step counting is a result relating nondeterministic and
deterministic time. Unfortunately it is nowhere near as sharp as theorem 3, the
best relationship between deterministic and nondeterministic space. Part of the
reason is that our simulation techniques for time are not as good as those for
space.
Cor ollar y . NTIME(t(n)) DTIME(2
ct(n)
)
Pr oof. NTIME(t(n)) NSPACE(t(n)) DSPACE(t(n)
2
) DTIME(2
ct(n)
)
because of theorems 3 and 4. (We could have proven this from scratch by
simulating a nondeterministic machine in a deterministic manner, but the
temptation to use our last two results was just too tempting!)
Our first theorem in this section stated that the union of all the complexity
classes results in the collection of all of the recursive sets. An obvious question
is whether one class can provide the entire family of recursive sets. The next
result denies this.
Th eor em 5. For any recursive function s(n), there is a recursive set that is
not a member of DSPACE(s(n)).
Complexity Classes 8
Pr oof. The technique we shall use is diagonalization over DSPACE(s(n)).
We shall examine every Turing machine that operates in s(n) space and
define a set that cannot be decided by any of them.
First, we must talk about the machines that operate in O(s(n)) space. For
each there is an equivalent machine which has one track and uses the
alphabet {0,1,b}. This binary alphabet, one track Turing machine also
operates in O(s(n)) space. (Recall the result on using a binary alphabet to
simulate machines with large alphabets that used blocks of standard size
to represent symbols.) Let's now take an enumeration M
"
, M
2
, ... of these
one track, binary machines and consider the following algorithm.
Examine(i, k, x)
Pre: n = length of x
lay out k*s(n) tape squares on the work tape;
run M
i
(x) within the laid off tape area;
if M
i
(x) rejects then accept else reject
This is merely a simulation of the binary Turing machine M
i
on input x
using ks(n) tape. And, the simulation lasts until we know whether or not
the machine will halt. Theorem 4 tells us that we only need wait some
constant times 2
c!s(n)
steps. This is easy to count to on a track of a tape of
length ks(n). Thus the procedure above is recursive and acts differently
than M
i
on input x if L
i
(n) ks(n).
Our strategy is going to be to feed the Examine routine all combinations
of k and i in hopes that we shall eventually knock out all s(n) tape
bounded Turing machines.
Thus we need a sequence of pairs <i, k> such that each pair occurs in our
sequence infinitely often. Such sequences abound. A standard is:
<1,1>, <1,2>, <2,1>, <1,3>, <2,2>, <3,1>, ...
For each input x we take the x
th
pair in the sequence. The decision
procedure for the set we claim is not s(n) space computable is now:
Diagonal(x)
select the x-th pair <i, k> from the sequence;
Examine(i, k, x)
Complexity Classes 9
Two things need to be verified. First, we need to show that the above
decision procedure can be carried out by some Turing machine. We note
that M
i
comes from an enumeration of two work tape symbol machines
and then appeal to Church's thesis for the actual machine construction
for the decision procedure. Next we need to prove that this procedure
cannot be carried out by an s(n) space bounded Turing machine.
Suppose that the Diagonal procedure is indeed s(n) space computable.
Then there is some two worktape symbol, s(n) space bounded Turing
machine M
i
which computes the above Diagonal procedure. And there is a
constant k such that for all but a finite number of inputs, M
i
uses no more
than ks(n) tape squares on inputs of length n. In particular, there is an x
such that <j, k> is the x
th
pair in our sequence of pairs and the
computation of M
i
(x) requires no more than ks(n) tape. (In fact there are
infinitely many of these x since the pair <j, k> appears infinitely often in
the sequence.) In this case
M
j
(x) Examine(j, k, x) = Diagonal(x)
which is a contradiction. Thus M
i
cannot be an s(n) bounded machine
and the set defined by our Diagonal procedure cannot be
a member of DSPACE(s(n)).
It should come as no surprise that the same result holds for nondeterministic
space as well as time classes. Thus we do have a hierarchy of classes since none
of them can hold all of the recursive sets. This seems in line with intuition
since we believe that we can compute bigger and better things with more
resources at our disposal.
Our next results explore the amount of space or time needed to compute new
things for classes with resource bounds that are tape or time functions. (Recall
that tape or time functions are bounds for actual Turing machines.) We
consider these resource bounds to be well behaved and note in passing that
there are strange functions about which are not tape or time functions.
Th eor em 6 (Space Hierarchy). If r(n) and s(n) are both at least O(n), s(n)
is a space function, and inf
n
r(n)/ s(n) = 0, then
DSPACE(s(n)) - DSPACE(r(n)) .
Pr oof Sket ch . The proof is very similar to that of the last theorem. All
that is needed is to change the space laid off in the Examine routine to
s(n). Since s(n) grows faster than any constant times r(n), the
diagonalization proceeds as scheduled. One note. What makes this
simulation and diagonalization possible is that s(n) is a space function.
Complexity Classes 10
This allows us to lay off s(n) tape squares in s(n) space. Thus the
diagonalization does produce a decision procedure for a set which is s(n)
space decidable but not r(n) space decidable.
The major reason the simulation worked was that we were able to lay out s(n)
tape squares. This is because we could compute s(n) by taking the machine it
was a tape function for and run it on all inputs of length n to find the longest
one. This requires O(n) space. If s(n) is even more well behaved we can do
better.
Defin it ion . A recursive function s(n) is efficiently space computable if
and only if it can be computed within s(n) space.
If s(n) is efficiently space computable, then the space hierarchy theorem is true
for s(n) down to O(log
2
n) because we can lay out the required space for the
simulation and keep track of which input symbol is being read.
Many functions are efficiently space computable, including such all time
favorites such as log
2
n and (log
2
n)
k
. An exercise dealing with efficient space
computability will be to prove that all space functions that are at least O(n) are
efficiently space computable.
Combining the space hierarchy theorem with the linear space compression
theorem provides some good news at last. If two functions differ only by a
constant, then they bound the same class. But if one is larger than the other by
more than a constant then one class is properly contained in the other.
Sadly the result for time is not as sharp. We shall need one of our functions to
always be efficiently time computable and do our simulation with two tapes.
Here is the theorem.
Th eor em 7 (Time Hierarchy). If r(n) and t(n) are both at least O(n), t(n) is
efficiently time computable, and
inf
n
r(n)log
2
r(n)
t(n)
= 0
then DTIME(t(n)) - DTIME(r(n)) .
N OTES
Har t mani s and St ear ns begi n t he st udy of comput at i onal compl exi t y
on Tur i ng machi nes. The ear l y paper s on t i me and space as wel l as
mul t i t ape si mul at i on and r eal -t i me comput at i on ar e:
J . HARTMANIS and R. E. STEARNS. "On the computational complexity of
algorithms," Trans. AMS 117 (1965), 285-305.
J . HARTMANIS, P. M. LEWIS II, and R. E. STEARNS. "Hierarchies of memory
limited computations," Proc. 6th Annual IEEE Symp. on Switching Circuit
Theory and Logical Design (1965), 179-190.
P. M. LEWIS II, R. E. STEARNS, and J . HARTMANIS. "Memory bounds for
the recognition of context-free and context-sensitive languages," Proc. 6th
Annual IEEE Symp. on Switching Circuit Theory and Logical Design (1965),
191-202.
F.C. HENNIE and R.E. STEARNS. "Two-tape simulation of multitape Turing
machines," J . ACM 13:4 (1966), 533-546.
M. O. RABIN. "Real-time computation," Israel J . Math. 1 (1963), 203-211.
The speedup t heor em as wel l as a axi omat i c t heor y of compl exi t y
came f r om:
M. BLUM. "A machine-independent theory of the complexity of recursive
functions," J . ACM 14:2 (1967), 322-336.
Historical Notes 2
The t heor em on t he r el at i onshi p bet ween det er mi ni st i c and
nondet er mi ni st i c space cl asses i s f r om:
W. J . SAVITCH. "Relationships between nondeterministic and
deterministic tape complexities," J . Comput. and System Sci 4:2 (1970),
177-192.
Cobham was t he f i r st t o ment i on t he cl ass P and t he i ni t i al NP-
compl et e set was di scover ed by Cook. Kar p qui ckl y pr oduced mor e
and t he l ast r ef er ence i s an encycl opedi a of such set s.
A. COBHAM. "The instrinsic computational difficulty of functions," Proc.
1964 Congress for Logic, Mathematics, and the Philosophy of Science.
North Holland, 1964, 24-30.
S. A. COOK. "The complexity of theorem proving procedures," Proc. 3rd
Annual ACM Symp. on the Theory of Computation (1971), 151-158.
R. M. KARP. "Reducibility among combinatorial problems," Complexity of
Computer Computations, Plenum Press, NY, 1972, 85-104.
M. R. GAREY and D. S. J OHNSON. Computers and Intractability: A Guide
to the Theory of N P-Completeness, H. Freeman, San Francisco, 1978.
Mor e mat er i al on compl exi t y may be f ound i n any of t he gener al
t heor y of comput at i on t ext s ment i oned i n t he not es f or t he sect i on
on comput abi l i t y.
PROBLEM S
The sets described below shall be used in many of the problems for this
chapter. As usual, w represents a string of 0's and 1's and the superscript
R stands for string reversal.
A =0
n
1
n
B =0
n
1
n
0
n
C =0
n
1
m
0
n
1
m
D =w#w
R
E =w#w F =ww
Measures and Resource Bounds
1. Present the least time consuming algorithms you are able for deciding
membership in sets A, D, and F above on multitape Turing machines.
Analyze the space required for these algorithms.
2. Develop the most space efficient algorithms you can for membership in the
sets A, D, and F above. How much time is needed for algorithm?
3. How many times a Turing machine reverses the direction of its tape heads
can be a measure of complexity too. Try to compute membership in the
sets A, D, and F above with the fewest number of tape head reversals.
Describe your algorithms.
4. Assume that one unit of ink is used to print a symbol on a Turing machine
tape square. How much ink is needed to decide membership in sets A, D,
and F above?
5. Try time as a complexity measure on one tape (rather than multitape)
Turing machines. Now how much time do algorithms for deciding A, D, and
F take?
6. Let T(n), S(n), R(n), and I(n) denote the time, space, reversals, and ink needed
for Turing machine M to process inputs of length n. Why must a machine
always use at least as much time as space? Thus S(n) T(n). What are the
relationships between the above four measures of complexity.
Complexity Problems 2
7. Define a measure of complexity which you feel properly reflects the actual
cost of computation. Determine the complexity of deciding membership in
the sets A, D, and F above.
8. Using multitape Turing machines as a computational model, precisely prove
that linear speedup in time is possible.
9. Let us use Turing machines with a read-only input tape and a single work
tape as our computational model. In addition, let us only use 0, 1, and
blank as tape symbols. Is there a difference in space complexity when
multitrack work tapes are used as opposed to one track work tapes? (That
is, if a task requires m tape squares on a k track machine, how much space
is needed on a one track model?)
10. Solve exercise 9 for time instead of space.
11 Show that the simulation of a k tape Turing machine's computation by a one
tape Turing machine can be done in the square of the original time.
12. If inf
n
f(n)/g(n) =0, then what is the relationship between the functions f
and g? Does f = O(g)? Does g = O(f)?
13. If inf
n
f(n)/g(n) = k, for some constant k, then what is the relationship
between the functions f and g? Again, does f = O(g)? Does g = O(f)?
14. What are the time and space requirements for LL parsing (from the
presentation on languages)?
15. Using the speedup theorem, prove the bizarre corollary concerning
supercomputers and pocket calculators.
Complexity Classes
1. Show that n
2
and 2
n
are time functions.
2. Demonstrate that (log
2
n)
2
is a space function.
3. Prove that for every recursive function f(n) there exists a time function t(n)
and a space function s(n) which exceed f(n) for all values of n.
4. Show that there are recursive functions which cannot be time or space
functions.
Complexity Problems 3
5. Verify that a set is a member of DSPACE(n) if its members can be
recognized within O(n) space for all but a finite number of n.
6. Prove that if a set can be accepted by some deterministic Turing machine,
then its membership problem can be decided in the same amount of space
or time.
7. Show that the family of DSPACE classes is closed under union, intersection,
and complement.
8. Prove that the family of NSPACE classes is closed under concatenation and
Kleene star.
9. Design a nondeterministic Turing machine which accepts members of the
above sets A and C. (That is, A C.) Is this quicker than doing it in a
deterministic manner?
10. Present a nondeterministic Turing machine which finds roots of
polynomials. (That is, given a sequence of integers a
0
, ... , a
n
it figures out a
value of x such that: a
0
+a
1
x +a
2
x
2
+... +a
n
x
n
= for some specified error
bound .)
11. Describe how in a nondeterministic manner one could accept pairs of finite
automata which do not accept the same set.
12. Show that if a space function is at least O(n) then it is efficiently space
computable.
13. How much space is needed to compute log
2
n?
14. Prove the time hierarchy theorem using r(n)
2
rather than r(n)log
2
r(n).
Reducibilities and Completeness
1. Show that the set A above is reducible to the sets D and F by mappings
which can be computed by finite automata.
2. Provide a mapping from set D above to set E. Analyze the space
requirements of the mapping.
3. Show that set A above is reducible to set C. Take a Turing machine M
c
which decides membership in C, combine it with the machine which maps A
into C, and produce a machine which decides membership in A. Further, do
this so that the space requirements are only O(log
2
n).
Complexity Problems 4
4. Precisely state and prove theorems 1 and 2 for time complexity.
5. Show that if the set A is a member of the class DTIME(B) then DTIME(A) is a
subclass of DTIME(B). Be sure to consider the cases where the sets A and B
have speedup.
6. Prove theorem 4 for nondeterministic time classes.
7. Prove that complexity classes named by sets with speedup cannot be named
by recursive functions. (If S has speedup then there is no recursive f such
that DSPACE(A) = DSPACE(f).)
The Classes P P and N N P
1. Estimate the time (in seconds, years, etc.) that it would take to solve a
problem which takes O(2
n
) steps on a typical computer. Do this for various
values of n.
2. Prove that any set in P can be reduced to any other set in P via some
polynomial time mapping.
3. Verify theorem 2.
4. Show that the existence of a set in N P whose complement was not also in
N P would lead to a proof that P N P.
5. Demonstrate that the entire satisfiability problem is indeed in N P.
6. Present an algorithm for converting a Turing machine instruction into the
set of clauses needed in the proof that SAT is N P-complete.
Intractable Problems
1. Convert the set of clauses {(v
1
), (v v
2 3
, ), (v v v
1 2 3
, , )} into proper 3-SAT
format.
2. Convert the set of clauses {(v v
1 2
, ), (v
1
, v
2
, v
3
, v
4
)} into proper 3-SAT
format.
Complexity Problems 5
3. How much time does it take a nondeterministic Turing machine to
recognize a member of 3-SAT? Assume that there are n clauses.
4. Show precisely that 0-1INT is in N P.
5. What time bound is involved in recognizing cliques in graphs? Before
solving this, define the data structure used as input. Might different data
structures (adjacency matrix or edge list) make a difference in complexity?
6. Verify that the transformation from 3-SAT to CLIQUE can be done in
polynomial time.
7. Prove that the vertex cover problem is N P-complete.
8. Show how to construct in polynomial time the graph used in the proof that
COLOR is N P-complete
9. Suppose you wanted to entertain n people. Unfortunately some of them do
not get along with each other. What is the complexity of deciding how
many parties you must have to entertain all of the people on your list?
A T A U TOM A TA
Thus far we have been concerned with two major topics: the discovery of an
appropriate model for computation and an examination of the intrinsic
properties of computation in general. We found that since Turing machines are
equivalent to prograns, they form an appropriate model for computation. But
we also discovered that in some sense they possessed far too much
computational power. Because of this we ran into great difficulty when we tried
to ask questions about them, and about computation in general. Whenever we
wished to know something nontrivial, unsolvability sprang forth. And even in
the solvable or recursive realm, we found intractability.
Now we shall attempt to overcome this lack of information about computation
by restricting the power or our computational model. We hope that this will
force some of the decision problems in which we are interested into the zone of
solvability.
The sections include:
Finite Automata
Closure Properties
Nondeterministic Operation
Regular Sets and Expressions
Decision Problems for Finite Automata
Pushdown Automata
Unsolvable Problems for Pushdown Automata
Linear Bounded Automata
Historical Notes and References
Problems
Fi ni t e Aut omat a
Let us begin by removing almost all of the Turing machine's power! Maybe then
we shall have solvable decision problems and still be able to accomplish some
computational tasks. Also, we might be able to gain insight into the nature of
computation by examining what computational losses we incur with this loss of
power.
If we do not allow writing or two-way operation of the tape head, we have what
has been traditionally called a finite automaton. This machine is only allowed to
read its input tape and then, on the basis of what it has read and processed,
accept or reject the input. This restricted machine operates by:
a) Reading a symbol,
b) Transferring to a new instruction, and
c) Advancing the tape head one square to the right.
When it arrives at the end of its input it then accepts or rejects depending upon
what instruction is being executed.
This sounds very simple. It is merely a one-way, semi-literate Turing machine
that just decides membership problems for a living! Let us examine one. In
order to depict one, all we need to do is jot down Turing machine instructions
in one large table, leave out the write part (that was not a pun!), and add a note
which indicates whether the machine should accept. Here is an example:
Instruction Read Goto Accept?
1
0
1
same
next
no
2
0
1
same
next
yes
3
0
1
same
same
no
Look closely at this machine. It stays on instruction one until it reads a one.
Then it goes to instruction two and accepts any input unless another one arrives
(the symbol 1 - not another input). If two or more ones appear in the input,
then it ends up executing instruction three and does not accept when the input
is over. And if no ones appear in the input the machine remains on instruction
one and does not accept. So, this machine accepts only inputs that contain
exactly one 1.
Finite Automata 2
(N.B. Endmarkers are not needed since the machine just moves to the right.
Accepting happens when the machine finishes the input while executing an
instruction that calls for acceptance. Try this machine out on the inputs 000,
0100, 1000100, etc.)
Traditionally these machines have not had instructions but states. (Recall A. M.
Turing's states of mind.) Another way to represent this same machine is to put
the next instruction or next state or goto portion under the input in a table like
that in figure 1.
There is another traditional method to describe finite automata which is
extremely intuitive. It is a picture called a state graph. The states of the finite
automaton appear as vertices of the graph while the transitions from state to
state under inputs are the graph edges. The state graph for the same machine
also appears in figure 1.
0,1 0 0
1 2 3
1 1
Input
State
0 1
Accept?
1
2
3
1
2
3
2
3
3
no
yes
no
Fi gur e 1 - Fi ni t e Aut omat on Repr esent at i ons
(Note that the two circles that surround state two mean acceptance.)
Before continuing let's examine the computation of a finite automaton. Our first
example begins in state one and reads the input symbols in turn changing
states as necessary. Thus a computation can be characterized by a sequence of
states. (Recall that Turing machine configurations needed the state plus the
tape content. Since a finite automaton never writes, we always know what is on
the tape and need only look at a state as a configuration.) Here is the sequence
for the input 0001001.
Input Read: 0 0 0 1 0 0 1
St at es: 1 1 1 1 2 2 2 3
Finite Automata 3
Our next example is an elevator controller. Let's imagine an elevator that serves
two floors. Inputs are calls to a floor either from inside the elevator or from the
floor itself. This makes three distinct inputs possible, namely:
0 - no calls
1 - call to floor one
2 - call to floor two
The elevator itself can be going up, going down, or halted at a floor. If it is on a
floor it could be waiting for a call or about to go to the other floor. This
provides us with the six states shown in figure 2 along with the state graph for
the elevator controller.
Fi gur e 2 - El evat or Cont r ol
A state table for the elevator is provided below as table 1.
Input
State
none cal l t o 1 cal l t o 2
W1 (wai t on 1)
U1 (st ar t up)
UP
DN
W2 (wai t on 2)
D2 (st ar t down)
W1
UP
W2
W1
W2
DN
W1
U1
D2
W1
DN
DN
UP
UP
W2
U1
W2
D2
Tabl e 1 - El evat or Cont r ol
Accepting and rejecting states are not included in the elevator design because
acceptance is not an issue. If we were to design a more sophisticated elevator,
it might have states that indicated:
1 0,1
0,2
0,2 2
0,1
1 1
0,2 0,1
2 2
UP
W2 D2
DN
W1 D1
W1 Wai t i ng on f i r st f l oor
U1 About t o go up
UP Goi ng up
DN Goi ng down
W2 Wai t i ng - second f l oor
D2 About t o go down
Finite Automata 4
a) power failure,
b) overloading, or
c) breakdown
In this case acceptance and rejection might make sense.
Let us make a few small notes about the design. If the elevator is about to move
(i.e. in state U1 or D2) and it is called to the floor it is presently on it will stay.
(This may be good - try it next time you are in an elevator.) And if it is moving
(up or down) and gets called back the other way, it remembers the call by going
to the U1 or D2 state upon arrival on the next floor. Of course the elevator does
not do things like open and close doors (these could be states too) since that
would have added complexity to the design. Speaking of complexity, imagine
having 100 floors.
That is our levity for this section. Now that we know what a finite automaton is,
we must (as usual) define it precisely.
Defin it ion . A finite automaton M is a quintuple M = (S,I,,s
0
,F) where:
S is a finite set (of states)
I is a finite alphabet (of input symbols)
: S I S (next state function)
s
0
S (the starting state)
F S (the accepting states).
We also need some additional notation. The next state function is called the
transition function and the accepting states are often called final states. The
entire machine is usually defined by presenting a state table or a state graph. In
this way, the states, alphabet, transition function, and final states are
constructively defined. The starting state is usually the lowest numbered state.
Our first example of a finite automaton is:
M =({s
1
, s
2
, s
3
}, {0,1}, , s
1
, {s
2
})
where the transition function , is defined explicitly by either a state table or a
state graph.
At this point we must make a slight detour and examine a very important yet
seemingly insignificant input string called the empty string. It is a string
without any symbols in it and is denoted as . It is not a string of blanks. An
example might make this clear. Look between the brackets in the picture below.
Finite Automata 5
A Blank
[ ]
Empty String
[ ]
Let's look again at a computation by our first finite automaton. For the input
010, our machine begins in s
1
, reads a 0 and goes to (s
1
,0) = s
1
, then reads a 1
and goes to (s
1
,1) = s
2
, and ends up in (s
2
,0) = s
2
after reading the final 0. All
of that can be put together as:
(((s
1
,0),1),0) = s
2
We call this transition on strings
*
and define it as follows.
Defin it ion . Let M = (S,I,,s
0
,F). For any input string x, input symbol a, and
state s
i
, the transition function on strings
*
takes the values:
*
(s
i
,(*e) = s
i
*
(s
i
,a) = (s
i
,a)
*
(s
i
,xa) = (
*
(s
i
,x),a).
That certainly was terse. But,
*
is really just what one expects it to be. It
merely applies the transition function to the symbols in the string. Let's look at
this for the example in figure 3.
s
0 1
2
s
s
s
3
b
a
a,b a,b
b a
This machine has a set of states = {s
0
, s
1
, s
2
, s
3
}and operates over the input
alphabet {a, b}. Its starting state is s
0
and its set of final or accepting states, F =
{s
2
}. The transition function is fully described twice in figure 3; once in figure
3a as a state table and once in figure 3b as a state graph.
Finite Automata 6
Input
State
a b
Accept?
0
1
2
3
3
3
2
3
1
2
2
3
no
no
yes
no
Fi gur e 3 - Fi ni t e Aut omat on
If the machine receives the input bbaa it goes through the sequence of states:
s
0
, s
1
, s
2
, s
2
, s
2
while when it gets an input such as abab it goes through the state transition:
s
0
, s
3
, s
3
, s
3
, s
3
Now we shall become a bit more abstract. When a finite automaton receives an
input string such as:
x =x
1
x
2
... x
n
where the x
i
are symbols from its input alphabet, it progresses through the
sequence:
s s ... , s
k k k
1 2 n+1
, ,
where the states in the sequence are defined as:
) x ... x x , s ( ) x , (s s
) x x , s ( ) x , (s s
) x , s ( ) x , (s s
s = s
n 2 1 0 n k k
2 1 0 2 k k
1 0 1 k k
0 k
n 1 + n
2 3
1 2
1
= =
= =
= =
!
Getting back to a more intuitive reality, the following table provides an
assignment of values to the symbols used above for an input of bbaba to the
finite automaton of figure 3.
i 1 2 3 4 5 6
x
i
b b a b a
s
k
i
s
0
s
1
s
2
s
2
s
2
s
2
Finite Automata 7
We have mentioned acceptance and rejection but have not talked too much
about it. This can be made precise also.
Defin it ion . The set (of strings) accepted by the finite automaton M =
(S,I,,s
0
,F) is: T(M) = {x |
*
(s
0
,x) F }
This set of accepted strings (named T(M) to mean Tapes of M) is merely all of
the strings for which M ended up in a final or accepting state after processing
the string. For our first example (figure 1) this was all strings of 0's and 1's that
contain exactly one 1. Our last example (figure 3.1.3) accepted the set of strings
over the alphabet {a, b}which began with exactly two b's.
Cl osur e Pr oper t i es
Removing power from Turing machines provided us with a new machine. We
also found a new class of sets. Now it is time to examine this class. Our first
questions concern operations on sets within the class.
Set complement is our first operation to examine. Since we are dealing with
strings (rather than numbers), we must redefine this operation. Thus the
definition of complement will slightly different. This is to be expected because
even though 0100 and 100 are the same number, they are different strings.
Here is the definition. If a set contains strings over an alphabet, then its
complement contains all of the strings (over the alphabet) not in the set.
Th eor em 1. If a set is accepted by a finite automaton then its complement
can be accepted by a finite automaton.
Pr oof. Let the finite automaton M = (S, I, , s
0
, F) accept the set T(M). We
must now show that there is a finite automaton which accepts the set of
strings over the alphabet I which M does not accept. Our strategy will be
to look at the operation of M, accepting when it rejects and rejecting
when M accepts. Since we know that strings which take M to a state of F
are accepted by M and those which take M into S-F are rejected, then our
course is fairly clear. Consider the machine:
M' = (S, I, , s
0
, S-F).
It is exactly the same as M except for its final or accepting states. Thus it
should accept when M rejects. When we precisely examine this, we find
that for all strings x over the alphabet I:
x T(M) if and only if
*
(s
0
,x) F
if and only if
*
(s
0
,x) S-F
if and only if x T(M')
and so ) M ( T ) ' M ( T = .
The last proof contained an example of our first method of dealing with finite
automata: rearranging an existing machine. Now we shall employ another
strategy: combining two machines. This is actually going to be parallel
Closure Properties 2
processing. Suppose we take the two machines whose state graphs are in figure
1.
q
0
q q
1 2
a
b
a b
b
a
s
0
1
2 3
s
s
s
a a
b
b
a,b a,b
(a) (b)
Fi gur e 1 - Fi ni t e Aut omat on Exampl es
We have seen the machine (M
1
) of figure 1a before, it accepts all strings (over {a,
b}) which begin with two b's. The other machine (M
2
) accepts strings which end
with two b's. Let's try to combine them into one machine which accepts strings
which either begin or end with two b's.
Why not run both machines at the same time on an input? We could keep track
of what state each machine is in by placing pebbles upon the appropriate states
and then advancing them according to the transition functions of each machine.
Both machines begin in their starting states, as pictured in the state graphs
below
1
2 3
s
s
s q q
1 2
a a
b
b
a
b
a,b
a b
b
a
a,b
with pebbles on s
0
and q
0
. If both machines now read the symbol b on their
input tapes, they move the pebbles to new states and assume configurations
like these
Closure Properties 3
s
0
2 3
q
s s
0
q
2
a a
b
b
a
b
a,b
a b
b
a
a,b
with pebbles on s
1
and q
1
. The pebbles have advanced according to the
transition functions of the machines. Now let's have them both read an a. At
this point, they both advance their pebbles to the next state and enter the
configurations
s
0
1
2
s
s q q
1 2
a a
b
b
a
b
a,b
a b
b
a
a,b
with pebbles on s
3
and q
0
.
With this picture in mind, let's trace the computations of both machines as they
process several input strings. Pay particular attention to the pairs of states the
machines go through. Our first string is bbabb, which will be accepted by both
machines.
Input: b b a b b
M
1
's states s
0
s
1
s
2
s
2
s
2
s
2
M
2
's states q
0
q
1
q
2
q
0
q
1
q
2
Now let us look at an input string neither will accept: babab.
Input: b a b a b
M
1
's states s
0
s
1
s
3
s
3
s
3
s
3
M
2
's states q
0
q
1
q
0
q
1
q
0
q
1
And finally, the string baabb which will be accepted by M
2
but not M
1
.
Closure Properties 4
Input: b a a b b
M
1
's states s
0
s
1
s
3
s
3
s
3
s
3
M
2
's states q
0
q
1
q
0
q
0
q
1
q
2
If we imagine a multiprocessing finite automaton with two processors (one for
M
1
and one for M
2
), it would probably look just like the pictures above. Its
state could be a state pair (one from each machine) corresponding to the pebble
positions. Then, if a pebble ended up on an accepting state for either machine
(that is, either s
2
or q
2
), our multiprocessing finite automaton would accept.
This is not difficult at all! All we need to do is to define a new class of
machines and we can accept several things at once. (Note that the new
machines accept unions of sets accepted by finite automata.) Or, if we think for
a bit, we find that we do not need to define a new class for this. The next result
shows that we already have this facility with finite automata. The proof of the
theorem demonstrates this by careful manipulation of the symbolic definition
of finite automata.
Th eor em 2. The class of sets accepted by finite automata is closed under
union.
Pr oof Sket ch . Let M
1
= (S, I, , s
0
, F) and M
2
= (Q, I, , q
0
, G) be two
arbitrary finite automata. To prove the theorem we must show that there
is another machine (M
3
) which accepts every string accepted by M
1
or M
2
.
We shall try the multiprocessing pebble machine concept and work it into
the definition of finite automata. Thus the states of M
3
are pairs of states
(one from M
1
and one from M
2
). This works out nicely since the set of
pairs of states from S and Q is known as the cross product (written SQ)
between S and Q. The starting state is obviously <s
0
, q
0
>. Thus:
M
3
= (SQ, I, , <s
0
, q
0
>, H)
where and H will be described presently.
The transition function is just a combination of and , since it
simulates the advance of pebbles on the state graphs. It uses to change
the states in S and , to change the states in Q. In other words, if a is a
symbol of I:
(<s
i
, q
i
>, a) = <(s
i
, a), (q
i
, a)>.
Closure Properties 5
Our final states are all of the pairs which contain a state from F or a state
from G. In cross product notation this is:
H = (FQ) (SG).
We have now defined M
3
and know that it is indeed a finite automaton
because it satisfies the definition of finite automata. We claim it does
accept T(M
1
) T(M
2
) since it mimics the operation of our intuitive
multiprocessing pebble machine. The remainder of the formal proof
(which we shall leave as an exercise) is merely an induction on the length
of input strings to show that for all strings x over the alphabet I:
x T(M
1
) T(M
2
) iff
*
(s
0
,x) F or
*
(q
0
,x) G
iff
*
(<s
0
, q
0
>,x) H.
Thus by construction we have shown that the class of sets accepted by
finite automata is closed under union.
By manipulating the notation we have shown that two finite automata can be
combined. Let's take a look at the result of applying this construction to the
machines of figure 1. (This is pictured in figure 2.)
a
b
b
b
b
b
b
b a
b
a
a
a
a
a
a
s q
3
s q
0 0
s q
3 0
s q
1 3
s q
1 1
s q
1 2
s q
2 0
2
s q
2 2
Fi gur e 2 - Uni on of M
1
and M
2
Note that not all pairs of states are included in the state graph. (For example,
<s
0
, q
1
> and <s
1
, q
2
> are missing.) This is because it is impossible to get to
these states from <s
0
, q
0
>.
Closure Properties 6
This is indeed a complicated machine! But, if we are a bit clever, we might
notice that if the machine enters a state pair containing s
2
, then it remains in
pairs containing s
2
. Thus we can combine all of these pairs and get the smaller
but equivalent machine of figure 3.
a
b
b
b
b
b
a
a
a
a
s q
3
s q
0 0
s q
3 0
s q
1 3
s q
1 1
s q
2 2 i
a,b
Fi gur e 3 - Reduced Uni on Machi ne
This is very likely what anyone would design if they had to build a machine
from scratch rather than use the construction detailed in the theorem. Lest
anyone think that finding smaller, equivalent machines is always this simple, we
must admit that this was an elementary example and we did some prior
planning to provide such clean results.
The final Boolean operation is set intersection. We state this here and leave the
proof as an exercise in Boolean algebra identities (De Morgan's Laws) or
machine construction - as you wish!
Th eor em 3. The class of sets accepted by finite automata is closed under
intersection.
Nondet er mi ni st i c Oper at i on
So far, every step taken by a finite automaton has been exactly determined by
the state of the machine and the symbol read. No choices have existed. This
mode of operation is called determinism and machines of this ilk are known as
deterministic finite automata. Finite automata need not be so unambiguous. We
could have defined them to have some choices of which state to advance to on
inputs. Examine figure 1, it provides the state graph of such a machine.
s
0
s
s
S
s
1
2
3
4
a,b
b a
a
a
a
b
b
b
a
Fi gur e 1 - Nondet er mi ni st i c Fi ni t e Aut omat on
If the input to this machine begins with the string ab, it will progress from s
0
to
s
1
to s
2
. Now, if the next input symbol is an a, it has a choice of whether to go
back to s
1
or to move to s
4
. So, now what does it do? Well, it transfers to the
state which will eventually take it into a final state if possible. Simple as that!
But, how does it know when it has not seen the remainder of the input yet? Do
not worry about that - it always chooses the right state!
The above explanation of the operation of the machine of figure 1 is a slight bit
mystical. But we can (if we wish) define machines with choices in their
transition functions. And we can (if we wish) define acceptance to take place
whenever a correct sequence of states under the input will end up in a final
state. In other words, if the machine can get to a final state in some proper
manner, it will accept. Now for a formal definition.
Defin it ion . A nondeterministic finite automaton is the quintuple
M = (S, I, , S
0
, F) where S, I, and F are as before but:
S
0
S (a set of starting states), and
(s,a) S for each s S and a I .
Nondeterministic Operation 2
Now instead of having a starting state and a transition function, we have a
starting state set and a set of transition states. More on this later. For now, note
that the only differences in the finite automaton definitions was that the
machine now has a choice of states to start in and a choice of transitions under
a state-symbol pair. A reduced version of the last nondeterministic machine is
presented in figure 2.
s
0 1
s
s
s
2
3
b
a
b
a
a
b
a,b
a
Input
State
a b
Accept ?
0 1,3 2 yes
1 3 0 no
2 2 2 no
3 3 2 yes
Fi gur e 2 - Reduced Nondet er mi ni st i c Machi ne
(N.B. We must note that the transition indicator is not a function any more.
To be precise about this we must state that it has become a relation. In other
words, since (s, a) is a set, it indicates which states are members of (s, a). If
that is confusing then forget it and consider (s, a) to be a set.) Our next step is
to define acceptance for nondeterministic finite automata. We could extend the
transition indicator so that it will handle strings. Since it provides a set of next
states, we will just note the set of states the machine is in after processing a
string. Let's look at a picture. Think about the last machine (the one in figure
2). Now imagine what states it might go through if it processed all possible
strings of length three. Now, look at figure 3.
Nondeterministic Operation 3
a
s
0
s
0
1
s
1
s
3
s
3
s
3
s
3
s
3
s
3
s
2
s
2
s
2
s
2
s
2
s
2
s
2
s
2
s
a a
a
a b b
b
b a,b a,b
a,b
a a b b a
Fi gur e 3- Comput at i on Tr ee
In processing the string abb, the machine ends up in s
2
, but it can get there in
two different ways. These are:
s
0
s
1
s
0
s
2
s
0
s
3
s
2
s
2
Likewise, the machine can end up in
3
after processing the string aaa. But since
the automaton is nondeterministic, it can have the option of ending up in
several states after processing an input. For example, after aba, the set of states
{s
1
, s
2
, s
3
}is reached.
In fact, a set of states is reached by the machine after processing an input. This
set depends upon the choices the automaton was offered along the way. And, if
a final state was in the set reached by the machine, we accept. In the above
example only the strings aba and aaa can be accepted because they were the
only strings which took the automaton into s
3
.
This gives a definition of
*
as the set of states reached by the automaton and
the tapes accepted as:
T(M) = {x |
*
(s
0
,x) F }
On the other hand, instead of extending to strings as with the deterministic
case, we shall discuss sequences of state transitions under the state transition
indicator . The following formal definition of acceptance merely states that a
string is accepted by a nondeterministic finite automaton if there is a sequence
of states (or path through the computation tree) which leads from a starting
Nondeterministic Operation 4
state to a final state under the input string. (Note that we do not discuss how a
machine might find this sequence - that does not matter! We just say that the
input is accepted if there exists such a sequence of states.)
Defin it ion . The input x = x
1
... x
n
is accepted by the nondeterministic
finite automaton M = (S, I, , S
0
, F) if and only if there is a sequence of
states: s s s
k k k
n 1 2 1
, ,...,
+
where:
a s S
b for each i n s s x
c s F
k
k k i
k
i i
n
)
) : ( , )
) .
1
1
1
0
+
+
With this definition of acceptance in mind, we define the set of tapes accepted
by a nondeterministic finite automaton (denoted T(M) as before) as exactly
those inputs for which there is a sequence of states leading from a starting
state to an accepting state.
Nondeterminism is useful because it allows us to define very simple machines
which perform certain operations. As an example, let's revisit the union closure
problem. As usual, suppose we have the two finite automata M
1
= (S, I, , s
0
, F)
and M
2
= (Q, I, , q
0
, G) and wish to build a machine which accepts the union of
the sets they accept. Our new union machine contains the states of both M
1
and
M
2
plus a new starting state. This new starting state leads into the states of
either M
1
or M
2
in a nondeterministic manner. That is, under the first input
symbol, we would advance to an appropriate state in M
1
or an appropriate state
in M
2
. Formally, if I = {0,1}, the transition indicator of the union machine is ,
and its starting state s' then:
(s', 0) = {(s
0
, 0), (q
0
, 0)}
(s', 1) = {(s
0
, 1), (q
0
, 1)}
The rest of the transition relation is just a combination of and , and the
state graph for the new union machine might resemble:
S'
M ' s
st at e
gr aph
st at e
gr aph
M ' s
1 2
Nondeterministic Operation 5
Thus the union machine is M = (SQ{s'}, I, , s', FG). Acceptance takes place
whenever there is a sequence to an accepting state through either the state
graph of M
1
or that of M
2
.
Let's try another closure property and build a nondeterministic machine which
realizes it. This will be a string property called concatenation or juxtaposition.
For strings this is easy, the concatenation of x and y is xy. If we have the sets A
and B then the concatenation of them is defined as:
AB = {xy | x A and y B }.
If A and B can be accepted by the deterministic finite automata M
1
= (S, I, , s
0
,
F) and M
2
= (Q, I, , q
0
, G) we need to try and construct a machine M
3
which will
accept AB. Let us look at:
M
3
= (SQ, I, , {s
0
}, G)
and define so that M
3
accepts AB. Our strategy will be to start out processing
as M
1
might and then when M
1
wishes to accept a part of the input, switch to
M
2
and continue on. With any luck we will end up in G, the final state set of
M
2
. Nondeterminism will be used to make the change from M
1
to M
2
. The
transition relation will operate just like on the state set S and like on the
state set Q except for the final states F S. There it will include a transition to
the starting state q
0
of M
2
. One might picture this as:
0
q
S - F
Q
F
and define the transition relation precisely as:
(s
i
, a) = {(s
i
, a)}for s
i
S-F
(s
i
, a) = {(s
i
, a), q
0
}for s
i
F
(q
i
, a) = {(q
i
, a)}for q
i
Q
By the definition of acceptance for nondeterministic machines, M
3
will accept a
string if and only if there is a sequence of states (under the direction of ) which
leads from s
0
to a state in G. Suppose z is a member of AB. Then:
Nondeterministic Operation 6
z AB iff z = xy where x A and y B
iff x T(M
1
) and y T(M
2
)
iff
*
(s
0
,x) F and
*
(q
0
,y) G
This means that there is a sequence of states
s s s
k k k
1 2 n
, ,...,
in S from s
0
= s to s
k k
1 n
F under and x. Also, there is a sequence of states
in Q
q q q
k k k
1 2 m
, ,...,
from q
0
= q to q
k k
1 m
G under and y. Since is defined to be just like on
S and like on Q, these sequences of states in S and Q exist under the influence
of and x and and y. We now note that instead of going to the last state s
k
n
in the sequence in S, could have directed transferred control to q
0
. Thus there
is a sequence:
s s s q q q
0 k k 0 k k
2 n-1 2 m
, ,..., , , ,...,
under and z = xy which proceeds from s
0
to G. We can now claim that
T(M
3
) = AB = T(M
1
)T(M
2
)
noting that the only way to get from s
0
to a state in G via is via a string in AB.
A final note on the machine M
3
which was just constructed to accept AB. If A
contains the empty word then of course s
0
is a final state and thus q
0
also must
be a member of the starting state set. Also, note that if B contains the empty
word, then the final states of M
3
must be FG rather than merely G.
Now we shall move along and look at multiple concatenation. Suppose we
concatenated the same set together several times. Putting this more formally
and in superscript notation, let:
A
0
= {}
A
1
= A
A
2
= AA
A
3
= AA
2
= AAA
Nondeterministic Operation 7
and so forth. The general scheme for this is:
A
0
= {}
A
i+1
= A
i
A
To sum everything up, we consider the union of this infinite sequence of
concatenations and call it the Kleene closure of the set A. Here is the definition.
Defin it ion . The Kleene Closure (written A
*
) of the set A is the union of A
k
over all integer values of k.
This operator is known as the Kleene Star Operator and so A
*
is pronounced A
star. One special case of this operator's use needs to be mentioned. If A is the
set {a,b}(in other words: an alphabet), then A
*
is the set of all strings over the
alphabet. This will be handy.
To accept the Kleene closure of a set accepted by a deterministic finite
automaton a construction similar to that used above for concatenation works
nicely. The strategy we shall use is to allow a reset to the starting state each
time a final state is entered. (That is, start over whenever a string from the
original set could have ended.) We shall present the construction below and
leave the proof of correctness as an exercise.
For a deterministic finite automaton M = (S, I, , s
0
, F) we must build a
(nondeterministic) machine which accepts [T(M)]
*
. As we mentioned, our
strategy will be to allow the transition relation to reset to s
0
whenever a final
state is reached. First we shall introduce a new starting state s' and the
following transition relation for all a U I:
(s',a) = {(s
0
,a)}
(s
i
,a) = {(s
i
,a)}for s
i
S-F
(s
i
,a) = {s
0
, (s
i
,a)}for s
i
F
The machine which accepts [T(M)]
*
is M' = (S{s'},I,,{s'},F{s'}). Now that we have
seen how useful nondeterministic operation can be in the design of finite
automata, it is time to ask whether they have more power than their
deterministic relatives.
Nondeterministic Operation 8
Th eor em 3. The class of sets accepted by finite automata is exactly the
same class as that accepted by nondeterministic finite automata.
Pr oof Sket ch . Two things need to be shown. First, since deterministic
machines are also nondeterministic machines (which do not ever make
choices) then the sets they accept are a subclass of those accepted by
nondeterministic automata.
To show the other necessary relation we must demonstrate that every set
accepted by a nondeterministic machine can be accepted in a
deterministic manner. We shall bring forth again our pebble automaton
model.
Let M
n
= (S, I, , S
0
, F) be an arbitrary nondeterministic finite automaton.
Consider its state graph. Now, place a pebble on each state in S
0
. Then,
process an input string under the transition relation . As calls for
transitions from state to state, move the pebbles accordingly. Under an
input a, with a pebble on s
i
, we pick up the pebble from s
i
and place
pebbles on every state in (S
i
, a). This is done for all states which have
pebbles upon them. Indeed, this is parallel processing with a vengeance!
(Recall the computation tree of figure 3, we are just crawling over it with
pebbles - or processors.) After the input has been processed, we accept if
any of the final states have pebbles upon them.
Since we moved pebbles on all of the paths M
n
could have taken from a
starting state to a final state (and we did not miss any!), we should accept
whenever M
n
does. And, also, if we accept then there was indeed a path
from a starting state to a final state. Intuitively it all seems to work.
But, can we define a deterministic machine which does the job? Let's try.
First let us define some states which correspond to the pebbled states of
M
n
. Consider:
q
1
means a pebble upon s
0
q
2
means a pebble upon s
1
q
3
means pebbles upon s
0
and s
1
q
4
means a pebble upon s
2
q
5
means pebbles upon s
0
and s
2
q
2
n+1
1
means pebbles upon s
0
, s
1
, ... ,s
n
Nondeterministic Operation 9
This is our state set Q. The starting state is the q
k
which means pebbles
on all of S
0
. Our set of final states (G) includes all q
i
which have a pebble
on a state of F. All that is left is to define the transition function and we
have defined a deterministic finite automaton M
d
= (Q, I, , q
k
, G) which
should accept the same set as that accepted by M
n
.
This is not difficult. Suppose q
i
means pebbles upon all of the states in s'
S. Now we take the union of (s
k
,a) over all s
k
s' and find the q
j
which means pebbles on all of these states. Then we define (q
i
, a) = q
j
.
The remainder of the proof involves an argument that M
n
and M
d
accept
the same set.
With this theorem and our previous constructions firmly in mind, we now state
the following.
Cor ollar y . The class of sets accepted by finite automata is closed under
concatenation and Kleene closure.
A cautionary remark must be made at this time. It is rather wonderful to be
able to use nondeterminism in our machine constructions. But, when these
machines are converted to deterministic finite automata via the above
construction, there is an exponential state explosion! A four state machine
would have fifteen states when converted to a deterministic machine. Imagine
what a 100 state machine would look like when converted! Fortunately most of
these machines can be reduced to a more manageable size.
Regul ar Set s and Expr essi ons
Finite automata are important in science, mathematics, and engineering.
Engineers like them because they are superb models for circuits (And, since the
advent of VLSI systems sometimes finite automata are circuits!) Computer
scientists adore them because they adapt very nicely to algorithm design, for
example the lexical analysis portion of compiling and translation.
Mathematicians are intrigued by them too due to the fact that there are several
nifty mathematical characterizations of the sets they accept. This is partially
what this section is about.
We shall build expressions from the symbols 0, 1, +, and & using the operations
of union, concatenation, and Kleene closure. Several intuitive examples of our
notation are:
a) 01 means a zero followed by a one (concatenation)
b) 0+1 means either a zero or a one (union)
c) 0
*
means ^ + 0 + 00 + 000 + ... (Kleene closure)
With parentheses we can build larger expressions. And we can associate
meanings with our expressions. Here's how:
Expr essi on Set Repr esent ed
(0+1)
*
al l st r i ngs over {0,1}.
0
*
10
*
10
*
st r i ngs cont ai ni ng exact l y t wo ones.
(0+1)
*
11 st r i ngs whi ch end wi t h t wo ones.
That is the intuitive approach to these new expressions or formulas.
Now for a precise, formal view. Several definitions should do the job.
Defin it ion . 0, 1, , and are regular expressions.
Defin it ion . If and are regular expressions, then so are
(), ( + ), and ()
*
.
OK, fine. Regular expressions are strings put together with zeros, ones,
epsilons, stars, plusses, and matched parentheses in certain ways. But why did
we do it? And what do they mean? We shall answer this with a list of what
various general regular expressions represent. First, let us define what some
specific regular expressions represent.
Regular Sets 2
a) 0 represents the set {0}
b) 1 represents the set {1}
c) represents the set {}(the empty string)
d) represents the empty set
Now for some general cases. If and are regular expressions representing the
sets A and B, then:
a) () represents the set AB
b) ( + ) represents the set AB
c) ()
*
represents the set A
*
The sets which can be represented by regular expressions are called regular
sets. When writing down regular expressions to represent regular sets we shall
often drop parentheses around concatenations. Some examples are 11(0 + 1)
*
(the set of strings beginning with two ones), 0
*
1
*
(all strings which contain a
possibly empty sequence of zeros followed by a possibly null string of ones),
and the examples mentioned earlier. We also should note that {0,1}is not the
only alphabet for regular sets. Any finite alphabet may be used.
From our precise definitions of the regular expressions and the sets they
represent we can derive the following nice characterization of the regular sets.
Then, very quickly we shall relate them to finite automata.
Th eor em 1. The class of regular sets is the smallest class containing the
sets {0}, {1}, {}, and which is closed under union, concatenation, and
Kleene closure.
See why the above characterization theorem is true? And why we left out the
proof? Anyway, that is all rather neat but, what exactly does it have to do with
finite automata?
Th eor em 2. Every regular set can be accepted by a finite automaton.
Pr oof. The singleton sets {0}, {1}, {}, and can all be accepted by finite
automata. The fact that the class of sets accepted by finite automata is
closed under union, concatenation, and Kleene closure completes the
proof.
J ust from closure properties we know that we can build finite automata to
accept all of the regular sets. And this is indeed done using the constructions
Regular Sets 3
from the theorems. For example, to build a machine accepting (a + b)a
*
b, we
design:
M
a
which accepts {a},
M
b
which accepts {b},
M
a+b
which accepts {a, b}(from M
a
and M
b
),
M
a*
which accepts a
*
,
and so forth
until the desired machine has been built. This is easily done automatically, and
is not too bad after the final machine is reduced. But it would be nice though to
have some algorithm for converting regular expressions directly to automata.
The following algorithm for this will be presented in intuitive terms in language
reminiscent of language parsing and translation.
Initially, we shall take a regular expression and break it into subexpressions. For
example, the regular expression (aa + b)
*
ab(bb)
*
can be broken into the three
subexpressions: (aa + b)
*
, ab, and (bb)
*
. (These can be broken down later on in
the same manner if necessary.) Then we number the symbols in the expression
so that we can distinguish between them later. Our three subexpressions now
are: (a
1
a
2
+ b
1
)
*
, a
3
b
2
, and (b
3
b
4
)
*
.
Symbols which lead an expression are important as are those which end the
expression. We group these in sets named FIRST and LAST. These sets for our
subexpressions are:
Expression FIRST LAST
(a
1
a
2
+ b
1
)
* a
1
, b
1
a
2
, b
1
a
3
b
2
a
3
b
2
(b
3
b
4
)
* b
3
b
4
Note that since the FIRST subexpression contained a union there were two
symbols in its FIRST set. The FIRST set for the entire expression is: {a
1
, a
3
, b
1
}.
The reason that a
3
was in this set is that since the first subexpression was
starred, it could be skipped and thus the first symbol of the next subexpression
could be the first symbol for the entire expression. For similar reasons, the
LAST set for the whole expression is {b
2
, b
4
}.
Formal, precise rules do govern the construction of the FIRST and LAST sets.
We know that FIRST(a) = {a}and that we always build FIRST and LAST sets from
the bottom up. Here are the remaining rules for FIRST sets.
Regular Sets 4
Defin it ion . If and are regular expressions then:
a) FIRST( + ) =FIRST() FIRST()
b) FIRST(*) =FIRST() {}
FIRST() if FIRST()
c) FIRST() =
FIRST() FIRST() otherwise
Examining these rules with care reveals that the above chart was not quite what
the rules call for since empty strings were omitted. The correct, complete chart
is:
Expression FIRST LAST
(a
1
a
2
+ b
1
)
*
a
1
, b
1
, a
2
, b
1
,
a
3
b
2
a
3
b
2
(b
3
b
4
)
*
b
3
, b
4
,
Rules for the LAST sets are much the same in spirit and their formulation will
be left as an exercise.
One more notion is needed, the set of symbols which might follow each symbol
in any strings generated from the expression. We shall first provide an example
and explain in a moment.
Symbol a
1
a
2
a
3
b
1
b
2
b
3
b
4
FOLLOW a
2
a
1
, a
3
, b b
2
a
1
, a
3
, b
1
b
3
b
4
b
3
Now, how did we do this? It is almost obvious if given a little thought. The
FOLLOW set for a symbol is all of the symbols which could come next. The
algorithm goes as follows. To find FOLLOW(a), we keep breaking the expression
into subexpressions until the symbol a is in the LAST set of a subexpression.
Then FOLLOW(a) is the FIRST set of the next subexpression. Here is an example.
Suppose that we have as our expression and know that a LAST(). Then
FOLLOW(a) = FIRST(). In most cases, this is the way it we compute FOLLOW
sets.
Regular Sets 5
But, there are three exceptions that must be noted.
1) If an expression of the form a
*
is in then we must also include the
FIRST set of this starred subexpression .
2) If is of the form
*
then FOLLOW(a) also contains 's FIRST set.
3) If the subexpression to the right of has an in its FIRST set, then we
keep on to the right unioning FIRST sets until we no longer find an
in one.
Another example. Let's find the FOLLOW set for b
1
in the regular expression (a
1
+ b
1
a
2
*
)
*
b
2
*
(a
3
+ b
3
). First we break it down into subexpressions until b
1
is in a
LAST set. These are:
(a
1
+ b
1
a
2
*
)
*
b
2
*
(a
3
+ b
3
)
Their FIRST and LAST sets are:
Expression FIRST LAST
(a
1
+ b
1
a
2
*
)
*
a
1
, b
1
, a
1
, b
1
, a
2
,
b
2
*
b
2
, b
2
,
(a
3
+ b
3
) a
3
, b
3
a
3
, b
3
Since b
1
is in the LAST set of asubexpression which is starred then we place
that subexpression's FIRST set {a
1
, b
1
}into FOLLOW(b
1
). Since a
2
*
came after
b
1
and was starred we must include a
2
also. We also place the FIRST set of the
next subexpression (b
2
*
) in the FOLLOW set. Since that set contained an , we
must put the next FIRST set in also. Thus in this example, all of the FIRST sets
are combined and we have:
FOLLOW(b
1
) = {a
1
, b
1
, a
2
, b
2
, a
3
, b
3
}
Several other FOLLOW sets are:
FOLLOW(a
1
) = {a
1
, b
1
, b
2
, a
3
, b
3
}
FOLLOW(b
2
) = {b
2
, a
3
, b
3
}
After computing all of these sets it is not hard to set up a finite automaton for
any regular expression. Begin with a state named s
0
. Connect it to states
Regular Sets 6
denoting the FIRST sets of the expression. (By sets we mean: split the FIRST set
into two parts, one for each type of symbol.) Our first example (a
1
a
2
+
b
1
)
*
a
3
b
2
(b
3
b
4
)
*
provides:
s
0
1
a
1,3
b
a
b
Next, connect the states just generated to states denoting the FOLLOW sets of
all their symbols. Again, we have:
s
0
1
a
1,3
b
b
2
2
a
a
a a
b
b
b
Continue on until everything is connected. Any edges missing at this point
should be connected to a rejecting state named s
r
. The states containing
symbols in the expression's LAST set are the accepting states. The complete
construction for our example (aa + b)
*
ab(bb)
*
is:
s
0
1
a
b
a
1,3
b
b
2
2
a
a
a a
b
b
b
b
b
b
3
s
r 4
b
a
a
b
a
a,b
Regular Sets 7
This construction did indeed produce an equivalent finite automaton, and in
not too inefficient a manner. Though if we note that b
2
and b
4
are basically the
same, and that b
1
and a
2
are similar, we can easily streamline the automaton to:
s
0
1
b
a
1,3
b
b
2
a
a
a a b
b
b
b
b
3
s
r
a
a
a,b
2,4
Our construction method provides:
s
0
a
1,3
b
a
a
b
b
a
b
a
a
b
123
123
for our final example. There is a very simple equivalent machine.
Try to find it!
We now close this section with the equivalence theorem concerning finite
automata and regular sets. Half of it was proven earlier in the section, but the
translation of finite automata into regular expressions remains. This is not
included for two reasons. First, that it is very tedious, and secondly that nobody
ever actually does that translation for any practical reason! (It is an interesting
demonstration of a correctness proof which involves several levels of iteration
and should be looked up by the interested reader.)
Th eor em 3. The regular sets are exactly those sets accepted by finite
automata.
Decision Problems for Finite Automata
Now we wish to examine decision problems for the sets accepted by finite
automata (or the regular sets). When we tried to decide things concerning the
r.e. sets we were disappointed because everything nontrivial seemed to be
unsolvable. Our hope in defining finite automata as a much weaker version of
Turing machines was to gain solvability at the expense of computational power.
Let us see if we have succeeded. Our first result indicates that we have.
Th eor em 1. Membership is solvable for the regular sets.
Pr oof. This is very easy indeed. With a universal Turing machine we can
simulate any finite automaton. To decide whether it accepts an input we
need just watch it for a number of steps equal to the length of that input.
So far, so good. In order to decide things a bit more intricate than membership
though, we need a very useful technical lemma which seems very strange
indeed until we begin to use it. It is one of the most important results in finite
automata theory and it provides us with a handle on the finiteness of finite
automata. It tells us that only the information stored in the states of a finite
automaton is available during computation. (This seems obvious, but the proof
of the following lemma points this out in a very powerful manner.)
Lemma (Pumping). Let M = (S, I, , s
0
, F) be a finite automaton. Then for
any string x accepted by M whose length is no less than the size of S, there
are strings u, v, and w (over the alphabet I) such that:
a) x = uvw,
b) v is not the empty string, and
c) for all k 0, uv
k
w T(M).
Pr oof. Let x = x
1
... x
n
. Suppose that x is accepted by the finite
automaton M and has length n where n is not smaller than the size of S,
the state set of M. Then as M processes x, it goes through the sequence of
states:
s ,...,s
j j
1 n 1 +
Finite Automata Decision Problems 2
where M is in state s
j
i
as it reads x
i
, and:
a) s
j
1
= s
0
,
b) for each i n, (s
j
i
,x
i
) = s
j
i +1
, and
c) s
j
n+1
F.
Since M has no more than n states (our initial assumption about the
length of x not being less than the number of Ms states), at least one of
the states in the sequence must be repeated because there are n+1 states
in the sequence. We shall assume that this repeat occurs at s
j
a
and s
j
b
where a < b.
Now let's consider the string x
a
... x
b-1
, the portion of the input which is
processed by M as it goes through the sequence of states s
j
a
, ... ,s
j
b
. We
shall say that v is this substring of x and note that since a < b, v . Now
we shall assign the remaining characters of x to u and w. The following
picture illustrates this.
u v w
x
1
... x
a-1
x
a
... x
b-1
x
b
... x
n
It is obvious that uvw = x. And, when M processes x:
a)
*
(s
j
1
,u) =
*
( s
0
,u) = s
j
a
[since s
j
1
= s
0
]
b)
*
(s
j
a
,v) = s
j
b
= s
j
a
[since s
j
b
= s
j
a
]
c)
*
( s
0
,uv) = s
j
b
= s
j
a
[same reason]
d)
*
(s
j
b
,w) =
*
(s
j
a
,w) = s
j
n+1
F [same again]
In other words, M enters and leaves the substring v in the same state. Now
we shall examine exactly what this means. If we were to omit v and
process uw, M would leave u in s
j
a
= s
j
b
and finish w in s
j
n+1
just as
before. Thus uw is in T(M). If we were to make M process uvvw then M
would leave uv in s
j
b
= s
j
a
, leave uvv in s
j
b
and finish w in the same
state as before. Thus uvvw T(M). In fact, no matter how many times we
add another v between u and w, M always leaves and enters each v in s
j
a
and therfore finishes the entire input in the same final state. Thus for
any k 0, uv
k
w T(M).
Finite Automata Decision Problems 3
If we go back and examine our proof of the pumping lemma, we find that we
can prove something a little more powerful. In fact, something that will come
in handy in the future. Something which will make our lives much more
pleasing. Here it is.
Cor ollar y . The substring v of x can be forced to reside in any portion of x
which is at least as long as the number of states of the automaton.
Pr oof. J ust note that in any substring of x which is no shorter than the
number of states, we can find a repeated state while processing. This
provides a v and the proof proceeds as before.
This technical lemma (referred to as the pumping lemma from now on) is one of
the most useful results in theoretical work involving finite automata and the
regular sets. It is the major tool used to
a) detect non-regular sets, and
b) prove decision problems solvable for finite automata.
The usefulness of the pumping lemma comes from the fact that it dramatically
points out one of the major characteristics of finite automata, namely that they
have only a finite amount of memory. Another way to state this is to say that if
a finite automaton has n states, then it can only remember n different things! In
fact, if *(s
0
, x) = *(s
0
, y) then the machine with the transition function
cannot tell the difference between x and y. They look the same to the machine
since they induce the same last state in computations involving them. And, if a
finite automaton accepts a very long string, then chances are that this string
contained repetitive patterns.
Our first use of the pumping lemma will be to present a non-regular set. This is
the favorite example for computer science theorists and illustrates the method
almost always used to prove sets non-regular.
Th eor em 2. The set of strings of the form {0
n
1
n
}for any n 0 is not a
regular set.
Pr oof. Assume that the set of strings of the form 0
n
1
n
is a regular set
and that the finite automaton M = (S, I, , s
0
, F) accepts it. Thus every
string of the form 0
k
1
k
for k larger than the size of the state set S will be
accepted by M.
If we take one of these strings for some k > |S| then the pumping lemma
assures us that there are strings u, v, and w such that:
Finite Automata Decision Problems 4
a) uvw = 0
k
1
k
,
b) v , and
c) for all n 0, uv
n
w T(M).
Invoking the corollary to the pumping lemma assures us that the
substring v can be in the midst of the 0's if we wish. Thus v = 0
m
for
some (nonzero) m k. This makes uw = 0
k-m
1
k
and the pumping lemma
states that uw T(M). Since uw is not of the form 0
n
1
n
we have a
contradiction. Thus our assumption that strings of the form 0
n
1
n
can be
accepted by finite automata and be regular set is incorrect.
As we mentioned earlier, almost all of the proofs of non-regularity involve the
same technique used in the proof of the last theorem. One merely needs to
examine the position of v in a long string contained in the set. Then either
remove it or repeat it several times. This will always produce an improper
string!
Next, we shall use the deflation aspect of the pumping lemma in order to show
that emptiness is solvable for regular sets.
Th eor em 3. If a finite automaton accepts any strings, it will accept one of
length less than the size of its state set.
Pr oof. Let M be a finite automaton which accepts the string x and that
the length of x is no less than the size of M's state set. Assume further
that M accepts no strings shorter than x. (This is the opposite of our
theorem.)
Immediately the pumping lemma asserts that there are strings u, v, and w
such that uvw = x, v , and uw T(M). Since v , uw is shorter than
uvw = x. Thus M accepts shorter strings than x and the theorem follows.
Here is a sequence of corollaries which follow from the last theorem. In each
case the proof merely involves checking membership for all strings of length
less than the size of the state set for some finite automaton or the machine
which accepts the complement of the set it accepts. (Recall that the class of
regular sets, namely those accepted by finite automata is closed under
complement.)
Cor ollar y (Emptiness Problem). Whether or not a finite automaton
accepts anything is solvable.
Cor ollar y (Emptiness of Complement). Whether or not a finite automaton
rejects anything is solvable.
Finite Automata Decision Problems 5
Cor ollar y . Whether or not a finite automaton accepts everything is
solvable.
Another cautionary note about these decision problems is in order. It is quite
refreshing that many things are solvable for the regular sets and it is wonderful
that several solvable decision problems came immediately from one lemma and
a theorem. But, it is very expensive to try and solve these problems. If we need
to look at all of the input strings of length less than the size of the state set
(let's say that it is of size n) then we are looking at almost 2
n
strings! Imagine
how long this takes when n is equal to 100 or so!
Flushed with success we shall attempt (and succeed) at another decision
problem which is unsolvable for the class of recursively enumerable sets.
Th eor em 4. Whether a regular set is finite is solvable.
Pr oof Sket ch . We know that if a finite automaton accepts any strings at
all then some will be of length less than the size of the state set. (This
does not help directly, but it gives a hint as to what we need for this
theorem.) The deletion aspect [uw T(M)] of the pumping lemma was
used to prove this. Let's use the inflation aspect [uv
n
w T(M)] of the
lemma to look for an infinite set.
For starters, if we were to find a string accepted by a finite automaton M
which was longer than or equal to the size of its state set, we could use
the aforementioned inflation aspect of the pumping lemma to show that
machine M must accept an infinite number of strings. This means that:
a finite automaton accepts only strings of length less than the
size of its set of states, if and only if it accepts a finite set.
Thus, to solve the finiteness problem for M = (S, I, , s
0
, F), we need to
determine whether or not:
T(M) - {strings of length < |S|}= .
A question now arises as to how many inputs we must examine in order
to tell if M will accept an input longer than or equal to the size of its state
set. The answer is that we only need to consider input strings up to twice
the size of the state set. (The proof of this is left as an exercise, but it is
much the same as the proof of the emptiness problem.) This ends our
proof sketch.
Finite Automata Decision Problems 6
The next decision problem is included because it demonstrates another
technique; using old problems to solve a new one. We see this in the areas of
unsolvability and complexity when we use reducibility to show problems
unsolvable or intractable. Here we show that the problem of set equivalence is
reducible to the emptiness problem and thus solvable.
Th eor em 5. Whether two regular sets are identical is solvable.
Pr oof. Let's take two finite automata (M
1
and M
2
) and examine a picture
of the sets they accept.
T(M )
1
T(M )
2
If the intersection is the same as both sets then indeed they are identical.
Or, on the other hand, if the areas outside the intersection are empty then
both sets are identical. Let's examine these outside areas.
T T(M )
1
(M )
2
T T ( ) ( ) M M
1 2
T T ( ) ( ) M M
2 1
X
d
Fi gur e 2 - Regul ar Gr ammar Der i vat i on Tr ee
There is not much in the way of complicated structure here. It is exactly the
sort of thing that a finite automaton could handle. In fact, let us now turn the
grammar into a finite automaton.
In order to do this conversion we must analyze how a constant is assembled.
We know that the derivation begins with a C so we shall use that as a starting
state and examine what comes next. This is fairly clear. It must be either a
minus sign, a digit, or a dot. Here is a chart.
Production Action
C I look for a minus and an integer
C dI look for a digit and an integer
C .F look for a dot and a fraction
C d look for a digit and quit
Our strategy will be to go from the state which refers to the part of the constant
we're building (i.e. the nonterminal) to the next part of the constant. This is a
machine fragment which covers the beginning productions of our grammar for
constants.
Regular Languages 4
,d
d
.
OK
C
I F
We should note that we are building a nondeterministic finite automaton. The
above fragment had a choice when it received a digit (d) as input. To continue,
we keep on connecting the states (nonterminals) of the machine according to
productions in the grammar just as we did before. When everything is
connected according to the productions, we end up with the state diagram of a
machine that looks like this:
C
,d
d
.
OK
I F A
X
d
.
E
d
d
+ +,
ddd
d d
d
At this point we claim that a derivation from our grammar corresponds to a
computation of the machine. Also we note that there are some missing
transitions in the state graph. These were the transitions for symbols which
were out of place or errors. For example: strings such as --112, 145.-67, 0.153E-
7+42 and the like. Since this is a nondeterministic machine, these are not
necessary but we shall invent a new state R which rejects all incorrect inputs
and put it in the state table below. The states H and R are the halting (we used
OK above) and rejecting states.
Input s
St at
e
d . + E Accept
?
C
I
F
A
X
H
R
H, I
H, I
H, F
R
H, X
R
R
F
F
R
R
R
R
R
I
R
R
X
R
R
R
R
R
R
X
R
R
R
R
R
A
R
R
R
R
no
no
no
no
no
yes
no
Regular Languages 5
From the above example it seems that we can transform type 3 or regular
grammars into finite automata in a rather sensible manner. (In fact, a far easier
way than changing regular expressions into finite automata.) It is now time to
formalize our method and prove a theorem.
Th eor em 2. Every regular (type 3) language is a regular set.
Pr oof Sket ch . Let G = (N, T, P, S) be a type 3 grammar. We must
construct a nondeterministic finite automaton which will accept the set
generated by the language of G. As in our example, let's use the
nonterminals as states. Also, we shall add special halting and rejecting
states (H and R). Thus our machine is:
M = (N {H, R}, T, , S, {H})
which accepts the strings in L(G).
The transition relation of M is defined as follows for all states (or
nonterminals) A and C as well as all input symbols (or terminals) b.
a) C (A, b) iff A bC
b) H (A, b) iff A b
c) R (A, b) iff it is not the case that A bC or A b
d) {R}= (R, b) = (H, b) for all b T
We need one more note. If there is a production of the form S then S
must also be a final state.
This is merely a formalization of our previous example. The machine
begins with the starting symbol, and examines the symbols which appear
in the input. If they could be generated, then the machine moves on to
the next nonterminal (or state).
The remainder of the proof is the demonstration that derivations of the
grammar are equivalent to computations of the machine. This is merely
claimed here and the result follows.
Another example occurs in figure 3. It contains a grammar which generates
0
*
10
*
and the finite automaton which can be constructed from it using the
methods outlined in the theorem. Note that since there are choices in both
states A and S, this is a nondeterministic finite automaton.
Regular Languages 6
S
1
1
OK
A
NO
0
0
0
1
0,1
S 0S
S 1A
S 1
A 0A
A 0
Fi gur e 3 - Aut omat on and Gr ammar f or 0
*
10
*
With this equivalence between the regular sets and the type 3 languages we now
know many more things about this class of languages. In particular we know
about some sets they cannot generate.
Th eor em 3. The set of strings of the form a
n
b
n
cannot be generated by a
regular (type 3) language.
Cor ollar y . The regular (type 3) languages are a proper subclass of the
context free (type 2) languages.
Cont ext Fr ee Languages
Now that we know a lot about the regular languages and have some idea just
why they were named that, let us proceed up the Chomsky hierarchy. We come
to the type 2 or context free languages. As we recall they have productions of
the form A where is a nonempty string of terminals and nonterminals
and A is a nonterminal.
Why not speculate about the context free languages before examining them with
care. We have automata-language pairings for all the other languages. And,
there is only one class of machines remaining. Furthermore, the set of strings
of the form a
n
b
n
is context free. So, it might be a good bet that the context free
languages have something to do with pushdown machines.
This section will be devoted to demonstrating the equivalence between context
free languages and the sets accepted by pushdown automata. Let's begin by
looking at a grammar for our favorite context free example: the set of strings of
the form a
n
b
n
.
S aSB
S aB
B b
It is somewhat amusing that this is almost a regular grammar. The extra symbol
on the right hand side of the first production is what makes it nonregular and
that seems to provide the extra power for context free languages.
Now examine the following nondeterministic pushdown machine which reads
a's and pushes B's on its stack and then checks the B's against b's.
r ead pop push
a
a
b
S
S
B
SB
B
B
It is obvious that our machine has only one state. It should also be easy to see
that if it is presented with a stack containing S at the beginning of the
computation then it will indeed accept strings of the form a
n
b
n
by empty stack
(recall that this means that it accepts if the stack is empty when the machine
reaches the end of its input string). There is a bit of business with the S on top
of the stack while we are reading a's. But this is what tells the machine that
Context Free Languages 2
we're on the first half of the input. When the S goes away then we cannot read
a's any more.
If there are objections to having a symbol on the stack at the beginning of the
computation, we can design the following equivalent two state machine which
takes care of this.
st at e r ead pop push got o
1 a
a
SB
B
2
2
2
a
a
b
S
S
B
SB
B
2
2
2
See what we did? J ust took all of the instructions which popped an S and
duplicated them as the first instruction.
Anyway, back to our primary objective. We've seen a grammar and a machine
which are equivalent. And, in fact, the translation was rather simple. The
following chart sums it up.
A b
A b
Product ion
read pop push
b A
b A
Machine
So, if we could have grammars with the proper sorts of productions, (namely
right hand sides beginning with terminals), we could easily make pushdown
machines which would accept their languages. We shall do just this in a
sequence of grammar transformations which will allow us to construct
grammars with desirable formats. These transformations shall prove useful in
demonstrating the equivalence of context free languages and pushdown
automata as well as in applications such as parsing.
First we shall consider the simplest kind of production in captivity. This would
be one that has exactly one nonterminal on its right hand side. We shall claim
that we do not need productions of that kind in out grammars. Then we shall
examine some special forms of grammars.
Defin it ion . A chain rule (or singleton production)is one of the form A B
where both A and B are nonterminals.
Context Free Languages 3
Th eor em 1. (Chain Rule Elimination). For each context free grammar
there is an equivalent one which contains no chain rules.
The proof of this is left as an exercise in substitution. It is important because it
is used in our next theorem concerning an important normal form for
grammars. This normal form (due to Chomsky) allows only two simple kinds of
productions. This makes the study of context free grammars much, much
simpler.
Th eor em 2. (Chomsky Normal Form). Every context free language can be
generated by a grammar with productions of the form A BC or A b
(where A, B, and C are nonterminals and b is a terminal).
Pr oof. We begin by assuming that we have a grammar with no chain
rules. Thus we just have productions of the form A b and productions
with right hand sides of length two or longer. We need just concentrate
on the latter.
The first step is to turn everything into nonterminal symbols. For a
production such as A BaSb, we invent two new nonterminals (X
a
and
X
b
) and then jot down:
A BX
a
SX
b
X
a
a
X
b
b
in our set of new productions (along with those of the form A b which
we saved earlier).
Now the only offending productions have all nonterminal right hand
sides. We shall keep those which have length two right hand sides and
modify the others. For a production such as A BCDE, we introduce
some new nonterminals (of the form Z
i
) and do a cascade such as:
A BZ
1
Z
1
CZ
2
Z
2
DE
Performing these two translations on the remaining productions which
are not in the proper form produces a Chomsky normal form grammar.
Here is the translation of a grammar for strings of the form a
n
b
n
into Chomsky
Normal form. Two translation steps are performed. The first changes terminals
to X
i
's and the second introduces the Z
i
cascades which make all productions
the proper length.
Context Free Languages 4
Original Rename Cascade
S aSb S X
a
SX
b
S X
a
Z
1
Z
1
SX
b
S ab S X
a
X
b
S X
a
X
b
X
a
a X
a
a
X
b
b X
b
b
Chomsky normal form grammars are going to prove useful as starting places in
our examination of context free grammars. Since they are very simple in form,
we shall be able to easily modify them to get things we wish. In addition, we
can use them in proofs of context free properties since there are only two kinds
of productions.
Although this is still not what we are looking for, it is a start. What we really
desire is another normal form for productions named Greibach Normal Form.
In this, all of the productions are of the form A b where is a (possibly
empty) string of nonterminals.
Starting with a Chomsky normal form grammar, we need only to modify the
productions of the form A BC. This involves finding out what strings which
begin with terminals ate generated by B. For example: suppose that B generates
the string b. Then we could use the production A bC. A translation
technique which helps do this is substitution. Here is a formalization of it.
Su bs t it u t ion . Consider a grammar which contains a production of the
form A B where A and B are nonterminals and is a string of
terminals and nonterminals. Looking at the remainder of the grammar
containing that production, suppose that:
B
1
|
2
| ... |
n
(where the
i
are strings of terminals and nonterminals) is the collection
of all of the productions which have B as the left hand side. We may
replace the production A B with:
A
1
|
2
| ... |
n
i
*
. A new nonterminal 'A' comes forth to aid in this
endeavor. Thus for all
i
and
j
we produce:
A
j
A'
A'
i
A'
i
A'
and add them to our rapidly expanding grammar.
Since neither the
j
nor the
i
can begin with an A', none of the
productions in the above group are left recursive. Noticing that A does
now generate strings of the form
j
i
*
completes the proof.
Here is an example. Suppose we begin with the Chomsky Normal form
grammar fragment:
A AB
A AC
A DA
A a
and divide it into the two groups indicated by the construction in the left
recursion removal theorem. We now have:
Context Free Languages 6
A A A
A AB A DA
A AC A a
The
i
mentioned in the proof are B and C. And the
j
are DA and a.
Now we retain the A productions and build the three new groups
mentioned in the proof to obtain:
A A A' A' A' A'
A DA A DAA' A' B A' B'
A a A aA' A' C A' CA'
That removes immediate left recursion. But we're not out of the woods quite
yet. There are more kinds of recursion. Consider the grammar:
A BD
B CC
C AB
In this case A can generate ABCD and recursion has once more caused a
difficulty. This must go. Cyclic recursion will be removed in the proof of our
nest result, the ultimate normal form theorem. And, as a useful byproduct, this
next theorem provides us with the means to build pushdown machines from
grammars.
Th eor em 4. (Greibach Normal Form). Every context free language can be
generated by a grammar with productions of the form A a where a is
a terminal and is a (possibly empty) string of nonterminals.
Pr oof Sket ch . Suppose we have a context free grammar which is in
Chomsky normal form. Let us rename the nonterminals so that they have
a nice ordering. In fact, we shall use the set {A
1
, A
2
, ... , A
n
} for the
nonterminals in the grammar we are modifying in this construction.
We first change the productions of our grammar so that the rules with A
i
on the left hand side have either a terminal or some A
k
where k > i at the
beginning of their right hand sides. (For example: A
3
A
6
, but not A
2
A
1
.) To do this we start with A
1
and keep on going until we reach A
n
rearranging things as we go. Here's how.
Assume that we have done this for all of the productions which have A
1
up through A
i-1
on their left hand sides. Now we take a production
involving A
i
. Since we have a Chomsky normal form grammar, it will be
Context Free Languages 7
of the form A
i
b, or A
i
A
k
B. We need only change the second type of
production if k i.
If k < i, then we apply the substitution translation outlined above until we
have productions of the form:
A
i
a, or
A
i
A
j
where j i.
(Note that no more than i-1 substitution steps need be done.) At this
point we can use the left recursion removal technique if j = i and we have
achieved our first plateau.
Now let us see what we have. Some new nonterminals (A
i
') surfaced
during left recursion removal and of course we have all of our old
terminals and nonterminals. But all of our productions are now of the
form:
A
i
a, A
i
A
k
, or A
i
'
where k > i and is a string of old and new nonterminals. (This is true
because we began with a Chomsky normal form grammar and had no
terminal symbols on the inside. This is intuitive, but quite nontrivial to
show!)
An aside. We are in very good shape now because recursion can never
bother us again! All we need do is convince terminal symbols to appear
at the beginning of every production.
The rest is all downhill. We'll take care of the A
i
A
k
productions first.
Start with i = n-1. (The rules with A
n
on the left must begin with terminals
since they do not begin with nonterminals with indices less than n and A
n
is the last nonterminal in our ordering.) Then go backwards using
substitution until the productions with A
1
on the left are reached. Now all
of our productions which have an A
i
on the left are in the form A
i
a.
All that remain are the productions with the A
i
' on the left. Since we
started with a Chomsky normal form grammar these must have one of
the A
i
at the beginning of the right hand side. So, substitute. We're done.
That was rather quick and seems like a lot of work! It is. Let us put all of the
steps together and look at it again. Examine the algorithm presented in figure 1
as we process an easy example.
Context Free Languages 8
GreibackConversion(G)
Pre: G is a Chomsky Normal Form Grammar
Post: G is an equivalent Greibach Normal Form Grammar
Rename nonterminals as A
1
, A
2
, ... ,A
n
for i : = 1 to n do
for each product ion with A
i
on the left hand side do
while A
i
A
k
and k < i substitute for A
k
i f A
i
A
i
then remove left recursion
for i : = n-1 downto 1 do
for each product ion A
i
A
k
substitute for A
k
for each A
i
' A
k
substitute for A
k
Fi gur e 1 - Gr ei bach Nor mal For m Conver si on
Consider the Chomsky normal form grammar
S AB
A a | SA
B b | SB
We order the nonterminals S, A, and B. The first step is to get the right hand
sides in descending order. S AB is fine as is A a. For A SA we follow the
steps in the first for loop and transform this production as indicated below:
Original Substitute Remove Recursion
A SA A ABA A aA'
A' BA
A' BAA'
To continue, B b is what we want, but B SB needs some substitution.
Original Substitute Substitute
B SB B ABB B aBB
B aA'BB
Now we execute the remaining for loops in the algorithm and use substitution
to get a terminal symbol to lead the right hand side on all of our productions.
The right column shows the final Greibach normal form grammar.
Context Free Languages 9
Original Substitute
B aBB B aBB
B aA'BB B aA'BB
A a A a
A aA' A aA'
S AB S aB
S aA'B
A' BA A' aBBA
A' aA'BBA
A' BAA' A' aBBAA'
A' aA'BBAA'
Granted, Greibach normal form grammars are not always a pretty sight. But,
they do come in handy when we wish to build a pushdown machine which will
accept the language they generate. If we recall the transformation strategy
outlined at the beginning of the section we see that these grammars are just
what we need. Let's do another example. The grammar:
S cAB
A a | aBS
B b | bSA
can be easily transformed into the one state, nondeterministic machine:
Read Pop Push
c
a
a
b
b
S
A
A
B
B
AB
BS
SA
which starts with S on its stack and accepts an input string if its stack is empty
after reading the input.
read pop push
a A
a A
Machine
A a
A a
Product ion
Context Free Languages 10
It should be quite believable that Greibach normal form grammars can beeasily
transformed into pushdown machine. The following chart depicts the
formalization for the general algorithm used in this transformation.
Let's add a bit of ammunition to our belief that we can change grammars into
machines. Examine a leftmost derivation of the string cabcabb by the latest
grammar and compare it to the computation of the pushdown machine as it
recognizes the string.
Grammar Machine
Derivation Input Read Stack
S S
cAB c AB
caBSB ca BSB
cabSB cab SB
cabcABB cabc ABB
cabcaBB cabca BB
cabcabB cabcab B
cabcabb cabcabb
Note that the leftmost derivation of the string exactly corresponds to the input
read so far plus the content of the machine's stack. Whenever a pushdown
machine is constructed from a Greibach normal form grammar this happens.
Quite handy! This recognition technique is known as top-down parsing and is
the core of the proof of our next result.
Th eor em 5. Every context free language can be accepted by a
nondeterministic pushdown automaton.
From examining the constructions which lead to the proof of our last theorem
we could come to the conclusion that one state pushdown machines are
equivalent to context free languages. But of course pushdown automata can
have many states. Using nondeterminism it is possible to turn a multistate
automaton into a one state machine. The trick is to make the machine guess
which state it will be in when it pops each symbol off of its stack. We shall close
this section by stating the last part of the equivalence and leave the proof to a
more advanced text on formal languages.
Th eor em 6. Every set accepted by a pushdown automaton can be
accepted by a one state pushdown machine.
Th eor em 7. The class of languages accepted by pushdown machines is the
class of context free languages.
Par si ng and Det er mi ni st i c Languages
We have noted the usefulness of finite automata (for circuit models, pattern
recognition, and lexical scanning) and linear bounded automata (as computer
models). But we have not discussed one of the most important areas of
computing, namely translation. Therefore, we shall now turn our attention to
the role of context free languages and pushdown automata in translation,
compiling, and parsing.
As seen in an early example in this chapter, assignment statements may be
generated by context free grammars. In fact, most of the syntax for
programming languages is context free. Thus, if we can construct grammars for
programming languages, we should be able to use pushdown automata to
recognize correct programs and aid in their translation into assembly language.
In order to recognize programs with correct syntax all we need to do is write
down a grammar that is able to generate all correct programs and then build a
pushdown machine that will recognize this language. Actually, we know exactly
how to build the machine. We convert the grammar to Greibach normal form
and jot down the one state pushdown machine that recognizes strings from the
language which are generated by the grammar. Unfortunately, there is one
small problem. Machines produced from Greibach normal form grammars are
often nondeterministic. This removes the utility from the process since we
cannot convert these machines into the kind of programs we are used to
writing. (After all, we normally do not do nondeterministic programming on
purpose.)
Here is a small aside. We should note that we know that the context free
languages are recursive and thus recognizable by programs which always halt.
So if we desired, we could have the program do some backtracking and go
through all possible derivations described by the grammar. But unfortunately
this makes our computation time somewhere between n
2
and n
3
steps when
recognizing strings of length n. And, we really do not want to use the entire
class of context free languages, only the deterministic ones.
This is exactly what we shall do. By resorting to the same strategy we used to
get away from unsolvability we can eliminate nondeterministic languages by
simplifying the grammars we design. Here is the first step.
Defin it ion . A context free grammar is an s-grammar if and only if every
production's right hand side begins with a terminal symbol and this
terminal is different for any two productions with the same left-hand side.
Parsing and Deterministic Languages 2
This looks good. We can easily build machines for these grammars and can
count on them being deterministic. Let us try an example. Our old and trusted
friend:
S aSb
S ab
It is of course not in the necessary form, but with the operation shown in figure
1, we can fix that.
Given a set of productions of the form:
A
1
|
2
| ... |
n
invent a new nonterminal Z and replace this set by
the collection:
A Z
Z
1
|
2
| ... |
n
Fi gur e 1 - Fact or i ng
We shall now try this on the above grammar for a
n
b
n
. The is of course just the
terminal symbol a while the 's are Sb and b. After factoring we come up with
the following grammar.
S aA
A Sb
A b
This is not quite what we want, but substituting for S will allow us to write
down the following equivalent s-grammar.
S aA
A aAb
A b
Designing machines for these grammars is something we have done before. For
a production of the form A a we just:
read the a, pop the A, and push onto the stack.
The machine accepts by empty stack at the end of the input. There is one
minor adjustment that must be made. We do not wish terminal symbols to be
pushed onto the stack. We can prevent this by presenting the s-grammar in
Greibach normal form by turning all terminals that do not begin the right-hand
Parsing and Deterministic Languages 3
side into nonterminals. Or, we could modify the pushdown machine so that it
just pops a terminal whenever it is on top of the stack and on the input string.
Thus in designing the pushdown machine, we always push the tail of a
production's right hand side and pop the nonterminal on the left whenever we
read the proper symbol. We also pop terminals when they appear on top of the
stack and under the reading head at the same time.
Let us continue. At times factoring produces productions which are not
compatible with s-grammars. For example, the fragment from a grammar for
arithmetic assignment statements
E T | T+E
becomes the following when we factor it.
E TZ
Z | +E
and we can substitute for the T until we get terminals leading all of the right-
hand sides of the productions. Our above example contains something we have
not considered, an epsilon rule (Z ). This was not exactly what we wished to
see. In fact it messes up our parser. After all, just how do we read nothing and
pop a Z? And when do we know to do it? The following definitions lead to an
answer to this question.
Defin it ion . The select set for the production A a (which is written
SELECT(A a)) where A is a nonterminal and is a possibly empty
string of terminal and nonterminial symbols is the set {a}.
Defin it ion . A context free grammar is a q-grammar if and only if every
production's right hand side begins with a terminal symbol or is , and
whenever two productions possess the same left-hand side they have
different select sets.
Thus if we have productions with matching left-hand sides, they must behave
like s-grammars, and the select sets guarantee this. Select sets solve that
problem, but what exactly is the select set for an epsilon rule? It should be just
the terminal symbol we expect to see next. If we knew what should follow the A
in the epsilon rule A then we would know when to pop the A without using
up an input symbol. The next definitions quickly provide the precise way to
start setting this up.
Parsing and Deterministic Languages 4
Defin it ion . The follow set for the nonterminal A (written FOLLOW(A)) is
the set of all terminals a for which some string Aa can be derived from
the starting symbol S. (Where and are possibly empty strings of both
terminals and nonterminals.)
Defin it ion . SELECT(A ) = FOLLOW(A).
That was not so difficult. We merely apply an epsilon rule whenever a symbol
that follows the left-hand side nonterminal comes along. This makes a lot of
sense. There is one small catch though. We must not advance past this symbol
on the input string at this time. Here is our first example in a new, elegant form
since it now contains an epsilon rule:
S | aSb
It is easy to see that the terminal symbol b must follow the S since S remains
between the a's and b's until it disappears via the epsilon rule. A machine
constructed from this grammar appears as figure 2.
read pop push advance?
a S Sb yes
b S no
b b yes
Fi gur e 2 - A Par ser f or a q-gr ammar
Note that technically we are no longer designing pushdown machines. Actually
we have designed a top-down parser. The difference is that parsers can advance
along the input string (or not) as they desire. But we should note that they still
are really pushdown automata.
Since our last example was a bit different than a pushdown automaton, let us
examine the computation it goes through when presented with the string aabb.
In this example, the stack bottom is to the left and the blue, italic input symbols
have been read.
Stack Input Action
S aabb Apply S aSb
bS a abb Apply S aSb
bbS aa bb Apply S
bb aa bb Verify b
b aab b Verify b
aabb Accept
Parsing and Deterministic Languages 5
Another way to look at this computation is to note that a leftmost derivation of
aabb took place. In fact, if we were to concatenate the input read and the stack
up to the point where the b's were verified, we would have the following
leftmost derivation.
S aSb aaSbb aabb
So, a top-down parser executes and verifies a leftmost derivation based on the
input and stack symbols it sees along the way. We shall explore this
phenomenon later.
Let us now go one step farther. Why require a production to have a right hand
beginning with a terminal? Why not allow productions of the form:
E TZ
that we got when we factored the above some productions for arithmetic
expressions earlier? This seems more intuitive and natural. Besides, it is no fun
transforming grammars into Greibach Normal Form. Things would be far easier
if we had a way of predicting when to apply the rule as we did for epsilon rules.
Modifying our select set definition a little helps since it allows select sets to
indicate when to apply productions.
Defin it ion . The select set for the production A (which is written as
SELECT(A )) where A is a nonterminal and is a string of terminal and
nonterminial symbols is the set of all terminal symbols which begin strings
derived from .
This tells us that if our pushdown machine has an E on its stack and is reading
a symbol from SELECT(E TZ), then that is the production that should be
applied by pushing TZ on the stack. Here is a new class of deterministic context
free grammars based upon select sets.
Defin it ion . A context free grammar is an LL(1) grammar if and only if
two productions with the same left-hand sides have different select sets.
Two items need to be cleared up. First, LL(1) means that the parser we build
from the grammar is processing the input string from left to right (the first L)
using leftmost derivations (the second L). This is exactly what the parsers we
have been designing have been doing. They create a leftmost derivation and
check it against the input string. In addition we look ahead one symbol. By that
we mean that the parser can examine the next symbol on its input string
without advancing past it. The parser in figure 2 is an example of this.
Parsing and Deterministic Languages 6
Next, we have neglected to fully define select sets for arbitrary productions.
Intuitively, this will be all of the terminal symbols first produced by the
production, which we call its FIRST set. If the production can lead to the empty
string, then the FOLLOW set of the left-hand side nonterminal must be included.
Here are the formal definitions.
Defin it ion . FIRST(A ) is the set of all terminal symbols a such that
some string of the form a can be derived from .
Defin it ion . SELECT(A ) contains FIRST(A ). If can be derived
from then it also contains FOLLOW(A).
It is time for a theoretical aside. The proof of this next theorem is left as an
exercise in careful substitution.
Th eor em 1. The class of languages generated by q-grammars is the same
as the class of LL(1) languages.
Our next step is to work out computing procedures for FIRST, FOLLOW, and
SELECT. We have defined them, but have not considered just how to find them.
We begin by finding out which nonterminals can generate the empty string
using the algorithm of figure 3. (In the following algorithms, capital Roman
letters are nonterminals, small Roman letters from the beginning of the
alphabet are terminals, x can be either, and Greek letters (except for ) are
strings of both.)
Nul l Fi nder(G, E)
PRE: G = (N, T, S, P) is a context f ree grammar
POST: E = set of nonterminals which generate
delet e al l product ions cont ai ni ng t er mi nal s f ro m P
E :=
repeat
f or each epsi lon r ule A i n P
add A t o E
delet e al l A product ions f ro m P
delet e occur rences of A i n al l product ions
until no more epsi lon r ules remai n i n P
Fi gur e 3 - Fi ndi ng nul l abl e nont er mi nal s
Let's pause a moment and insure that the algorithm for finding nonterminals
which generate the null string is correct. We claim that it terminates because
one nonterminal is deleted every time the loop is executed. Correctness comes
Parsing and Deterministic Languages 7
from thinking about just how a nonterminal A could generate the null string .
There must be a production A where all of the nonterminals in in turn
generate . And one of them must generate directly.
Computing FIRST sets starts with a relation named BEGIN that is computed
according to the recipe:
for all productions: A x
if can be derived from then x BEGIN(A)
(Note that x is either a terminal or nonterminal and is a possibly empty string
of nonterminals.) Then FIRST(A) is merely the set of terminal symbols in the
reflexive, transitive closure of BEGIN(A). Now we have all of the terminals that
show up at the beginning of strings generated by each nonterminal. (For
completeness we should add that for a terminal symbol a, FIRST(a) = {a}.) And
at last we have:
for all productions A
if = x and can be derived from
then FIRST(A ) contains FIRST(x)
Some more relations are needed to do FOLLOW. Let us begin. We need to
detect symbols that come after others and which end derivations.
for all productions: A Ax
if can be derived from then x AFTER(A)
for all productions: A x
if can be derived from then x END(A)
So, AFTER(A) is all of the symbols that immediately follow A. If we set LAST(A)
to be the reflexive, transitive closure of END(A) then it is the set of symbols
which end strings generated from A.
Now, watch closely. FOLLOW(A) is the set of all terminal symbols a such that
there exist symbols B and x where:
A LAST(B), x AFTER(B), and a FIRST(x)
Try to write that out in prose. Best way to be sure that it is correct and to
understand the computing procedure. For now, let us do an example. How
about one of the all time favorite grammars for arithmetic expressions (which
we factor on the right):
Parsing and Deterministic Languages 8
E T | T+E E TA
A +E |
T F | FT T FB
B T |
F x | (E) F x | (E)
Here are all of the relations that we described above for the nonterminals of the
factored grammar.
E A T B F
BEGIN T + F x, (
FIRST x, ( + x, ( x, (
AFTER ) A B
END T, A E F, B T x, )
LAST E,A,T,B,F,x,) E,A,T,B,F,x,) T,B,F,x,) T,B,F,x,) F,x,)
FOLLOW ) +, ) +, ) +, , ) )
Putting this all together we arrive at the following select sets and are able to
construct a parser for the grammar. The general rule for LL(1) parsers is to pop
a production's left-hand side and push the right hand side when reading the
select set. Here is our example.
read pop push advance?
E TA { x, ( } x,( E TA no
A +E { + } + A E yes
A { ) } ) A no
T FB { x, ( } x,( T FB no
B T { } B T yes
B { +, ) } +,) B no
F x { x } x F yes
F (E) { ( } ( F E) yes
) ) yes
Select Parser
Production Set
That wasn't so bad after all. We did a lot of work, but came up with a
deterministic parser for a LL(1) grammar that describes arithmetic expressions
for a programming language. One more note on top-down parsers though.
They're usually presented with instructions to either pop the top stack symbol
(for an epsilon rule) or to replace the top stack symbol with a string (for other
productions). No advance of the input is made in either case. It is always
implicitly assumed that when the top of the stack matches the input that the
Parsing and Deterministic Languages 9
parser pops the symbol and advances. This is a predict and verify kind of
operation based on leftmost derivations.
Often parsers are presented in a slightly different format. All of the match or
verify operations with terminals on the input and stack are understood and a
replace table is provided as the parser. This table shows what to place on the
stack when the top symbol and the next input symbol are given. Here is our
last example in this form.
Stack Input Symbol
+ x ( )
E TA TA
A +E pop
T FB FB
B pop T pop
F x (E)
See how the two formats describe the same parser? In the new from we just
examine the input and the stack and replace the stack symbol by the string in
the appropriate box of the parser table. By the way, the blank boxes depict
configurations that should not occur unless there is an error. In a real parser
one might have these boxes point to error messages. Quite useful.
At this point we have defined a subclass (in fact a proper one) of the
deterministic context free languages. And, we know how to build parsers for
their grammars and check to see if the grammars are in the proper form. We
even have some tools that can be used to place grammars in the correct form if
needed. These are:
omitting useless nonterminals,
omitting unreachable nonterminals,
factoring,
substitution, and
recursion removal.
A word about recursion. It should be obvious that left recursion (immediate or
cyclic) is not compatible with top-down parsing. The following result should
reinforce this.
Parsing and Deterministic Languages 10
Th eor em 2. Every LL(1) language has a nonrecursive grammar.
Pr oof. Assume that there is an LL(1) language whose grammars all have
recursion. This means that in each LL(1) grammar for the language there
is a nonterminal A such that A can be derived from it. Let us further
assume that A is not a useless nonterminal.
This means that there is a sequence of nonterminals X
1
, X
2
, ... , X
k
such
that for some sequence of strings
1
,
2
, ... ,
k
:
A X
1
1
X
2
2
... X
k
k
A.
Take the nonterminals in the above sequence (A and the X
i
). Now examine
the select sets for all productions where they form the left-hand side.
Each of these select sets must have FIRST(A) in it. (And, FIRST(A) is not
empty since A is not useless.) We claim that either all of these
nonterminals are useless or that at least one of them forms the left side
of more than one production. Since A is not useless, then there must be
at least two productions with the same left side and overlapping select
sets. Thus the grammar is not LL(1).
So, if we wish to translate or parse something here is the recipe:
Develop a grammar for it,
Convert the grammar to LL(1) form, and
Construct the parser.
This always brings success since LL(1) grammars provide parsers which are
nonrecursive and deterministic.
Unfortunately this is not always possible. In fact, we have no way of knowing
whether a grammar can be placed in LL(1) form. This can be seen from the
following sequence of results that terminate with our old nemesis unsolvability.
(The second result, theorem 4, is left for an advanced treatment of formal
languages.)
Th eor em 3. An arbitrary Turing machine's set of valid computations is a
context free language if and only if the machine halts for only a finite set
of inputs.
Pr oof. Left as an exercise in use of the pumping lemma.
Th eor em 4. The class of deterministic context free languages is closed
under complement.
Parsing and Deterministic Languages 11
Th eor em 5. It is unsolvable whether an arbitrary context free language is:
a) a regular set,
b) an s-language,
c) an LL(1) language, or
d) a deterministic context free language.
Pr oof. We shall reduce the finiteness problem for Turing machines to the
problem of deciding whether or not a context free language is in a
subclass that is closed under complement.
Take an arbitrary Turing machine M
i
and construct the context free
language L
g(i)
that is the set of invalid computations for the Turing
machine. Now examine the complement of L
g(i)
: the Turing machine's
valid computations. If the Turing machine halted for only a finite number
of inputs then the set of valid computations is a regular set (and thus also
a deterministic context free language, or LL(1) language, or s-language).
Both L
g(i)
and its complement are in all of these subclasses of the context
free languages. That is:
T(M
i
) is finite iff {valid computations }is finite
iff the complement of L
g(i)
is regular
iff L
g(i)
is regular
Thus being able to decide whether an arbitrary context free language is in
one of these subclasses allows us to decide finiteness from recursively
enumerable sets.
Another problem with deterministic parsing occurs when we wish to know how
a statement was formed. If we examine the grammar:
S V = E
E E + E
E E E
E V
V x | y | z
we find that it generates all of the assignment statements generated by the
grammar at the beginning of this chapter. But unfortunately it is ambiguous.
For example, the statement:
x = y + zx
can be generated by the two rightmost derivations in figure 4. Note that in the
tree on the left an expression is generated which must be evaluated with the
Parsing and Deterministic Languages 12
multiplication first as x = y + (zx) while that on the right generates one which
would be evaluated as x = (y + z)x.
V
y
E
S
= V
x
E
+ E
V
z
E
V
x
E
V
y
E
S
= V
x
E
+
E
V
z
E
V
x
E
x = y + (zx) x = (y + z) x
Fi gur e 4 - Ambi guous Ri ght most Der i vat i ons
Thus, it is not clear whether we should add or multiply first when we execute
this statement. In our previous work with LL(1) grammars this was taken care of
for us - none of them could be ambiguous! But since some languages always
have ambiguous grammars (we call these languages inherently ambiguous) we
need to be careful.
A shortcoming of the class of LL(1) languages we should mention is that they do
not include all of the deterministic context free languages. We might think to
extend our lookahead set and use LL(k) parsers by looking into a window k
symbols wide on the input as we parse. But this leads to very large parser
tables. Besides, applications that require more than one symbol of lookahead
are scarce. And anyway, the LL(k) languages do not include all of the
deterministic context free languages.
In our quest for all of the deterministic context free languages, we shall turn
from top-down or predictive parsing to the reverse. Instead of predicting what
is to come and verifying it from the input we shall use a bottom-up approach.
This means that rather than beginning with the starting symbol and generating
an input string, we shall examine the string and attempt to work our way back
to the starting symbol. In other words, we shall reconstruct the parse. We will
process the input and decide how it was generated using our parser stack as a
notebook. Let us begin by examining a string generated by (you guessed it!) the
grammar:
A aB
B Ab
B b
Parsing and Deterministic Languages 13
The following chart provides an intuitive overview of this new approach to
parsing. Here the string aabb is parsed in a bottom-up manner. We shall
discuss the steps after presenting the chart.
Step Stack Examine Input Maybe Action
0 aabb
1 a abb A aB push
2 a a bb A aB push
3 aa b b B b apply
4 aa B b A aB apply
5 a A b B Ab push
6 aA b B Ab apply
7 a B A aB apply
8 A accept
In step 1 we moved the first input symbol (a) into the examination area and
guessed that we might be working on A aB. But, since we had not seen a B we
put off making any decisions and pushed the a onto the stack to save it for
awhile. In step 2 we did the same. In step three we encountered a b. We knew
that it could come from applying the production B b. So we substituted a B
for the b. (Remember that we are working backwards.) In step 4 we looked at
the stack and sure enough discovered an a. This meant that we could surely
apply A aB, and so we did. (We moved the new A in to be examined after
getting rid of the B which was there as well as the a on the stack since we used
them to make the new A.) In step 5 we looked at the stack and the examination
area, could not decide what was going on (except something such as B Ab
possibly), so we just put the A onto the stack. Then in step 6 we looked at the b
in the examination area and the A on the stack and used them both to make a B
via the production B Ab. This B entered the examination area. Looking at
this B and the stacked a in step 7 we applied A aB to them and placed an A in
the examination area. Since nothing was left to do in step 8 and we were
examining our starting symbol we accepted.
See what happened? We looked at our input string and whenever we could
figure out how a symbol was generated, we applied the production that did it.
We in essence worked our way up the derivation tree. And, we used the stack to
save parts of the tree to our left that we needed to tie in later. Since our
grammar was unambiguous and deterministic, we were able of do it.
Now let us do it all over with some new terminology and some mixing up of the
above columns. When we push an input symbol into the stack we shall call it a
shift. And when we apply a production we shall call it a reduce operation. We
Parsing and Deterministic Languages 14
shall shift our guesses onto the stack with input symbols. For example, if we
see an a and guess that we're seeing the results of applying the production A
aB, we shift the pair (a,aB) onto the stack. After we reduce, we shall place a
guess pair on the stack with the nonterminal we just produced. Here we go.
Step Stack Input Action
0 aabb shi f t (aB)
1 (a,aB) abb shi f t (aB)
2 (a,aB)(a,aB) bb shi f t (b)
3 (a,aB)(a,aB)(b,b) b reduce(B b)
4 (a,aB)(a,aB)(B,aB) b reduce(A aB)
5 (a,aB)(A,Ab) b shi f t (Ab)
6 (a,aB)(A,Ab)(b,Ab) reduce(B Ab)
7 (a,aB)(B,aB) reduce(A aB)
8 (A, ) accept
Our new parsing technique involves keeping notes on past input on the stack.
For instance, in step 5 we have an a (which might be part of an aB) at the
bottom of our stack, and an A (which we hope shall be part of an Ab) on top of
the stack. We then use these notes to try to work backwards to the starting
symbol. This is what happens when we do reduce operations. This is the
standard bottom-up approach we have always seen in computer science. Our
general method is to do a rightmost derivation except that we do it backwards!
Neat. What we did at each step was to examine the stack and see if we could do
a reduction by applying a production to the top elements of the stack. If so,
then we replaced the right hand side symbols (which were at the top of the
stack) with the left-hand side nonterminal.
After doing a reduction we put the new nonterminal on the stack along with a
guess of what was being built. We also did this when we shifted a terminal onto
the stack. Let us examine these guesses. We tried to make them as accurate as
possible by looking at the stack before pushing the (symbol, guess) pair. We
should also note that the pair (a,aB) means that we have placed the a on the
stack and think that maybe a B will come along. On the other hand, the pair
(b,Ab) indicates that the top two symbols on the stack are A and b, and, we have
seen the entire right hand side of a production. Thus we always keep track of
what is in the stack.
Now for another enhancement. We shall get rid of some duplication. Instead of
placing (a, aB) on the stack we shall just put a|B on it. This means that we have
seen the part of aB which comes before the vertical line - the symbol a. Putting
aB| on the stack means that we have a and B as our top two stack symbols. Here
is the same computation with our new stack symbols.
Parsing and Deterministic Languages 15
Step Stack Input Action
0 aabb shi f t (a|B)
1 a|B abb shi f t (a|B)
2 a|B, a|B bb shi f t (b|)
3 a|B, a|B,b| b reduce(B b)
4 a|B, a|B,aB| b reduce(A aB)
5 a|B, A|b b shi f t (Ab|)
6 a|B, A|b, Ab| reduce(B Ab)
7 a|B, aB| reduce(A aB)
8 A| accept
Let's pause a bit and examine these things we are placing on the stack. They are
often called states and do indicate the state of the input string we have read and
partially parsed. States are made up of items that are just productions with an
indicator that tells us how far on the right hand side we have progressed. The
set of items for the previous grammar is:
A |aB B |Ab B |b
A a|B B A|b B b|
A aB| B Ab|
Recall what an item means. A a|B means that we have seen an a and hope to
see a B and apply the production. Traditionally we also invent a new starting
symbol and add a production where it goes to the old starting symbol. In this
case this means adding the items:
S
0
|A S
0
A|
to our collection of items.
There are lots and lots of items in a grammar. Some are almost the same. Now
it is time to group equivalent items together. We take a closure of an item and
get all of the equivalent ones. These closures shall form the stack symbols (or
states) of our parser. These are computed according to the following procedure.
Parsing and Deterministic Languages 16
Closure(I , CLOSURE(I ))
PRE: I is an item
POST: CLOSURE(I) contains items equivalent to I
place I in CLOSURE(I )
f or each (A |B) i n CLOSURE(I ) and (B )
place (B | ) i n CLOSURE(I )
Fi gur e 5 - Cl osur e Comput at i on f or It ems
We should compute a few closures for the items in our grammar. The only time
we get past the first step above is when the vertical bar is to the left of a
nonterminal. Such as in B |Ab. Let's do that one. We place B |Ab in
CLOSURE(B |Ab) first. Then we look at all A productions and put A |
in CLOSURE(B |Ab) also. This gives us:
CLOSURE(B |Ab) = {B |Ab, A |aB}
Some more closures are:
CLOSURE(S |A) = {S |A, A |aB}
CLOSURE(S A|) = {S A|}
CLOSURE(A a|B) = {A a|B, B |Ab, B |b, A |aB}
Thus the closure of an item is a collection of all items which represent the same
sequence of things placed upon the stack recently. These items in the set are
what we have seen on the input string and processed. The productions
represented are all those which might be applied soon. The last closure
presented above is particularly interesting since it tells us that we have seen an
a and should be about to see a B. Thus either Ab, b, or aB could be arriving
shortly. States will built presently by combining closures of items.
Let's return to our last table where we did a recognition of aabb. Note that in
step 2 a|B was on top to the stack and the next input was b. We then placed b|
on the stack. Traditionally sets of items called states are placed upon the stack
and so the process of putting the next state on the stack is referred to as a
GOTO. Thus from step 2 to step 3 in the recognition of aabb we execute:
GOTO(a|B, b) = b|.
In step 3 we reduced with the production B b and got a B. We then placed aB|
on the stack. In our new terminology this is:
Parsing and Deterministic Languages 17
GOTO(a|B, B) = aB|.
It is time now to precisely define the GOTO operation. For a set of items (or
state) Q and symbol x this is:
GOTO(Q, x) = {CLOSURE(A x|)}for all A |x Q
Check out the operations we looked at above and those in the previous
acceptance table. Several more examples are:
GOTO({S |A, A |aB}, A) = CLOSURE(S A|)
= {S A|}
GOTO({S |A, A |aB}, a) = CLOSURE(A a|B)
= {A a|B, B |Ab, B |b, A |aB}
So, all we need do is add a new starting production (S
0
S) to a grammar and
execute the following state construction algorithm in order to generate all the
states we require in order to do parsing.
Q
0
= CLOSURE(S
0
|S)
i = 0
k = 1
repeat
f or each |x i n Q
i
if GOTO(Q
i
, x) is a new st at e then
Q
k
= GOTO(Q
i
, x)
k = k + 1
i = i + 1
unt i l no new st at es are f ound
Fi gur e 6 - St at e Const r uct i on Al gor i t hm
The seven states determined for our example grammar using the above
algorithm are:
Q
0
={S |A, A |aB}
Q
1
={S A|}
Q
2
={A a|B, B |Ab, B |b, A |aB}
Q
3
={A aB|}
Q
4
={B A|b}
Q
5
={B b|}
Q
6
={B Ab|}
Parsing and Deterministic Languages 18
and the relationships formed from the GOTO(Q, x) relationship are:
A B a b
Q
0
Q
1
Q
2
Q
2
Q
4
Q
3
Q
2
Q
5
Q
4
Q
6
Note that not all state-symbol pairs are represented. We do know why there are
no states to go to from Q
1
, Q
3
, Q
5
, and Q
6
- right? Because they are states that
contain items we shall reduce. After that we shall place another item on the
stack as part of the reduction process.
All that remains is to build a parser that will carry out the sample computation
we presented above. It is easy. Here are the rules for building the parser table.
(Recall that the stack symbols or states are along the left side while grammar
symbols lie across the top.)
On the row for state Q
i
:
a) i f GOTO(Q
i
,a) = Q
k
then shift (Q
k
) under a
b) i f A | Q
i
then reduce(A ) under FOLLOW(A)
c) i f S
0
S| Q
i
then accept under the endmarker
Fi gur e 7 - Par ser Tabl e Const r uct i on
That is all there is to it. Quite simple. Another note - we shall attach a GOTO
table to the right side of our parser table so that we know what to place on the
stack after a reduction. The parser for our sample grammar is provided below.
The words shift and reduce have been omitted because they refer always to
states and productions respectively and there should be no problem telling
which is which. (Aliases for the states have been provided so that the table is
readable.)
Parsing and Deterministic Languages 19
State Input GOTO
Name Alias a b end A B
Q
0
|aB Q
2
Q
1
Q
1
A| accept
Q
2
a|B Q
2
Q
5
Q
4
Q
2
Q
3
aB| A aB
Q
4
A|b Q
6
Q
5
b| B b B b
Q
6
Ab| B Ab B Ab
We know intuitively how these parsers work, but need to specify some things
precisely. Shift operations merely push the indicated state on the stack. A
reduce operation has two parts. For a reduction of A where the length of
is k, first pop k states off the stack. (These are the right hand side symbols for
the production.) Then if Q
i
is on top of the stack, push GOTO(Q
i
,A) onto the
stack. So, what we are doing is to examine the stack and push the proper state
depending upon what was at the top and what was about to be processed. And
last, begin with Q
0
on the stack. Try out our last example and note that exactly
the same sequence of moves results.
Now let us label what we have been doing. Since we have been processing the
input from left to right and doing rightmost derivations, this is called LR
parsing. And the following theorem ties the LR languages into our framework.
Th eor em 6. The following classes are equivalent.
a) Deterministic context free languages.
b) LR(1) languages.
c) LR(0) languages with endmarkers.
d) Languages accepted by deterministic pushdown automata.
Summar y
We have encountered five major classes of languages and machines in our
examination of computation. Now seems like a good time to sum up some of
the things we have discovered for all of these classes. This shall be done in a
series of charts.
The first sets forth these classes or families in descending order. Each is a
proper subclass of those above it. (Note that the last column provides a set in
the class which does not belong to the one below.)
Class Machine Language Example
Recur si vel y
Enumer abl e Set s
Tur i ng
Machi nes
Type 0
K
Recur si ve Set s di agonal set s
Cont ext
Sensi t i ve
Languages
Li near
Bounded
Aut omat a
Type 1
0
n
1
n
0
n
Cont ext Fr ee
Languages
Pushdown
Aut omat a
Type 2
i nval i d TM
comput at i ons
Det er mi ni st i c
Cont ext Fr ee
Languages
Det er mi ni st i c
Pushdown
Aut omat a
LR(1)
a
n
b
n
Regul ar Set s
Fi ni t e
Aut omat a
Type 3
Next, we shall list the closure properties which were proven for each class or
mentioned in either the historical notes or exercises. Complement is indicated
by '' and concatenation is indicated by a dot.
Class
r .e.
r ecur si ve
csl
cf l
dcf l
r egul ar
no
yes
yes
no
yes
yes
yes
yes
yes
yes
no
yes
yes
yes
yes
no
no
yes
yes
yes
yes
yes
no
yes
yes
yes
yes
yes
no
yes
Summary 2
Our last chart indicates the solvability or unsolvability of the decision problems
we have examined thus far. (S stands for solvable, U for unsolvable, and ? for
unknown.)
Class x L L = L finite L
i
L
j
L
i
= L
j
L = * L cofinite
r.e. U U U U U U U
recursive S U U U U U U
csl S U U U U U U
cfl S S S U U U U
dcfl S S S U ? S S
regular S S S S S S S
N otes
It al l began wi t h Noam Chomsky. Soon, however BNF (Backus Nor mal f or m
or Backus-Naur For m) was i nvent ed t o speci f y t he synt ax of pr ogr ammi ng
l anguages. The cl assi cs ar e:
J . W. BACKUS, "The syntax and semantics of the proposed international
algebraic language of the Zurich ACM-GAMM conference," Proceedings of the
International Conference on Information Processing (1959), UNESCO, 125-132.
N. CHOMSKY, "Three models for the description of languages," IRE Transactions
on Information Theory 2:3 (1956), 113-124.
P. NAUR et. al., "Report of the algorithmic language ALGOL 60," Communications
of the Association for Computing Machinery 3:5 (1960), 299-314. Revised in 6:1
(1963), 1-17.
Rel at i onshi ps bet ween cl asses of l anguages and aut omat a wer e soon
i nvest i gat ed. In or der of l anguage t ype we have:
N. CHOMSKY, "On certain formal properties of grammars," Information and
Control 2:2 (1959), 137-167.
S. Y. KURODA, "Classes of languages and linear bounded automata," Information
and Control 7:2 (1964), 207-223.
Formal Languages References 2
P. S. LANDWEBER, "Three theorems on phrase structure grammars of type 1."
Information and Control 6:2 (1963), 131-136.
N. CHOMSKY, "Context-free grammars and pushdown storage," Quarterly
Progress Report 65 (1962), 187-194, MIT Research Laboratory in Electronics,
Cambridge, Massachusetts.
J . EVEY, "Application of pushdown store machines," Proceedings of the 1963 Fall
J oint Computer Conference, 215-227, AFIPS Press, Montvale, New J ersey.
N. CHOMSKY and G. A. MILLER, "Finite state languages," Information and
Control 1:2 (1958), 91-112.
Nor mal f or ms f or t he cont ext f r ee l anguages ar e due t o Chomsky (i n t he
1959 paper above) and:
S. A. GREIBACH, "A new normal form theorem for context-free phrase structure
grammars," J ournal of the Association for Computing Machinery 12:1 (1965), 42-
52.
Most of t he cl osur e pr oper t i es and sol vabl e deci si on pr obl ems f or cont ext
f r ee l anguages wer e di scover ed by Bar -Hi l l el , Per l es, and Shami r i n t he
paper ci t ed i n chapt er 3. They al so i nvent ed t he pumpi ng l emma. A
st r onger f or m of t hi s usef ul l emma i s due t o:
W. G. OGDEN, "A helpful result for proving inherent ambiguity," Mathematical
Systems Theory 2:3 (1969), 191-194.
The t ext by Hopcr of t and Ul l man i s a good pl ace t o f i nd mat er i al about
aut omat a and f or mal l anguages, as i s t he book by Lewi s and Papadi mi t r i ou.
(These wer e ci t ed i n chapt er 1.) Sever al f or mal l anguages t ext s ar e:
S. GINSBURG, The Mathematical Theory of Context-free Languages, McGraw-Hill,
New York, 1966.
M. A. HARRISON, Introduction to Formal Language Theory, Addison-Wesley,
Reading, Massachusetts, 1978.
G. E. REVESZ, Introduction to Formal Languages, McGraw-Hill, New York, 1983.
A. SALOMAA, Formal Languages, Academic Press, New York, 1973.
Formal Languages References 3
Knut h was t he f i r st t o expl or e LR(k) l anguages and t hei r equi val ence t o
det er mi ni st i c cont ext f r ee l anguages. The ear l y LR and LL gr ammar and
par si ng paper s ar e:
D. E. KNUTH, "On the translation of languages from left to right," Information
and Control 8:6 (1965), 607-639.
A. J . KORENJ AK, "A practical method for constructing LR(k) processors,"
Communications of the Association for Computing Machinery 12:11 (1969), 613-
623.
F. L. DE REMER, "Generating parsers for BNF grammars," Proceedings of the 1969
Spring J oint Computer Conference, 793-799, AFIPS Press, Montvale, New J ersey.
an d t wo books abou t comp iler d es ign ar e:
A. V. AHO and J . D. ULLMAN, Principles of Compiler Design, Addison-Wesley,
Reading, Massachusetts, 1977.
P. M. LEWIS II, D. J . ROSENCRANTZ, and R. E. STEARNS, Compiler Design Theory,
Addison-Wesley, Reading, Massachusetts, 1976.
RO M PROBLEM S
Grammars
1. Construct a grammar that defines variables and arrays for a
programming language. Add arithmetic assignment statements. Then
include labels. Make sure to explain what the productions accomplish.
2. Provide a grammar for the while, for, and case statements of our NICE
programming language. Then define blocks. (Note that at this point you
have defined most of the syntax for our NICE language.)
3. What type of grammar is necessary in order to express the syntax of the
SMALL language? J ustify your answer.
4. Build a grammar that generates Boolean expressions. Use this as part of a
grammar for conditional (or if-then-else) statements.
Language Properties
1. Design a grammar that generates strings of the form 0
n
1
m
0
n
1
m
. Provide a
convincing argument that it does what you intended. What type is it?
2. Furnish a grammar for binary numbers that are powers of two. What type
is it? Now supply one for strings of ones that are a power of two in
length. Explain your strategy.
3. Construct a grammar that generates strings of the form ww where w is a
string of zeros and ones. Again, please hint at the reasons for your
methods.
4. Show precisely that for each Type 0 language there is a Turing machine
that accepts the strings of that language.
5. Prove that the Type 1 or context sensitive languages are equivalent to the
sets accepted by linear bounded automata.
6. Prove that all types of languages are closed under the operation of string
reversal.
Regular Languages
1. Derive a regular expression for programming language constants such as
those defined in this chapter.
2. Construct a regular grammar for strings of the form 1
*
0
*
1.
3. What is the regular expression for the language generated by the regular
grammar:
S 1A | 1
A 0S | 0A | 0
4. Design a deterministic finite automaton that recognizes programming
language constants.
5. What is the equivalent regular grammar for the following finite
automaton?
Input
State
0 1
Accept?
0 0 2 no
1 1 0 yes
2 2 1 no
6. Prove that every regular set can be generated by some regular grammar.
7. Show that if epsilon rules of the form A are allowed in regular
grammars, then only the regular sets are generated.
8. A left linear grammar is restricted to productions of the form A Bc or
of the form A c where as usual A and B are nonterminals and c is a
terminal symbol. Prove that these grammars generate the regular sets.
Context Free Languages
1. Construct a context free grammar which generates all strings of the form
a
n
b
*
c
n
.
2. A nonterminal can be reached from another if it appears in some string
generated by that nonterminal. Prove that context free grammars need
not contain any nonterminals that cannot be reached from the starting
symbol.
3. Show that productions of the form A B (i.e. chain rules) need never
appear in context free grammars.
4. Produce a Chomsky Normal Form grammar for assignment statements.
5. Develop a Chomsky Normal form Grammar that generates all Boolean
expressions.
6. Express the following grammar in Chomsky Normal form.
S =VE
E +EE | -EE | V
V a | b
7. Convert the grammar of the last problem to Greibach Normal form.
8. Place our favorite grammar (S 1S0 | 10) in Greibach Normal form.
9. If you think about it, grammars are merely bunches of symbols arranged
according to certain rules. So, we should be able to generate grammars
with other grammars. Design three context free grammars which generate
grammars which are:
a. context free,
b. in Chomsky Normal form, and
c. in Greibach Normal form.
10. Prove that any set which can be accepted by a pushdown automaton is a
context free language.
11. Show that the context free languages can be accepted by deterministic
linear bounded automata. [Hint: use Greibach Normal form grammars.]
12. An epsilon move is one in which the tape head is not advanced. Show that
epsilon moves are unnecessary for pushdown automata.
Context Free Language Properties
1. Is the set of strings of the form 0
n
1
m
0
n
1
m
(for n 0) a context free
language? J ustify your conjecture.
2. Show that the set of strings of prime length is not a context free
language.
3. We found that the context free languages are not closed under
intersection. But they are closed under a less restrictive property -
intersection with regular sets. Prove this.
4. Suppose that L is a context free language and that R is a regular set. Show
that L - R is context free. What about R - L?
5. Demonstrate that while the set of strings of the form w#w (where w is a
string of a's and b's) is not a context free language, its complement is
one.
6. Select a feature of your favorite programming language and show that its
syntax is not context free.
7. Precisely work out the algorithm for deciding the emptiness problem for
context free languages. Why is your algorithm correct?
8. Show that the grammars of problems 3 and 6 generate infinite languages.
Parsing and Deterministic Languages
1. Why is the following grammar not an s-grammar? Turn it into one and
explain each step as you do it.
S AB
A aA | aC
B bC
C Bc | c
2. Develop an s-grammar for strings of the form a
n
b
n
c
m
d
m
. Show why your
grammar is an s-grammar.
3. Rewrite the following grammar as a q-grammar. Explain your changes.
E T | T+E
T x | (E)
4. Construct a q-grammar for Boolean expressions.
5. Show that every LL(1)-language is indeed a q-language.
6. Discuss the differences between pushdown automata and top-down
parsers. Can parsers be made into pushdown automata?
7. Show that the class of languages accepted by deterministic pushdown
automata is closed under complement.
8. Construct the tables for an LR parser that recognizes arithmetic
assignment statements. (Modify the grammar provided in this chapter to
include endmarkers.)
9. Outline a general method for converting LR parsers to deterministic
pushdown automata (with epsilon moves).