100% found this document useful (1 vote)
1K views

Introduction To Languages and The Theory of Computation

This document is the preface to the third edition of the textbook "Introduction to Languages and the Theory of Computation" by John C. Martin. It provides brief biographical information about the author, including that he attended Rice University and received his PhD in 1971, taught at the University of Hawaii, and is currently an associate professor of computer science at North Dakota State University. The preface also includes a table of contents that lists the chapters in the book on topics like mathematical notation, regular languages, regular expressions, and finite automata.

Uploaded by

Arham Imran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
1K views

Introduction To Languages and The Theory of Computation

This document is the preface to the third edition of the textbook "Introduction to Languages and the Theory of Computation" by John C. Martin. It provides brief biographical information about the author, including that he attended Rice University and received his PhD in 1971, taught at the University of Hawaii, and is currently an associate professor of computer science at North Dakota State University. The preface also includes a table of contents that lists the chapters in the book on topics like mathematical notation, regular languages, regular expressions, and finite automata.

Uploaded by

Arham Imran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 568

John Martin

rear y

! > - -

i ! ;
; : bd
Fs t

'

= a

es

'
ri

*
| ee i
os"7, ( Hx
a ae
a
Pes


MW
Introduction to Languages
and The Theory of
Computation
Third Edition

John C. Martin
North Dakota State University

Boston
2
Burr Ridge, IL Dubuque, !A Madison, WI New York San Francisco St. Louis
Bangkok Bogoté Caracas KualaLumpur Lisbon London Madrid Mexico City
Milan Montreal New Delhi Santiago Seoul Singapore Sydney Taipei Toronto
McGraw-Hill Higher Education 3
A Division of The McGraw-Hill Companies

INTRODUCTION TO LANGUAGES AND THE THEORY OF COMPUTATION


THIRD EDITION

Published by McGraw-Hill, a business unit of The McGraw-Hill Companies, Inc., 1221


Avenue of the Americas, New York, NY 10020. Copyright © 2003, 1997, 1991 by The
McGraw-Hill Companies, Inc. All rights reserved. No part of this publication may be
reproduced or distributed in any form or by any means, or stored in a database or retrieval
system, without the prior written consent of The McGraw-Hill Companies, Inc., including,
but not limited to, in any network or other electronic storage or transmission, or broadcast for
distance learning.

Some ancillaries, including electronic and print components, may not be available to
customers outside the United States.

This book is printed on acid-free paper.

International 1 234567890 QPF/QPF098765432


Domestic 234567890 QPF/QPF 09876543

ISBN 0-07-232200-4
ISBN 0-07-119854-7 (ISE)

Publisher: Elizabeth A. Jones


Developmental editor: Melinda Dougharty
Executive marketing manager: John Wannemacher
Lead project manager: Peggy J. Selle
Lead production supervisor: Sandy Ludovissy
Lead media project manager: Audrey A. Reiter
Designer: K. Wayne Harms
Cover/interior designer: Rokusek Design
Cover image: Ryoichi, Utsumi (IMA) Photonica
Compositor: Techsetters
Typeface: 10/12 Times Roman
Printer: Quebecor World Fairfield, PA

Library of Congress Cataloging-in-Publication Data


Martin, John C.
Introduction to languages and the theory of computation / John Martin.—3rd ed.
p. cm.
Includes index.
ISBN 0-07-232200-4— ISBN 0-07-119854-7
1. Sequential machine theory. 2. Computable functions. I. Title.

QA267.5.S4 M29 2003


511.3—de21 2002070865
le
INTERNATIONAL EDITION ISBN 0-07-119854-7
Copyright © 2003. Exclusive rights by The McGraw-Hill Companies, Inc., for manufacture
and export. This book cannot be re-exported from the country to which it is sold by
McGraw-Hill. The International Edition is not available in North America.

www.tmhhe.com
‘Aoneea aig tr mt
er TEOCBA

sanuncagabo sia bess in oP babnis sivheka 2) ot


PPnt GLatin to GV) ad ae Fis Get serpent OS
; MF grist) aythhad fefalcoivet! ai tevru Evie Aakemate AS
nnn i? aacidacontelid§ beenMame in ad oe oe ‘vis eortaity To PIPPA ’

= 7 i. :
teat Notairan : Rag, or 2- XL SB .

aria Finite ‘an rhs

gavernt ¥

a4: Mathematica Phjawt- i q ; ”

fon:
P pups <9 rf’ i fr if iy i ‘
p Preneurcs;-
Reldtigns” 22
i jis +) iy]

wu -
‘ Cigars 5
PRiemeas 12
a haiy CLS Uripiite Pinter

ioncetlesr iriver

Mat a
Z.
a Pls 7 ind Ale one The Oren

matical Indectioe a u ae srt Site


diResursive Oetiiitiense 4
hus Get } wi ST Tae
i Pio ee Je t 3 i
7 Fite Prineipic of Mathe: Letical latest Ni =: mr ; 3
. =
BeheBrann Princip Sm x4 sft Teas] i

Tim SS
marge Rielinim ia
pokes ft» =

Reguiar and Monresuie


LAiguagarm ‘ z

p, Se 227) 4
I

oe
———
v
ABOUT THE AUTHOR

John C. Martin attended Rice University both as an undergraduate and as a graduate


student, receiving a B.A. in mathematics in 1966 and a Ph.D. in 1971. He taught for
two years at the University of Hawaii in Honolulu before joining the faculty of North
Dakota State University, where he is an associate professor of computer science.
CONTENTS

Preface ix
la acl
Introduction xi
Regular Languages
and Finite Automata 83

Beso iy) CHAPTERS


Mathematical Notation Regular Expressions
and Techniques 1 and Finite Automata 85
3.1 Regular Languages and Regular
CHAPTER1 Expressions 85
Basic Mathematical Objects 3 3.2 The Memory Required to Recognize a
Language 90
i octs 3
3.3 Finite Automata 95
2. Logic--9
3.4 Distinguishing One String from Another 105
1.3. Functions 17
3.5 Unions, Intersections, and Complements 109
1.4 Relations 22
Exercises 112
1.5 Languages 28
More Challenging Problems 118
Exercises 32
More Challenging Problems 39
CHAPTER
Nondeterminism
CHAPTER@ and Kleene’s Theorem 123
Mathematical Induction
4.1 Nondeterministic Finite Automata 123
and Recursive Definitions 43
4.2 Nondeterministic Finite Automata
2.1 Proofs 43 with A-Transitions 133
2.2 The Principle of Mathematical Induction 48 4.3 Kleene’s Theorem 145
2.3 The Strong Principle of Mathematical Exercises 156
Induction 55 More Challenging Problems 164
2.4 Recursive Definitions 58
2.5 Structural Induction 66 CHAPTERD
Exercises 72 Regular and Nonregular
More Challenging Problems 77 Languages 168
5.1 ACriterion for Regularity 168
5.2 Minimal Finite Automata 175
vi Contents

5.3 The Pumping Lemma for Regular CHAPTERS


Languages 180 Context-Free and
5.4 Decision Problems 186 Non-Context-Free Languages 297
5.5 Regular Languages and Computers 189 8.1 The Pumping Lemma for Context-Free
Exercises 191 Languages 297
More Challenging Problems 196 8.2 Intersections and Complements
of Context-Free Languages 306
8.3. Decision Problems Involving Context-Free
Languages 311
Exercises 3
ees) tl|
More Challenging Problems 314
Context-Free Languages
and Pushdown Automata 201

CHAPTER6 ie TAV
Context-Free Grammars 203 Turing Machines and Their
Languages 317
6.1 Examples and Definitions 203
6.2 More Examples 210
CHAPTERY
6.3 Regular Grammars ZAG)
Turing Machines 319
6.4 Derivation Trees and Ambiguity 220
6.5 An Unambiguous CFG for Algebraic 9.1 Definitions and Examples 319
Expressions 226 9.2 Computing a Partial Function with a
6.6 Simplified Forms and Normal Forms 232) Turing Machine 328
Exercises 240 9.3. Combining Turing Machines 332
More Challenging Problems 247 9.4 Variations of Turing Machines:
Multitape TMs 337
9.5 Nondeterministic Turing Machines 341
CHAPTERT 9.6 Universal Turing Machines 347
Pushdown Automata 251 9.7 Models of Computation and the
7.1 Introduction by Way of an Example 251 Church-Turing Thesis 352
Exercises 354
7.2 The DefinitionofaPushdownAutomaton 255
More Challenging Problems 361
7.3 Deterministic Pushdown Automata 260
7.4 APDA Corresponding to a Given Context-Free
Grammar 265 CHAPTER1O
7.5 A Context-Free Grammar Corresponding to a Recursively Enumerable
Given PDA 273 Languages 365
7.6 Parsing 280
10.1 Recursively Enumerable and Recursive 365
Exercises 290
10.2 Enumerating a Language 368
More Challenging Problems 295
10.3. More General Grammars 371
Contents vii

10.4 Context-Sensitive Languages and the 12.6 Nonnumeric Functions, and Other Approaches
Chomsky Hierarchy 380 to Computability 470
10.5 Not All Languages are Recursively Exercises 474
Enumerable 387 More Challenging Problems 477
Exercises 397
More Challenging Problems 401
uuopoction \Val
Introduction to
Computational
PART V Complexity 479
Unsolvable Problems and
Computable Functions 405 CHAPTERIS
Measuring and Classifying
CHAPTER11 Complexity 481
Unsolvable Problems 407 13.1 Growth Rates of Functions 481
11.1. A Nonrecursive Language and an Unsolvable 13.2 Time and Space Complexity of a Turing
Problem 407 Machine 486
11.2 Reducing One Problem to Another: The 13.3 Complexity Classes 492
Halting Problem 411 Exercises 497
11.3 Other Unsolvable Problems Involving More Challenging Problems 499
TMs 416
11.4 Rice’s Theorem and More Unsolvable
CHAPTER 14
Problems 419
Tractable and Intractable
11.5 Post’s Correspondence Problem 422
Problems 500
11.6 Unsolvable Problems Involving Context-Free
Languages 430 14.1 Tractable and Possibly Intractable Problems:
Pand NP 500
Exercises 436
14.2 Polynomial-Time Reductions and
More Challenging Problems 439
NP-Completeness 506
14.3 Cook’s Theorem 510
CHAPTERI2 14.4 Some Other VP-Complete Problems Si7,
Computable Functions 442 Exercises 522
12.1 Primitive Recursive Functions 442 More Challenging Problems 524
12.2 Primitive Recursive Predicates and Some
Bounded Operations 451 References 52/
12.3 Unbounded Minimalization and ju-Recursive
Functions 459 Bibliography 529
12.4 Gédel Numbering 461 Index of Notation 531
12.5 All Computable Functions Are Index 535
p-Recursive 465
av Derik a

Aig hired see eekariee! al ” quibate 4 avin mets


*
pein OTS: -rillidvanyne’ Fa 6 5» Athenians kate © =F”
Rs ry “ ian r et i hut J nd oe sleet dart i
’ nS) Mey eps pty arr ead { -
= aidngetevg
’ ‘7 a aa
i fan!

H
sah
A 4 -ime SW antes
) : Vee thie pofei X y : 1 a: etre aaipne eth»
7 : : é 4 bi S ee
At An " m nig Sia :
Of. noiouboval . =) ’
<b).' Gap eae ed oa
jenonaiugmon —— —

poet~em s
By: sine EID eee terse : THA
bas amolders
Seen aang
is tan reer i -nekoraa >
f ae
>a
a0
Pat Poe. an pifasecalh —
~~
——
ne *< a bon.
(Sh yitnel fet)<2
es
goateG Me
Day ee Furirose 3OF ametiang.9e
|
4 BE & le Yuasigme.d songe fine sail $F re re ae iie Hi ays iplicseiconie & ah ‘
by aes ANG [MSA MBSR iniioald ; ts PMI So) cae, a
raseet Ean; TP:. she ul pity Skeyrton 3 Cb nil in = ” script ie ovata’
ad Rega: ¢Gretigee JP ren aie odes tie ger eanet
:" os)Eine reas) er boyd» bods pyiet toa aangee nlf tk:
ah eect gi a CFS oy ee
ee 1 eS - 3 ~ 5" ne ite aaT. mae

_ ; mp 286. Bru ident?1 ee eymet cs


wes Mapih igs ae
aeta ibis Eliassen a B. .aealioteve
doe « maison? Che wal ee *
oe sideanon wed mri Aeall senvandenthaidnvioenid
Prat aes b
cm 4}kiakne Ao: Lb } = eau Fo

O82 A haa 4 F aa

o He Ole.iii Raat ay ee
par
= aneirubealt SnueTuitnesn se eee ="
i a ’ A aid 7 Ve ss eae te >

wan mumbo Fes Eb a: e a a


4 pins “ ae iene AY, he EP Stent i bbe
a a
on at usher spin fag Ld oe Se
ea ii Piriete vin Ala
a ni aie Maliska)rn 9 he iesi ve
be sree Melee - und u b pes: ti i a < aya
sang hey
vee}
ae Thamoer 6S.
394
:
ACoaeAN,FiseG riers
Cees
LO oRMR
epmnateh ‘y
ay

OF) 858 witgyepoheia


, = oe a ie+ yey ; ia
7 Le, a - oy nolinton to xebnt
“i ‘Riss —_
i als agit)
348° Soe
eee nabnt
iia Cah buns tele
in =
oes ape
a iia
- ee
PREFACE

his book is an introduction to the theory of computation. It emphasizes


formal languages, models of computation, and computability, and it includes
an introduction to computational complexity and NP-completeness.
Most students studying these topics have already had experience in the practice
of computation. They have used a number of technologies related to computers; now
they can begin to acquire an appreciation of computer science as a coherent discipline.
The ideas are profound—and fun to think about—and the principles will not quickly
become obsolete. Finally, students can gain proficiency with mathematical tools and
formal methods, at the same time that they see how these techniques are applied to
computing.
I believe that the best way to present theoretical topics such as the ones in this
book is to take advantage of the clarity and precision of mathematical language—
provided the presentation is accessible to readers who are still learning to use this
language. The book attempts to introduce the necessary mathematical tools gently
and gradually, in the context in which they are used, and to provide discussion and
examples that make the language intelligible. The first two chapters present the
topics from discrete mathematics that come up later, including a detailed discussion
of mathematical induction. As a result, the text can be read by students without a
strong background in discrete math, and it should also be appropriate for students
whose skills in that area need to be consolidated and sharpened.
The organizational changes in the third edition are not as dramatic as those in the
second. One chapter was broken up and distributed among the remaining fourteen,
and sections of several chapters were reworked and rearranged. In addition to changes
in organization, there were plenty of opportunities throughout to rewrite, to correct
proofs and examples and make them easier to understand, to add examples, and to
replace examples by others that illustrate principles better. Some exercises have
been added, some others have been modified, and the exercises in each chapter have
been grouped into ordinary ones and more challenging ones. In the Turing machine
chapter, I have followed the advice of two reviewers in adopting a more standard and
more intuitive definition of halting.
Whether or not Part I is covered in detail, I recommend covering Section 1.5,
which introduces notation and terminology involving languages. It may also be
desirable to review mathematical induction, particularly the sections on recursive
definitions and structural induction and the examples having to do with formal lan-
guages. At North Dakota State, the text is used in a two-semester sequence required
of undergraduate computer science majors, and there is more than enough material
for both semesters. A one-semester course omitting most of Part I could cover regular
and context-free languages, and the corresponding automata, and at least some of the
theory of Turing machines and solvability. In addition, since most of Parts IV, V, and
Preface

VI are substantially independent of the first three parts, the text can also be used in a
course on Turing machines, computability, and complexity.
I am grateful to the many people who have helped me with all three editions of
this text. Particular thanks are due to Ting-Lu Huang, who pointed out an error in the
proof of Theorem 4.2 in the second edition, and to Jonathan Goldstine, who provided
several corrections to Chapters 7 and 8. I appreciate the thoughtful and detailed com-
ments of Bruce Wieand, North Carolina State University; Edward Ashcroft, Arizona
State University; Ding-Zhu Du, University of Minnesota; William D. Shoaff, Florida
Institute of Technology; and Sharon Tuttle, Humboldt State University, who reviewed
the second edition, and Ding-Zhu Du, University of Minnesota; Leonard M. Faltz,
Arizona State University; and Nilfur Onder, Michigan Tech, who reviewed a prelim-
inary version of this edition. Their help has resulted in a number of improvements,
including the modification in Chapter 9 mentioned earlier. Melinda Dougharty at
McGraw-Hill has been delightful to work with, and'I also appreciate the support and
professionalism of Betsy Jones and Peggy Selle. Finally, thanks once again to my
wife Pippa for her help, both tangible and intangible.
John C. Martin
INTRODUCTION

n order to study the theory of computation, let us try to say what a computation
is. We might say that it consists of executing an algorithm: starting with some
input and following a step-by-step procedure that will produce a result. Exactly
what kinds of steps are allowed in an algorithm? One approach is to think about
the steps allowed in high-level languages that are used to program computers (C, for
example). Instead, however, we will think about the computers themselves. We will
say that a step will be permitted in a computation if it is an operation the computer
can make. In other words, a computation is simply a sequence of steps that can be
performed by a computer! We will be able to talk precisely about algorithms and
computations once we know precisely what kinds of computers we will study.
The computers will not be actual computers. In the first place, a theory based on
the specifications of an actual piece of hardware would not be very useful, because
it would have to be changed every time the hardware was changed or enhanced.
Even more importantly, actual computers are much too complicated; the idealized
computers we will study are simple. We will study several abstract machines, or
models of computation, which will be defined mathematically. Some of them are as
powerful in principle as real computers (or even more so, because they are not subject
to physical constraints on memory), while the simpler ones are less powerful. These
simpler machines are still worth studying, because they make it easier to introduce
some of the mathematical formalisms we will use in our theory and because the
computations they can carry out are performed by real-world counterparts in many
real-world situations.
We can understand the “languages” part of the subject by considering the idea of
a decision problem, a computational problem for which every specific instance can
be answered “yes” or “no.” A familiar numerical example is the problem: Given a
positive integer n, is it prime? The number n is encoded as a string of digits, and a
computation that solves the problem starts with this input string. We can think about
this as a language recognition problem: to take an arbitrary string of digits and deter-
mine whether it is one of the strings in the language of all strings representing primes.
In the same way, solving any decision problem can be thought of as recognizing a
certain language, the language of all strings representing instances of the problem for
which the answer is “yes.” Not all computational problems are decision problems,
and the more powerful of our models of computation will allow us to handle more
general kinds; however, even a more general problem can often be approached by
considering a comparable decision problem. For example, if f is a function, being
able to answer the question: given x and y, is y = f(x)? is tantamount to being
able to compute f(x) for an arbitrary x. The problem of language recognition will
be a unifying theme in our discussion of abstract models of computation. Comput-
ing machines of different types can recognize languages of different complexity, and

xi
xii Introduction

the various computation models will result in a corresponding hierarchy of language


types.
The simplest type of abstract machine we consider is a finite automaton, or finite-
state machine. The underlying principle is a very general one. Any system that is at
each moment in one of a finite number of discrete states, and moves among these states
in a predictable way in response to individual input signals, can be modeled by a finite
automaton. The languages these machines can recognize are the regular languages,
which can also be described as the ones obtained from one-element languages by
repeated applications of certain basic operations. Regular languages include some that
arise naturally as “pieces” of programming languages. The corresponding machines
in software form have been applied to various problems in compiler design and text
editing, among others.
The most obvious limitation of a finite automaton is that, except for being able
to keep track of its current state, it has no memory. As you might expect, such a
machine can recognize only simple languages. Context-free languages allow richer
syntax than regular languages. They can be generated using context-free grammars,
and they can be recognized by computing devices called pushdown automata (a
pushdown automaton is a finite automaton with an auxiliary memory in the form
of a stack). Context-free grammars were used originally to model properties of
natural languages like English, which they can do only to a limited extent. They are
important in computer science because they can describe much of the syntax of high-
level programming languages and other related formal languages. The corresponding
machines, pushdown automata, provide a natural way to approach the problem of
parsing a statement in a high-level programming language: determining the syntax
of the statement by reconstructing the sequence of rules by which it is derived in the
context-free grammar.
Although the auxiliary memory makes a pushdown automaton a more powerful
computing device than a finite automaton, the stack organization imposes constraints
that keep the machine from being a general model of computation. A Turing machine,
named for the English mathematician who invented it, is an even more powerful
computer, and there is general agreement that such a machine is able to carry out
any “step-by-step procedure” whatsoever. The languages that can be recognized
by Turing machines are more general than context-free languages, and they can be
produced by more general grammars. Moreover, since a Turing machine can print
output strings as well as just answering yes or no, there is in principle nothing to stop
such a machine from performing any computation that a full-fledged computer can,
except that it is likely to do it more clumsily and less efficiently.
Nevertheless, there are limits to what a Turing machine can do; since we can
describe this abstract model precisely, we can formulate specific computational prob-
lems that it cannot solve. At this point we no longer have the option of just coming
up with a more powerful machine—there are no more powerful machines! The exis-
tence of these unsolvable problems means that the theory of computation is inevitably
about the limitations of computers as well as their capabilities.
Finally, although a Turing machine is clumsy in the way it carries out com-
putations, it is an effective yardstick for comparing the inherent complexity of one
Introduction

computational problem to that of another. Some problems that are solvable in princi-
ple are not really solvable in practice, because their solution would require impossible
amounts of time and space. A simple criterion involving Turing machines is generally
used to distinguish the tractable problems from the intractable ones. Although the
criterion is simple, however, it is not always easy to decide which problems satisfy
it. In the last chapter we discuss an interesting class of problems, those for which no
one has found either a good algorithm or a convincing proof that none exists.
People have been able to compute for many thousands of years, but only very
recently have people made machines that can, and computation as a pervasive part
of our lives is an even more recent phenomenon. The theory of computation is
slightly older than the electronic computer, because some of the pioneers in the
field, Turing and others, were perceptive enough to anticipate the potential power of
computers; their work provided the conceptual model on which the modern digital
computer is based. The theory of computation has also drawn from other areas:
mathematics, philosophy, linguistics, biology, and electrical engineering, to name a
few. Remarkably, these elements fit together into a coherent, even elegant, theory,
which has the additional advantage that it is useful and provides insight into many
areas of computer science.
ae ak fier ire 1. _ Wires! as di ;
, : F iy
' \ bs S i
} & 6 el
eee
: = A i eae ia
7
as

‘ siagheaisivkieSTicaen tersadiebioih omni shone


Stiteinan ssiups bln Re ttHox oe avueasd usenet %
vila Binoy ai esanitne cen gk ty H sgecabietion peendneny romviquane esee
a
Sluigcpriiih:. Fas ait at sdieogt orti ra a ougadri¢ ‘
egyee ries srryy Thatre Centeeat hry akeantnoenpebiats rainy
Senchss we eroth) wenifeeng ites arabepoteetegi eek ak pantera re
ATI REA Ie) FAL) ear eeay servis TONS 7p + nergy {ies mini
wae iy
fine, tat heed tule ube oor ae tO te epee ena"aie hati
ing Paina ats sees yerneean { thea se ea. 3Ge liter anor gett
aii RERI durian 26; -eadealt eit prea saeectaN they Petey. Base —
mE cioshory ay hrooeee igabod \ateissees wwe ne vane
idczow Cg, TREND ae) saqiot Mi UH |>hee ovine opr
4
- Gaifiserisbens cnlirtnidenedebath nen<i antedonatee
Hanious esrizo! jour evastaieoe5 mrnaecele hoy na ies afc
gerhin iaguierssiqny Lattoisbp bre. wavintd 2satalagn’ wl wiemnales.. ay retin
<i eaten mee a nod:rile rise te ake aa 6 xeechegemy vad
ae enn dtiviennsaberden Die teliey To) ih epuanvket buetiiies- qian giperly
powPeary Ginter), FA eh “aetinel ‘ A att pepe te
ww ALA). “SA tNVELAP I =) "i ery fi) oom _

parura: hupsxidaswes Lng wt ib [Tey wu

sqrt UT. 3h pA r ty a Sy « Trine Gre ~


legless rurmypine | pes eal v4) tod 4: 9} Viet «
Iecrurice pushie cpals wt ody We SEG ik

pursitis @| (Sees nd ii by ' 1 aTrNo iy.
ea rig ;
OF Ties Koateteme ist Yyy Coe ne ra P vithey, v6
CONE 31 ree: PAW ae
Aliyah | lary Pe ' oe
cur prittioe ievice' than hla mn, (ie dh Aten te
itvar heap Hi > Oipa lite fek ‘ ; ‘ Ot eee |

i Hew ios Whe Ht a & fai ) Aa prin Aye 7 =

corp 5 nial ineaye as i sire ot Tee ey

any "hte i Alen. Pecedere” whardties Nhe Focy


ty Tree trial bie tee We ; huin cammvat A tnd

sbpart atvingw
aN well a& Jibot wissen yew OF Ute ip paar
such 4 fitachine fr PeTTO LEP CUNT Cre “ie ang |
exept hol it'ts Vhoity’ ws douf mine chumienihy gud patente
Nevortian LP ve A Theis sent he 7 ala hale G
ehenuvit hemi eS sires HX vn care Geel Mn ey leben

je Aw Heat { Cele - 1 GR erat vee ne Were” de, "


Ud Wid ADR Perayeetad yr Wie, Part
ce te Bea ee ae
iwaeuneaivrdily perkdonw "eit py Ufeah erates la -
el At

An tho lignes ery es well ap ees one


ris j wodieh « Te nt coma A, ealide parva
rs t25t ; fete t Hat ice bad hy ns i
ad an .
Mathematical Notation
and Techniques

his textbook starts by reviewing some of the most fundamental mathematical


ideas: sets, functions, relations, and basic principles of logic. Later in the book
we will study abstract “machines”; the components of an abstract machine are sets,
and the way the machine works is described by a function from one set to another. In
the last section of Chapter 1, we introduce languages, which are merely sets whose
elements are strings of symbols. The notation introduced in this section will be
useful later, as we study classes of languages and the corresponding types of abstract
machines.
Reasoning about mathematical objects involves the idea of a proof, and this is the
subject of Chapter 2. The emphasis is on one particular proof technique—the principle
of mathematical induction—which will be particularly useful to us in this book. A
closely related idea is that of an inductive, or recursive, definition. Definitions of this
type will make it easy to define languages and to establish properties of the languages
using mathematical induction. @
nolisioy isoitemartie
eeupintss! bre - —~

=S— = 7 a a .
_—
(eatery s({ tests Kesdiyy Mbit teu Set Wa ayant woe ii) prude Pevityy) «jit! .

Bogs S41)ft)talad -Jngol Yo rolqlantey Sune tnd poets: aol} eres Sasbi"
ASF Sui OnuGRiT Tastee. £4 Vo aE a rth
at) o>
eortiee
ely
aes dha line
hate nate ;
te SR AA isa a
ft Soicha
err cen ce ee tf hoe a nethiag rt
Sa0nw i 53 : ch Beal .

od Mit naltiad ditt i Bosubeini noe S| hore Io. GaSe oe


$0 srry yrtioniccey rics
Beige sit bine sngearginat
a0 We coed! ghee ow ke at

lex1G i Bree ;an} vrei evasion lacie ished une


od 2) eciia bas

ity argo
aul Hibiey) Poe qalos nattsete oe aN at 2
Sinisa
& Anod sill a) tc) leteey ¢hotoinuy 9h Oe oe ea gee ie

GLO eMOLINA!) funsfeToh 9 acer ae oy Ade nas igh af ee


Peal ayMaitul oi TONG Mi tidkiaa of Tate Pegegha) ey tudes Foes, eer

- : SG comin laphany

i
os .

Pay.
:
7 >,


:

we

=
See me
C HAPTER

Basic Mathematical Objects

1.11|SETS
A set is determined by its elements. An easy way to describe or specify a finite set is
to list all its elements. For example,
A = {I11, 12, 21, 22}
When we enumerate a set this way, the order in which we write the elements is
irrelevant. The set A could just as well be written {11,21, 22,12}. Writing an
element more than once does not change the set: The sets {11, 21, 22, 11, 12, 21} and
101, 21,22, 12} are the same.
Even if a set is infinite, it may be possible to start listing the elements in a way
that makes it clear what they are. For example,
B= Or On ao eee
describes the set of odd integers greater than or equal to 3. However, although this
way of describing a set is common, it is not always foolproof. Does {3,5, 7, ...}
represent the same set, or does it represent the set of odd primes, or perhaps the set
of integers bigger than 1 whose names contain the letter “e”’?
A precise way of describing a set without listing the elements explicitly is to give
a property that characterizes the elements. For example, we might write
B = {x |x is an odd integer greater than 1}

or
A = {x |x is a two-digit integer, each of whose digits is 1 or 2}
The notation “{x|” at the beginning of both formulas is usually read “the set of all x
such that.”
To say that x is an element of the set A, we write
xeaA
PART 1 Mathematical Notation and Techniques

Using this notation we might describe the set C = {3, 5, 7, 9, 11} by writing

Gi {x |x € Band x <= 11}

A common way to shorten this slightly is to write

= {ye 5) |x= 15}

which we read “‘the set of x in B such that x < 11.”


It is also customary to extend the notation in a different way. It would be rea-
sonable to describe the set

= {x | there exist integers i and j, both > 0, with x = 3i + 77}

“the set of numbers 37 + 77, where i and j are pore nonnegative integers,” and a
concise way to write this is

= {31 + 7j | i, j are nonnegative integers}

Once we define N to be the set of nonnegative integers, or natural numbers, we can


describe D even more concisely by writing

= {3i+7j|i, jen}

For two sets A and B, we say that A is a subset of B, and write A C B, if every
element of A is an element of B. Because a set is determined by its elements, two
sets are equal if they have exactly the same elements, and this is the same as saying
that each is a subset of the other. When we want to prove that A = B, we will need
to show both statements: that A C B and that B C A.
The complement of a set A is the set A’ of everything that is not an element of
A. This makes sense only in the context of some “universal set” U containing all the
elements we are discussing.
={xeU|x¢A}
Here the symbol ¢ means “is not an element of.” If U is the set of integers, for
example, then {1, 2}’ is the set of integers other than 1 or 2. The set {1, 2}’ would be
different if U were the set of all real numbers or some other universe.
Two other important operations involving sets are union and intersection. The
union of A and B (sometimes referred to as “A union B”’) is the set

AUB={x|x €Aorx e€ B}

and the intersection of A and B is

AN B={x|x €Aandx e€ B}

For example,

{1, 2,3, 4)O42, 47678} = fl.


2: 3476" 8}
{1, 2, 3, 4} M {2, 4, 6, 8} = {2, 4}
CHAPTER 1 Basic Mathematical Objects

We can define another useful set operation, set difference, in terms of intersections
and complements. The difference A — B is the set of everything in A but not in B.
In other words,

A—B={x|x eAandx ¢ B}
= {x |x EeA}N {x |x ¢ B}
v/a 3h
For example,

{1,.2;3,4}—{2,4, 6,8}. = {1,3}


Note that the complement of any set A is the same as U — A, where U is the universe.
Introducing a special set J called the empty set, or the set containing no elements,
often makes it easier to write equations involving sets. For example, the formula
A B = @ expresses the fact that A and B are disjoint, or have no elements in
common. If C is a collection of subsets of a set, we say that the elements of C are
pairwise disjoint if for any two distinct elements A and B of C, ANB =9Q. (A
note on terminology: “Distinct” just means “different.” The phrase “three distinct
elements” means three elements, no two of which are equal. We do not say “one
distinct element.’’)
Once we have the union, intersection, difference, and complement operations,
we can define sets using arbitrarily complicated formulas. It will be useful to have
some basic rules for manipulating and simplifying such formulas. Here is a list of
some standard set identities. In all these rules, A, B, and C represent arbitrary sets.
As before, U represents the universe and 9 is the empty set.
The commutative laws:

AWB =BIOA (1.1)


ANB=Bh\A (1.2)
The associative laws:
ALI BUCY (AUB) UC (1.3)
AN(BIVC) =A B) NC (1.4)
The distributive laws:
AWE NEY =(AUB) WAUC) (1.5)
ACB OC c= (AMsBy Os At VE) (1.6)
The idempotent laws:
AUA=A (1.7)
ANA=A (1.8)

The absorptive laws:


AU(ANB)=A (1.9)
AN(AUB)=A (1.10)
PART 1 Mathematical Notation and Techniques

The De Morgan laws:


(AUB) =A’NB’ (1.11)
(AN BY =A’ UB’ (1.12)
Other laws involving complements:
(A =A (1.13)
ANA’ =9 (1.14)
AUA’=U (1.15)
Other laws involving the empty set:
AUS=A (1.16)
ANG=H (1.17)
Other laws involving the universal set:

reel Usk Ui (1.18)


AS == A (1.19)

To illustrate how identities of this type might be proved, let us give a proof of
(1.12), the second De Morgan law. Since (1.12) asserts that two sets are equal, we
will show that each of the two sets is a subset of the other.

To show (AM B)’ C A’ U B’, we must show that every element of (AN B)’ is an element
of A’ U B’. Let x be an arbitrary element of (AM B)’. Then by definition of complement,
x ¢ ANB. By definition of intersection, x is not an element of both A and B; therefore,
either x ¢ A or x ¢ B. Thus, x € A’ or x € B’,andsox € A’UB’.
To show A’U B’ C (AN B)’, letxbe any element of A’U B’. Thenx € A’ orx € B’.
Therefore, either x ¢ A or x ¢ B. Thus, x is not an element of both A and B, and so
x € ANB. Therefore, x € (AN B)’.

In order to visualize a set that is formed from primitive sets by using the set
operations, it is often helpful to draw a Venn diagram. The ideais to draw a large region
representing the universe and within that to draw schematic diagrams of the primitive
sets, overlapping so as to show one region for each membership combination. (This
may be difficult to do when there are more than three primitive sets; just as in our
list of identities above, however, three is usually enough.) If we shade the primitive
regions differently, the set we are interested in can be identified by the appropriate
combination of shadings.
In the case of two sets A and B, the basic Venn diagram is shown in Figure 1.1a.
The four disjoint regions of the picture, corresponding to the sets (A U B)’, A — B,
B — A, and A B, are unshaded, shaded one way only, shaded the other way only,
and shaded both ways, respectively. Figure 1.2 shows an unshaded Venn diagram for
three sets. For practice, you might want to label each of the eight subregions, using
a formula involving A, B, C, and appropriate set operations.
Venn diagrams can be used to confirm the truth of set identities. For example,
Figure 1.1b illustrates the set (A MN B)’, the left side of identity (1.12). It is obtained
CHAPTER 1 Basic Mathematical Objects

OOCCCO
proreeeanesatatek

Figure 1.1 |
(a) A basic Venn diagram; (b), (c) The second De Morgan identity.

A B ‘

GE

Figure 1.2]
A three-set Venn diagram.

by shading all the regions that are not shaded twice in Figure 1.1a. On the other hand,
Figure 1.1c shows the two sets A’ and B’, shaded in different ways. The union of
these two sets (the right side of (1.12)) corresponds to the region in Figure 1.1c that
is shaded at least once, and this is indeed the region shown in Figure 1.1b.
The symmetric difference A @ B of the two sets A and B is defined by the formula

A®B=(A-—8B)U(B-A)
and corresponds to the region in Figure |.la that is shaded exactly once. In Exercise
1.5, you are asked to use Venn diagrams to verify that the symmetric difference
operation satisfies the associative law:

A®(BOC)=(AGB)SC
PART 1 Mathematical Notation and Techniques

When Venn diagrams are used properly they can provide the basis for arguments
that are both simple and convincing. (In Exercise 1.48 you are asked to show the
associative property of symmetric difference without using Venn diagrams, and you
will probably decide that the diagrams simplify the argument considerably.) Never-
theless, a proof based on pictures may be misleading, because the pictures may not
show all the relevant properties of the sets involved. Because of the limitations of
Venn diagrams in reasoning about sets, it is also important to be able to work with
the identities directly. As an illustration of how these identities can be applied, let us
simplify the expression A U (B — A):

AU(B—A)=AU(BNA’) (by definition of the — operation)


=(AUB)N(AUA’) (by (1.5))
=(A.W BB) 4 (by 15)
=A-O-B..(by (1.19)).
Rules (1.3) and (1.4), the associative laws for union and intersection, allow us
in effect to apply these operations to more than two sets at once. According to (1.3),
the two sets A U (B UC) and (A U B) UC are equal, so that we might as well write
AU BUC for either one. Furthermore, it is easy to see that this set can be described
as follows:

AUBUC={x|xeAorxe
Borx eC}
= {x |x is an element of at least one of the sets A, B, and C}

More generally, if A;, Az, ... are sets, we can write


n

U4
fool
to mean the set {x |x € A; for at least one iwith 1 <i < n}; and
co

Lai = {x |x € A; for at least


one i > 1}
ZI

Using (1.4), the associative law for intersection, we may write


n

(Ai = {x | x € A; for every


i with | <i <n}
=

and so forth. Still more generally, if P(i) is some condition involving i,

Ua
P(i)
means the set {x | x € A; for at least one i satisfying P(i)}. In Chapter 4 we will
encounter a set with the slightly intimidating formula

LJ 8(p,a)
ped*(q,x)
CHAPTER 1 Basic Mathematical Objects

We do not need to know what the sets 6*(q, x) and 5(p, a) are in order to understand
that

U 5(p, a) = {x| x € 6(p, a) for at least one element p of 5*(q, x)}


ped*(q,x)

If 6*(q, x) were {r, s, t}, for example, this formula would give us 5(r, a) Ud(s,a) U
O(t, a).
The elements of a set can be sets themselves. For any set A, the set of all subsets
of A is referred to as the power set of A, which we shall write 24. The reason for this
terminology and this notation is that if A has n elements, then 24 has 2” elements.
To illustrate, suppose A = {1, 2, 3}. Then

= {@, {1}, {2}, {3}, (1, 2}, {1, 3}, (2, 3}, (1, 2, 3}}
Notice that 4 and A are both elements: The empty set is a subset of every set, and
every set is a subset of itself.
Here is one more important way to construct a set from given sets. For any two
sets A and B, we may consider a new set called the Cartesian product of A and B,
written A x B and read “A cross B.” It is the set of all ordered Pairs (a, b), where
aceAandbe B.

Ax B={(a,b)|a€Aandbe
B}

The word ordered means that the pair (a, b) is different from the pair (b, a), unless
a and b happen to be the same. If A has n elements and B has m 1 elements, then the
set A x B has exactly nm elements. For example,

aD eX (DAC, | = (0, D), (a, Cy, (a. a), (D;D), (0. C),, (BD, a)}
More generally, the set of all “ordered n-tuples” (a;, a, ..., dn), where a; € A; for
each i, is denoted by A; x Az x --+ x An.

1.2|Logic

1.2.1 Propositions and Logical Connectives


Even at this early stage we have already made use of logical statements and logical
arguments (see the proof of identity (1.12) in Section 1.1). Before we go any further,
we will briefly discuss logical propositions and some notation that will be useful in
working with them.
A proposition is a declarative statement that is sufficiently objective, meaningful,
and precise to have a truth value (one of the values true and false). The following
statements satisfy these criteria and have truth values true, false, and true, respec-
tively:

Fourteen is an even integer.


Winnipeg is the largest seaport in Canada.
00;
10 PART 1 Mathematical Notation and Techniques

Consider the statements

x? <4.
a’ +b* =3.
He has never held public office.
These look like precise, objective statements, but as they stand they cannot be
said to have truth values. Essential information is missing: What are the values
of x, a, and b, and who is “he”? Each of these propositions involves one or more
free variables; for the proposition to have a truth value, each free variable must be
assigned a specific value from an appropriate domain, or universe. In the first case,
an appropriate domain for the free variable x is a set of numbers. If the domain is
chosen to be the set of natural numbers, for example, the values of x that make the
proposition true are those less than 2, and every other value of x will make it false. If
the domain is the set 7 of all real numbers, the proposition will be true for all choices
of x that are both greater than —2 and less than 2. The free variables in the second
statement apparently also represent numbers. For the domain of natural numbers
there are no values for a and b that would make the proposition true; for the domain
of real numbers, a = 1 and b = V2 is one of many choices that would work. An
appropriate domain for the third statement would be a set of people.
Just as we can combine numerical values by using algebraic operations like ad-
dition and subtraction, we can combine truth values, using logical connectives. A
compound proposition is one formed from simpler propositions using a logical con-
nective. When we add two numerical expressions, all we need to know in order to
determine the numerical value of the answer is the numerical value of the two expres-
sions. Similarly, when we combine logical expressions using a logical connective,
all we need to know (in order to determine the truth value of the result) is the truth
values of the original expressions. In other words, we can define a specific logical
connective by saying, for each combination of truth values of the expressions being
combined, what the truth value of the resulting expression is.
The three most common logical connectives are conjunction, disjunction, and
negation, the symbols for which are A, V, and —, respectively. These correspond to
the English words and, or, and not. The conjunction p A q of p and q is read “ p and
q,” and the disjunction p V q is read “p or g.” The negation =p of p is read “not Dee
These three connectives can be defined by the truth tables below.

The entries in the truth tables are probably the ones you would expect, on the
basis of the familiar meanings of the English words and, or, and not. For example,
CHAPTER 1 Basic Mathematical Objects 11

the proposition p A q is true if p and g are both true and false otherwise. The
proposition p V q is true if p is true, or if q is true, or if both p and gq are true.
(In everyday conversation “or” sometimes means exclusive or. Someone who says
“Give me liberty or give me death!” probably expects only one of the two. However,
exclusive or is a different logical connective, not the one we have defined.) Finally,
the negation of a proposition is true precisely when the proposition is false.
Another important logical connective is the conditional. The proposition p > q
is commonly read “if p, then g.” Although this connective also comes up in everyday
speech, it may not be obvious how it should be defined precisely. Consider someone
giving directions, who says “If you cross a railroad track, you’ve gone too far.” This
statement should be true if you cross a railroad track and you have in fact gone too far.
If you cross a railroad track and you have not gone too far, the statement should be
false. The less obvious cases are the ones in which you do not cross a railroad track.
Normally we do not consider the statement to be false in these cases; we assume that
the speaker chooses not to commit himself, and we still give him credit for being a
truthful person. Following this example, we define the truth table for the conditional
as follows.

If you are not convinced by the example that the last two entries of the truth table
are the most reasonable choices, consider the proposition

So ee PESee

where the domain associated with the free variable x is the set of natural numbers.
You would probably agree that this proposition ought to be true, no matter what value
is substituted for x. However, by choosing x appropriately we can obtain all three
of the cases in which the truth-table value is true. If x is chosen to be 0, then both
x < landx < 2 are true; if x = 1, the first is false and the second is true; and if
x > 1, both are false. Therefore, if this compound proposition is to be true for every
x, the truth table must be the one shown. The conditional proposition is taken to be
true except in the single case where it must certainly be false.
One slightly confusing aspect of the conditional proposition is that when it is
expressed in words, the word order is sometimes inverted. The statements “if p then
q” and “gq if p” mean the same thing: The crucial point is that the “if” comes just
before p in both cases. Perhaps even more confusing, however, is another common
way of expressing the conditional. The proposition p > q is often read “p only if
q.” It is important to understand that “if” and “only if” have different meanings. The
two statements “p if g” (¢q — p) and “p only if q” (p > q) are both conditional
12 PART 1 Mathematical Notation and Techniques

statements, but with the order of p and q reversed. Each of these two statements is
said to be the converse of the other.
The proposition (p > q) A (q > p) is abbreviated p < q, and the connective
<> is called the biconditional. According to the previous paragraph, p < q might
be read “p only if g, and p if q.” It is usually shortened to “p if and only if q.” (It
might seem more accurate to say “only if and if”; however, we will see shortly that
it doesn’t matter.) Another common way to read p < q is to say “if p then q, and
conversely.”
With a compound proposition, composed of propositional variables (like p and
q) and logical connectives, we can determine the truth value of the entire proposition
from the truth values of the propositional variables. This is a routine calculation based
on the truth tables of the connectives. We can take care of all the cases at once by
constructing the entire truth table for the compound proposition, considering one at a
time the connectives from which it is built. We illustrate the way this might be done
for the proposition (p V g) A =(p > @q).

The last column of the table, which is the desired result, is obtained by combining
the third and fifth columns using the A operation. Another way of carrying out the
same calculation is to include a column of the table for each operation in the expression
and to fill in the columns in the order in which the operations might be carried out.
The table below illustrates this approach.

The first two columns to be computed are those corresponding to the subexpres-
sions p V q and p —> q. Column 3 is obtained by negating column 2, and the final
result in column 4 is obtained by combining columns 1 and 3 using the A operation.
A compound proposition is called a tautology if it is true in every case (that is,
for every combination of truth values of the simpler propositions from which it is
CHAPTER 1 Basic Mathematical Objects 13

constructed). For example, p V —p is a tautology, as you can see by constructing


its two-row truth table. A contradiction is the opposite, a proposition that is false in
every case. If P is a tautology, —P is a contradiction. Another example is p A —p,
where p is any proposition. Although there are many different-looking formulas that
turn out to be tautologies, logically (that is, with regard to their truth values) they are
indistinguishable. The proposition true stands for any tautology: It is, by definition,
always true. Similarly, false stands for any contradiction.

1.2.2 Logical Implication and Equivalence


Suppose now that P and Q are both compound propositions, built from propositional
variables and logical connectives. Then in each case—that is, for each possible
choice of the truth values of all these propositional variables—P and Q both have
truth values. If Q is true in each case in which P is true, then we say P logically
implies Q, and we write P => Q. If P and Q have the same truth value in each case
(in other words, if P and Q have exactly the same truth tables), we say P and Q are
logically equivalent, written P <> Q. Saying that P } Q is the same as saying
that P > Q and Q => P. The significance of logical equivalence is that for any
proposition appearing in a formula, we may substitute any other logically equivalent
proposition, because the truth value of the overall expression remains unchanged.
The logical connective — and the relation = of logical implication are similar
(P — Q is read “if P then Q,” and P > Q means that if P is true, Q must be true)
but not the same. P — Q is a proposition just as P and Q are. It has a truth value,
depending on the truth values of P and Q. P > Qs not a logical proposition in
the same sense. It is a higher-level statement, if you like: a “meta-statement,” or an
assertion about the relationship between two propositions. If we know that P > Q,
then there can be no case in which P is true and Q is not. In other words (because of
the way we have defined the truth table for —), the statement P — Q must always
be true. Thus, the similarity between — and => can be accounted for by saying that
P => QO means P —> Q isa tautology. In exactly the same way, P <> Q means that
P = Qisa tautology.
Logical implication and logical equivalence come up in proofs, which we shall
discuss in a little more detail in Chapter 2. For now, let us note a number of funda-
mental logical equivalences that will be useful to us.
One approach would be simply to list the formulas we are interested in and
establish each one by constructing its truth table. However, if we are careful we can
obtain a number of them almost immediately from the set identities in Section 1.1. You
may already have noticed a superficial resemblance between the logical connectives V
and / and the set operations U and /M, respectively. In fact, the similarity goes deeper
than the shapes of the symbols, as we can see by considering a different approach
to defining the set operations. We can define A U B, for example, by considering
the four cases determined by an element’s possible membership in A and in B. In
three of the four cases, the element is in A U B; the only case in which the element
is not in A U B is that in which the element is in neither A nor B. This can be
14 PART 1 Mathematical Notation and Techniques

summarized by a “membership table” for the U operation. Let T denote membership


and F nonmembership.

Ap RB AUR
it T Ip
ar F ah
F a ay
F Ie F

Obviously, this duplicates the truth-table definition of V. Similarly, the member-


ship table for M corresponds to the truth table for A, and the membership table for the
complement of a set corresponds to the truth table for negation.
To see how we can apply this observation, consider an example. We can establish
the equality of the two sets (A U B)’ and A’/N B’ by showing that they have the same
membership table. The membership tables for these two sets, however, are obtained
from those of A and B in exactly the same way that the truth tables for =(a Vv b) and
sa /( —b are obtained from those of a and b. The conclusion is that the truth tables
for these two propositions are also identical, and the two propositions are logically
equivalent.
What we obtain from this discussion is a list of logical equivalences correspond-
ing to the set identities (1.1)-(1.19). The proposition corresponding to the empty set
G is the proposition false, and the one corresponding to the universe U is true. (Recall
the earlier discussion of the biconditional in which we mentioned that “only if and
if” is really the same as “if and only if.” We can now see the reason: According to
the logical analogue of the commutative law for intersection in Equation (1.2), the
two propositions p A qg and q A p are logically equivalent.)
There are many other useful logical equivalences. We list just a few that involve
the conditional connective:

(P > 4) > (7"pVvq)


(p> q) <> (—-q > =p)
“(Pp > q) > (pA-@)
The first of these formulas is a way of reducing a conditional proposition to one
involving only the three original connectives. The second asserts that the conditional
proposition p — q is logically equivalent to its contrapositive. This equivalence
occurs in everyday speech: “If I drive, I don’t drink” is equivalent to “If it is not true
that I don’t drink, then I don’t drive,” or, more simply, “If I drink, then I don’t drive.”

1.2.3. Logical Quantifiers and Quantified Statements


Let us return to propositions involving variables, which may be replaced by values
from some doriain or universe. Consider first a proposition with a free variable, say
the proposition x? < 4 from Section 1.2.1, where the associated domain is the set of
CHAPTER 1 Basic Mathematical Objects 15

natural numbers. If we now modify this proposition by attaching the phrase “there
exists an x such that” at the beginning, then the proposition has been changed from
a statement about x to a statement about the domain. The status of the variable x
changes as well: It is no longer a free variable. If we tried to substitute a specific
value from the universe for all occurrences of x in the proposition, we would obtain
a statement such as “there exists a 3 such that 3* < 4,” which is nonsense. We say
that the statement “there exists an x such that x? < 4” is a quantified statement. The
phrase “there exists” is called the existential quantifier. The variable x is said to be
bound to the quantifier and is referred to as a bound variable. We write this statement
in the compact form

Ax (x? < 4)
The other quantifier is the universal quantifier, written V. The formula Vx(x? < 4)
stands for the statement “for every x, ane Aaa thi you are not familiar with this
notation, the way to remember it is that V is an upside-down A, for “all,” and J is a
backwards E, for “exists.”
If P(x) is any proposition involving the free variable x over some domain U, then
by definition, the quantified statement 4x(P(x)) is true if there is at least one value of
x in U that makes the formula P(x) true, and false otherwise. Similarly, Vx(P(x)) is
true precisely when P(x) is true no matter what element of U is substituted for x. If
P(x) is the formula x? < 4, then 4x(P(x)) is true if the domain is the set of natural
numbers, because 0* < 4. It is false for the domain of positive even integers. The
quantified statement Vx(P(x)) is false for both of these domains.
The notation for quantified propositions occasionally varies. For example, the
statement Jx(x* < 4) is sometimes written dx : x7 < 4. We have chosen to use the
parentheses in order to clarify the scope of the quantifier. In our example 4x(x* < 4),
the scope of the quantifier is the statement x? < 4. If the quantified statement appears
within a larger formula, then any x outside this scope means something different. If
you have studied a block-structured programming language such as Pascal or C, you
may be reminded of the scope of a variable in a block of the program, and in fact
the situations are very similar. If a block in a C program contains a declaration of
an identifier A, then the scope of that declaration is limited to that block and its
subblocks. To refer to A outside the block is either an error or a reference to some
other identifier declared outside the block.
Paying attention to the scope of quantifiers is particularly important in proposi-
tions with more than one quantifier. Consider, for example, the two propositions

Vx(4y((x — y)? < 4))


Ay (Vx((x — y)? < 4))
where the domain is assumed to be the set of real numbers. Superficially these are
similar. They both have the same quantifiers, associated with the same variables,
and the inequalities are the same. However, the first statement is true and the second
is not. They say different things, and we can see the difference by considering the
scope of the universal quantifier in each one. In the first case, the entire clause
Ay((x — y)* <4) is within the scope of “Vx,” and so this clause is to be interpreted
16 PART 141 Mathematical Notation and Techniques

as a statement about the specific value x. In other words, we can interpret “Sy” as
“there exists a number y, which may depend on x.” In this case the proposition is
true, since y could be chosen to be x, for example. In the second case, the existential
quantifier is outside the scope of “Vx,” which means that for the statement to be true
there would have to be a single number y that satisfies the inequality no matter what
x is. This is not the case, because the inequality fails if x = y + 2.
Although we will not be using the J and V notation very often after this chapter,
there are times when it is useful. One advantage of writing a quantified statement
using this notation is that it forces you to specify, and therefore to understand, the
scope of each quantifier.

| EXAMPLE 1.1 | Alternative Notation for Quantified Statements


Consider quantified statements over some domain U, and'assume that A is a subset of U. Just
as {x € A | P(x)} is sometimes written to mean “the set of elements x in A satisfying the
property P(x),” people also write Ix € A(P(x)) to mean “there exists an x in A such that
P(x).” Similarly, Vx € A(P(x)) means “for every x in A, P(x).” It is interesting to rewrite
both these statements in the original stricter notation. In the first case we want a statement of
the form 4x(Q(x)). Since we want our statement to say not only that there is an x satisfying
the property P(x), but that there is such an x that is also an element of A, we may take Q(x)
to be the statement x € A A P(x). Our formula becomes

dx(x € AA P(x))

In the second case, the form of the statement is to be Vx(Q(x)). If we tried the same choice
for Q(x), we would be saying not only that every x satisfies P(x), but also that every x is an
element of A. This is not what we want. The condition that x be an element of A is supposed
to make the statement weaker, not stronger—we do not want to say that every x satisfies P(x),
only that every x in A does. To say it another way, every x satisfies P(x) if it is also an element
of A. A conditional statement is a reasonable choice for Q(x), and our statement becomes

Vx(x € A> P(x))

Sometimes the quantifier notation is relaxed even further, in order to write propositions like

Vx > O(P(x))

Just as in the previous formula, this could be rewritten in the form

Vx(x >0—> P(x))

| EXAMPLE1.2 | A Quantified Statement Saying p is Prime


Let us consider the statement “‘p is prime,” involving the free variable p over the universe V of
natural numbers. We will try to write this as a symbolic formula, using logical connectives and
quantifiers, standard relational operators on integers (>, =, and so on), and the multiplication
operation *. We take as our definition of a prime number a number greater than 1 whose only
divisors are itself and 1. The first question is how to express the fact that one number is a
divisor of another. The statement “k is a divisor of p” means that p is a multiple of k, or that
there is an integer m with p = m x k. Next, saying that the only divisors of p are p and | is
CHAPTER 1 Basic Mathematical Objects 17

the same as saying that every divisor of p is either p or 1. Adapting the second part of the
previous example, we can restate this as “for every k, if k is a divisor of p, then k is either p
or 1.” Putting all these pieces together, we obtain for “p is prime” the proposition

(p > LI)AVkK(Am(p =mxk) > (kK=1VkK=p))

Manipulating quantified statements often requires negating them. Let us consider


how to express the negation of a quantified statement as a quantified statement—that
is, to rewrite it so that the negation symbol does not appear before the quantifier.
Saying “it is not the case that for every x, P(x),” or “not every x satisfies P,” is the
same as saying that there is at least one x that does not satisfy P. The conclusion
is that -Vx(P(x)) is the same as dx(—P(x)). Similarly, ~4x(P(x)) is the same as
Vx(—P(x)). The rule is that to negate a quantified statement, we reverse the quantifier
(change existential to universal, and vice versa) and take the negation sign inside the
parentheses.
We can apply this rule one step at a time to statements containing several nested
quantifiers. Let s be the statement

Wx (dy(Wz(P(, y, z))))
We negate s as follows.

as = (Vx (Ay(Vz(P(, y, 2)))))


= Ax(sGy(vz(P@, y,2))))) -
= Ax(Vy(>(W2(P(, y, 2)))))
= Ax (Vy(z(-P(, y, z))))

1.3| Functions
Functions, along with sets, are among the most basic objects in mathematics. The
ones you are most familiar with are probably those that involve real numbers, with
formulas like x? and logx. The first of these formulas specifies for a real number
x another real number x2. The second assigns a real number log x to a positive real
number x (logxmakes sense only if x > 0). In general, a function assigns to each
element of one set a single element of another set. The first set is called the domain
of the function, the second set is the codomain, and if the function is f, the element
of the codomain that is assigned to an element x of the domain is denoted f(x).
Natural choices for the domains of the functions with formulas x? and logx
would probably be 7 (the set of real numbers) and {x € R | x > O}, respectively;
however, when we define a function f, we are free to choose the domain to be any
set we want, as long as the rule or formula we have in mind gives us a meaningful
value f (x) for every x in the set. The codomain of a function f is specified in order
be chosen
to make it clear what kind of object f(x) will be; however, it can also
arbitrarily, provided that f(x) belongs to the codomain for every x in the domain.
little
(This apparent arbitrariness can sometimes be confusing, and we discuss it a
more on page 19.)
18 PART 1 Mathematical Notation and Techniques

We write

f:A->B

to indicate that f is a function with domain A and codomain B. Although in sim-


ple examples we often stick to numerical functions (functions whose domain and
codomain are both sets of numbers), the following examples include other kinds, and
many of the functions we consider in this book will not deal with numbers directly
at all.
Let H be the set of human beings, alive and dead, and let VV = {0,1,2,...}
be the set of natural numbers. Since 2# is the set of all subsets of H, the set of all
nonempty subsets is 2% — {}.
1. fi: N — N, defined by the formula f,(x) = x?
2. f2: H — H, defined by the rule f(x) = the mother of x
3. f3: H — N, defined by the rule f;(x) = the number of siblings of x
4. f,: 2% — {0} > H, defined by the rule f4(x) = the tallest person in the set x

In the last example, we are making the assumption that no two humans are exactly
the same height, because otherwise there are sets x for which the phrase “the tallest
person in x” does not actually specify a function value.

1.3.1 One-to-one and Onto Functions


Suppose f : A > Band S C A. We write

f(S) = {f(x) |x € S} ={y € B| y = f(a) forat least one x € S}


In other words, f(S) is the set of values that f associates to elements of the subset
S. This notation is potentially confusing, for now when we write f(X), we might
have in mind either a subset of B (the one corresponding to the subset X of A), or an
element of B (the one associated to the element X of A). Generally, the context will
make it clear which is intended.
The set f(A), the set of all elements of the codomain that are associated to
elements of A, is given a special name, the range of f. If f : A — B, then it is
true by the definition of a function from A to B that the range of f is a subset of B
(that is, f(A) C B). It may not be the entire set B, since there may be elements of
B that are not assigned to any elements of A. If f(A) = B (the range and codomain
of f are equal, or every element of the codomain is actually one of the values of the
function), the function f is said to be onto, or surjective, or a surjection.
The range of the function f;, described previously is the set of all natural numbers
that are perfect squares. Since not all natural numbers have this property, f| is not
onto. Similarly, neither f; nor f3 is onto. The range of f> is the set of all human
females with at least one child, and the range of f3, assuming that we do not count
half-brothers or half-sisters as siblings, is probably {0,1,...,.N} for some N less
than 50. (This is not certain; it may be, for example, that one pair of parents had 41
children but no pair of parents ever had 40.) The function f4 is onto. To see this, we
CHAPTER 1 Basic Mathematical Objects 19

must be able to find for each human being y at least one nonempty set S of humans
so that y is the tallest member of S. This is easy: The set {y} is such a set.
Again assuming that f : A > B, we say f is one-to-one, or injective, or an
injection, if no single element y of B can be f(x) for more than one x in A. In
other words, f is one-to-one if, whenever f(x1) = f(x), then x, = x2. To say
it yet another way, f is one-to-one if, whenever x, and x> are different elements of
A, then f (x1) # f(x2). A bijection is a function that is both one-to-one and onto.
If f : A > B is a bijection, we sometimes say that A and B are in one-to-one
correspondence, or that there is a one-to-one correspondence between the two sets.
Of our four examples, only f; is one-to-one. The other three are not, because
there are two people having the same mother, there are two people with the same
number of siblings, and there are many nonempty sets of people with the same tallest
element.
The terminology “one-to-one,” although standard, is potentially confusing. It
does not mean that to one element of A there is assigned (only) one element of B.
This property goes without saying; it is part of what f : A — B means. Nor does it
mean that for one y € B there is one x € A with f(x) = y. If f is not onto, then
there is a y for which there is no such x. “One-to-one” means the opposite of “many-
to-one.” One element of B can be associated with only (at most) one element of A.
To a large extent, whether f is one-to-one or onto depends on how we have
specified the domain and codomain. Consider these examples. Here 7 denotes the
set of all real numbers and R* the set of all nonnegative real numbers.
1. f:R— R, defined by f(x) = x’, is neither one-to-one nor onto. It is not
one-to-one, because f(—1) = f(1). It is not onto, since —1 cannot be f(x) for
any x.
2. f:R— Rt, defined by f(x) = x’, is onto but not one-to-one.
3. f :Rt > R, defined by f(x) = x’, is one-to-one but not onto.
4. f:R+t > Rt, defined by f(x) = x’, is both one-to-one and onto.
In less formal discussions, you often come across phrases like “the function
f(x) = x*.” Although such a phrase may be sufficient, depending on the context,
these four examples should make it clear that in order to specify a function f com-
pletely and discuss its properties, one must provide not only a rule by which the value
f (x) can be obtained from x, but also the domain and codomain of f.
On the other hand, we have already mentioned that the choice of domain and
codomain can be somewhat arbitrary. These examples seem to confirm that, and we
might want to consider a little more carefully whether this arbitrariness serves any
purpose. When people say “the function f(x) = x” or “the function f(x) = log x,”
they might be assumed to be using the convention that the domain is the largest set for
which the formula makes sense (for x the set 7 and for log x the set {x € R| x > O},
assuming in both cases that only real numbers are involved). In the case of the
codomain, it is not clear why we would choose F instead of R* as the codomain
of the function with formula f(x) = x2, since it is true that x* > 0 for every real
number x. In fact, it might seem that a good choice of codomain for any function f
20 PART 1 Mathematical Notation and Techniques

with domain A is simply the range f(A), the set of all values obtained by applying
the function to elements of the domain. (As we have already noticed, the range must
always be a subset of the codomain, and if it is chosen as the codomain then f is
automatically onto.)
There are, however, valid reasons for allowing ourselves to specify the domain
and codomain of a function as we wish. It might be appropriate because of the
circumstances to limit the domain of the function: If f(x) represents the weight of
an object x centimeters long, there is no reason to consider x < 0, even if the formula
for f(x) makes sense for these values. People do not always specify the codomain
of a function f with domain A to be the set f(A), because it may be difficult to
say exactly what set this is. In examples involving real numbers, for example, it is
tempting to write f : R — RF at the outset—assuming that f(x) is a real number for
every real number x—and worry about exactly what subset of 7? the range is only if it
is necessary. It may also be that the focus of the discussion is the two sets themselves,
rather than the function. We might, for example, want to ask whether two specific
sets A and B can be placed in one-to-one correspondence (in other words, whether
there is a function with domain A and codomain B that is a bijection). In any case,
even though it might occasionally seem unnecessary, we will try to specify both the
domain and the codomain of any function we discuss.

1.3.2 Compositions and Inverses of Functions


Suppose we have functions f : A > Bandg: B — C. Then for anyx € A, f(x) is
in the domain of g, and it is therefore possible to talk about g(f(x)). More generally,
if f : A — B,g: B, — C, and the range of f is a subset of B,, then g(f (x)) makes
sense for each x € A. The function h : A — C defined by h(x) = g(f(x)) is called
the composition of g and f and is written h = g o f. For example, the function h
from R to R defined by h(x) = sin(x?) is go f, where g(x) = sin x and f(x) = x?.
The function f o g, on the other hand, is given by the formula (sin x)*. When you
compute g o f(x), take the formula for g(x) and replace every occurrence of x by
the formula for f(x).
You can verify by just tracing the definitions that if f : A > B,g:B—>C,
and h : C — D, then the functions h o (g o f) and (ho g) o f from A to D are
equal, and they are computed as follows. For x € A, first take f(x); then apply g
to that element of B to obtain g(f(x)); then apply h to that element of C to obtain
h(g(f(x))). We summarize this property of composition by saying composition is
associative; compare this to the set identities (1.3) and (1.4) in Section 1.1.
Let us show that if f : A > Band g: B = C are both one-to-one, then
so is the composition g o f. To say that g o f is one-to-one means that whenever
go f(x) = g0 f (x2), then x; = x2. But if g(f(x1)) = g(f (x2)), then since g is
one-to-one, f(x,) = f (x2). Therefore, since f is also one-to-one, x) = x.
Similarly, if f : A > B and g : B => C are both onto, go f is also onto. For
any z € C, there is an element y € B with g(y) = z, since g is onto, and there is an
element x € A with f(x) = y, since f is onto. Therefore, for any z € C, there is an
x € Awith g(f(x)) =g0o f@) =z.
CHAPTER 1 Basic Mathematical Objects 21

It follows by combining these two observations that if f and g are both bijections,
then the composition g o f is also a bijection.
Suppose that f : A — B is a bijection (one-to-one and onto). Then for any
y € B, there is at least one x € A with f(x) = y, since f is onto; and for any y € B,
there is at most one x € A with f(x) = y, since f is one-to-one. Therefore, for any
y € B, it makes sense to speak of the element x € A for which f(x) = y, and we
denote this x by f~'(y). We now have a function f~! from B to A: f~!(y) is the
element x € A for which f(x) = y.
Note that we obtain the formula f(f~!(y)) = y immediately. For any x € A,
f~'((x)) is defined to be the element z € A for which FO= FC). Since xis
also such an element, and since there can be only one, z = x. Thus we also have the
formula f~!(f(x)) = x. These two formulas summarize, and can be taken as the
defining property of, the inverse function f-!: B > A:

for every x € A, f'(f(x)) =x


for every y € B, f(f'(y)) =y
Another slightly different use of the f—! notation makes sense for any function
ff, whether it is one-to-one or onto. If f : A > B and S is any subset of B, we write

f"(S)
={x € Al f@)ES}
so that f—! associates a subset of A to each subset of B. The set f~!(B) is simply
the domain A. Note also that if f happens to be a bijection, so that f—! is a function
from B to A, then the set f—!(.S) can be obtained as in the beginning of Section 1.3.1:

ifAS) tfOy yes}


This formula makes sense only if f is a bijection, because only in that case does f—!
associate an element f—'(y) of A to each element y of B.

1.3.3 Operations on a Set


We may view a function of two or more variables as one whose domain is (some
subset of) a Cartesian product of two or more sets. For example, we might consider
the function of the two variables x and y given by the formula

3x —xy

to be a function with domain R x FR and codomain 7, since the formula makes sense,
and yields a real number, for any ordered pair (x, y) of real numbers.
We have seen several examples of this already in Section 1.1. If U is a set, we
may form the union of any two subsets A and B of U. The function f given by the
formula

f(A, B)=AUB

may be viewed as a function with domain 2” x 2" (the set of ordered pairs of subsets
of U) and codomain 2”.
22 PART 1 Mathematical Notation and Techniques

For an even more familiar example, consider the operation of addition on the set
Z of integers. For any two integers x and y, x + y is an integer. We may therefore
view addition as being a function from Z x Z to Z.
Union is a binary operation on the set 2” of subsets of U. Addition is a binary
operation on the set of integers. In general, a binary operation on a set S is a function
from S x S to S; in other words, it is a function that takes two elements of S and
produces an element of S. Other binary operations on 2” are those of intersection
and set difference. Other binary arithmetic operations on Z are multiplication and
subtraction. In most of the familiar situations where a binary operation e on a set is
involved, it is common to use the “infix” notation x e y rather than the usual functional
notation e(x, y). For example, we write A U B instead of U(A, B), and x + y instead
of +(x, y).
A unary operation on S is simply a function from S to S. For example, the com-
plement operation is a unary operation on 2”, and negation is a unary operation on Z.
If e is an arbitrary binary operation on a set S, and T is a subset of S, we say that
T is closed under the operation e if T eT C T. In other words, T is closed under e if
the result of applying the operation to two elements of T is, in fact, an element of T.
Similarly, if u is a unary operation on S, and T C S, T is closed under u if u(T) C T.
For example, the set \V is closed under the operation of addition (the sum of two
natural numbers is a natural number), but not under the operation of subtraction (the
difference of two natural numbers is not always a natural number). The set of finite
subsets of R is closed under all the operations union, intersection, and set difference,
since if A and B are finite sets of real numbers, all three of the sets AU B, AN B,
and A — B are finite sets of real numbers. The set of finite subsets of R is not closed
under the complement operation, since the complement of a finite set of real numbers
is not finite. The significance of a subset T of S being closed under e is that we can
then think of e as an operation on T itself; that is, if e is a binary operation, there is
a function from T. x T to T whose value at each pair (x, y) € T x T is the same
as that of e. (It is tempting to say that the function is e, except that we identify two
functions only if they have the same domains and codomains.)
Note that a few paragraphs back, division was not included among the binary
arithmetic operations on Z. There are two reasons for this. First, not every pair
(x, y) of integers is included in the domain of the (real) division operation, since
division by 0 is not defined. Second, even if y 4 0, the quotient x /y may not be an
integer; that is, the set of nonzero integers, thought of as a subset of R, is not closed
under the operation of division. One way around the second problem would be to
use a different operation, such as integer division (in which the value is the integer
quotient and the remainder is ignored). This leaves the problem of division by zero.
One approach would be to say that integer division is a binary operation on the set of
nonzero integers. Another approach, in which 0/x would still make sense whenever
x # 0, would be to say that for any fixed nonzero x, the set of integers is closed under
the unary operation of integer division by x.
Most of the time, we will be interested in unary or binary operations on a set.
However, there are times when it will be useful to consider more general types. Ann-
ary operation on a set S is a function from the n-fold Cartesian product Sx §x---x §
to S. As we saw in the case of union and intersection, when we start with an associative
CHAPTER 1 Basic Mathematical Objects 23

binary operation e, one for which x e (y ez) = (x e y) ez, there is a natural way to
obtain for each n > 2 an n-ary operation. This is what we are doing, for example,
when we consider the union of n sets instead of two.

1.4| Relations
A mathematical relation is a way of making more precise the intuitive idea of a
relationship between objects. Since a function will turn out to be a special type of
relation, we can start by giving a more precise definition of a function than the one
in Section 1.3, and then generalize it.
In calculus, when you draw the graph of a function f : R — R, you are
specifying a set of points, or ordered pairs: all the ordered pairs (x, y) for which
y = f(x). We might actually identify the function with its graph and say that the
function is this set of pairs. Saying that a function is a set of ordered pairs makes
it unnecessary to say that it is a “rule,” or a “way of assigning,” or any other such
phrase. In general, we may define a function f : A > B to be a subset f of A x B
so that for each a € A, there is exactly one element b € B for which (a, b) € f. For
each a € A, this element b is what we usually write as f(a).
A function from A to B is a restricted type of correspondence between elements
of the set A and elements of the set B, restricted in that to every a € A there must
correspond one and only one b € B. A bijection from A to B is even more restricted;
in the ordered-pair definition, for any a € A there must be one and only one b € B so
that the pair (a, b) belongs to the function, and for any b € B there is one and only one
a € Aso that (a, b) belongs to the function. If we relax both these restrictions, then
an element of A can correspond to several elements of B, or possibly to none, and an
element of B can correspond to several elements of A, or possibly none. Although
such a correspondence is no longer necessarily a function, either from A to B or from
B to A, it still makes sense to describe the correspondence by specifying a subset of
Ax B. This is how we can define a relation from A to B: It is simply a subset of A x B.
For an element a € A, a corresponds to, or is related to, an element b ¢€ B if the pair
(a, b) is in the subset. We will be interested primarily in the special case where A
and B are the same set, and in that case we refer to the relation as a relation on A.

Definition 1.1 A Relation on a Set

You are already familiar with many examples of relations on sets, even if you
have not seen this formal definition before. When you write a = b, where a and b
are elements of some set A, you are using the relation of equality. If we think of = as
a subset of A x A, then we can write (a, b) €= instead of a = b. Of course, we are
more accustomed to the notation a = b. For this reason, in the case of an arbitrary
relation R ona set A, we often write a Rb instead of (a, b) € R. Both these notations
mean that a is related to b—or, if there is some doubt as to which relation is intended,
related to b via R.
24 PART 1 Mathematical Notation and Techniques

The subset = ofVV x N, for example, is the set {(0, 0), (1, 1), (2, 2), ...}, con-
taining all pairs (i, i). The relation on N specified by the subset

{(0, 1),
(0, 2), (1, 2),
(0, 3), C, 3), (2, 3),
sy,

of N x N is the relation <. Other familiar relations on NV include <, >, >, and #.
One that may not be quite as familiar is the “congruence mod n” relation. If n is a
fixed positive integer, we say a is congruent to b mod n, written a =, b, if a —b isan
integer multiple of n. In the interest of precision we write this symbolically: a =, b
means 4k(a — b = k xn). Note that the domain of the quantifier is the set of integers:
The integer k in this formula can be negative or 0. To illustrate, let = 3. The subset
=; ofNV x N contains the ordered pairs (0, 0), (1, 1), (1, 4), (4, 1), (7; 10), @, 6),
(8, 14), (76, 4), and every other pair (a, b) for which a — b is divisible by 3.
So far, all our examples of relations are well-known ones that have commonly
accepted names, such as = and <. These are not the only ones, however. We are
free to invent a relation on a set, either by specifying a condition that describes what
it means for two elements to be related, or simply by listing the ordered pairs we
want to be included. Consider the set A = {1, 2, 3, 4}. We might be interested in the
relation R; on A defined as follows: for any a and b in A, aR,b if and only if |a — b|
is prime. The ordered pairs in R, are (1, 3), (3, 1), (2, 4), (4, 2), (1, 4), and (4, 1).
We might also wish to consider the relation R2 = {(1, 1), (1, 4), G, 4), (4, 2)}. In
this relation, 1 is related to both itself and 4, 3 is related to 4, 4 is related to 2, and
there are no other relationships. Even if there is no simple way to say what it means
for a to be related to b, other than to say that (a, b) is one of these four pairs, Rp isa
perfectly acceptable relation on A.

1.4.1 Equivalence Relations and Equivalence Classes


The type of relation we will be particularly interested in is characterized by three
properties: reflexivity, symmetry, and transitivity.
CHAPTER 1 Basic Mathematical Objects 25

It is useful to note one difference between the reflexive property and the other
two properties in Definition 1.2. In order for the relation R to be reflexive, every
element of A must be related to itself. In particular, there are a number of ordered
pairs that must be in the relation: those of the form (a, a). Another way to say this
is that A must contain as a subset the relation of equality on A. In the case of the
other two properties, the definition says only that if certain elements are related, then
certain others are related. If R is symmetric, for example, two elements a and b need
not be related; however, if one of the two pairs (a, b) and (b, a) is in the relation,
then the other is also.
The definitions of reflexive, symmetric, and transitive all start out “For every... .”
The negation of the quantified statement “For every. ..” is a quantified statement of
the form “There exists....” This means that to show a relation R on a set is not
reflexive, it is sufficient to show that there is at least one pair (x, x) not in R. To show
R is not symmetric, all we have to do is find one pair (a, b) so that aRb and not bRa.
And to show R is not transitive, we just need to find one choice of a, b, and c in the
set so that aRb and bRc but not aRc. In the last case, a, b, and c do not need to be all
different. For example, if there are elements a and b of the set A so that aRb, bRa,
and not aRa, then R is not transitive.
The relations < and > on the set NV are reflexive (for every a € N,a < a and
a > a) but not symmetric; for example, the statement 1 < 3 is true but 3 < 1 is not.
The < relation and the > relation are neither reflexive nor symmetric. The relation
is neither reflexive nor transitive; for example, although 1 4 3 and 3 + 1, the
statement 1 4 1, which would be required by transitivity, fails.
The simplest example of an equivalence relation on any set is the equality relation.
The three properties in Definition 1.2 are fundamental properties of equality: Any
element is equal to itself; if a is equal to b, then b is equal to a; and if a = b and
b =, then a = c. It seems reasonable to require that any relation we refer to as
“equivalence” should also satisfy these properties, just because of the way we use
the word informally. (In Section 1.2 we have also used the word in a more precise
way, and in Exercise 1.33 you are asked to show that logical equivalence is in fact an
equivalence relation.) An equivalence relation can be thought of as a generalization
of the equality relation.
Let us show that for any fixed positive integer n, the congruence relation =, on
the set V is an equivalence relation. First, it is reflexive because for every a € N,
a—a = 0*n and therefore a — a is a multiple of n. Second, it is symmetric
because for every a and J, if a =, b, then for some k, a — b = k * n; it follows that
b—a=—k«n = (—k) «n, and therefore b =, a. Finally, it is transitive. If a=, b
and b =, c, then for some integers k and m, a — b = kn and b — c = mn; therefore
a—c=(a—b)+(b—-c) = (k+™m)n, and a — c is a multipleof n.
An important general property of equivalence relations can be illustrated by this
example. For the sake of concreteness, we fix a value of n, say 4. The set of elements
of NV equivalent to 0 is the set of natural numbers i for which i — 0 is a multiple of
4: in other words, the set

(Os4 R129)
PART 1 Mathematical Notation and Techniques

The set of elements equivalent to 1 is


Aeee IS re |
Similarly, the sets {x |x =4 2} and {x |x =4 3} are the sets

{2, OO) Tae ADCS eh, mel ees


respectively. Note that every natural number is in one of these four subsets, and that the
four subsets are pairwise disjoint. These two properties are summarized by saying
that the four subsets form a partition of N’: A partition of a set A is a collection
of pairwise disjoint subsets of A whose union is A. The numbers equivalent to a
particular integer, say 9, are the numbers in the same subset as 9: 1, 5, 9, 13, and so
forth. The partition of VV is another way of describing the relation. To say that two
natural numbers are equivalent is simply to say that they are in the same subset. If
we are told what the partition is, then effectively we know what the relation is.
Any partition of a set A determines an equivalence relation on A in exactly
this way. If C is a partition of A (that is, the sets in C are pairwise disjoint and
Usec S = A), then we can define the relation E on A by saying
aEb if and only if a and b belong to the same element of C
We think of the elements of A as being distributed into a number of bins, the bins
being simply the elements of C, the subsets that form the partition. Every element
of A is in exactly one bin. The relation E is an equivalence relation, because the
following are true:

1. Every element a of A is in the same bin as itself.


2. For any a and b of A, if a and b are in the same bin, so are b and a.
3. For any a, b, and c of A, if a and b are in the same bin, and b and c are also,
then a and ¢ are also.

The elements in the bin that contains a are precisely the elements of A (including a
itself) that are related to a via this relation E.
Now we can turn this around and show that any equivalence relation on a set can
be described in exactly this way. Suppose R is an equivalence relation on A. For any
element a of A, we denote by [a], or simply by [a], the equivalence class containing
a:

[alr = {x € A|xRa}
Note that the “equivalence class containing a” really does contain a. Because R is
reflexive, aRa, which means that a is one of the elements of [a]. Note also that since
R is symmetric, saying that x € [a] is really the same as saying that a € [x], oraRx.
The reason is that if x € [a], then x Ra, and then by the symmetry of R, aRx.
We have started with an equivalence relation on A, and we have obtained a
collection of bins—namely, the equivalence classes. Now we check that they do
actually form a partition of A. Of the two properties, the second is easier to check.
Saying that the union of the equivalence classes is A is the same as saying that every
element of A is in some equivalence class; however, as we have already noted, for any
CHAPTER 1 Basic Mathematical Objects 27

a € A,a € [a]. The other condition is that any two distinct equivalence classes are
disjoint; in other words, for any a and b, if [a] # [b], then [a] N[b] = @. Let us show
the contrapositive statement, which amounts to the same thing: If [a] and [b] are not
disjoint, then [a] = [b]. We start by showing that if [a] N [b] 4 @, then [a] € [b].
The same argument will show that [b] € [a], and it will follow that [a] = [b].
If [a] N [b] # GY, then there is an x that is an element of both [a] and [b]. As we
have noted above, if x € [a], then aRx. Let y be any element of [a]. Then we have

yRa (by definition of [a])


aRx. (because x € [a] and therefore a € [x])
xRb_ (by definition of [b])

Using the transitivity of R once, along with the first two statements, we may say that
yRx; using it again with yRx and x Rb, we obtain yRb. What we have now shown
is that any element of [a] is an element of [b], so that [a] € [b].
We now have a partition of A into bins, or equivalence classes. If a and b are
elements satisfying aRb, thena € [b] and b € [b]; in other words, if two elements are
equivalent, they are in the same bin. On the other hand, if a and D are in the same bin,
then since b € [b], we must have a € [b], so that a and b are related. The conclusion
is that two elements are related if and only if they are in the same bin. Abstractly the
equivalence relation R on A is no different from the relation E described previously
in terms of the partition.
This discussion can be summarized by the following theorem.

If R is an equivalence relation on a set A, then according to the definition, figuring


out what elements of A are related to (equivalent to) x means figuring out what the
equivalence class containing x is. Understanding the partition determined by the
equivalence classes is another way of understanding the relation.
One other comment about the terminology is in order. We call [a] the equivalence
class containing a, but we must be careful to remember that it may contain other
elements too. If it contains another element b, then we could refer to it just as
accurately as the equivalence class containing b: the statement b ¢€ [a] is the same
as the statement b Ra, and as we have seen, this is also the same as a € [b].
To be a little more explicit, here are eight ways of saying the same thing:

aRb
a and b are in the same equivalence class
PART 1 Mathematical Notation and Techniques

a é[b]r
b € [alr
[alr = [b]r
[alr C [b]r
[blr C [alr
lalrO [blr FY
If we have a nonempty subset S of A, then in order to show it is an equivalence
class, we must show two things:

1. For any x and y in S, x and y are equivalent.


2. For any x € S and any y ¢ S, x and y are not equivalent.
If we can do this, then for any a € S, it follows from statement 1 that any element of
S is equivalent to a, so that S C [a]; and it follows from statement 2 that no element
of S’ can be in [a], so that [a] C S.
We have already calculated the equivalence class containing i in the case of the
equivalence relation =, on the set VV. It is the set of all natural numbers that differ
from i by a multiple of 4. For any positive integer n, the equivalence classes for the
equivalence relation =, on NV are

[Ok Oln non ...}


(Sia
ls on 4 to.
[225 2s 2n 2 eee}

[n
— 1] = {n —1,2n
—1,3n —1,...}

These n equivalence classes are distinct, because no two of the integers 0, 1,...,n—1
differ by a multiple of n, and they are the only ones, because among them they clearly
account for all the nonnegative integers.

1.5 |Languages
By a language we mean simply a set of strings involving symbols from some al-
phabet. This definition will allow familiar languages like natural languages and
high-level programming languages, and it will also allow random assortments of un-
related strings. It may be helpful before we go any further to consider how, or even
whether, it makes sense to view a language like English as simply a set of strings.
An English dictionary contains words: dumpling, inquisition, notational, put, etc.
However, writing a sequence of English words is not the same as writing English. It
makes somewhat more sense to say that English is a collection of legal sentences. We
can say that the sentence “The cat is in the hat” is an element of the English language
and that the string “dumpling dumpling put notational inquisition” is not. Taking
sentences to be the basic units may seem arbitrary (why not paragraphs?), except that
rules of English grammar usually deal specifically with constructing sentences. In
CHAPTER 1 Basic Mathematical Objects 29

the case of a high-level programming language, such as C or Pascal, the most rea-
sonable way to think of the language as a set of strings is perhaps to take the strings
to be complete programs in the language. (We normally ask a compiler to check the
syntax of programs or subprograms, rather than individual statements.) Though we
sometimes speak of “words” in a language, we should therefore keep in mind that if
by a word we mean a string that is an element of the language, then a single word
incorporates the rules of syntax or grammar that characterize the language.
When we discuss a language, we begin by specifying the alphabet, which contains
all the legal symbols that can be used to form strings in the language. An alphabet is a
finite set of symbols. In the case of common languages like English, we would want
the alphabet to include the 26 letters, both uppercase and lowercase, as well as blanks
and various punctuation symbols. In the case of programming languages, we would
add the 10 numeric digits. Many examples in this book, however, involve a smaller
alphabet, sometimes containing only two symbols and occasionally containing only
one. Such alphabets make the examples easier to describe and can still accommodate
most of the features we will be interested in.
A string over an alphabet © is obtained by placing some of the elements of &
(possibly none) in order. The length of a string x over & is the number of symbols
in the string, and we will denote this number by |x|. Some of the strings over the
alphabet {a, b} are a, baa, aba, and aabba, and we have |a| = 1, |baa| = |aba| = 3,
and |aabba| = 5. Note that when we write a, we might be referring either to the
symbol in the alphabet or to the string of length 1; for our purposes it will usually not
be necessary to distinguish these. The null string (the string of length 0) is a string
over , no matter what alphabet D is. We denote it by A. (To avoid confusion, we
will never allow the letter A to represent an element of ¥.)
For any alphabet ©, the set of all strings over X is denoted by &*, and a language
over » is therefore a subset of &*. For & = {a, b}, we have

b* = {a, b}* = {A, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, ...}

A few examples of languages over & are

{A, a, aa, aab}

{x € {a, b}* | |x| < 8}


{x € {a, b}* | |x| is odd}
{x € {a, b}* |na(x) = ns(*)}
{x © {a, b}* | |x| => 2 and x begins and ends with b}

In the fourth example, n,(x) and np(x) represent the number of a’s and the number
of b’s, respectively, in the string x.
Because languages are sets of strings, new languages can be constructed using set
operations. For any two languages over an alphabet ©, their union, intersection, and
difference are also languages over &. When we speak of the complement of a language
over D, we take the universal set to be the language &*, so that L'=x* — L. Note
that any two languages can be considered to be languages over a common alphabet:
if Lj] © Uf and Lg © x3, then L; and L> are both subsets of (2; U X2)*. This
30 PART 1 Mathematical Notation and Techniques

creates the possibility of confusion, since now the complement of L; might be taken
to be either Xf — Ly or (X) U X2)* — Ly. However, it will usually be clear which
alphabet is referred to.
The concatenation operation on strings will also allow us to construct new lan-
guages. If x and y are elements of &*, the concatenation of x and y is the string xy
formed by writing the symbols of x and the symbols of y consecutively. If x = abb
and y = ba, xy = abbba and yx = baabb. For any string x, xA = Ax = x.
Clearly, concatenation is associative: For any strings x, y, and z, (xy)z = x(yz).
This allows us to concatenate several strings without specifying the order in which
the various concatenation operations are actually performed.
We say that a string x is a substring of another string y if there are strings w and
z, either or both of which may be null, so that y = wxz. The string car is a substring
of each of the strings descartes, vicar, carthage, and car, but not of the string charity.
A prefix of a string is an initial substring. For example, the prefixes of abaa are A
(which is a prefix of every string), a, ab, aba, and abaa. Similarly, a suffix is a final
substring.
Now that we have the concatenation operation, we can apply it to languages as
well as to strings. If L,, Ly C &*,

LyL2= {xy |x € L; and y € L>}

For example,

{hope, fear}{less, fully} = {hopeless, hopefully, fearless, fearfully}


Just as concatenating a string x with A produces x, concatenating any language L
with {A} produces L. In other words, L{A} = {A}L = L.
We use exponential notation to indicate the number of items being concatenated.
These can be individual symbols, strings, or languages. Thus, if © is an alphabet,
Cea > andi ey

a* =aa---a
x S=XX++ +X

yr. Da (re S* Nae)


Dee ale ted
where in each case there are k factors altogether. An important special case is the one
in which k = 0:

a=A xo =A
DS fAy ES A
These last four rules are analogous to sieth from algebra, and you can understand
them the same way. For any real number x, x° is defined to be 1. One reason is that
we want the formula x?x4 = x?T4 to mmole for every p and q, and in particular we
want x°x4 to be x4. This means that x° should be the unit of multiplication (the real
number u for which uy = y for every y), which is 1. The string that is the unit of
concatenation is A (because Ay= y for any string y), and the unit of concatenation
CHAPTER 1 Basic Mathematical Objects 31

for languages is {A}. Therefore, these are the appropriate choices for x° and L°,
respectively.
L* means the set of all strings that can be obtained by concatenating k elements
of L. Next we define the set of all strings that can be obtained by concatenating any
number of elements of L:
ore
|
hoe LJ Li

i=0

The operation * in this formula is often called the Kleene star, after the mathematician
S.C. Kleene. This use of the * symbol is consistent with using D* to represent the set of
strings over &, because strings are simply concatenations of zero or more symbols.
Note that A is always an element of L*, no matter what L is, since L° = {A}.
Finally, we denote by L* the set of all strings obtainable by concatenating one or
more elements of L:
CO

Lee LU Jy
i=l

You can check that Lt = L*L = LL*. The two languages L* and L+ may in fact
be equal—see Exercise 1.38.
Strings, by definition, are finite (have only a finite number of symbols). Most
interesting languages are infinite (contain an infinite number of strings). However,
in order to work with these languages we must be able to specify or describe them in
ways that are finite. At this point we may distinguish two possible approaches to this
problem, which can be illustrated by examples:

L, = {ab, bab}* U {b}{bb}*

tn = {xe {a,b} |ng(x) = np(x)}


In the case of L;, we describe the language by saying how an arbitrary string in the lan-
guage can be constructed: either by concatenating an arbitrary number of strings, each
of which is either ab or bab, or by concatenating the string b with an arbitrary number
of copies of the string bb. We describe L2, on the other hand, by specifying a property
that characterizes the strings in the language. In other words, we say how to recognize
a string in the language: Count the number of a’s and the number of b’s and compare
the two. There is not always a clear line separating these approaches. The definition

L3 = {byb | y € {a, b}"}


which is another way of describing the last language in our original group of five
examples, could be interpreted as a method of generating strings in L3 (start with an
arbitrary string y and add b to each end) or as a method of recognizing elements of L3
(check that the symbol on each end is b and that the total length is at least 2). Even a
definition that clearly belongs to the first category, like that of L;, might immediately
suggest a way of determining whether an arbitrary string belongs to L;; and there
might be obvious ways of generating all the strings in L2, although the definition
givenis clearly of the second type.
32 PART 1 Mathematical Notation and Techniques

We will be studying, on the one hand, more and more general ways of generating
languages, beginning with ways similar to the ones used in the definition of L;, and
on the other hand, corresponding methods of greater and greater sophistication for
recognizing strings in these languages. In the second approach, it will be useful to
think of the algorithm for recognizing the language as being embodied in an abstract
machine, and a precise description of the machine will effectively give us a precise
way of specifying which strings are in the language. Initially these abstract machines
will be fairly primitive, since it turns out that languages like L; can be recognized
easily. A language like L2 will require a more powerful type of abstract machine to
recognize it, as well as a more general method of generating it. Before we are through,
we will study machines equivalent in power to the most sophisticated computer.

EXERCISES
1.1. Describe each of these infinite sets precisely, using a formula that does not
involve “...”. If you wish, you can use VV, R, Z, and other sets discussed in
the chapter.
a. {0,—1,2, —3,4, —5,...}
by (1/2, 1/4; 3/450)8, 3/8, 5/8>7/8,.1/16, 3/16, 5/16, 7/16, 37}
c. {10, 1100, 111000, 11110000, ...} (a subset of {0, 1}*)
d. {{O}, {1}, {2}, .. J
Cre Ols (0, Or hs 2) Ondo 73 ex. 3
Pole (ORL 2. SOF INO 354 251 ONT AOn ea
(OR? So oes}
1.2. Label each of the eight regions in Figure 1.2, using A, B, C, and appropriate
set operations.
1.3. Use Venn diagrams to verify each of the set identities (1.1)—-(1.19).
1.4. Assume that A and B are sets. In each case, find a simpler expression
representing the given set. The easiest way is probably to use Venn
diagrams, but also practice manipulating the formulas using the set identities
(1.1)-(1.19).
ae AR}
ere AN
CROLL BY SA
d. (A— B)UB— AUN B)
e. (A'M B’y
f. (A’UB’’
g. AU(BN(A—(B A)))
bh A (Bb (A U( Ba Ay)
1.5. Show using Venn diagrams that the symmetric difference operation ®
satisfies the associative property A@(B@C)=(A® B)@C.
1.6. In each case, find a simpler statement (one not involving the symmetric
difference operation) equivalent to the given one. Assume in each case that
CHAPTER 1 Basic Mathematical Objects 33

A and B are subsets of U.


a A@®B=A
b A®BB=A-B
c. A®B=AUB
d. A®B=ANB
AOB =A’
Le In each case, find an expression for the indicated set, involving A, B, C, and
the three operations U, M, and ’.
a. {x|x € Aor x € B but not both}
{x|x is an element of exactly one of the three sets A, B, and C}
{x|x is an element of at most one of the three sets A, B, and C}
{x|x is an element of exactly two of the three sets A, B, and C}
Cle
OS {x|x is an element of at least one and at most two of the three sets A, B,
and C}
1.8. For each integer n, denote by C,, the set of all real numbers less than n, and
for each positive number n let D,, be the set of all real numbers less than
1/n. Express each of the following sets in a simpler form not involving
unions or intersections. (For example, the answer to (a) is Cio.) Since 00 is
not a number, the expressions C,, and D, do not make sense and should not
appear in your answer.
S Una Cn
Bene Ds
Gs Sty C.
d. Mia Ds
8 i sad Oe
be
Bh (hpeCn
ia Bocce 38
i Lae Cc,
j. (nace&n
1.9. One might think that an empty set of real numbers and an empty set of sets
are two different objects. Show that according to our definitions, there is
only one set containing no elements.
1.10. List the elements of 22°”.
1.11. Denote by p,q, andr the statements a = 1, b = 0, and c = 3, respectively.
Write each of the following statements symbolically, using p,q,7r, A, V, 7,
and >.
a. Eithera = lorb #0.
b. b =O but neither a = 1 norc = 3. (Note: in logic, unlike English, “but”
and “and” are interchangeable.)
34 PART 1 Mathematical Notation and Techniques

It is not the case that both a 4 1 andb = 0.


If a £ 1 then c = 3, but otherwise c ¥ 3.
b = O only if either a = 1 orc =3.
sy
[SsIf it is not the case that eithera= 1 orb = 0, thenonlyifc =3isa Al.
1.12. Which of these statements are true?
If1+1=2,then2+2
= 4.
b-pl=3
onlyif 2--2:= 6,
(ll =2. and | = 3) if andonly
if 1 = 3:
If1+1=3then1+2=3.
iff =-2, then2-="3 and-2'= 4.
Fd Only if3
We
eh
ere —-1 =2is1—2=0.
1.13. In each case, construct a truth table for the statement and use the result to
find a simpler statement that is logically equivalent.
as (p= gq) A (p — 29)
Db. pv (pp— @q)
c. pA(p—@)
d.. (p> q) A(=p = q)
e. po(p<gq)
f. qA\(p>4q)
1.14. A principle of classical logic is modus ponens, which asserts that the
proposition (p A (p > q)) > q isa tautology, or that p A (p > q)
logically implies g. Show that this result requires the truth table for the
conditional statement r — s to be defined exactly as we defined it in the two
cases where r is false.
1.15. Suppose m, and m are integers representing months (1 < m; < 12), and d,
and d are integers representing days (d; is at least 1 and no larger than the
number of days in month m;). For each i, the pair (m;, d;) can be thought of
as representing a date. We wish to write a logical proposition involving the
four integers that says (m,, d;) comes before (m2, d) in the calendar.
a. Find such a proposition that is a disjunction of two propositions.
b. Find such a proposition that is a conjunction of two propositions.
1.16. Show that the statements p V q Vr V s and (=p Aq A-r) > 5 are
equivalent.
1.17. In each case, say whether the statement is a tautology, a contradiction, or
neither.
Bie DV (p> p)
PAD p)
| ieee soe)8
Pp) i (op =p)
sr
Key
OES (Pp) Ap Sep)
CHAPTER 1 Basic Mathematical Objects 35

1.18. Consider the statement “Everybody loves somebody sometime.” In order to


express this precisely, let L(x, y, t) be a proposition involving the three free
variables x, y, and t that expresses the fact that x loves y at time t. (Here x
and y are humans, and t is a time.) Using this notation, express the original
statement using quantifiers.
1.19. Let F(x, t) be the proposition: You can fool person x at time t. Using this
notation, write a quantified statement to formalize Abraham Lincoln’s
statement: “You can fool all the people some of the time, and you can fool
some of the people all the time, but you can not fool all the people all of the
time.” Give at least two different answers (not equivalent), representing
different possible interpretations of the statement.
1.20. In each case below, say whether the given statement is true for the universe
(0, 1) = {x € R|0 <x < 1}, and say whether it is true for the universe
[Oe ie =a by
a. Wx(dy(x > y))
b. Vx(y@ 2 y))
Ce SvGVx(x.>ay))
d. Ay(Vx(x = y))
1.21. Suppose A and B are finite sets, A has n elements, and f : A > B.
a. If f is one-to-one, what can you say about the number of elements of B?
b. If f is onto, what can you say about the number of elements of B?

1.22. In this problem, as usual, R denotes the set of real numbers, R* the set of
nonnegative real numbers, NV the set of natural numbers (nonnegative
integers), and 2® the set of subsets of R. [0, 1] denotes the set
{x € R|0<x <1}. Ineach case, say whether the indicated function is
one-to-one, and say what its range is.
a. fa: R* > R* defined by fa(x) = x +a (where a is some fixed
element of R*)
d:R+ > Rt defined by d(x) = 2x
t:.N = N defined by t(x) = 2x
g: Rt > N defined by g(x) = Lx] (the largest integer < x)
p:R* > Rt defined by p(x) =x + Lx]
i: 2% — 2 defined by i(A) = AN [0, 1]
u:2® — 2® defined by u(A) = AU[0, 1]
LS: m:R+
REO
PO”
eS — Rt defined by m(x) = min(x, 2)
i. M: Rt > R* defined by M(x) = max(x, 2)
j. s: Rt > R* defined by s(x) = min(x, 2)+ max(x, 2)
1.23. Suppose A and B are sets, f: A> B,andg: BA. If f(g(y)) = y for
every y € B, then f isa____ functionand gisa__ function. Give
reasons for your answers.
1.24. Let A = {2, 3,4, 6, 7, 12, 18} and B = {7, 8, 9, 10}.
36 PART 1 Mathematical Notation and Techniques

a. Define f: A> Bas follows: f(2) =7; f(3) =9; f(4) = 8;


fO) = $7) 21007 U2): 9,18). 7. Pindaifuncnone 2B — A
so that for every y € B, f(g(y)) = y. Is there more than one such g?
b. Define g: B > Aas follows: g(7) = 6; g(8) = 7; g(9) = 2;
g(10) = 18. Find a function f : A > B so that for every y € B,
f(g()) = y. Is there more than one such f?
1.25. Let fa, d,t, g, i, and u be the functions defined in Exercise 1.22. In each
case, find a formula for the indicated function, and simplify it as much as
possible.
god
tog
Ov

do fa
f, od
80 fa
uol
Sacs
oe
Se
SS Lou
1.26. In each case, show that f is a bijection and find a formula for f~!.
a. f : Rk — R
definedby f(x) =x
b. f: Rt > {x € R|0 <x < 1} defined
by f(x) = 1/(14+ x)
c f:RxR— Rx R
defined
by f(x,y) = (a+ y,x — y)
1.27. Show that if f : A > B is a bijection, then f~ is also a bijection, and
Cees:
1.28. In each case, a relation on the set {1, 2, 3} is given. Of the three properties,
reflexivity, symmetry, and transitivity, determine which ones the relation has.
Give reasons.
a R= 1(l 3) 1) 2)
bi k= {1 1), (2, 2),.6; 5) acl, 2)}
c. R=6
1.29. For each of the eight lines of the table below, construct a relation on (235)
that fits the description.

true true true


true true false
true false true
true false false
false true true
false true false
false false true
false false false
CHAPTER 1 Basic Mathematical Objects 37

1.30. Three relations are given on the set of all nonempty subsets of VV. In each
case, say whether the relation is reflexive, whether it is symmetric, and
whether it is transitive.
a. R is defined by: ARB if and only if A C B.
b. R is defined by: ARB if and only if AN B £ @.
c. R is defined by: ARB if and only if1 ¢ AN B.
1.31. How would your answer to Exercise 1.30 change if in each case R were the
indicated relation on the set of all subsets of NV’?
1.32. Let R be a relation on a set S. Write three quantified statements (the domain
being S in each case), which say, respectively, that R is not reflexive, R is
not symmetric, and R is not transitive.
1.33. In each case, a set A is specified, and a relation R is defined on it. Show that
R is an equivalence relation.
a. A = 2°, for some set S. An element X of A is related via R to an
element Y if there is a bijection from X to Y.
b. A is an arbitrary set, and it is assumed that for some other set B,
f : A — Bisa function. For x, y € A,xRy if f(x) = f(y).
c. Suppose U is the set {1,2,..., 10}. A is the set of all statements over
the universe U—that is, statements involving at most one free variable,
which can have as its value an element of U. (Included in A are the
statements false and true.) For two elements r and s of A,rRs ifr }s.
d. A is the set R, and for x, y € A, xRy if x — y is’an integer.
e. A is the set of all infinite sequences x = x9x,x2--- of 0’s and 1’s. For
two such sequences x and y, x Ry if there exists an integer k so that
A; = y; for every 1 = k.
1.34. In Exercise 1.33a, if S has exactly 10 elements, how many equivalence
classes are there for the relation R? Describe them. What are the elements of
the equivalence class containing {a, b} (assuming a and b are two elements
of S)?
1.35. In Exercise 1.33, if A and B are both the set of real numbers, and f is the
function defined by f(x) = x”, describe the equivalence classes.
1.36. In Exercise 1.33b, suppose A has n elements and B has m elements.
a. If f is one-to-one (and not necessarily onto), how many equivalence
classes are there?
b. If f is onto (and not necessarily one-to-one), how many equivalence
classes are there?
1.37. In Exercise 1.33c, how many equivalence classes are there? List some
elements in the equivalence class containing the statement
(x = 3) V (x = 7). List some elements in the equivalence class containing
the statement true, and some in the equivalence class containing false.
1.38. Let L be a language. It is clear from the definitions that Lt C L*. Under
what circumstances are they equal?
38 PART 1 Mathematical Notation and Techniques

1.39. a. Find a language L over {a, b} that is neither {A} nor {a, b}* and satisfies
1 eas bas
b. Find an infinite language L over {a, b} for which L # L*.
1.40. In each case, give an example of languages L; and L2 satisfying L;L2 =
LL as well as the additional conditions indicated.
a. Neither language is a subset of the other, and neither language is {A}.
b. Lj, is a proper nonempty subset of L2 (proper means L; # Lz), and
L, # {A}.
1.41. Let L, and L» be subsets of {0, 1}*, and consider the two languages L} U L5
and (L; UL>)*.
a. Which of the two is always a subset of the other? Why? Give an
example (i.e., say what L; and L> are) so that the opposite inclusion
does not hold.
b.. If Lj © L5, then(L, U £o)* = L> = LU L>. Similarly if £5 C L;.
Give an example of languages L,; and L2 for which Lj € L3, L5 < Li,
and LF UL3 = (1, UL))*.
1.42. Show that if A and B are languages over & and A C B, then A* C B*.
1.43. Show that for any language:L, L* = (L*)* = (Lt)* = (L*)*.
1.44, For a finite language L, denote by || the number of elements of L. (For
example, |{A, a, ababb}| = 3.) Is it always true that for finite languages A
and B, |AB| = |A||B|? Either prove the equality or find a counterexample.
1.45. List some elements of {a, ab}*. Can you describe a simple way to recognize
elements of this language? In other words, try to find a proposition p(x) so
that
a. {a, ab}* is precisely the set of strings x satisfying p(x); and
b. for any x, there is a simple procedure to test whether x satisfies p(x).
1.46. a. Consider the language L of all strings of a’s and b’s that do not end with
b and do not contain the substring bb. Find a finite language S so that
| eine
b. Show that there is no language S so that the language of all strings of a’s
and b’s that do not contain the substring bb is equal to S*.
1.47, Let L1, Lo, and L3 be languages over some alphabet ©. In each part below,
two languages are given. Say what the relationship is between them. (Are
they always equal? If not, is one always a subset of the other?) Give reasons
for your answers, including counterexamples if appropriate.
a. L,(L2 M L3), L,L> NM L,L3

baal ail sialon


eh E35) (Labe)*
d. Li(L2L4)*, (LjL2)*L}
CHAPTER 1 Basic Mathematical Objects 39

MORE CHALLENGING PROBLEMS


1.48. Show the associative property of symmetric difference (see Exercise 15)
without using Venn diagrams.
1.49. Suppose that for a finite set S, |S| denotes the number of elements of 5 , and
let A, B, C, and D be finite sets.
a. Show that |A U B| = |A|+|B|—|ANM Bl.
b. Show that |A UBUC|=|A|+4+ |B] + |c|— IAN B|—|ANC| —
|BOC|+|ANBNC|. (Anelement x can be in none, one, two, or three
of the sets A, B, and C. For each case, consider the contribution of x to
each of the terms on the right side of the formula.)
Find a formula for |A U BUC U Dj.
1.50. How many elements are there in the following set?

(B, {D}, {D, {O}}, (0, {{B, {9}, (8, (O}}}}


b. Describe precisely the algorithm you used to answer part (a).
1.51. Simplify the given set as much as possible in each case. Assume that all the
numbers involved are real numbers.
a.

(\ix | lx -al <r}


r>0

Ut Ix -al <r}
r<l

1.52. Is it possible for two distinct, nonempty sets A and B to satisfy


A x B C B x A? Give either an example of sets A and B for which this is
true or a general reason why it is impossible.
1,53. Suppose that A and B are subsets of a universal set U.
a. What is the relationship between 24? and 24 U 28? (Under what
circumstances are they equal? If they are not equal, is one necessarily a
subset of the other, and if so, which one?) Give reasons for your answers.
b. What is the relationship between 24% and 24 1 28? Give reasons.
What is the relationship between 247 and 24 @ 28? (The operator ® is
symmetric difference, as in Exercise 1.5.) Give reasons.
d. What is the relationship between 24) and (24)'? (Both are subsets of
2”.)Give reasons.
1.54. Find a statement logically equivalent to p < q that is in the form of a
disjunction, and simplify it as much as possible. (One approach is to use the
last paragraph of Section 1.2.2, from which it follows that p < q is
equivalent to (=p V q) A (p V 7q), and then to use distributive laws.)
40 PART 1 Mathematical Notation and Techniques

135. In each case, write a quantified statement, using the formal notation
discussed in the chapter, that expresses the given statement. In both cases the
set A is assumed to be a subset of the domain, not necessarily the entire
domain.
a. There is exactly one element x in the set A satisfying the condition
P—that is, for which the proposition P(x) holds.
b. There are at least two distinct elements in the set A satisfying the
condition P.
1.56. Below are four pairs of statements. In all cases, the universe for the
quantified statements is assumed to be the set \V. We say one statement
logically implies the other if, for any choice of statements p(x) and q(x) for
which the first is true, the second is also true. In each case, say whether the
first statement logically implies the second, and whether the second logically
implies the first. In each case where the answer is no, give an example of
statements p(x) and g(x) to illustrate.
a.

Vx (p(x) V q(x))
Vx (p(x)) V Vx (q(x))
b.

Vx (p(x) A q(x))
Vx(p(x)) A Vx(q(x))
cy

Ax (p(x) V q(x))
dx (p(x)) V Ax(q(x))
d.

Ax (p(x) A q(x))
Ax (p(x)) A Ax (g(x))
1.57. Suppose A and B are sets and f : A > B. Let S and T be subsets of A.
a. Is the set f(S UT) asubset of f(S) U f(T)? If so, give a proof; if not,
give a counterexample (i.e., specify sets A, B, S, and T and a function
f).
b. Is the set f(S) U f(T) a subset of f(S UT)? Give either a proof or a
counterexample.
c. Repeat part (a) with intersection instead of union.
Repeat part (b) with intersection instead of union.
In each of the first four parts where your answer is no, what extra
assumption on the function f would make the answer yes? Give reasons
for your answer.
CHAPTER 1 Basic Mathematical Objects 41

1.58. Suppose n is a positive integer and X = {1,2,...,n}. Let A = 2*; let B be


the set of all functions from X to {0, 1}, and let C = {0, 1}” = {0, Tex
{O, 1} x --+ x {0, 1} (where there are n factors).
a. Describe an explicit bijection f from A to B. (In other words, define a
function f : A > B, by saying for each subset S of X what function
from X to {0, 1} f(S) is, and show that f is a bijection.
b. Describe an explicit bijection g from B to C. (In this case, you have to
say, for each function t : X — {0, 1}, what n-tuple g(t) is, and then
show g is a bijection.)
c. Describe an explicit bijection h from C to A. (Here you have to start
with an n-tuple N = (ij, i2,..., in) and say what set h(N) is, and then
show h is a bijection.)
1.59. Let E be the set {1, 2,3, ...}, § the set of nonempty subsets of E, T the set
of nonempty proper subsets of \’, and P the set of partitions of NV into two
nonempty subsets.
a. Suppose f : T — P is defined by the formula f(A) = {A, N — A} (in
other words, for a nonempty subset A of NV, f(A) is the partition of VV
consisting of the two subsets A and NV’ — A). Is f a bijection from T to
P? Why or why not?
b. Suppose g : S — P be defined by g(A) = {A, N — A}. Is g a bijection
from S to P? Why or why not?
1.60. Suppose U is a set, o is a binary operation on U, and Jp is a subset of U.
a. Let S be the set of subsets of U that contain Jp as a subset and are closed
under the operation 0; let T = NscgS. Show that Jo C T and that T is
closed under o.
b. Show that the set T defined in part (a) is the smallest subset of U that
contains Jp and is closed under o, in the sense that for any other such set
Feiliveelie
1.61. Consider the following “proof” that any relation R on a set A which is both
symmetric and transitive must also be reflexive:
Let a be any element of A. Let b be any element of A for which
aRb. Then since R is symmetric, bRa. Now since R is transitive,
and since aRb and bRa, it follows that aRa. Therefore, R is
reflexive.
Your answer to Exercise 1.29 shows that this proof cannot be correct. What
is the first incorrect statement of the proof, and why is it incorrect?
1.62. Suppose A is a set having n elements.
a. How many relations are there on A?
b. How many reflexive relations are there on A?
c. How many symmetric relations are there on A?
d. How many relations are there on A that are both reflexive and
symmetric?
42 PART 1 Mathematical Notation and Techniques

1.63. Suppose R is a relation on a nonempty set A.


a. Define R' = RU {(x, y) | yRx}. Show that R* is symmetric and is the
smallest symmetric relation on A containing R (i.e., for any symmetric
relation R; with R C R,, R® C R}).
b. Define R’ to be the intersection of all transitive relations on A containing
R. Show that R’ is transitive and is the smallest transitive relation on A
containing R.
c. Let RY = RU {(x, y) |dz(xRz and zRy)}. Is R“ equal to the set R’ in
part (b)? Either prove that it is, or give an example in which it is not.
The relations R* and R’ are called the symmetric closure and transitive
closure of R, respectively.
1.64. Let R be the equivalence relation in Exercise 1.33a. Assuming that S is
finite, find a function f : 25 > AN so that for.any x, y € 2°, xRy if and only
if f(x) = f(y).
1.65. Let n be a positive integer. Find a function f : NV — N so that for any
x,y EN,x =, y if and only if f(x) = f(y).
1.66. Let A be any set, and let R be any equivalence relation on A. Find a set B
and a function f : A > B so that for any x, y € A, x Ry if and only if
f(x) = f(y).
1.67. Suppose R is an equivalence relation on a set A. A subset S C A is pairwise
inequivalent if no two distinct elements of S are equivalent. S is a maximal
pairwise inequivalent set if S is pairwise inequivalent and every element of
A is equivalent to some element of S. Show that a set S is a maximal
pairwise inequivalent set if and only if it contains exactly one element of
each equivalence class.
1.68. Suppose R, and R2 are equivalence relations on a set A. As discussed in
Section 1.4, the equivalence classes of R, and R> form partitions P, and Pp,
respectively, of A. Show that R; C R> if and only if the partition P, is finer
than P) (i.e., every subset in the partition P» is the union of one or more
subsets in the partition P;).
1.69. Suppose & is an alphabet. It is obviously possible for two distinct strings x
and y over & to satisfy the condition xy = yx, since this condition is always
satisfied if y = A. Is it possible under the additional restriction that x and y
are both nonnull? Either prove that this cannot happen, or describe precisely
the circumstances under which it can.
1.70. Show that there is no language L so that {aa, bb}*{ab, ba}* = L*.
1.71. Consider the language L = {x € {0, 1}* | x = yy for some string y}. We
know that L = L{A} = {A}L (because any language L has this property). Is
there any other way to express L as the concatenation of two languages?
Prove your answer.
C HAPTER

Mathematical Induction
and Recursive Definitions

2.1|PROOFS
A proof of a statement is essentially just a convincing argument that the statement is
true. Ideally, however, a proof not only convinces but explains’ why the statement is
true, and also how it relates to other statements and how it fits into the overall theory. A
typical step in a proof is to derive some statement from (1) assumptions or hypotheses,
(2) statements that have already been derived, and (3) other generally accepted facts,
using general principles of logical reasoning. In a very careful, detailed proof, we
might allow no “generally accepted facts” other than certain axioms that we specify
initially, and we might restrict ourselves to certain specific rules of logical inference,
by which each step must be justified. Being this careful, however, may not be feasible
or worthwhile. We may take shortcuts (“It is obvious that ...” or “It is easy to show
that ...”) and concentrate on the main steps in the proof, assuming that a conscientious
or curious reader could fill in the low-level details.
Usually what we are trying to prove involves a statement of the form p > q. A
direct proof assumes that the statement p is true and uses this to show gq is true.

The Product of Two Odd Integers Is Odd | EXAMPLE 2.1 |


To prove: For any integers a and b, if a and b are odd, then ab is odd.

@ Proof
We start by saying more precisely what our assumption means. An integer n is odd if there
exists an integerx so that n = 2x + 1. Now let a and b be any odd integers. Then according
to this definition, there is an integer x so that a = 2x + 1, and there is an integer y so that

43
44 PART 1 Mathematical Notation and Techniques

b = 2y + 1. We wish to show that there is an integer z so that ab = 2z + 1. Let us therefore


calculate ab:

ab = (2x + 1)@y +1)


=4xy+2x+2y+1
=2Qxy+x+y)+1

Since we have shown that there is a z, namely, 2xy + x + y, so that ab = 2z + 1, the proof is
complete.

This is an example of a constructive proof. We proved the statement “There exists


z such that ...” by constructing a specific value for z that works. A nonconstructive
proof shows that such a z must exist without providing any information about its
value. Such a proof would not explain, it would only convince. Although in some
situations this is the best we can do, people normally prefer a constructive proof if
one is possible. In some cases, the method of construction is interesting in its own
right. In these cases, the proof is even more valuable because it provides an algorithm
as well as an explanation.
Since the statement we proved in Example 2.1 is the quantified statement “For
any integers a and b,...,” it is important to understand that it is not sufficient to
give an example of a and b for which the statement is true. If we say “Let a = 45
and b = 11; then a = 2(22) + 1 and b = 2(5) + 2; therefore, ab = (2 * 22 + 1)
(2*54+ 1) =... = 2 247 + 1,” we have proved nothing except that 45 « 11 is
odd. Finding a value of x so that the statement P(x) is true is a proof of the statement
“There exists x such that P(x).” Finding a value of x for which P (x) is false disproves
the statement “For every x, P(x)” (or, if you prefer, proves the statement “It is not
the case that for every x, P(x)’); this is called a proof by counterexample. To prove
“For every x, P(x),” however, requires that we give an argument in which there are
no restrictions on x. (Let us return briefly to the example with 45 and 11. It is not
totally unreasonable to claim that the argument beginning “Let a = 45 and b = 11”
is a proof of the quantified statement—after all, the algebraic steps involved are the
same as the ones we presented in our official proof. The crucial point, however, is that
there is nothing special about 45 and 11. Someone who offers this as a proof should
at least point out that the same argument would work in general. For an argument
this simple, such an observation may be convincing; even more convincing is an
argument involving a and b like the one we gave originally.)
The alternative to a direct proof is an indirect proof, and the simplest form of
indirect proof is a proof by contrapositive, using the logical equivalence of p > q
and -q > —p.

| EXAMPLE 2.2 _ A Proof by Contrapositive


To prove: For any positive integers i, j, and n, ifi x j =n, then eitheri < /n or j < /n.
CHAPTER 2 Mathematical Induction and Recursive Definitions 45

& Proof
The statement we wish to prove is of the general form “For every x, if p(x), then g(x).” For
each x, the statement “If p(x) then g(x)” is logically equivalent to “If not g(x) then not p(x),”
and therefore (by a general principle of logical reasoning) the statement we want to prove is
equivalent to this: For any positive integers i, j, and n, if it is not the case that i < J/n or
j < Jn, theni * j #n.
If it is not true thati < /n or j < ./n, theni > /n and j > ./n. A generally accepted
fact from mathematics is that if a and b are numbers with a > b, and c is a number > 0, then
ac > bc. Applying this to the inequality i> /n with c = j, we obtaini x j > /n xj. Since
n > 0, we know that ./n > 0, and we may apply the same fact again to the inequality j > /n,
this time letting c = ./n, to obtain j,/n > ./n./n =n. We now havei * j > j./n > n, and
it follows thati * 7 Zn.
The second paragraph in this proof illustrates the fact that a complete proof, with no details
left out, is usually not feasible. Even though the statement we are proving here is relatively
simple, and our proof includes more detail than might normally be included, there is still a lot
left out. Here are some of the details that were ignored:

1. —=(p V q) is logically equivalent to —p A 7g. Therefore, if it is not true that i< /n or


j < Jn, theni < /n andj £ Jn.
2. For any two real numbers a and b, exactly one of the conditions a < b,a > b, anda = b
holds. (This is a generally accepted fact from mathematics.) Therefore, if i< ./n, then
i> ./n, and similarly for /.
3. For any two real numbers a and b, a x b = b xa. Therefore, ./n * j = j.J/n.
4. The > relation on the set of real numbers is transitive. Therefore, from the fact that
ix j > j./nand j./n > n it follows thati * j > n.

Even if we include all these details, we have not stated explicitly the rules of inference
we have used to arrive at the final conclusion, and we have used a number of facts about real
numbers that could themselves be proved from more fundamental axioms. In presenting a
proof, one usually tries to strike a balance: enough left out to avoid having the minor details
obscure the main points and put the reader to sleep, and enough left in so that the reader will
be convinced.

A variation of proof by contrapositive is proof by contradiction. In its most


general form, proving a statement p by contradiction means showing that if it is not
true, some contradiction results. Formally, this means showing that the statement
ap — false is true. It follows that the contrapositive statement true — p is true, and
this statement is logically equivalent to p. If we wish to prove the statement p > g
by contradiction, we assume that p — q is false. Because of the logical equivalence
of p > q and —p Vv q, this means assuming that —(—p V q), or p Aq, is true. From
this assumption we try to derive some statement that contradicts some statement we
know to be true—possibly p, or possibly some other statement.
46 PART 1 Mathematical Notation and Techniques

/2 \|s Irrational
A real number x is rational if there are two integers m and n so that x = m/n. We present
one of the most famous examples of proof by contradiction: the proof, known to the ancient
Greeks, that /2 is irrational.
& Proof
Suppose for the sake of contradiction that /2 is rational. Then there are integers m’ and n’
with /2 = m’/n’. By dividing both m’ and n’ by all the factors that are common to both, we
obtain 2 = m/n, for some integers m and n having no common factors. Since m/n = “/2,
m =nJ2. Squaring both sides of this equation, we obtain m? = 2n?, and therefore m? is even
(divisible by 2). The result proved in Example 2.1 is that for any integers a and b, if a and b are
odd, then ab is odd. Since a conditional statement is logically equivalent to its contrapositive,
we may conclude that for any a and b, if ab is not odd, then either a is not odd or b is not odd.
However, an integer is not odd if and only if it is even (Exercise 2.21), and so for any a and b,
if ab is even, then a or b is even. If we apply this when a = b = m, we conclude that since
m 2 is even, m must be even. This means that for some k, m = 2k. Therefore, (2k)* = 2n?.
Simplifying this and canceling 2 from both sides, we obtain 2k? = n?. Therefore, n? is even.
The same argument that we have already used shows that n must be even, and son = 27 for
some j. We have shown that m and n are both divisible by 2. This contradicts the previous
statement that m and n have no common factor. The assumption that s/2 is rational therefore
leads to a contradiction, and the conclusion is that /2 is irrational.

| EXAMPLE 2.4 | Another Proof by Contradiction


To prove: For any sets A, B, and C,if AN B=@andC C B, then ANC =@.

@ Proof
Again we try a proof by contradiction. Suppose that A, B, and C are sets for which the
conditional statement is false. Then AM B = 0, C C B, and ANC # @. Therefore, there
exists x with x € AMC, so that x € A and x € C. Since C C B and x € C, it follows
that x € B. Therefore, x € AN B, which contradicts the assumption that AM B = @. Since
the assumption that the conditional statement is false leads to a contradiction, the statement is
proved.

There is not always a clear line between a proof by contrapositive and one by
contradiction. Any proof by contrapositive that p — q is true can easily be refor-
mulated as a proof by contradiction. Instead of assuming that —q is true and trying
to show —p, assume that p and —@ are true and derive —p; then the contradiction is
that p and —p are both true. In the last example it seemed slightly easier to argue
by contradiction, since we wanted to use the assumption that C C B. A proof by
contrapositive would assume that A C # @ and would try to show that

=((AN
B= %) A(C C B))
CHAPTER 2 Mathematical Induction and Recursive Definitions 47

This approach seems a little more complicated, just because the formula we are trying
to obtain is more complicated.
It is often convenient (or necessary) to use several different proof techniques
within a single proof. Although the overall proof in the following example is not a
proof by contradiction, this technique is used twice within the proof.

There Must Be a Prime Between rn and n! | EXAMPLE2.5 |


For a positive integer n the number n! is defined to be the product n * (n — 1) *---*2 «1 of all
the positive integers less than or equal to n. To prove: For any integer n > 2, there is a prime
p Satisfying n < p <n!.

@ Proof
Since n > 2, two distinct factors in n! are n and 2. Therefore, n! > 2n =n+n>n-+1, and
thus n! — 1 > n. The number n! — 1 must have a factor p that is a prime. (See Example 1.2
for the definition of a prime. The fact that every integer greater than 1 has a prime factor is a
basic fact about positive integers, which we will prove in Example 2.11.) Since p is a divisor
of n! — 1, p <n! —1 <n!. This gives us one of the inequalities we need. To show the other
one, suppose for the sake of contradiction that p < n. Then since p is one of the positive
integers less than or equal ton, p is a factor of n!. However, p cannot be a factor of both n! and
n| — 1; if it were, it would be a factor of 1, their difference, and this is impossible. Therefore,
the assumption that p < n leads to a contradiction, and we may conclude thatn < p <n!.

na

Another useful technique is to divide the proof into separate cases; this is illus-
trated by the next example.

Strings of Length 4 Contain Substrings yy | EXAMPLE 2.6 |


To prove: Every string x in {0, 1}* of length 4 contains a nonnull substring of the form yy.

@ Proof
We can show the result by considering two separate cases. If x contains two consecutive 0’s
or two consecutive 1’s, then the statement is true for a string y of length 1. In the other case,
any symbol that follows a 0 must be a 1, and vice versa, so that x must be either 0101 or 1010.
The statement is therefore true for a string y of length 2.
Even though the argument is simple, let us state more explicitly the logic on which it
depends. We want to show that some proposition P is true. The statement P is logically
equivalent to true > P. If we denote by p the statement that x contains two consecutive 0’s
or two consecutive 1’s, then p V —p is true. This means true — P is logically equivalent to

(PNP): sak
which in turn is logically equivalent to

(Pp SERV
ANG Piae: Ff)
48 PART 1 Mathematical Notation and Techniques

This last statement is what we actually prove, by showing that each of the two separate condi-
tional statements is true.
In this proof, there was some choice as to which cases to consider. A less efficient approach
would have been to divide our two cases into four subcases: (i) x contains two consecutive 0’s;
(and so forth). An even more laborious proof would be to consider the 16 strings of length 4
individually, and to show that the result is true in each case. Any of these approaches is valid,
as long as our cases cover all the possibilities and we can complete the proof in each case.

The examples in this section provide only a very brief introduction to proofs.
Learning to read proofs takes a lot of practice, and creating your own is even harder.
One thing that does help is to develop a critical attitude. Be skeptical. When you
read a step in a proof, ask yourself, “Am I convinced by this?” When you have
written a proof, read it over as if someone else had written it (it is best to read aloud
if circumstances permit), and as you read each step ask yourself the same question.

2.2| THE PRINCIPLE OF MATHEMATICAL


INDUCTION
Very often, we wish to prove that some statement involving a natural number n is true
for every sufficiently large value of n. The statement might be a numerical equality:

yoisn+l)/2
i=1
The number of subsets of {1, 2,..., 2} is 2”.

It might be an inequality:

nS?

It might be some other assertion about n, or about a set with n elements, or a string
of length n:

There exist positive integers j and k so that n = 3j + 7k.


Every language with exactly n elements is regular.
Ifx € {0, 1}*, |x] =n, and x = Oy1, then x contains the substring 01.

(The term regular is defined in Chapter 3.) In this section, we discuss a common
approach to proving statements of this type.
In both the last two examples, it might seem as though the explicit mention of n
makes the statement slightly more awkward. It would be simpler to say, “Every finite
language is regular,” and this statement is true; it would also be correct to let the last
statement begin, “For any x and y in {0, 1}*, ifx = Oy1, ....” However, in both cases
the simpler statement is equivalent to the assertion that the original statement is true
for every nonnegative value of n, and formulating the statement so that it involves n
will allow us to apply the proof technique we are about to discuss.
CHAPTER 2 Mathematical Induction and Recursive Definitions 49

The Sum of the First n Positive Integers | EXAMPLE 2.7 |


We begin with the first example above, expressed without the summation notation:

Ve 2 eee eh — a 2
This formula is supposed to hold for every n > 1; however, it makes sense to consider it for
n = 0 as well if we interpret the left side in that case to be the empty sum, which by definition
is 0. Let us therefore try to prove that the statement is true for every n > 0.
How do we start? Unless we have any better ideas, we might very well begin by writing
out the formula for the first few values of n, to see if we can spot a pattern.

n=0: 0=00+1)/2
n=1: 0+1=104+1)/2
n=2: 0+14+2=224 1/2
n=3: 04+14+24+3=33+4+1)/2
n=4: 04+14+24+34+4=444+1/2
As we are verifying these formulas, we probably realize after a few lines that in checking a
specific case, say n = 4, it is not necessary to do all the arithmetic on the left side: 0+ 1+2+
3 +4. We can take the left side of the previous formula, which we have already calculated,
and add 4. When we calculated 0 + 1 + 2 + 3, we obtained 3(3 + 1)/2. So our answer for
n=4is

3(3 + 1)/2+4=43/24+ 1) =484+2)/2=4(4+4 1)/2

which is the one we wanted. Now that we have done this step, we can take care of n = 5 the
same way, by taking the sum we just obtained for n = 4 and adding S:

4(4+1)/2+5 =5(4/2 +1) =5(442)/2=56541)/2


These two calculations are similar—in fact, this is the pattern we were looking for, and we can
probably see at this point that it will continue. Are we ready to write our proof?

= Example 2.7. Proof Number 1


To show

0+1424+---+n=n(n+1)/2 n>0
for every

0) 0=00
+ 1)/2
(eas We 0+1=0(0+1)/2+1 (by using the result for n = 0)
=1(0/2 +1)
= 10+ 2)/2
= 1(1+1)/2
n=2: 0414+2=104+1)/2+2 (by using the result for n = 1)
=~) (V2)
=2(1
+ 2)/2
= 22+ 1)/2
PART 1 Mathematical Notation and Techniques

n=3: 04+1+4+24+3=2(241)/2+3 (by using the result for n = 2)


95 (2/2)
= 3(2
+ 2)/2
= 3(34+1)/2

Since this pattern continues indefinitely, the formula is true for every n > 0.

Now let us criticize this proof. The conclusion, “the formula is true for every
n > 0,” is supposed to follow from the fact that “this pattern continues indefinitely.”
The phrase “this pattern” refers to the calculation that we have done three times, to
derive the formula for n = 1 fromn = 0, forn = 2 fromn = 1, and for n = 3 from
n = 2. There are at least two clear deficiencies in.the proof. One is that we have
not said explicitly what “this pattern” is. The second, which is more serious, is that
we have not made any attempt to justify the assertion that it continues indefinitely.
In this example, the pattern is obvious enough that people might accept the assertion
without much argument. However, it would be fair to say that the most important
statement in the proof is the one for which no reasons are given!
Our second version of the proof tries to correct both these problems at once: to
describe the pattern precisely by doing the calculation, not just for three particular
values of n but for an arbitrary value of n, and in the process, to demonstrate that the
pattern does not depend on the value of n and therefore does continue indefinitely.

a Example 2.7. Proof Number 2


To show

0+142+---+n=n(n+1)/2 for every


n>0

n=0: 0=00+1)/2
ale: 0+1=0(0+1)/2+1 (by using the result for n= 0)
= 1(0/2
+ 1)
= 1(0+2)/2
=10+4+1)/2
i) SOs 0+1+2=1(1+1)/2+2 (by using the result for n = 1)
= 2(1/2
+ 1)
= 21-2)/2
= 2(2'+ 1)/2
n=3: 04+1+4+2+3=2(2+1)/24+3 (by using the result for n = 2)
= 3(2/2
+ 1)
= 3(2 + 2)/2
= 3(34+ 1)/2

In general, for any value of k > 0, the formula for n = k + 1 can be derived from the one
for n = k as follows:
CHAPTER 2 Mathematical Induction and Recursive Definitions 51

O+1424+---+(4+1I =OF14+-- +4441)


=k(k+1)/2+(k+1) (from the result forn = k)
= (k+ 1)(k/2+1)
= (K+ 1)(kK + 2)/2
= (k+1)(kK+1)+1)/2

Therefore, the formula holds for every n > 0.

We might now say that the proof has more than it needs. Presenting the calcu-
lations for three specific values of » originally made it easier for the reader to spot
the pattern; now, however, the pattern has been stated explicitly. To the extent that
the argument for these three specific cases is taken to be part of the proof, it obscures
the two essential parts of the proof: (1) checking the formula for the initial value
of n, n = O, and (2) showing in general that once we have obtained the formula for
one value of n (n = k), we can derive it for the next value (n = k + 1). These two
facts together are what allow us to conclude that the formula holds for every n > 0.
Neither by itself would be enough. (On one hand, the formula for n = 0, or even
for the first million values of n, might be true just by accident. On the other hand, it
would not help to know that we can always derive the formula for the casen = k + 1
from the one for the case n = k, if we could never get off the ground by showing that
it is actually true for some starting value of k.)
The principle that we have used in this example can now be formulated in general.

The Principle of Mathematical Induction

A proof by induction is an application of this principle. The two parts of such


a proof are called the basis step and the induction step. In the induction step, we
assume that k is anumber > no and that the statement P(n) is true in the casen = k;
we call this assumption the induction hypothesis. Let us return to our example one
last time in order to illustrate the format of a proof by induction.

= Example 2.7. Proof Number 3 (by induction)


Let P(n) be the statement

14+24+3+---+n=n(n+1)/2

To show that P(n) is true for every n > 0.


and
Basis step. We must show that P (0) is true. P(0) is the statement 0 = 0(0 + 1)/2,
this is obviously true.
52 PART 1 Mathematical Notation and Techniques

Induction hypothesis.

k>O and 142434---+k=k(k+1)/2


Statement to be shown in induction step.

14+24+34+-:-¢4k4+)=(4+D((k4+1)4+1)/2

Proof of induction step.

Tobi tesabi (ech CR 2a) eta)


=k(k+1)/2+(k+1) (by the induction hypothesis)

= (k + 1)(k/2+ 1)
=(k+ 1)(kK+2)/2
= (k+1)(k+ 1) ap 1)/2

Whether or not you follow this format exactly, it is advisable always to include
in your proof explicit statements of the following:
The general statement involving n that is to be proved.
The statement to which it reduces in the basis step (the general statement, but
with mo substituted for n).
The induction hypothesis (the general statement, with k substituted for n, and
preceded by “k > no, and”).
The statement to be shown in the induction step (with k + 1 substituted for n).
The point during the induction step at which the induction hypothesis is used.
The advantage of formulating a general principle of induction is that it supplies
a general framework for proofs of this type. If you read in a journal article the phrase
“It can be shown by induction that ...,” even if the details are missing, you can
supply them. Although including these five items explicitly may seem laborious at
first, the advantage is that it can help you to clarify for yourself exactly what you are
trying to do in the proof. Very often, once you have gotten to this point, filling in the
remaining details is a straightforward process.

| EXAMPLE 2.8
Pee _ =Strings of the Form Oy1 Must Contain the Substring 01
Let us prove the following statement: For any x € {0, 1}*, if x begins with 0 and ends with 1
(.e., x = Oy] for some string y), then x must contain the substring 01.
You may wonder whether this statement requires an induction proof; let us begin with
an argument that does not involve induction, at least explicitly. If x = Oy1 for some string
y € {0, 1}*, then x must contain at least one 1. The first 1 in x cannot occur at the beginning,
since x starts with 0; therefore, the first 1 must be immediately preceded by a 0, which means
that x contains the substring 01. It would be hard to imagine a proof much simpler than
this, and it seems convincing. It is interesting to observe, however, that this proof uses a fact
about natural numbers (every nonempty subset has a smallest element) that is equivalent to
CHAPTER 2 Mathematical Induction and Recursive Definitions 53

the principle of mathematical induction. We will return to this statement later, when we have
a slightly modified version of the induction principle. See Example 2.12 and the discussion
before that example.
In any case, we are interested in illustrating the principle of induction at least as much
as in the result itself. Let us try to construct an induction proof. Our initial problem is that
mathematical induction is a way of proving statements of the form “For MODE <2 10) bccn, %
and our statement is not of this form. This is easy to fix, and the solution was suggested at
the beginning of this section. Consider the statement P(n): If |x| =n and x = Oy1 for some
string y € {0, 1}*, then x contains the substring 01. In other words, we are introducing an
integer n into our statement, specifically in order to use induction. If we can prove that P(n) is
true for every n > 2, it will follow that the original statement is true. (The integer we choose is
the length of the string, and we could describe the method of proof as induction on the length
of the string. There are other possible choices; see Exercise 2.6.)
In the basis step, we wish to prove the statement “If |x| = 2 and x = Oyl for some
string y € {0, 1}*, then x contains the substring 01.” This statement is true, because if |x| = 2
and x = Oyl, then y must be the null string A, and we may conclude that x = 01. Our
induction hypothesis will be the statement: k > 2, and if |x| = k and x = Oy1 for some
string y € {0, 1}*, then x contains the substring 01. In the induction step, we must show: if
|x] =k + 1 and x = Oy1 for some y € {0, 1}*, then x contains the substring 01. (These three
statements are obtained from the original statement P(n) very simply: first, by substituting 0
for n; second, by substituting k for n, and adding the phrase “k > 2, and” at the beginning;
third, by substituting k + 1 for n. These three steps are always the same, and the basis step
is often as easy to prove as it is here. Now the mechanical part is over, and we must actually
think about how to continue the proof!)
We have a string x of length k + 1, about which we want to prove something. We have an
induction hypothesis that tells us something about certain strings of length k, the ones that begin
with 0 and end with 1. In order to apply the induction hypothesis, we need a string of length k to
apply it to. We can get a string of length k from x by leaving out one symbol. Let us try deleting
the initial 0. (See Exercise 2.5.) The remainder, y1, is certainly a string of length k, and we
know that it ends in 1, but it may not begin with 0—and we can apply the induction hypothesis
only to strings that do. However, if y1 does not begin with 0, it must begin with 1, and in this
case x starts with the substring 01! If y1 does begin with 0, then the induction hypothesis tells
us that it must contain the substring 01, so that x = Oy1 must contain the substring too.
Now that we have figured out the crucial steps, we can afford to be a little more concise in
our official proof. We are trying to prove that for every n > 2, P(n) is true, where P(n) is the
statement: If |x| = n and x = Oy1 for some string y € {0, 1}*, then x contains the substring
O01.

Basis step. We must show that the statement P(2) is true. P(2) says that if |x| = 2 and
x = Oy for some y € {0, 1}*, then x contains the substring 01. P(2) is true, because if
|x| = 2 and x = Oy1 for some y, then x = O1.
Induction hypothesis. k > 2 and P(k); in other words, if |x| = k and x = Oy1 for
some y € {0, 1}*, then x contains the substring 01.
Statement to be shown in induction step. P(k + 1); that is, if |x| =k + 1 and
x = Oyl for some y € {0, 1}*, then x contains the substring 01.
54 PART 1 Mathematical Notation and Techniques

Proof of induction step. Since |x| = k + 1 and x = Oyl, |yl| =k. If y begins with 1,
then x begins with the substring 01. If y begins with 0, then y1 begins with 0 and ends
with 1; by the induction hypothesis, y contains the substring 01, and therefore x does
also.

| EXAMPLE 2.9 | Verifying a Portion of a Program


The program fragment below is written in pseudocode. Lowercase letters represent constants,
uppercase letters represent variables, and the constant n is assumed to be nonnegative:

Me Ab
iOpes JE Ea Ak ie) Hal
Veet Nis Ser
write (Y) ;

We would like to show that when this code is executed, the value printed out is x”. We do this
in a slightly roundabout way, by introducing a new integer j, the number of iterations of the
loop that have been performed. Let P(/) be the statement that the value of Y after 7 iterations
is x/. The result we want will follow from the fact that P(j) is true for any j > 0, and the fact
that “For J = 1 ton” results in n iterations of the loop.

Basis step. P (0) is the statement that after 0 iterations of the loop, Y has the value a
This is true because Y receives the initial value | and after 0 iterations of the loop its
value is unchanged.
Inductive hypothesis. k > 0, and after k iterations of the loop the value of Y is x*.
Statement to be proved in induction step. After k + 1 iterations of the loop, the value
of Ydsa'th;
Proof of induction step. The effect of the assignment statement Y = Y x x is to replace
the old value of Y by that value times x; therefore, the value of Y after any iteration is x
times the value before that iteration. Since x * x* = x**!, the proof is complete.

Although the program fragment in this example is very simple, the example
should suggest that the principle of mathematical induction can be a useful technique
for verifying the correctness of programs. For another example, see Exercise 2.56.
You may occasionally find the principle of mathematical induction in a disguised
form, which we could call the minimal counterexample principle. The last example
in this section illustrates this.

| EXAMPLE 2.10 | A Proof Using the Minimal Counterexample Principle


To show: For every integer n > 0, 5" — 2” is divisible by 3.
Just as in an ordinary induction proof, we begin by checking that P(n) is true for the
starting value of n. This is true here, since 5° — 2° = 1 — 1 = 0, and 0 is divisible by 3. Now
if it is not true that P(n) is true for every n > 0, then there are values of n greater than or equal
to 0 for which P(n) is false, and therefore there must be a smallest such value, say n = k.
CHAPTER 2 Mathematical Induction and Recursive Definitions 55

(See Example 2.12.) Since we have verified P(O), k must be at least 1. Therefore,
k — 1 is at
least 0, and since k is the smallest value for which P fails, P(k — 1) is true.
This means that
5‘! — 2*-! is a multiple of 3, say 3j. Then, however,
Soe
a Se eI = 34S bot (S |) — 2) 3 a SPP 37
This expression is divisible by 3. We have derived a contradiction, which allows us to conclude
that our original assumption is false. Therefore, P(n) is true for every n > 0.
a a

You can probably see the similarity between this proof and one that uses the
principle of mathematical induction. Although an induction proof has the advantage
that it does not involve proof by contradiction, both approaches are equally valid.
Not every statement involving an integer n is appropriate for mathematical in-
duction. Using this technique on the statement
(12a) ae" eal
would be silly because the proof of the induction step would not require the induction
hypothesis at all. The formula for n = k + 1, or for any other value, can be obtained
immediately by expanding the left side of the formula and using laws of exponents.
The proof would not be a real induction proof, and it would be misleading to classify
it as one.
A general rule of thumb is that if you are tempted to use a phrase like “Repeat this
process for each n,” or “Since this pattern continues indefinitely” in a proof, there
is a good chance that the proof can be made more precise by using mathematical
induction. When you encounter one of these phrases while reading a proof, it is very
likely a substitute for an induction argument. In this case, supplying the details of the
induction may help you to understand the proof better.

2.3 THE STRONG PRINCIPLE OF


MATHEMATICAL INDUCTION
Sometimes, as in our first example, a proof by mathematical induction is called for,
but the induction principle in Section 2.2 is not the most convenient tool.

Integers Bigger Than 2 Have Prime Factorizations | EXAMPLE 2.11 |


Recall that a prime is a positive integer, 2 or bigger, that has no positive integer divisors except
itself and 1. Part of the fundamental theorem of arithmetic is that every integer can be factored
into primes. More precisely, let P(n) be the statement that n is either prime or the product of
two or more primes; we will try to prove that P() is true for every n > 2.
The basis step does not present any problems. P(2) is true, since 2 is a prime. If we
proceed as usual, then we take as the induction hypothesis the statement that k > 2 and k is
either prime or the product of two or more primes. We would like to show that k + 1 is either
prime or the product of primes. If k + 1 happens to be prime, there is nothing left to prove.
Otherwise, by the definition of prime, k + 1 has some positive integer divisor other than itself
56 PART 1 Mathematical Notation and Techniques

and 1. This means k + 1 = r «5 for some positive integers r and s, neither of which is 1 or
k + 1. It follows that r and s must both be greater than 1 and less than k + 1.
In order to finish the induction step, we would like to show that r and s are both either
primes or products of primes; it would then follow, since k + 1 is the product of r and s, that
k + 1 is a product of two or more primes. Unfortunately, the only information our induction
hypothesis gives us is that k is a prime or a product of primes, and this tells us nothing about
rors.
Consider, however, the following intuitive argument, in which we set about verifying the
statement P(n) one value of n at a time:

2 is a prime.
3 is a prime.
4 = 2 x 2, which is a product of primes since P(2) is known to be true.
5 is a prime.
6 = 2 «3, which is a product of primes since P(2) and P(3) are known to be true.
7 is a prime.
8 = 2 «4, which is a product of primes since P(2) and P(4) are known to be true.
9 = 3 «3, which is a product of primes since P(3) is known to be true.
10 = 2 * 5, which is a product of primes since P(2) and P(5) are known to be true.
11 is a prime.
12 = 2 « 6, which is a product of primes since P(2) and P(6) are known to be true.

This seems as convincing as the intuitive argument given at the start of Example 2.7. Further-
more, we can describe explicitly the pattern illustrated by the first 11 steps: For each k > 2,
either k + 1 is prime or it is the product of two numbers r and s for which the proposition P
has already been shown to hold.
The difference between the pattern appearing here and the one we saw in Example 2.7
is this: At each step in the earlier example we were able to obtain the truth of P(k + 1) by
knowing that P(k) was true, and here we need to know that P holds, not only for k but also
for all the values up to k. The following modified version of the induction principle will allow
our proof to proceed.

The Strong Principle of Mathematical Induction


CHAPTER 2 Mathematical Induction and Recursive Definitions 57

To use this principle in a proof, we follow the same steps as before except for the
way we state the induction hypothesis. The statement here is that k is some integer
> no and that all the statements P(ng), P(mp + 1),..., P(k) are true. With this
change, we can finish the proof we began earlier.

mg Example 2.11. Proof by induction.


To show: P(n) is true for every n > 2, where P(n) is the statement: n is either a prime or a
product of two or more primes.

Basis step. P(2) is the statement that 2 is either a prime or a product of two or more
primes. This is true because 2 is a prime.
Induction hypothesis. k > 2, and for every n with 2 < n < k, nis either prime ora
product of two or more primes.
Statement to be shown in induction step. k + 1 is either prime or a product of two or
more primes.
Proof of induction step. We consider two cases. If k + 1 is prime, the statement
P(k + 1) is true. Otherwise, by definition of a prime, k + 1 = r * s, for some positive
integers r and s, neither of which is 1 ork + 1. It follows that2 <r <kand2<s <k.
Therefore, by the induction hypothesis, both r and s are either prime or the product of
two or more primes. Therefore, their product k + 1 is the product of two or more primes,
and P(k + 1) is true.

The strong principle of induction is also referred to as the principle of complete


induction, or course-of-values induction. The first example suggests that it is as plau-
sible intuitively as the ordinary induction principle, and in fact the two are equivalent.
As to whether they are true, the answer may seem a little surprising. Neither can be
proved using other standard properties of the natural numbers. (Neither can be dis-
proved, either!) This means, in effect, that in order to use the induction principle, we
must adopt it as an axiom. A well-known set of axioms for the natural numbers, the
Peano axioms, includes one similar to the induction principle.
Twice in Section 2.2 we had occasion to use the well-ordering principle for the
natural numbers, which says that every nonempty subset of V has a smallest element.
As obvious as this statement probably seems, it is also impossible to prove without
using induction or something comparable. In the next example, we show that it
follows from the strong principle of induction. (It can be shown to be equivalent.)

The Well-ordering Principle for the Natural Numbers | EXAMPLE 2.12 |


To prove: Every nonempty subset of NV, the set of natural numbers, has a smallest element.
(What we are actually proving is that if the strong principle of mathematical induction is true,
then every nonempty subset of V has a smallest element.)
First we need to find a way to express the result in the form “For every n > no, P(n).”
Every nonempty subset A of V contains a natural number, say n. If every subset of VV
PART 1 Mathematical Notation and Techniques

containing n has a smallest element, then A does. With this in mind, we let P(n) be the
statement “Every subset of VV containing n has a smallest element.” We prove that P(n) is
true for every n > 0. (See Exercise 2.7.)

Basis step. P(0) is the statement that every subset of V containing 0 has a smallest
element. This is true because 0 is the smallest natural number and therefore the smallest
element of the subset.
Induction hypothesis. k > 0, and for every n with O < n < k, every subset of VV
containing 7 has a smallest element. (Put more simply, k > 0 and every subset of VV
containing an integer less than or equal to k has a smallest element.)
Statement to be shown in induction step. Every subset of \V containing k + 1 has a
smallest element.
Proof of induction step. Let A be any subset of VV containing k + 1. We consider two
cases. If A contains no natural number less than k + 1, then k + 1 is the smallest
element of A. Otherwise, A contains some natural number n with n < k. In this case, by
the induction hypothesis, A contains a smallest element.

The strong principle of mathematical induction is more appropriate here, since


when we come up with an n to which we want to apply the induction hypothesis, all
we know about n is that n < k. We do not know that n = k. It may not be obvious at
the beginning of an induction proof whether the strong induction principle is required
or whether you can get by with the original version. You can avoid worrying about
this by always using the strong version. It allows you to adopt a stronger induction
hypothesis, and so if an induction proof is possible at all, it will certainly be possible
with the strong version. In any case, you can put off the decision until you reach the
point where you have to prove P(k + 1). If you can do this with only the assumption
that P(k) is true, then the original principle of induction is sufficient. If you need
information about earlier values of n as well, the strong version is needed.
We will see more examples of how the strong principle of mathematical induction
is applied once we have discussed recursive definitions and the close relationship
between them and mathematical induction.

2.4| RECURSIVE DEFINITIONS


2.4.1 Recursive Definitions of Functions with
Domain VV
The chances are that in a programming course you have seen a translation into some
high-level programming language of the following definition:

1 iin
=0
p=
ne(n—1)! -ifn>0
This is one of the simplest examples of a recursive, or inductive, definition. It defines
the factorial function on the set of natural numbers, first by defining the value at 0,
CHAPTER 2 Mathematical Induction and Recursive Definitions 59

and then by defining the value at any larger natural number in terms of its value at the
previous one. There is an obvious analogy here to the basis step and the induction step
in a proof by mathematical induction. The intuitive reason this is a valid definition is
the same as the intuitive reason the principle of induction should be true: If we think
of defining n! for all the values of n in order, beginning with n = 0, then for any
k => 0, eventually we will have defined the value of k!, and at that point the definition
will tell us how to obtain (k + 1)!.
In this section, we will look at a number of examples of recursive definitions of
functions and examine more closely the relationship between recursive definitions
and proofs by induction. We begin with more examples of functions on the set of
natural numbers.

The Fibonacci Function | EXAMPLE 2.13 |


The Fibonacci function f is usually defined as follows:

f@O)=1
fd) =1
foreveryn > 1, f@+1)= fm)
+ fa@—!1)

To evaluate f (4), for example, we can use the definition in either a top-down fashion:

f(A) = FB) + FQ)


= (f(2) + f0)) + FQ
= ((f(1) + f) + fF) + (fF) + FO)
=(1+1)4+1+0+1)
=5

or a bottom-up fashion:

FO=1
fQ) =1
f{QD=f/H+f/O=1+1=2
fB=f/Q]+fM=2+1=3
£4 = f3B)+fQ@=3+2=5
It is possible to give a nonrecursive algebraic formula for the number f (7); see Exercise 2.53.
However, the recursive definition is the one that most people remember and prefer to use.
If the definition of the factorial function is reminiscent of the principle of mathematical
induction, then the definition of the Fibonacci function suggests the strong principle of induc-
1). This
tion. This is because the definition of f(n + 1) involves not only f (7) but f(n —
observation is useful in proving facts about f. For example, let us prove that
for everyn > 0, f(n) < (5/3)"

Basis step. We must show that f(0) < (5/3)°; this is true, since f(0) and (5/ 3)° are
both 1.
60 PART 1 Mathematical Notation and Techniques

Induction hypothesis. k > 0, and for every n withO <n <k, f(n) < (5/3)”.
Statement to show in induction step. f(k + 1) < (5/3)**'.
Proof of induction step. Because of the way f is defined, we consider two cases in
order to make sure that our proof is valid for every value of k > 0. If k = 0, then
f(k +1) = f() = 1, by definition, and in this case the inequality is clearly true. If
k > 0, then we must use the recursive part of the definition, which is
fk +1) = f(k) + fk — 1). Since both k and k — 1 are < k, we may apply the
induction hypothesis to both terms, obtaining

fA+1)
= f+ fR-D)
6 /3)e-F 6/3)
= (5/3)*"(5/3 + 1)
= (5/3)*"(8/3)
= (5/3)*1(24/9)
< (5/3)*"1(25/9) = (5/3)!

The Union of n Sets


Suppose A;, Az,... are subsets of some universal set U. For each n > 0, we may define
LU?_, Ai as follows:

‘Ce
Ai =

ll
1=1

n+1 n

for every n > 0, U A; = (Ua U Ans


i=1 i=)

For each n, (}_, A; is a set, in particular a subset of U; therefore, it makes sense to view the
recursive definition as defining a function from the set of natural numbers to the set of subsets
of U. What we have effectively done in this definition is to extend the binary operation of
union so that it can be applied to n operands. (We discussed this possibility for associative
operations like union near the end of Section 1.1.) This procedure is familiar to you in other
settings, although you may not have encountered a formal recursive definition like this one
before. When you add n numbers in an expression like )\’_, a;, for example, you are extending
the binary operation of addition to n operands. The notational device used to do this is the
summation sign: & bears the same relation to the binary operation + that |_) does to the binary
operation U. A recursive definition of © would follow the one above in every detail:
0
oa eal)

i=1

n+l

for every n > 0, ya; (x«]+ An+1


i=l i=l
Similarly, we could define the intersection of n sets, the product of n numbers, the concatenation
of n strings, the concatenation of n languages, and so forth. (The only difficulty we would have
CHAPTER 2 Mathematical Induction and Recursive Definitions 61

with the last two is that we have not introduced a notation for the concatenation operator—we
have written the concatenation of x and y as simply xy.) We are also free to use one of these
general definitions in a special case. For example, we might consider the concatenation of n
languages, all of which are the same language L:

L° = {A}
foreveryn >0, L7*1=L" L

Of course, we already have nonrecursive definitions of many of these things. Our defini-
tion of _, A; in Section 1.1 is
n

|) Ai = {x |x € A; for atleast
one iwith 1<i <n}
i=l

The definition of L” in Section 1.5 is

L" = LL ---L (n factors in all)

and we also described it as the set of all strings that can be obtained by concatenating n
elements of L. It may not be obvious that we have gained anything by introducing the recursive
definition. The nonrecursive definition of the n-fold union is clear enough, and even with the
ellipses (...) it is not difficult to determine which strings are included in L”. However, the
recursive definition has the same advantage over the ellipses that we discussed in the first proof
in Example 2.7; rather than suggesting what the general step is in this n-fold concatenation, it
comes right out and says it. After all, when you construct an element of L”, you concatenate
two strings, not n strings at once. The recursive definition is more consistent with the binary
nature of concatenation, and more explicit about how the concatenation is done: An (n + 1)-
fold concatenation is obtained by concatenating an element of L* and an element of L. The
recursive definition has a dynamic, or algorithmic, quality that the other one lacks. Finally,
and probably most important, the recursive definition has a practical advantage. It gives us a
natural way of constructing proofs using mathematical induction.
Suppose we want to prove the generalized De Morgan law:

for every n > 0, (U4)= a A;


it i=)

In the induction step, we must show something about


k+1 d

va

Using the recursive definition, we begin by replacing this with

((Us)-)
at which point we can use the original De Morgan law to complete the proof, since we have
expressed the (k + 1)-fold union as a two-fold union.
62 PART 1 Mathematical Notation and Techniques

This example illustrates again the close relationship between the principle of
mathematical induction and the idea of recursive definitions. Not only are the two
ideas similar in principle, they are almost inseparable in practice. Recursive defini-
tions are useful in constructing induction proofs, and induction is the natural proof
technique to use on objects defined recursively.
The relationship is so close, in fact, that in induction proofs we might use recursive
definitions without realizing it. In Example 2.7, we proved that

1+2+---+n=n(n+1)/2
and the crucial observation in the induction step was that

(ser eee Drexler 4 oe actu act et)


This is exactly the formula we would have obtained from the recursive definition of
the summation operator © in Example 2.14. In other words, although we had not
formally adopted this definition at the time, the property of summation that we needed
for the induction argument was the one the definition provides.

2.4.2 Recursive Definitions of Sets


We can also define a single set recursively. Although such a definition may not involve
an integer n explicitly, the principle is similar. We specify certain objects that are in
the set to start with, and we describe one or more general methods for obtaining new
elements of the set from existing ones.

| EXAMPLE 2.15 | Recursive Definition of L*


Suppose L is a language over some alphabet L. We have previously defined L* as the union
of the sets L” for n > 0. From our recursive definition of L” it follows that for any n, any
x € L", and any y € L, the string xy is an element of L”*! and therefore of L*. Furthermore,
every element of L* can be obtained this way except A, which comes from L°. This suggests
the following more direct recursive definition of L*.

1. Ae LE.
2. Forany x € L* andany y € L, xy € L*.
3. No string is in L* unless it can be obtained by using rules 1 and 2.

To illustrate, let L = {a, ab}. According to rule 1, A € L*. One application of rule 2 adds the
strings in L' = L, which are Aa = a and Aab = ab. Another application of rule 2 adds the
strings in L*, which are Aa, Aab, aa, aab, aba, and abab. For any k > 0, a string obtained
by concatenating k elements of L can be produced from the definition by using k applications
of rule 2.
An even simpler illustration is to let L be ©, which is itself a language over ©. Then a
string of length k in &*, which is a concatenation of k symbols belonging to D, can be produced
by k applications of rule 2.
This way of defining L* recursively is not the only way. Here is another possibility.

Ue AWS IES,
2. Foranyx €L,x € L*.
CHAPTER 2 Mathematical Induction and Recursive Definitions 63

3. For any two elements x and y of L*, xy € L*.


4. No string is in L* unless it can be obtained by using rules 1, 2, and 3.

In this approach, rules 1 and 2 are both necessary in the basis part of the definition, since rule
1 by itself would provide no strings other than A to concatenate in the recursive part.
The first definition is a little closer to our original definition of L*, and perhaps a little
easier to work with, because there is a direct correspondence between the applications of rule
2 needed to generate an element of L* and the strings of L that are being concatenated. The
second definition allows more flexibility as to how to produce a string in the language. There
is a sense, however, in which both definitions capture the idea of all possible concatenations
of strings of L, and for this reason it may not be too difficult to convince yourself that both
definitions really do work—that the set being defined is L* in each case. Exercise 2.63 asks
you to consider the question in more detail.

Palindromes | EXAMPLE 2.16 |


Let & be any alphabet. The language pal of palindromes over & can be defined as follows:

1. A eEpal.
2. For anya € &,a € pal.
3. For any x € pal and any a € X, axa € pal.
4. No string is in pal unless it can be obtained by using rules 1, 2, and 3.

The strings that can be obtained by using rules 1 and 2 exclusively are the elements of
pal of even length, and those obtained by using rules 2 and 3 are of odd length. A simple
nonrecursive definition of pal is that it is the set of strings that read the same backwards as
forwards. (See Exercise 2.60.)
In Example 2.14, we mentioned the algorithmic, or constructive, nature of the recursive
definition of L”. In the case of a language such as pal, this aspect of the recursive definition
can be useful both from the standpoint of generating elements of the language and from the
standpoint of recognizing elements of the language. The definition says, on the one hand, that
we can construct palindromes by starting with either A or a single symbol, and continuing to
concatenate a symbol onto both ends of the current string. On the other hand, it says that if we
wish to test a string to see if it’s a palindrome, we may first compare the leftmost and rightmost
symbols, and if they are equal, reduce the problem to testing the remaining substring.

Fully Parenthesized Algebraic Expressions | EXAMPLE 2.17 |


Let = be the alphabet {i, (, ), +, —}. Below is arecursive definition of the language AE of fully
parenthesized algebraic expressions involving the binary operators + and — and the identifier
i. The term fully parenthesized means exactly one pair of parentheses for every operator.

1. i €AE.
2. For any x, y €AE, both (x + y) and (x — y) are elements of AE.
3. Nostring is in AE unless it can be obtained by using rules (1) and (2).
@ —i)) +71).
Some ofthe strings in AE are i, (i + 1), (i —i), (i +i) — i), and (i —
64 PART 1 Mathematical Notation and Techniques

| EXAMPLE 2.18
PE | Finite Subsets of the Natural Numbers
We define a set F of subsets of the natural numbers as follows:

Def.
For any n € N, {n} € F.
For any A and Bin F¥, AUB Ee Ff.
a Nothing is in ¥ unless it can be obtained by using rules 1, 2, and 3.
Se

We can obtain any two-element subset of \V by starting with two one-element sets and using
rule 3. Because we can then apply rule 3 to any two-element set A and any one-element set B,
we obtain all the three-element subsets of VV. It is easy to show using mathematical induction
that for any natural number n, any n-element subset of NV is an element of F; we may conclude
that F is the collection of all finite subsets of NV.

Let us consider the last statement in each of the recursive definitions in this sec-
tion. In each case, the previous statements describe ways of producing new elements
of the set being defined. The last statement is intended to remove any ambiguity:
Unless an object can be shown using these previous rules to belong to the set, it does
not belong to the set.
We might choose to be even a little more explicit. In Example 2.17, we might
say, “No string is in AE unless it can be obtained by a finite number of applications
of rules | and 2.” Here this extra precision hardly seems necessary, as long as it is
understood that “string” means something of finite length; in Example 2.18 it is easier
to see that it might be appropriate. One might ask whether there are any infinite sets
in F. We would hope that the answer is “no” because we have already agreed that the
definition is a reasonable way to define the collection of finite subsets of NY. On the
one hand, we could argue that the only way rule 3 could produce an infinite set would
be for one of the sets A or B to be infinite already. On the other hand, think about
using the definition to show that an infinite set C is not in F. This means showing that
C cannot be obtained by using rule 3. For an infinite set C to be obtained this way,
either A or B (both of which must be elements of F) would have to be infinite—but
how do we know that cannot happen? The definition is not really precise unless it
makes it clear that rules | to 3 can be used only a finite number of times in obtaining an
element of F. Remember also that we think of a recursive definition as a constructive,
or algorithmic, definition, and that in any actual construction we would be able to
apply any of the rules in the definition only a finite number of times.
Let us describe even more carefully the steps that would be involved in “a finite
number of applications” of rules 1 to 3 in Example 2.18. Take a finite subset A of VV
that we might want to obtain from the definition of F, say A = {2, 3,7, 11, 14}. There
are a number of ways we can use the definition to show that A € F. One obvious
approach is to start with {2} and use rule 3 four times, adding one more element
each time, so that the four steps give us the subsets {2, 3}, (23,7); 42) 3) Tela stand
{2, 3,7, 11, 14}. Each time, the new subset is obtained by applying rule 3 to two
sets, one a set we had obtained earlier by using rule 3, the other a one-element set
CHAPTER 2 Mathematical Induction and Recursive Definitions 65

(one of the elements of F specified explicitly by rule 2). Another approach would
be to start with two one-element sets, say {2} and {7}, to add elements one at a time
to each so as to obtain the sets {2, 3} and {7, 11, 14}, and then to use rule 3 once
more to obtain their union. In both of these approaches, we can write a sequence
of sets representing the preliminary steps we take to obtain the one we want. We
might include in our sequence all the one-element sets we use, in addition to the two-,
three-, or four-element subsets we obtain along the way. In the first case, therefore,
the sequence might look like this:
1>{2} Pee): Oe (2e3t 4. {7}
Oe OPCa 6. {11} GI Wescoer
ee lg 8. {14}
Plas (zene tadda Dg
and in the second case the sequence might be
hae PA ese teh oS. \2,0} 4. {7}
{LE} (ee Peal a 7. {14} 8. {7, 11, 14}
25a); 4 bi,14}
In both cases, there is considerable flexibility as to the order of the terms. The
significant feature of both sequences is that every term is either one of the specific
sets mentioned in statements 1 and 2 of the definition, or it is obtained from two terms
appearing earlier in the sequence by using statement 3.
A precise way of expressing statement 4 in the definition would therefore be to
say: i
No set A is in F unless there is a positive integer n and a sequence A;, A2,..., An, SO
that A, = A, and for every i with 1 <i < n, A; is either %, or a one-element set, or
A; U A, for some j,k < i.

A recursive definition of a set like F is not usually this explicit. Probably the
most common approach is to say something like our original statement 4; a less formal
approach would be to say “Nothing else is in F,” and sometimes the final statement
is skipped altogether.

An Induction Proof Involving a Language Defined Recursively | EXAMPLE 2.19 |


Suppose that the language L, a subset of {0, 1}*, is defined recursively as follows.

1. AeL.
2. Forany y € L, both Oy and Oy] are in L.
3. No string is in L unless it can be obtained from rules 1 and 2.

In order to determine what the strings in L are, we might try to use the definition to generate
the first few elements. We know A ¢€ L. From rule 2 it follows that 0 and 01 are both in L.
Using rule 2 again, we obtain 00, 001, and 0011; one more application produces 000, 0001,
00011, and 000111. After studying these strings, we may be able to guess that the strings in L
are all those of the form 0'1/, where i > j > 0. Let us prove that every string of this form is
roy ye
PART 1 Mathematical Notation and Techniques

To simplify the notation, let A = {0'1/ |i > j = 0}. The statement A C L, as it stands,
does not involve an integer. Just as we did in Example 2.8, however, we can introduce the
length of a string as an integer on which to base an induction proof.
To prove: A C L;i.e., for every n > 0, every x € A satisfying |x| =n is an element of L.

Basis step. We must show that every x in A with |x| = 0 is an element of L. This is true
because if |x| = 0, then x = A, and statement 1 in the definition of L tells us that A € L.
Induction hypothesis. k > 0, and every x in A with |x| = k is an element of L.
Statement to show in induction step. Every x in A with |x| = k + 1 is an element of L.
Proof of induction step. Suppose x € A, and |x| = k'+ 1. Then x = 0'1/, where
i>j>0,andi+ j =k+1. Weare trying to show that x € L. According to the
definition, the only ways this can happen are for x to be A, for x to be Oy for some
y € L, and for x to be Oy1 for some y € L. The first case is impossible, since
|x| =k+1 > 0. In either of the other cases, we can obtain the conclusion we want if
we know that the string y is in L. Presumably we will show this by using the induction
hypothesis. However, at this point we have a slight problem. If x = Oy1, then the string
y to which we want to apply the induction hypothesis is not of length k, but of length
k — 1. Fortunately, the solution to the problem is easy: Use the strong induction
principle instead, which allows us to use the stronger induction hypothesis. The
statement to be shown in the induction step remains the same.
Induction hypothesis (revised). k > 0, and every x in A with |x| < k is anelement of L.
Proof of induction step (corrected version). Suppose x € A and |x| = k + 1. Then
x = 0'l/, wherei > j > Oandi+j =k+1. Weconsider
two cases. Ifi > j, then wecan
write x = Oy for some y that still has at least as many 0’s as 1’s (i.e., for some
y € A). In this case, since |y| = k, it follows from the induction hypothesis that y € L;
therefore, since x = Oy, it follows from the first part of statement 2 in the
definition of L that x is also an element of L. In the second case, when i = j, we
know that there must be at least one 0 and one 1 in the string. Therefore, x = Oy1 for
some y. Furthermore, y € A, because y = 0''1/—! andi = j. Since |y| < k, it follows
from the induction hypothesis that y € L. Since x = Oy1, we can use the second part of
statement 2 in the definition of L to conclude that x € L.

The two sets L and A are actually equal. We have now proved half of this
statement. The other half, the statement L C A, can also be proved by induction
on the length of the string (see Exercise 2.41). In the next section, however, we
consider another approach to an induction proof, which is more naturally related to
the recursive definition of L and is probably easier.

2.5|STRUCTURAL INDUCTION
We have already noticed the very close correspondence between recursive definitions
of functions on the set NV’ and proofs by mathematical induction of properties of
those functions. The correspondence is exact, in the sense that when we want to
CHAPTER 2 Mathematical Induction and Recursive Definitions 67

prove something about f(k + 1) in the induction step of a proof, we can consult the
definition of f(k + 1), the recursive part of the definition.
When we began to formulate recursive definitions of sets, there was usually no
integer involved explicitly (see Examples 2.15-2.19), and it may have appeared that
any correspondence between the recursive definition and induction proofs involving
elements of the set would be less direct (even though there is a general similarity
between the recursive definition and the induction principle). In this section, we
consider the correspondence more carefully. We can start by identifying an integer
that arises naturally from the recursive definition, which can be introduced for the
purpose of an induction proof. However, when we look more closely at the proof,
we will be able to dispense with the integer altogether, and what remains will be an
induction proof based on the structure of the definition itself.
The example we use to illustrate this principle is essentially a continuation of
Example 2.19.

Continuation of Example 2.19 | EXAMPLE 2.20 |


We have the language L, defined recursively as follows:

1 A€EL.
2. For every x € L, both Ox and 0x1 are in L.

The third statement in the definition of L says that every element of L is obtained by starting
with A and applying statement 2 a finite number of times (zero or more).
We also have the language A = {0'l/ |i > j > 0}. In Example 2.19, we proved that
A C L, using mathematical induction on the length of the string. Now we wish to prove the
opposite inclusion L C A. As we have already pointed out, there is no reason why using the
length of the string would not work in an induction proof in this direction too—a string in L
has a length, just as every other string does. However, for each element x of L, there is another
integer that is associated with x not just because x is a string, but specifically because x is an
element of L. This is the number of times rule 2 is used in order to obtain x from the definition.
We construct an induction proof, based on this number, that L C A.
To prove: For every n > 0, every x € L obtained by n applications of rule 2 is an element
Olvl

Basis step. We must show that if x € L and x is obtained without using rule 2 at all,
then x € A. The only possibility for x in this case is A, and A € A because A = 0°1°.
Induction hypothesis. k > 0, and every string in L that can be obtained by k
applications of rule 2 is an element of A.
Statement to show in induction step. Any string in L that can be obtained by k + 1
applications of rule 2 is in A.
Proof of induction step. Let x be an element of L that is obtained by k + 1 applications
of rule 2. This means that either x = Oy or x = Oy1, where in either case y is a string in
L that can be obtained by using the rule k times. By the induction hypothesis, y € A, so
that y = 0'1/, withi > j > 0. Therefore, either x= 0'*11/ or x= 0'*! 1/+! and in
either case x € A.
PART 1 Mathematical Notation and Techniques

As simple as this proof is, we can make it even simpler. There is a sense in which the
integers k and k + 1 are extraneous. In the induction step, we wish to show that the new element
x of L, the one obtained by applying rule 2 of the definition to some other element y of L, is
an element of A. It is true that y is obtained by applying the rule k times, and therefore x is
obtained by applying the rule k+ 1 times. The only fact we need in the induction step, however,
is that y € A and for any element y of L that is in A, both Oy and Oy1 are also elements of
A. Once we verify this, k and k + 1 are needed only to make the proof fit the framework of a
standard induction proof.
Why not just leave them out? We still have a basis step, except that instead of thinking
of A as the string obtained from zero applications of rule 2, we just think of it as the string in
rule 1, the basis part of the definition of L. In the recursive part of the definition, we apply one
of two possible operations to an element y of L. What we will need to know about the string
y in the induction step is that it is in A, and therefore it is appropriate to designate this as our
induction hypothesis. The induction step is simply to show that for this string y, Oy and Oy1
(the two strings obtained from y by applying the operations in the definition) are both in L.
We can call this version of mathematical induction structural induction. Although there is
an underlying integer involved, just as in an ordinary induction proof, it is usually unnecessary
to mention it explicitly. Instead, the steps of the proof follow the structure of the recursive
definition directly. Below is our modified proof for this example.
To prove: L C A.

Basis step. We must show that A € A. This is true, because A = 0°1°.


Induction hypothesis. The string y € L is an element of A.
Statement to show in induction step. Both Oy and Oy1 are elements of A.
Proof of induction step. Since y € A, y = 0'1/, withi > j > 0. Therefore,
Oy = 0'*!1/, and Oy1 = 0't!1/+!, and both strings are in A.

In the induction step, instead of talking about an arbitrary string x obtainable


by k + 1 applications of rule 2, the proof is more explicit in anticipating how it is
obtained: It will be either Oy or Oy1, where in either case y is a string obtainable by
k applications of rule 2. In the original proof, this property of y is needed in order to
be able to apply the induction hypothesis to y; in the structural induction proof, we
anticipate the property of y that follows from the induction hypothesis, and simply
take the induction hypothesis to be that y does satisfy this property.
Although we will not formulate an official Principle of Structural Induction,
the preceding example and the ones to follow should make it clear how to use this
technique. If we have a recursive definition of a set L, structural induction can be
used to show that every element of L has some property. In the previous example,
the “property” is that of belonging to the set A, and any property we are interested
in can be expressed this way if we wish (we can always replace the phrase “has the
property” by the phrase “belongs to the set of objects having the property”).
CHAPTER 2 Mathematical Induction and Recursive Definitions 69

Another Property of Fully Parenthesized Algebraic Expressions | EXAMPLE 2.21


Let us return to the language AE defined in Example 2.17. AE, asubset of D* = {i, (,), +, —}*,
is defined as follows:

LA.
2.
For any x and y in AE, (x + y) and (x — y) are in AE.
3.
No other strings are in AE.

This time we try to show that every element of the set has the property of not containing the
substring )(. As in the previous example, we could set up an induction proof based on the
number of times rule 2 is used in obtaining a string from the definition. Notice that in this
approach we would want the strong principle of induction: If z is obtained by a total of k + 1
applications of rule 2, then either z = (x + y) or z = (x — y), where in either case both x
and y are obtained by k or fewer (not exactly k) applications of rule 2. However, in a proof by
structural induction, since the integer k is not present explicitly, these details are unnecessary.
What we really need to know about x and y is that they both have the desired property, and the
statement that they do is the appropriate induction hypothesis.
To prove: No string in AE contains the substring )(.

Basis step. The string i does not contain the substring )(. (This is obvious.)
Induction hypothesis. x and y are strings that do not contain the substring )(.
Statement to show in induction step. Neither (x + y) nor (x — y) contains the
substring )(.
Proof of induction step. In both the expressions (x + y) and (x — y), the symbol
_preceding x is not ), the symbol following x is not (, the symbol preceding y is not ), and
the symbol following y is not (. Therefore, the only way )( could appear would be for it
to occur in x or y separately.
Note that for the sake of simplicity, we made the induction hypothesis weaker than
we really needed to (that is, we proved slightly more than was necessary). In order to
use structural induction, we must show that if x and y are any strings in AE not
containing )(, then neither (x + y) nor (x — y) contains )(. In our induction step, we
showed this not only for x and y in AE, but for any x and y. This simplification is often,
though not always, possible (see Exercise 2.69).

The Language of Strings with More a's than b’s | EXAMPLE 2.22 |

Suppose that the language L C {a, b}* is defined as follows:

1 ael.
2. Foranyxe€L,axeL.
3. For any x and y in L, all the strings bxy, xby, and xyb are in L.
4. No other strings are in L.

Let us prove that every element of L has more a’s than b’s. Again we may use the structural
step by
induction principle, and just as in the previous example, we can simplify the induction
70 PART 1 Mathematical Notation and Techniques

proving something even stronger than we need. We will show that

1. ahas more a’s than b’s.


2. For any x having more a’s than b’s, ax also does.
3. For any x and y having more a’s than b’s, each of the strings bx y, xby, and x yb also does.

If we were using ordinary induction on the number of applications of steps 2 or 3 in the


recursive definition of L, an appropriate induction hypothesis would be that any element of
L obtainable by k or fewer applications has more a’s than b’s. Since we can anticipate that
we will use this hypothesis either on a single string x to which we will apply step 2 or on two
strings x and y to which we will apply step 3, we formulate our induction hypothesis to take
care of either case.
To prove: Every element of L has more a’s than b’s.

Basis step. The string a has more a’s than b’s. (This is obvious.)
Induction hypothesis. x and y are strings containing more a’s than b’s.
Statement to show in induction step. Each of the strings ax, bxy, xby, and xyb has
more a’s than b’s.
Proof of induction step. The string ax clearly has more a’s than b’s, since x does.
Since both x and y have more a’s than b’s, xy has at least two more a’s than b’s, and
therefore any string formed by inserting one more D still has at least one more a than b’s.

In Exercise 2.64 you are asked to prove the converse, that every string in {a, b}*
having more a’s than b’s is in the language L.

| EXAMPLE 2.23 | The Transitive Closure of a Relation


For any relation R on a set S, the transitive closure of R (see Exercise 1.63) is the relation R'
on § defined as follows:

ae R Gees
2. For any x, y,z € S, if (x, y) € R’ and (y, z) € R’, then (x, z) € R’.
3. No other pairs are in R’.

(It makes sense to summarize statements 1 to 3 by saying that R’ is the smallest transitive
relation containing R.) Let us show that if R; and R> are relations on S with R, C Ro, then
Ree he
Structural induction is appropriate here since the statement we want to show says that
every pair in Rj satisfies the property of membership in R3.
The basis step is to show that every element of R, is an element of Rj. This is true because
R, © R, C R;; the first inclusion is our assumption, and the second is just statement 1 in the
definition of R}. The induction hypothesis is that (x, y) and (y, z) are elements of R, that are
in R}, and in the induction step we must show that (x, z) € R;. Once again the argument is
simplified slightly by proving more than we need to. For any pairs (x, y) and (y, z) in R},
CHAPTER 2 Mathematical Induction and Recursive Definitions 71

whether they are in Rj or not, (x, z) € Rj, because this is exactly what statement 2 in the
definition of Rj says.

A recursive definition of a set can also provide a useful way of defining a function
on the set. The function definition can follow the structure of the definition, in much
the same way that a proof using structural induction does. This idea is illustrated by
the next example.

Recursive Definitions of the Length and Reverse Functions | EXAMPLE 2.24 |


In Example 2.15 we saw a recursive definition of the set &*, for an alphabet &:

eA Se.
2. For every x € &*, and everya € XU, xa € &*.
3. No other elements are in X*.

Two useful functions on * are the length function, for which we have already given a
nonrecursive definition, and the reverse function rev, which assigns to each string the string
obtained by reversing the order of the symbols. Let us give a recursive definition of each of
these. The length function can be defined as follows:

ieee — 10:
2. Forany x € D* andanya € ¥, |xa| = |x| +1.
»

For the reverse function we will often use the notation x” to stand for rev(x), the reverse of x.
Here is one way of defining the function recursively.

1D PN TNE
2. For any x € X* andanya € &, (xa)" =ax’.

To convince ourselves that these are both valid definitions of functions on &*, we could
construct a proof using structural induction to show that every element of &* satisfies the
property of being assigned a unique value by the function.
Let us now show, using structural induction, a useful property of the length function: For
every x and y in &*,
lxy| = |x| + lyl
(The structural induction, like the recursive definition of the length function itself, follows the
recursive definition of 5* above.) The formula seems obvious, of course, from the way we
normally think about the length function. The point is that we do not have to depend on our
intuitive understanding of length; the recursive definition is a practical one to use in discussing
properties of the function. For the proof we choose y as the string on which to base the induction
that
(see Exercise 2.34). That is, we interpret the statement as a statement about y—namely,
basis step of the structural induction is to show that this
for every x, |xy| = |x| + ly]. The
statement is true when y = A. It is, because for every x,

|xA| = |x| = |x] +0= |x| +1Al


72 PART 1 Mathematical Notation and Techniques

The induction hypothesis is that y is a string for which the statement holds, and in the induction
step we consider ya, for an arbitrary a € XZ. We want to show that for any x, |x(ya)| =
Ix| + lyal:
|x(ya)| = |(xy)a| (because concatenation is associative)
= |xy|+1 (by the recursive definition of the length function)
= (|x| + |y|) +1 (by the induction hypothesis)
= |x|+(/y|+1) (because addition is associative)
=|x|+|ya| (by the-definition of the length function)

In this case, structural induction is not much different from ordinary induction on the length of
y, but a little simpler. The proof involves lengths of strings because we are proving a statement
about lengths of strings; at least we did not have to introduce them gratuitously in order to
provide a framework for an induction proof.

Finally, it is worth pointing out that the idea of structural induction is general
enough to include the ordinary induction principle as a special case. When we prove
that a statement P(n) is true for every n > no, we are showing that every element of
the set S = {n |n = no} satisfies the property P. The set S can be defined recursively
as follows:
1. no € S.

2. Foreveryne S,n+1eS.
3. No integer is in S unless it can be obtained from rules 1 and 2.

If we compare the induction step in an ordinary induction proof to that in a proof by


structural induction, we see that they are the same. In the first case, the induction
hypothesis is the statement that P(k) is true for some k > no, and the induction step
is to show that P(k + 1) is true. In the second case, we assume that some element
n of S satisfies the property P, and show that the element obtained from n by rule 2
(i.e., 2 + 1) also satisfies P.

EXERCISES
2.1. Prove that the statements (p V g) > r and (p > r) Vv (q — r) are logically
equivalent. (See Example 2.6.)
2.2. For each of Examples 2.5 and 2.6, how would you classify the given proof:
constructive, nonconstructive, or something in-between? Why?
2.3. Prove that if a and b are even integers, then ab is even.
2.4. Prove that for any positive integers i, j, and n, if i « j =n, then either
i> J/nor j > /n. (See Example 2.2. The statement in that example may
be more obviously useful since it tells you that for an integer n > 1, if none
of the integers j in the range 2 < j < \/n isa divisor of n, then n is prime.)
CHAPTER 2 Mathematical Induction and Recursive Definitions 73

2.5. In the induction step in Example 2.8, starting with x = Oy1, we used yl as
the string of length k to which we applied the induction hypothesis. Redo the
induction step, this time using Oy instead.
2.6. Prove the statement in Example 2.8 by using mathematical induction on the
number of 0’s in the string, rather than on the length of the string.
2.7. In Example 2.12, in order to show that every nonempty subset of V has a
smallest element, we chose P(n) to be the statement: Every subset of VV
containing n has a smallest element. Consider this alternative choice for
P(n): Every subset of \V containing at least n elements has a smallest
element. (We would want to try to prove that this statement P(n) is true for
every n > 1.) Why would this not be an appropriate choice?
In all the remaining exercises in this chapter, with the exception of 46 through
48, “prove” means “prove, using an appropriate version of mathematical induction.”

2.8. Prove that for every n > 0,


benant vent 1)

2.9. Suppose that do, a, ..., is a sequence of real numbers. Prove that for any
nisl,
n

> G = 4-1) = a, — a9
t=]

2.10. Prove that for every n > 1,


74+134+194+---+(6n+
1) =n@Gn+4)

2.11. Prove that for every n > 0,

Se 1 api
erie SE Tsay,
tite |
2.12. For natural numbers n and i satisfying 0 <i <n, let C(n, i) denote the
number n!/(i!(n — i)!).
a. Show that if0 <i <n,thenC(n,i) =C(w—1,i-—1)+C(@—1,i).
(You don’t need mathematical induction for this.)
b. Prove that for any n > 0,

scm, yes 2.
i=0

2.13. Suppose r is a real number other than 1. Prove that for any n > 0,
74 PART 1 Mathematical Notation and Techniques

2.14. Prove that for any n > 0,


n

1+) cixi!=@+D!
i=l
2.15. Prove that for any n > 4, n! > 2”.
2.16. Prove that if ao, a;, ... is a sequence of real numbers so that a, < a,41 for
every n = 0, then for every m,n > 0, ifm <n, then dm < dp.
2.17. Suppose x is any real number greater than —1. Prove that for any n > 0,
(1 +x)" > 1+ nx. (Be sure you say in your proof exactly how you use the
assumption that x > —1.)
2.18. A fact about infinite series is that the series )°*°, 1/i diverges (i.e., is
infinite). Prove the following statement, which implies the result: For every
n > 1, there is an integer k, > 1 so that ees 1/i > n. (Hint: the sum
1/(K+1)4+1/(K +2) +---+1/(2k) is at least how big?)
2.19. Prove that for every n > 1,

Yoix2i =(—1)*2"142
i=l
2.20. Prove that for every n > 2,

see |
132)" == S/n

2.21. Prove that for every n > 0, n is either even or odd, but not both. (By
definition, an integer n is even if there is an integer i so thatn = 2 *i, andn
is odd if there is an integer i so thatn = 2 *i + 1.)
2.22. Prove that for any language L C {0, 1}*, if L? C L, then L* C L.
2.23. Suppose that © is an alphabet, and that f : U* — &* has the property that
f(a) =a for every a € X and f(xy) = f(x) f(y) for every x, y € D*.
Prove thatfonevery x 6. , f(x) =x.
2.24. Prove that for every n > 0, n(n? + 5) is divisible by 6.
2.25. Suppose a and b are integers with 0 < a < b. Prove that for every n > 1,
b” — a” is divisible by b — a.
2.26. Prove that every positive integer is the product of a power of 2 and an odd
integer.
2.27. Suppose that A;, A>, ... are sets. Prove that for every 72. = 15

i=1 i=l
2.28. Prove that for every n > 1, the number of subsets of (betas alee ee
2.29. Prove that for every n > 1 and every m > 1, the number of functions from
(12, ca0y 1} O41; 2a aS mee
CHAPTER 2 Mathematical Induction and Recursive Definitions 75

2.30. In calculus, a basic formula involving derivatives is the product formula,


ae says that ifsfand g are functions that have derivatives,
a-(f #2), = f* a + oe * g. Using this formula and the fact that me =~,
prove that for any n > 1, 4(x") =nx""!,
2.31. The numbers a,,, for n > 0, are defined recursively as follows:

ay =239) a= —2)" form > 2, a=)5a,.4'— 6a,

Prove that for every n > 0, a, = 2*3" —4 x2".


2.32. The Fibonacci function was defined in Example 2.13 using the definition

fo=0; fi=1, forn > 2, fa = fr-1 + fr—r


a. Suppose C is a positive real number satisfying C > 8/13. Prove that for
everyn = Of, =< C(13/8)".
brat Prove that foreyery n> 0,>.ji5 fo i=ite fot
c. Prove that for every n > 0,-7.9 fi = foto — 1.
2.33. Suppose we define a real-valued function f on the natural numbers as
follows:
FO) = 0. “torn OFF mre a/ Lea = 1)
a. Prove that for every n > 0, f(n) < 2.
b. Prove that f is an increasing function; in other words, for every n > 0,
f+) > fi).
2.34. In Example 2.24, a proof was given using structural induction based on the
string y that for any strings x and y, |xy| = |x| +|y|. Can you prove the
result using structural induction based on x? Why or why not?
2.35. Prove that for any string x, |x’| = |x|.
2.36. Prove that if x is any string in AE (see Example 2.17), then any prefix of x
contains at least as many left parentheses as right.
2.37. Suppose we modify the definition of AE to remove the restriction that the
expressions be fully parenthesized. We call the new language GAE. One way
to define GAE is as follows.
(i) i € GAE.
(ii) For any x and y in GAE, both the strings x + y and x — y are in
GAE.
(iii) For any x € GAE, the string (x) is in GAE.
(iv) No other strings are in GAE.
a. Prove that every string in AE is in GAE.
b. Prove that every prefix of every string in GAE has at least as many left
parentheses as right.
c. Suppose that we define an integer-valued function N on the language
GAE as follows. N(i) = 0; for any x and y in GAE, WN assigns to both
the strings x + y and x — y the larger of the two numbers N(x) and
~ N(y); for any x € GAE, N assigns the value N(x) + | to the string (x).
76 PART 1 Mathematical Notation and Techniques

Describe in words (nonrecursively) what the value N(x) means for a


string x € GAE.
2.38. Consider the following modified version of the strong induction principle, in
which the basis step seems to have been eliminated.
To prove that the statement P(7) is true for every n > no, it is sufficient
to show that for any k > no, if P(n) is true for every n satisfying
no <n <k, then P(k) is true.

Assuming that the strong principle of mathematical induction is correct,


prove that this modified version is correct. (Note that the basis step has not
really been eliminated, only disguised.)

2.39. In each part below, a recursive definition is given of a subset of {a, b}*. Give
a simple nonrecursive definition in each case. Assume that each definition
includes an implicit last statement: “Nothing is in L unless it can be
obtained by the previous statements.”
a € L; for any x € L, xa and xb are in L.
a € L; for any x € L, bx and xb are in L.
a € L; for any x € L, ax and xb are in L.
a € L; for any x € L, xb, xa, and bx are in L.
a € L; for any x € L, xb, ax, and bx are in L.
2» a € L; for any x € L, xb and xba are in L.
moans
2.40. Give recursive definitions of each of the following sets.
The set NV of all natural numbers.
The set S of all integers (positive and negative) divisible by 7.
The set T of positive integers divisible by 2 or 7.
The set U of all strings in {0, 1}* containing the substring 00.
The set V of all strings of the form 0'1/, where j <i <2).
paos
mo The set W of all strings of the form 0'1/, where i > 27.
2.41. Let L and A be the languages defined in Example 2.19. Prove that L C A by
using induction on the length of the string.
2.42. Below are recursive definitions of languages L; and L>, both subsets of
{a, b}*. Prove that each is precisely the language L of all strings not
containing the substring aa.
a AELy;a€ Li;
For any x € L,, xb and xba are in L;;
Nothing else is in L.
b. A €Lj;a € Lo;
For any x € L», bx and abx are in Lp;
Nothing else is in Lp.
2.43. For each n > 0, we define the strings a, and b, in {0, 1}* as follows:

a9=0; bo =1; forn > 0, ay = ay_1 bp; Dy = bn—1An—1


CHAPTER 2 Mathematical Induction and Recursive Definitions 77

Prove that for every n > 0, the following statements are true.
a. The strings a, and b, are of the same length.
b. The strings a, and b, differ in every position.
c. The strings a2, and b2, are palindromes.
d. The string a, contains neither the substring 000 nor the substring 111.
2.44. The “pigeonhole principle” says that if n + 1 objects are distributed among n
pigeonholes, there must be at least one pigeonhole that ends up with more
than one object. A more formal version of the statement is that if f: A > B
and the sets A and B have n + | and n elements, respectively, then f cannot
be one-to-one. Prove the second version of the statement.
2.45. The following argument cannot be correct, because the conclusion is false.
Say exactly which statement in the argument is the first incorrect one, and
why it is incorrect.)
To prove: all circles have the same diameter. More precisely, for any
n > 1, if S is any set of n circles, then all the elements of S have the same
diameter. The basis step is to show that all the circles ina set of 1 circle have
the same diameter, and this is obvious. The induction hypothesis is that
k > land for any set S of k circles, all elements of S have the same diameter.
We wish to show that for this k, and any set S of k+ 1 circles, all the circles
in S have the same diameter. Let § = {C), Co,..., Cy41}. Consider the
GWieche 1 =, (Co, ee Gr) abut = Cr Op. 2. CeCe pn LIS
simply S with the circle C4, deleted, and R is S with the element C;,_;
deleted. Since both T and R are sets of k circles, all the circles in T have
the same diameter, and all the circles in R have the same diameter; these
statements follow from the induction hypothesis. Now observe that C;_
is an element of both T and R. If d is the diameter of this circle, then any
circle in T has diameter d, and so does any circle in R. Therefore, all the
circles in S have this same diameter.

MORE CHALLENGING PROBLEMS


2.46. Prove that if p and q are distinct primes, and the integer n is divisible by
both p and q, then n is divisible by pg. You may use the following generally
accepted fact from mathematics: If p is prime and m and n are positive
integers so that mn is divisible by p, then either m is divisible by p or 7 is
divisible by p.
2.47. Prove that if p and q are distinct primes, then pq is the smallest integer that
is divisible by both p and q.
2.48. Prove that if n is any positive integer that is not a perfect square, then J/n is
not rational. (See Example 2.3. You may need the generally accepted fact
mentioned in Exercise 2.46.)
78 PART 1 Mathematical Notation and Techniques

2.49. Prove that for every n > 1,

Yo vi > 2nJ/n/3
i=1
2.50. Prove that every integer greater than 17 is a nonnegative integer combination
of 4 and 7. In other words, for every n > 17, there exist integers i, and j,,
both> 0, so that n = i, * 4+jy, * 7.
2.51. Prove that for every n > 1, the number of subsets of {1, 2,..., 2} having an
even number of elements is 2”~!.
2.52. Prove that the ordinary principle of mathematical induction implies the
strong principle of mathematical induction. In other words, show that if P(n)
is a statement involving n that we wish to establish for every n > N, and
1. The ordinary principle of mathematical induction is true;
2. P(N) is true;
3. Foreveryk > N, (P(N) A P(N +1)... P(k)) > P(R+1)
then P(n) is true for every n > N.
2.53. Suppose that f is the Fibonacci function (see Example 2.13).
a. Prove that for every n > 0, f, = c(a” — b”), where

1 1+ 75 1-J/5
C=, @=—, anddb=
a) 2 Z

b. Prove that for every n > 0, f7,, = fn fnt2 + (—-1)".


2.54. Prove that every positive integer can be expressed uniquely as the sum of
distinct powers of 2. (Another way to say this is that every positive integer
has a unique binary representation. Note that there are really two things to
show: first, that every positive integer can be expressed as the sum of
distinct powers of 2; and second, that for every positive integer n, there
cannot be two different sets of powers of 2, both of which sum to n.)
PROBE Prove that for any n > 1, and any sequence a), a, ... , dy of positive real
numbers, and any sequence by, bo, ..., b, that is a permutation
(rearrangement) of the a;’s,

and the two expressions are equal if and only if b; = a; for every i.
2.56. Consider the following loop, written in pseudocode:

while B do
Sir

The meaning of this is what you would expect. Test B; if it is true, execute S;
test B again; if it is still true, execute S again; test B again; and so forth. In
other words, continue executing S as long as the condition B remains true. A
CHAPTER 2 Mathematical Induction and Recursive Definitions 79

condition P is called an invariant of the loop if whenever P and B are both


true and S is executed once, P is still true.
a. Prove that if P is an invariant of the loop, and P is true before the first
iteration of the loop (i.e., when B is tested the first time), then if the loop
eventually terminates (i.e., after some number of iterations, B is false), P
is still true.
b. Suppose x and y are integer variables, and initially x > 0 and y > 0.
Consider the following program fragment:

q = 0;
ee eG
while nis=,yido
Oeaa er
ay oh Se ree &

(The loop condition B is r > y, and the loop body S is the pair of
assignment statements.) By considering the condition (r > 0) A
(x = q * y +7), prove that when this loop terminates, the values of g
and r will be the integer quotient and remainder, respectively, when x is
divided by y; in other words, x = g * y+rand0 <r < y.
2.57. Suppose f is a function defined on the set of positive integers and satisfying
these two conditions:
Gg) fa)=1
Gi) forn = 1, fn) = fn +1) =2f@)
Prove that for every positive integer n, f (n) is the largest power of 2 less
than or equal to n.
2.58. The total time T (n) required to execute a particular recursive sorting
algorithm on an array of n elements is one second if n = 1, and otherwise no
more than Cn + 27 (n/2) for some constant C independent of n. Prove that
if n is any power of 2, say 2*, then

T(n) <nx*(Ck+1) =n(Clog,n + 1)

2.59. The function rev: &* — &* was defined recursively in Example 2.24.
Using the recursive definition, prove the following facts about rev. (Recall
that rev(x) is also written x”.)
a. For any strings x and y in &*, (xy)” = y’x’.
b. For any stringx € &*, (x")" =x.
c. For any string x € &* and any n > 0, (x”)" = (x")".
2.60. On the one hand, we have a recursive definition of the set pal, given in
Example 2.16. On the other hand, we have a recursive definition of x’, the
reverse of the string x, in Example 2.24. Using these definitions, prove that
pal= {x € &* |x" = x}.
80 PART 1 Mathematical Notation and Techniques

2.61. Prove that the language L defined by the recursive definition below is the set
of all elements of {a, b}* not containing the substring aab.
Nites:
For every x € L, xa, bx, and abx are in L;
Nothing else is in L.
2.62. In each part below, a recursive definition is given of a subset of {a, b}*. Give
a simple nonrecursive definition in each case. Assume that each definition
includes an implicit last statement: “Nothing is in L unless it can be
obtained by the previous statements.”
a. A, a, and aa are in L; for any x € L, xb, xba, and xbaa are in L.
b. A €L; for any x € L, ax, xb, and xba are in L.
2.63. Suppose L C {0, 1}*. Two languages L, and L, are defined recursively
below. (These two definitions were both given in Example 2.15 as possible
definitions of L*.)
Definition of L;:
1 sAe bis

2. Forany x € L; andany ye L,xye Ly.


3. No string is in L, unless it can be obtained by using rules 1 and 2.
Definition of L>:
Le Ave Ep.
2. Foranyx e€L,x € Lp.
3. For any two elements x and y of Lz, xy € Lp.
4. No string is in L> unless it can be obtained by using rules 1, 2, and 3.
a. Prove that L; C Lp.
b. Prove that for any two strings x and y in L), xy € Ly.
c. Prove that Ly € Ly.
2.64. Let L be the language defined in Example 2.22. Prove that L contains every
element of {a, b}* having more a’s than b’s.
2.65. A € L; for every x and y in L, axby and bxay are both in L; nothing else is
in L. Prove that L is precisely the set of strings in {a, b}* with equal
numbers of a’s and b’s.
2.66. Suppose S and T are both finite sets of strings, A ¢ T, and we have a
function e : S — T. The function e can be thought of as an encoding of the
elements of S, using the elements of T as code words. In this situation we
can then encode S* by letting the code string for x, x2...x, be
e* (X1X2...Xn) = e(x))e(x2)...e(X,). The encoding e has the prefix
property if there do not exist elements x; and x2 of S so that x; # x and
e(x)) is a prefix of e(x2). (If e has the prefix property, then in particular e is
one-to-one.) Prove that if e has the prefix property, then every element of T*
has at most one decoding—that is, the function e* is one-to-one.
2.67. (Adapted from the book by Paulos [1998]). A certain remote village contains
a large number of husband-wife couples. Exactly n of the husbands are
CHAPTER 2 Mathematical Induction and Recursive Definitions 81

unfaithful to their wives. Each wife is immediately aware of any other


husband’s infidelity and knows that the same is true of the other wives;
however, she has no way of knowing whether her own husband has been
unfaithful. (No wife ever informs on any other woman’s husband.) The
village also has a very strict code of morality; each wife follows the code
rigidly and knows that the other wives do also. If any wife determines
conclusively that her husband has been unfaithful, she must kill him on the
same day she finds this out. At midnight each night, if anyone in the village
has been killed that day, a public announcement is made, so that everyone
then knows. Finally, all the wives in the village are expert reasoners, and
each wife is aware that all the other wives are expert reasoners.
One day, a guru, whose pronouncements all the wives trust absolutely,
visits the village, convenes a meeting of all the wives, and announces to
them that there is at least one unfaithful husband in the village. What
happens as a result? Prove your answer. (Hint: if n = 1, the wife of the
unfaithful husband already knows that no other husband is unfaithful. She
concludes that her husband is unfaithful and kills him that day.)
2.68. Suppose P(m, n) is a statement involving two natural numbers m and n, and
suppose we can show these two statements: i) P(0, 0) is true; 11) For any
natural numbers i and /j, if P(i, j) is true, then P(i, 7+ 1) and P@i +1, 7)
are true. Does it follow that P(m, n) is true for all natural numbers m and n?
Give reasons for your answer.
2.69. Suppose S is a set of integers defined as follows: 0, € S; for every x € S,
x +5 € S; no other elements are in S. In order to show using structural
induction that every element of S satisfies some property P, it is enough to
show that 0 satisfies P, and if x € S and x satisfies P, then x + 5 also
satisfies P. Give an example of a property P for which this is true but for
which it is not true that x + 5 satisfies P for every integer x satisfying P. In
other words, give an example in which the structural induction proof cannot
be simplified as we did in Examples 2.21 and 2.23.
2.70. Suppose that U is a finite set that is closed under some binary operation o,
and J is a subset of U. Suppose also that S is defined as follows:
1. Every element of J is an element of S.
2. Forevery xandyinS,xoyeS.
3. Nothing else is in S.
Describe an algorithm that will determine, for some arbitrary element of U,
whether or not it is in S. (In particular, for each element of U, no matter
whether the answer is yes or no for that element, the algorithm must produce
the answer after a finite number of steps.)
ee kc aaa atl ros hes
RATSA
aH
oN / posit i
gtial\ey Caw ee e's, | woah moh avipgdl) niteam, oe " hi act
Pie eseT eee Mera eae i"Ave haat at oben” barat oo He
seed ten Doda cea vid wepliad wg oh ho Dw 0 leat Silage
> ot? hemdeal ey i im ae siete yeaah OM) pects
’ gt oe ateadtin si ‘ier Tey Stee are a Vase 6 eed Galu eyalihy
nla ately 4 Ty ) oi + Tele th lath ewe bee a.
lilac iA Med sear ‘ai yi? koe chegale od di ae rn > 2m,
sully ad grypeere 3 yeaa eth out ehinteod 4 hg aang tay re
oriep riety Lee ede orl i : il fon nade tegtt al) ) eta > Vien 4 7
bite oiseeee Herz 5 tpt wget colt Nin giluihl e0ral Pe ae
a Phiel Lee e / Sala UO clit’ hy Te J Unde say fou £ ny
“a alee Fase ed fa Sa (eet) lle eae ne wey ve ele fr ear eat) ~

ao etetng ka ager >! ae uta i Ve05. ole Ly eal) erie u digSs ax

jaar
u ~~ Se osm ted) Aat Yori | edt iy 2iFics
seal Feat
e - ' .

7 i Me, o > ’
ws No lee ti a wane a, Sei Biges c th enoerinit ’ a
SAE dat Regt) baa Sabied Sig cxidch theit Ties obits! tiiad
= re a) CAL ellie DOk TSO And fl Feyiy sini anc

Cpa i lend Amie ctieey SS gipclanls We USie Rigi (1, eh . a sue aioe oe)
wT tie Thee Ce. Lo are . &s
(A+ OO Cee eis wich partied (@ A ies Kotani 3"=
axhel
tea bn i Potions oi ae (i “aw po voile? eek ate ru i. ae ;

< Se Aedes wit ances oan > a :


i 2 Reet lies ox tan als se pila)Ye kes 5 +i 2 see? SS -
trian, pica Wee i (Gt Ay ioe seatii oo .¢ 3 Mt " a

‘Siegen cy "Yn M250 » wenlhials Pree tnd sturpaubert aa


ixadi Pood hs sr ; : ; > ae sea! An ay ih itt joie
a is
i Aad : i the r
oO . : —: Prep eT eeam Lvs i) jure
* agttetna _ as
sS
MAL ips act 44

abe guiteteie sy Hise uni new Now gt dodtyy


HOVSMASES Westin (rend yells agin UW Mite inee>ie Sr iy zhi foties Ss
» : : cay Liki we 4 Badtilisense os at =

s 6 - es : . . : enaenhel
7 & a P i) i A] oH, *-
4 P
PAU, Prunit ealh, Eats OS WAC hat SY hall Ses We
MOTH fa Gani? jente ck Wi nae 8a8 Dae ae
;- _ ] ji i \ oh "7
nn ay
ag
Dak as
ne we aS _ ' - “hs q
mS : a ‘oa 4
a s a

= : WN LN a .: os
ie oA Sy Lowe ee ai dil sale wh santas, Ap -

{ial (s * gil bi aie, sso


een
v pia x aa jinms , i

Ria stay oh apar St BIEY 1! Qa,Naeem NER .


ne aa j 12 ee e vi hn Beiim, mapa ita pallpt ti

ie
aig as ian a
i
<< oeiaeBosch yop
"
is
et ee

a a. an : *

@. f See awe
me
NY aa exis
Regular Languages
and Finite Automata

nteresting languages are likely to be infinite but must be describable in some


finite way. One method is to describe how the strings in the language can be
generated from simpler strings, using string operations, or how the language itself
can be generated from simpler languages using set operations. Another approach is
to specify an algorithmic procedure for recognizing whethera given string is in the
language.
The simplest languages we consider in this book are the regular languages,
which are those that can be generated from one-element languages by applying certain
standard operations a finite number of times. They are also precisely the ones that can
be recognized by devices called finite automata (FA), simple computing machines
with severely restricted memories. The second characterization provides some insight
into what it means for a language not to be regular, and it gives us a simple model of
computation we can generalize later, when we study more general languages.
In Part II, we consider these two as well as other characterizations of regular
languages. We obtain algorithms for translating one description of a language into
another description of a different type; we gain experience in using formal methods
to describe languages and trying to answer questions about them, in a simple setting
in which the models are relatively simple and the questions generally have answers;
and we examine ways in which regular languages themselves are useful in real-world
applications.

83
< fay
a ine :
7
= i

gepsupniai isiupoh Py
steno?wea otal RM ~~.
7 ~ =
: - = ) ae
. ~ a = ;
*
7B:

~~ a fs
-
> ie WH WN aot at Bar fei er ot ch tit cnm eeamaauall envite
.
a |
al Pa pntrgnis!, whi ai eyawe Ot) wie wie coe ae eg a: er
A HUnrans Jest. andvicrayy te griay »saniqgent Teas WE hows adn
7 i ano ra very Be Ay » Wyrtiss me Drea iva ct TH ery : 38, . itneG, ih phic os
ii a <a

'
eeemo ee en cials-ses,meeros) lasrray nije APs a
garry: 14 exyeuy! aT a eeiaenia i
AP? ithe LSS te) Da Pda , wattey he's rien vid Sige 2 irs |
Li dy) MiNi

oan eases siiduGnee Sql: 6AT) worine vied Bell ae nehyh ate

ee ee eas i eon hatuuls bouwey oi! eer pb thar.


7 1G hse ah pey's ow Pvt) 1) boar calinyss Sd of ka “gl aw price!
so piigead) NYA d wall eee Ow Holly eee wails Ward Lage

veliss Ww oe Ce a cmanaie
ra oh Tees! @bo aetiNiiior) son witulio et gregh, Gumia e
Gere eyieel i som) »i pO) pe? ye ct nenliepiraty 2
— Shelitivery
| Ree typrthe we oi ht vidio) Mid laourys Tavraer i) qtiren baw mova ;
: > | ROT Peek (noe aed wil Hay alqinia Yiiiniay ae Bee AAR
PTS Ye ite oe a) PAU nl iA fiat fiairte) rhe eh

nn
:
bs 7

“~~ ~
~. —
C HAPTER

Regular Languages
and Finite Automata

3.1 |REGULAR LANGUAGES


AND REGULAR EXPRESSIONS
Nonnull strings over an alphabet & are created by concatenating simple strings, those
of length 1. Since concatenation can also be thought of as an operation on languages,
we may consider the languages obtained by concatenation from the simple languages
of the form {a}, where a € &. If concatenation is the only operation we allow,
however, we can get only single strings or languages that contain single strings.
Adding the set operation of union permits languages with several elements, and if
we allow the Kleene * operation, which arises naturally from concatenation, we can
produce infinite languages as well.
To the simple languages of the form {a}, we add two more: the empty language
@ and the language {A} whose only element is the null string.
A regular language over an alphabet & is one that can be obtained from these
basic languages using the operations of union, concatenation, and Kleene *. A regular
language can therefore be described by an explicit formula. It is common to sim-
plify the formula slightly, by leaving out the set brackets {} or replacing them with
parentheses and by replacing U by +; the result is called a regular expression.
Here are several examples of regular languages over the alphabet {0, 1}, along
with the corresponding regular expressions.
Language Corresponding Regular Expression
1. {A} A
2. {0} 0
3. {001} (i.e., {0}{O}{1}) 001
4. {0, 1} (i.e., {0} U {1}) O)-Fil
5. {0, 10} (ie., {0} U {10}) OFF 10

85
86 PART 2 Regular Languages and Finite Automata

Language Corresponding Regular Expression


6. {1, A}{001} (i + A)OO01
7« {V10}*{0-} (110)*(0+ 1)
8. {1}*{10} 110
9; {10, Tit, tLOLO}* (10 + 111 + 11010)*
10. {0, 10}*({11}* U {001, A}) (O- 10)*(Cd)* = 001 + A)

We think of a regular expression as representing the “most typical string” in the


corresponding language. For example, 1*10 stands for a string that consists of the
substring 10 preceded by any number of 1’s.
The phrase we used above, “obtained from these basic languages using the oper-
ations of union, concatenation, and Kleene *,” should suggest a recursive definition
of the type we studied in Section 2.4.2. It will be helpful to complicate the definition
a little so that it defines not only the regular languages'but also the regular expressions
corresponding to them.

Definition 3.1 Regular Panguades ETatel at-\e[ 1-1 Brean over >

nesetR of regular language over ponding regular


ressions, aredefined as follo v _.

Bis
is anelement of R, and the corresp g regular expression is 9.
{A}is
is anelement . R, and theoe regular ia esis AL

(b) Li i is an ce of R, and the corresponding regular expr


(c) L is an element of R, and the corresponding regular ke

: Only those. igneuners that can be obtained by an statement


regular languages over bi.

The empty language is included in the definition primarily for the sake of con-
sistency. There will be a number of places where we will want to say things like
“To every something-or-other, there corresponds a regular language,” and without
the language 4 we would need to make exceptions for trivial special cases.
Our definition of regular expressions is really a little more restrictive in several
respects than we need to be in practice. We use notation such as L? for languages,
and it is reasonable to use similar shortcuts in the case of regular expressions. Thus
we sometimes write (r7) to stand for the regular expression (rr), (r*) to stand for
the regular expression ((r*)r), and so forth. You should also note that the regular
CHAPTER 3 Regular Languages and Finite Automata 87

expressions we get from the definition are fully parenthesized. We will usually relax
this requirement, using the same rules that apply to algebraic expressions: The Kleene
* operation has the highest precedence and + the lowest, with concatenation in be-
tween. This rule allows us to write a + b*c, for example, instead of (a + ((b*)c)).
Just as with algebraic expressions, however, there are times when parentheses are
necessary. The regular expression (a + b)* is a simple example, since the languages
corresponding to (a + b)* and a + b* are not the same.
Let us agree to identify two regular expressions if they correspond to the same
language. At the end of the last paragraph, for example, we could simply have said

(a+b)* 4a+b*

instead of saying that the two expressions correspond to different languages. With this
convention we can look at a few examples of rules for simplifying regular expressions
over {0, 1}:

Gee A="
1*1* = 1*
OF 41 = 1? 20"
(0*1*)* = (0+ 1)*
(0+ 1)*01(0 + 1)* + 1*0* = (0+ 1)*
(All five are actually special cases of more general rules. For example, for any two
regular expressions r and s, (r*s*)* = (r + s)*.) These rules are really statements
about languages, which we could have considered in Chapter 1. The last one is
probably the least obvious. It says that the language of all strings of 0’s and 1’s (the
right side) can be expressed as the union of two languages, one containing all the
strings having the substring 01, the other containing all the others. (Saying that all
the 1’s precede all the 0’s is the same as saying that 01 is not a substring.)
Although there are times when it is helpful to simplify a regular expression as
much as possible, we will not attempt a systematic discussion of the algebra of regular
expressions. Instead, we consider a few more examples.

Strings of Even Length | EXAMPLE3.1 |


Let L C {0, 1}* be the language of all strings of even length. (Since 0 is even, A € L.) Is L
regular? If it is, what is a regular expression corresponding to it?
Any string of even length can be obtained by concatenating zero or more strings of length
2. Conversely, any such concatenation has even length. It follows that

Er = {O0; OF, 10n it}

so that one regular expression corresponding to L is (00 + 01 + 10 + 11)*. Another is


(0+ 1)0+ 1))*.
88 PART 2 Regular Languages and Finite Automata

| EXAMPLE3.2 | Strings with an Odd Number of 1’s


Let L be the language of all strings of 0’s and 1’s containing an odd number of 1’s. Any string
in L must contain at least one 1, and it must therefore start with a string of the form 0'10/.
There is an even number (possibly zero) of additional 1’s, each followed by zero or more 0’s.
This means that the rest of the string is the concatenation of zero or more pieces of the general
form 10”10”. One regular expression describing L is therefore
0* 10*(10* 10*)*
A slightly different expression, which we might have obtained by stopping the initial substring
immediately after the 1, is
0*1(0*10*1)*0*
If we had begun by considering the Jast 1 in the string, rather than the first, we might have
ended up with

(0* 10* 1)*0*10*


A more complicated answer that is still correct is
0*(10* 10*)*1(0* 10* 1)*0*
In this case the 1 that is emphasized is somewhere in the middle, with an even number of 1’s on
either side of it. There are still other ways we could describe a typical element of L, depending
on which aspect of the structure we wanted to emphasize, and there is not necessarily one that
is the simplest or the most natural. The important thing in all these examples is that the regular
expression must be general enough to describe every string in the language. One that does not
quite work, for example, is
OF 102)" 10"
since it does not allow for strings beginning with 0. We could correct this problem by inserting
O* at the beginning, to obtain

0*(10* 10*)*10*
This is a way of showing the last 1 in the string explicitly, slightly different from the third
regular expression in this example.

| EXAMPLE3.3 | Strings of Length 6 or Less


Let L be the set of all strings over {0, 1} of length 6 or less. A simple but inelegant regular
expression corresponding to L is
A+0+1+00+01+ 10+ 11+0004+.---+ 111110+ 111111
A regular expression to describe the set of strings of length exactly 6 is

0+ 10+10+)D04+104+1)0+1)
or, in our extended notation, (0 + 1)°. To reduce the length, however, we may simply allow
some or all of the factors to be A. We may therefore describe L by the regular expression

(0+1+4+A)°
a
a ela ae a
e
CHAPTER 3 Regular Languages and Finite Automata 89

Strings Ending in 1 and Not Containing 00 | EXAMPLE 3.4 |


This time we let L be the language

L = {x € {0, 1}* |x ends with 1 and does not contain the substring 00}

In order to find a regular expression for L, we try stating the defining property of strings in L in
other ways. Saying that a string does not contain the substring 00 is the same as saying that no
0 can be followed by 0, or in other words, every 0 either comes at the very end or is followed
immediately by 1. Since strings in L cannot have 0 at the end, every 0 must be followed by 1.
This means that copies of the strings 01 and 1 can account for the entire string and therefore
that every string in L corresponds to the regular expression (1 + 01)*. This regular expression
is a little too general, however, since it allows the null string. The definition says that strings in
L must end with 1, and this is stronger than saying they cannot end with 0. We cannot fix the
problem by adding a | at the end, to obtain (1+01)*1 because now our expression is not general
enough; it does not allow 01. Allowing this choice at the end, we obtain (1 + 01)*(1 + 01), or
(1+01)*.

The Language of C Identifiers | EXAMPLE 3.5 |


For this example a little more notation will be useful. Let us temporarily use / (for “letter’’) to
denote the regular expression

atb+---+z+A+B+H+:--+Z
andd (for “digit’’) to stand for

0+14+2+---49

An identifier in the C programming language is any string of length 1 or more that contains
only letters, digits, and underscores (_) and begins with a letter or an underscore. Therefore, a
regular expression for the language of all C identifiers is

d+_)jd+d+_)*

Real Literals in Pascal | EXAMPLE3.6 |

Suppose we keep the abbreviations / and d as in the previous example and introduce the
additional abbreviations s (for “sign”) and p (for “point”). The symbol s is shorthand for
A +a+m, where a is “plus” and m is “minus.” Consider the regular expression
sd* (pd* + pd*Esd* + Esd*)

(Here E is not an abbreviation, but one of the symbols in the alphabet.) A typical string
corresponding to this regular expression has this form: first a sign (plus, minus, or neither);
one or more digits; then either a decimal point and one or more digits, which may or may
not be followed by an E, a sign, and one or more digits, or just the E, the sign, and one or more
digits. (If nothing else, you should be convinced by now that one regular expression is often
worth several lines of prose.) This is precisely the specification for a real “Jiteral,” or constant,
in the Pascal programming language. If the constant is in exponential format, no decimal point
90 PART 2 Regular Languages and Finite Automata

is needed. If there is a decimal point, there must be at least one digit immediately preceding
and following it.

3.2| THE MEMORY REQUIRED TO


RECOGNIZE A LANGUAGE
When we discuss the problem of recognizing a language (deciding whether an arbi-
trary input string is in the language), we will be following two conventions for the
time being. First, we will restrict ourselves to a single pass through the input, from
left to right. Although this restriction is somewhat arbitrary, it allows us to consider
how much information must be “remembered” during the processing of the input, and
this turns out to be a useful criterion for classifying languages. Second, rather than
waiting until the end of the input string to reach a decision (and having to assume that
the end of the string is marked explicitly), we make a tentative decision after each
input symbol. This allows us to process a string the same way, whether it represents
the entire input or a prefix of a longer string. The processing produces a sequence of
tentative decisions, one for each prefix, and the final answer for the string is simply
the last of these.
The question we want to consider is how much information we need to remember
at each step, in order to guarantee that our sequence of decisions will always be correct.
The two extremes are that we remember everything (that is, exactly what substring we
have read) and that we remember nothing. Remembering nothing might be enough!
For example, if the language is empty, the algorithm that answers “no” at each step,
regardless of the input, is the correct one; if the language is all of D*, answering
“yes” at each step is correct. In both these trivial cases, since the answer we return
is always the same, we can continue to return the right answer without remembering
what input symbols we have read, or remembering that we have read one substring
rather than another.
In any situation other than these two trivial ones, however, the answers in the
sequence are not identical. There are two strings x and y for which the answers
are different. This means that the information we remember at the point when we
have received input string x must be different from what we remember when we
have received input string y, for otherwise we would have no way to distinguish the
two strings. Therefore, in at least one of these two situations we must remember
something.

Strings Ending with 0


Let L be the language {0, 1}*{0} of all strings in {0, 1}* that end with 0. Then for any nonnull
input string x, whether or not x € L depends only on the last symbol. Another way to say this
is that there is no need to distinguish between one string ending with 0 and any other
string
ending with 0, or between one string ending with 1 and any other string ending with 1. Any
two strings ending with the same symbol can be treated exactly the same way.
CHAPTER 3 Regular Languages and Finite Automata 91

The only string not accounted for is A. However, there is no need to distinguish between
A and a string ending with 1: Neither is in the language, because neither ends with 0, and
once we get one more symbol we will not remember enough to distinguish the resulting strings
anyway, because they will both end with the same symbol. The conclusion is that there are
only two cases (either the string ends with 0 or it does not), and at each step we must remember
only which case we have currently.

Strings with Next-to-Last Symbol 0 | EXAMPLE3.8


3.8 |
Let L be the language of all strings in {0, 1}* with next-to-last symbol 0. Following the last
example, we can say that the decision we make for a string depends on its next-to-last symbol
and that we must remember at least that much information. Is that enough? For example, is it
necessary to distinguish between the strings 01 and 00, both of which have next-to-last symbol
0?
Any algorithm that does not distinguish between these two strings, and treats them exactly
the same, is also unable to distinguish the two strings obtained after one more input symbol.
Now it is clear that such an algorithm cannot work: If the next input is 0, for example, the two
resulting strings are 010 and 000, and only one of these is in the language. For the same reason,
any correct algorithm must also distinguish between 11 and 10 because their last symbols are
different.
We conclude that, for this language, it is apparently necessary to remember both the last
two symbols. For strings of length at least 2, there are four separate cases. Just as in the
previous example, we can see that the three input strings of lengthless than 2 do not require
separate cases. Both the strings A and 1 can be treated exactly like 11, because for either string
at least two more input symbols will be required before the current string is in the language, and
at that point, the string we had before those two symbols is irrelevant. The string 0 represents
the same case as 10: Neither string is in the language, and once another input is received, both
current strings will have the same last two symbols. The four cases we must distinguish are
these:

a. The string is A or | or ends with 11.


b. The string is 0 or ends with 10.
c. The string ends with 00.
d. The string ends with 01.

Strings Ending with 11

This time, let L = {0, 1}*{11}, the language of all strings in {0, 1}* ending with 11. We can eas-
ily formulate an algorithm for recognizing L in which we remember only the last two symbols
of the input string. This time, in fact, we can get by with even a little less.
11. For
First, it is not sufficient to remember only whether the current string ends with
and one
example, suppose the algorithm does not distinguish between a string ending in 01
in 11. Thenif the next inputis 1, the algorithm will
ending in 00, on the grounds that neither ends
not be able to distinguish between the two new strings, which end in 11 and 01, respectively.
92 PART 2 Regular Languages and Finite Automata

This is not correct, since only one of these strings is in L. The algorithm must remember
enough now to distinguish between 01 and 00, so that it will be able if necessary to distinguish
between 011 and 001 one symbol later.
Two strings ending in 00 and 10, however, do not need to be distinguished. Neither string
is in L, and no matter what the next symbol is, the two resulting strings will have the same last
two symbols. For the same reason, the string 1 can be identified with any string ending in 01.
Finally, the two strings 0 and A can be identified with all the other strings ending in 0: In
all these cases, at least two more input symbols are required to produce an element of L, and
at that point it will be unnecessary to remember anything but the last two symbols.
Any algorithm recognizing L and following the rules we have adopted must distinguish
the following three cases, and it is sufficient for the algorithm to remember which of these the
current string represents:

a. The string does not end in 1. (Either it is A or it ends in 0.)


b. The string is 1 or ends in 01.
c. The string ends in 11.

| EXAMPLE 3.10 Strings with an Even Number of 0’s and an Odd Number of 1’s
Consider the language of strings x in {0, 1}* for which no(x) is even and n;(x) is odd. One
way to get by with remembering less than the entire current string would be to remember just
the numbers of 0’s and 1’s we have read, ignoring the way the symbols are arranged. For
example, it is not necessary to remember whether the current string is 011 or 101. However,
remembering this much information would still require us to consider an infinite number of
distinct cases, and an algorithm that remembers much less information can still work correctly.
There is no need to distinguish between the strings 011 and 0001111, for example: The current
answers are both “no,” and the answers will continue to be the same, no matter what input
symbols we get from now on. In the case of 011 and 001111, the current answers are also both
“no”; however, these two strings must be distinguished, since if the next input is 1, the answer
should be “no” in the first case but “yes” in the second.
The reason 011 and 0001111 can be treated the same is that both have an odd number of
0’s and an even number of 1’s. The reason 011 and 001111 must be distinguished is that one
has an odd number of 0’s, the other an even number. It is essential to remember the parity (i.e.,
even or odd) of both the number of 0’s and the number of 1’s, and this is also sufficient. Once
again, we have four distinct cases, and the only information about an input string that we must
remember is which of these cases it represents.

| EXAMPLE 3.11 | A Recognition Algorithm for the Language in Example 3.4


As in Example 3.4, let

L = {x € {0, 1}* |x ends in 1 and does not contain the substring 00}

Suppose that in the course of processing an input string, we have seen the string s so far. If s
already contains the substring 00, then that fact is all we need to remember; s is not in L, and
no matter what input we get from here on, the result will never be in L. Let us denote this case
by the letter NV.
CHAPTER 3 Regular Languages and Finite Automata 93

Consider next two other cases, in both of which 00 has not yet occurred: case 0, in which
the last symbol of s is 0, and case 1, in which the last symbol is 1. In the first case, if the
next input is 0 we have case N, and if the next input is 1 we have case 1. Starting in case
1, the inputs 0 and 1 take us to cases 0 and 1, respectively. These three cases account for all
substrings except A. This string, however, must be distinguished from all the others. It would
not be correct to say that the null string corresponds to case N, because unlike that case there
are possible subsequent inputs that would give us a string in L. A does not correspond to case
0, because if the next input is 0 the answers should be different in the two cases; and it does
not correspond to case 1, because the current answers should already be different.
Once again we have managed to divide the set {0, 1}* into four types of strings so that in
order to recognize strings in L it is sufficient at each step to remember which of the four types
we have so far.

We can summarize Examples 3.7—3.11 by the schematic diagrams in Figure 3.1.


A diagram like this can be interpreted as a flowchart for an algorithm recognizing the
language. In each diagram, the circles correspond to the distinct cases the algorithm
is keeping track of, or the distinct types of strings in our classification. The two
circles in Figure 3.1a (corresponding to Example 3.7) represent strings that do not
end with 0 and strings that do, respectively. In Figures 3.1b and 3.1c, corresponding
to Examples 3.8 and 3.9, the circles represent the cases involving the last two symbols
that must be distinguished. In Figure 3.1d, the label used in each circle is a description
of the parities of the number of 0’s and the number of 1’s, respectively, in the current
string. We have already discussed the labeling scheme in Figure 3.le.
In these diagrams, the short arrow not originating at one of the circles indicates
the starting point of the algorithm, the case that includes the null string A. The double
circles in each case designate cases in which the current string is actually an element
of L. This is the way the flowchart indicates the answer the algorithm returns for
each string.
The arrows originating at a circle tell us, for each possible next symbol, which
case results. As we have already described, this information is all the algorithm needs
to remember. In Figure 3.1e, for example, if at some point we are in case 0 (i.e., the
current substring ends in 0 and does not contain 00) and the next symbol is 1, the
arrow labeled 1 allows us to forget everything except the fact that the new substring
ends with 1 and does not contain 00.
The last sentence of the preceding paragraph is misleading in one sense. Although
in studying the algorithm it is helpful to think of case 1 as meaning “The substring we
have received so far ends in 1 and does not contain 00,” it is not necessary to think of
it this way at all. We could give the four cases arbitrary, meaningless labels, and as
long as we are able to keep track of which case we are currently in, we will be able to
execute the algorithm correctly. The procedure is purely mechanical and requires no
understanding of the significance of the cases. A computer program, or a machine,
could do it.
the
This leads us to another possible interpretation of these diagrams, which is
one we will adopt. We think of Figure 3.le, for example, as specifying an abstract
PART 2 Regular Languages and Finite Automata

Figure 3.1 |
(a) Strings ending in 0; (b) Strings with next-to-last symbol 0; (c) Strings ending
with 11; (d) Strings with no even and n, odd; (e) A recognition algorithm for the
language in Example 3.4.

machine that would work as follows: The machine is at any time in one of four
possible states, which we have arbitrarily labeled A, 0, 1, and N. When it is activated
initially, it is in state A. The machine receives successive inputs of 0 or 1, and as a
result of being in a certain state and receiving a certain input, it moves to the state
specified by the corresponding arrow. Finally, certain states are accepting states (state
1 is the only one in this example). A string of 0’s and 1’s is in L if and only if the
state the machine is in as a result of processing that string is an accepting state.
It seems reasonable to refer to something of this sort as a “machine,” since
one can visualize an actual piece of hardware that works according to these rough
specifications. The specifications do not say exactly how the hardware works—
exactly how the input is transmitted to the machine, for example, or whether a ayes
answer corresponds to a flashing light or a beep. For that matter, the “machine”
might exist only in software form, so that the strings are input data to a program. The
CHAPTER 3 Regular Languages and Finite Automata 95

phrase abstract machine means that it is a specification, in some minimal sense, of


the capabilities the machine needs to have. The machine description does not say
what physical status the “states” and “inputs” have. The abstraction at the heart of
the machine is the set of states and the function that specifies, for each combination
of state and input symbol, the state the machine goes to next. The crucial property is
the finiteness of the set of states. This is significant because the size of the set puts an
absolute limit on the amount of information the machine can (or needs to) remember.
Although strings in the language can be arbitrarily long—and will be, unless the
language is finite—remembering a fixed amount of information, independent of the
size of the input, is sufficient. Being able to distinguish between these states (or
between strings that lead to these states) is the only form of memory the machine has.
The more states a machine of this type has, the more complicated a language it
will be able to recognize. However, the requirement that the set of states be finite is a
significant constraint, and we will be able to find many languages (see Theorem 3.3
for an example) that cannot be recognized by this type of machine, or this type of
algorithm. We will show in Chapter 4 that the languages that can be recognized this
way are precisely the regular languages. The conclusion, which may not have been
obvious from the discussion in Section 3.1, is that regular languages are fairly simple,
at least in principle, and there are many languages that are not regular.

3.31 FINITE AUTOMATA


In Section 3.2, we were introduced to a simple type of language-recognizing machine.
Now we are ready for the official definition.

Definition 3.2 Pe htion of a Finite one

iaS-tup
finite-staté machine (abbreviated FA) is
:finiteautomaton, or

sa finiteset hose elements we will think aeas states),


isafinite alphabet of input symbols,
_ (the:initial state);
"(the set of accepting states),
éisa unctionfrom 2 x x to O (the transition function). _

If you have not run into definitions like this before, you might enjoy what the
mathematician R. P. Boas had to say about them, in an article in The American
s
Mathematical Monthly (88: 727-731, 1981) entitled “Can We Make Mathematic
Intelligible?”:
early
There is a test for identifying some of the future professional mathematicians at an
beginning “Let X be an
age. These are students who instantly comprehend a sentence
96 PART 2 Regular Languages and Finite Automata

ordered quintuple (a, T, 7, 0, B), where ...” They are even more promising if they add,
“T never really understood it before.”

Whether or not you “instantly comprehend” a definition of this type, you can
appreciate the practical advantages. Specifying a finite automaton requires that we
specify five things: two sets Q and &, an element go and a subset A of Q, anda
function from Q x ¥ to Q. Defining a finite automaton to be a 5-tuple may seem
strange at first, but it is simply efficient use of notation. It allows us to talk about the
five things at once as though we are talking about one “object”; it will allow us to say
Let M = (Q, &, qo, A, 5) be an FA

instead of

Let M be an FA with state set Q, input alphabet ©, initial state go, set of
accepting states A, and transition function 6.

| EXAMPLE 3.12 | Strings Ending with 10


Figure 3.2a gives a transition diagram for an FA with seven states recognizing L = {0, 1}*{10},
the language of all strings in {0, 1}* ending in 9. Until it gets more than two inputs, the FA
remembers exactly what it has received (there is a separate state for each possible input string
of length 2 or less). After that, it cycles back and forth among four states, “remembering” the
last two symbols it has received. Figure 3.2b describes this machine in tabular form, by giving
the values of the transition function for each state-input pair.
We would expect from Example 3.9 that this FA is more complicated than it needs to be.
In particular, the rows for the three states 1, 01, and 11 in the transition table are exactly the
same. If x, y, and z are strings causing the FA to be in the three states, then x, y, and z do not
need to be distinguished now, since none of the three states is an accepting state; and the three
strings that will result.one input symbol later cannot be distinguished. The conclusion is that
the three states do not represent cases that need to be distinguished. We could merge them into
one and call it state B.
The rows for the three states 0, 00, and 10 are also identical. These three states cannot
all be merged into one, because 10 is an accepting state and the other two are not. The two
nonaccepting states can, and we call the new state A.
At this point, the number of states has been reduced to 4, and we have the transition table

State

We can go one step further, using the same reasoning as before. In this new FA, the rows
for states A and A are the same, and neither A nor A is an accepting state. We can therefore
CHAPTER 3 Regular Languages and Finite Automata

(a) (b)

Figure 3.2 |
A finite automaton recognizing {0, 1}*{10}. (a) Transition diagram;
(b) Transition table. .

(a) (b)

Figure 3.3 |
A simplified finite automaton recognizing {0, 1}*{10}: (a) Transition
diagram; (b) Transition table.

include A in the state A as well. The three resulting states are A, B, and 10, and it is easy to see
that the number cannot be reduced any more. The final result and the corresponding transition
table are pictured in Figure 3.3.
We might describe state A as representing “no progress toward 11,” meaning that the
machine has received either no input at all, or an input string ending with 0 but not 10. B
stands for “halfway there,” or last symbol 1. As we observed after Example 3.11, however,
these descriptions of the states are not needed to specify the abstract machine completely.
PART 2 Regular Languages and Finite Automata

The analysis that led to the simplification of the FA in this example can be turned into a
systematic procedure for minimizing the number of states in a given FA. See Section 5.2 for
more details.

For an arbitrary FA M = (Q, =, qo, A, 45), the expression 6(q, a) is a concise


way of writing “the state to which the FA goes, if it is in state g and receives input a.”
The next step is to extend the notation so that we can describe equally concisely “the
state in which the FA ends up, if it begins in state g and receives the string x of input
symbols.” Let us write this as 6*(q, x). The function 5*, then, is an extension of the
transition function 6 from the set Q x & to the larger set Q x X*. How do we define
the function 6* precisely? The idea behind the function is simple enough. If x is the
string a1a ...d,, we want to obtain 5*(q, x) by first going to the state g; to which M
goes from state g on input a); then going to the state g. to which M goes from q on
input a2; ...; and finally, going to the state g, to which M goes from g,_; on input
ay. Unfortunately, so far this does not sound either particularly precise or particularly
concise. Although we can replace many of the phrases by mathematical formulas (for
example, “the state to which M goes from state gq on input a,” is simply 5(q, a1)),
we still have the problem of the ellipses. At this point, we might be reminded of the
discussion in Example 2.14.
The easiest way to define 5* precisely is to give a recursive definition. For a
particular state g, we are trying to define the expression 5*(q, x) for each string x in
u*. Using the recursive definition of U* (Example 2.15), we can proceed as follows:
Define 5*(g, A); then, assuming we know what 5*(q, y) is, define 6*(q, ya) for an
element a of D.
The “basis” part of the definition is not hard to figure out. We do not expect
the state of the FA to change as a result of getting the input string A, and we define
6*(q, A) = q, for every g € Q. Now, 6*(q, ya) is to be the state that results when
M begins in state q and receives first the input string y, then the single additional
symbol a. The state M is in after getting y is 5*(q, y); and from any state p, the state
to which M moves from p on the single input symbol a is 6(p, a). This means that
the recursive part of the definition should define 5*(q, ya) to be 8(5 “(Gs 9) sa):

Definition 3.3. The Extended Transition Function 5*

Let M@=. 2. qo. A,5) be an FA. We defin the function

(q, A) — |
: orany gq € Q,5°
Foranyg € Q.y ¢ E* anda ¢ &

It is important to understand that, in adopting the recursive definition of 5*, we


have not abandoned the intuitive idea with which we first approached the definition.
CHAPTER 3 Regular Languages and Finite Automata

a b
O-)+-@)-@
Figure 3.4 |

The point is that the recursive definition is the best way to capture this intuitive idea
in a formal definition. Using this definition to calculate 5* (q, X) amounts to just what
you would expect, given what we wanted the function 5* to represent: processing the
symbols of x, one at a time, and seeing where the transition function 5 takes us at
each step. Suppose for example that M contains the transitions shown in Figure 3.4.
Let us use Definition 3.3 to calculate 5*(q, abc):

6*(q, abc) = 5(5*(q, ab), c)


= 0(0(6°(q,,a), b), Cc)
= 0(0(0 (Gg NG), D):.€)
= 6(6(6(6*(q, A), a), b), c)
= 6(6(6(q, a), b), c)
= 6(5(q1, b), c)
= 8(q2, Cc)

Note that in the calculation above, it was necessary to calculate 6*(q, a) by using the
recursive part of the definition, since the basis part involves 6*(q, A). Fortunately,
6*(q, a) turned out to be 5(q, a); for strings of length 1 (i.e., elements of ©), 6 and 5*
can be used interchangeably. For a string x with |x| 4 1, however, writing 5(q, x) is
incorrect, because the pair (g, x) does not belong to the domain of 6.
Other properties you would expect 6* to satisfy can be derived from our definition.
For example, a natural generalization of statement 2 of the definition is the formula

5°(q, xy) = 6" (8"(q, x), y)


which should be true for any g € Q and any two strings x and y. The proof is by
mathematical induction, and the details are left to you in Exercise 3.22.
Now we can state more concisely what it means for an FA to accept a string and
what it means for an FA to accept a language.

Definition 3.4 Acceptance by an FA

be an FA. A string x € &* is accept


not accepted, we say it is rejecte
1e language recognized by M, is
xis accepted by M1.
accepted, or recognized,
100 PART 2 Regular Languages and Finite Automata

—Or> (anything)

Figure 3.5 |

Notice what the last statement in the definition does not say. It does not say that
L is accepted by M if every string in L is accepted by M. If it did, we could use
the FA in Figure 3.5 to accept any language, no matter how complex. The power
of a machine does not lie in the number of strings it accepts, but in its ability to
discriminate—to accept some and reject others. In order to accept a language L, an
FA has to accept all the strings in L and reject all the strings in L’.
The terminology introduced in Definition 3.4 allows us to record officially the
following fact, which we have mentioned already but will not prove until Chapter 4.

This theorem says on the one hand that if M is any FA, there is aregular expression
corresponding to the language L(M); and on the other hand, that given a regular
expression, there is an FA that accepts the corresponding language. The proofs in
Chapter 4 will actually give us ways of constructing both these things. Until then,
many examples are simple enough that we can get by without a formal algorithm.

Finding a Regular Expression Corresponding to an FA


Let us try to describe by a regular expression the language L accepted by the FA in Figure 3.6.
The state labeled A is both the initial state and an accepting state; this tells us that A € L.
More generally, even-length strings of 0’s (of which A is one) are in L. These are the only
strings x for which 5*(A, x) = A, because the only arrow to A from another state is the one
from state 0, and the only arrow to that state is the one from A. These strings correspond to
the regular expression (00)*.

Figure 3.6 |
A finite automaton accepting {00}*{11}*.
CHAPTER 3 Regular Languages and Finite Automata 101

The state labeled D serves the same purpose in this example that N did in Example 3.11.
Once the FA reaches D, it stays in that state; a string x for which 6*(A, x) = D cannot be a
prefix of any element of L.
The easiest way to reach the other accepting state B from A is with the string 11. Once the
FA is in state B, any even-length string of 1’s returns it to B, and these are the only strings that
do this. Therefore, if x is a string that causes the FA to go from A to B without revisiting A, x
must be of the form 11(11)* for some k > 0. The most general type of string that causes the FA
to reach state B is a string of this type preceded by one of the form (00)/, and a corresponding
regular expression is (00)*11(11)*.
By combining the two cases (the strings x for which 6*(A, x) = A and those for which
6*(A, x) = B), we conclude that the language L corresponds to the regular expression (00)* +
(00)*11(11)*, which can be simplified to

(00)*(11)*

Another Example of a Regular Expression Corresponding to an FA


Next we consider the FA / in Figure 3.7, with input alphabet {a, b}. One noteworthy feature
of this FA is the fact that every arrow labeled b takes the machine to state B. As a result, every
string ending in b causes M to be in state B; in other words, for any string x, 6*(A, xb) = B.
Therefore, 5*(A, xbaaa) = 5*(6*(A, xb), aaa) = 5*(B, aaa) = E, which means that any
string ending in baaa is accepted by M. (The first of these equalities uses the formula in
Exercise 3.22.)
On the other hand, we can see from the diagram that the only way to get to state E is to
reach D first and then receive input a; the only way to reach D is to reach C and then receive
a; the only way to reach C is to reach B and then receive a; and the only way to reach B is
to receive input b, although this could happen in any state. Therefore, it is also true that any

(2

Figure 3.7 |
A finite automaton M accepting
{a, b}*{baaa}.
102 PART 2 Regular Languages and Finite Automata

string accepted by M must end in baaa. The language L(M) is the set of all strings ending in
baaa, and a regular expression corresponding to L(M) is (a + b)*baaa.
This is a roundabout way of arriving at a regular expression. It depends on our noticing
certain distinctive features of the FA, and it is not clear that the approach will be useful for
other machines. An approach that might seem more direct is to start at state A and try to
build a regular expression as we move toward E. Since 5(A,a) = A, we begin the regular
expression with a*. Since (A, b) = B and 6(B, b) = B, we might write a*bb* next. Now the
symbol a takes us to C, and we try a*bb*a. At this point it starts to get complicated, however,
because we can now go back to B, loop some more with input b, then return to C—and we
can repeat this any number of times. This might suggest a*bb*a(bb*a)*. As we get closer to
E, there are more loops, and loops within loops, to take into account, and the formulas quickly
become unwieldy. We might or might not be able to carry this to completion, and even if we
can, the resulting formula will be complicated. We emphasize again that we do not yet have
a systematic way to solve these problems, and there is no need to worry at this stage about
complicated examples.

Strings Containing Either ab or bba


In this example we consider the language L of all strings in {a, }* that contain at least one of the
two substrings ab and bba (L corresponds to the regular expression (a+b)*(ab+bba)(a+b)*),
and try to construct a finite automaton accepting L.
We start with two observations. These two strings themselves should be accepted by our
FA; and if x is any string that is accepted, then any other string obtained by adding symbols
to the end of x should also be accepted. Figure 3.8a shows a first attempt at an FA (obviously
uncompleted) incorporating these features. (We might have started with two separate accepting
states, but the transitions from both would have been the same, and one is enough.)
In order to continue, we need transitions labeled a and b from each of the states p, r, and
5, and we may need additional states. It will help to think at this point about what each of the

Figure 3.8 |
Strings containing either ab or bba.
CHAPTER 3 Regular Languages and Finite Automata 103

nonaccepting states is supposed to represent. First, being in a nonaccepting state means that
we have not yet received one of the two desired strings. Being in the initial state qo should
presumably mean that we have made no progress at all toward getting one of these two strings.
It seems as though any input symbol at all represents some progress, however: An is at least
potentially the first symbol in the substring ab, and b might be the first symbol in bba. This
suggests that once we have at least one input symbol, we should never need to return to the
initial state.
It is not correct to say that p is the state the FA should be in if the last input symbol
received was an a, because there are arrows labeled a that go to state t. It should be possible,
however, to let p represent the state in which the last input symbol was a and we have not yet
seen either ab or bba. If we are already in this state and the next input symbol we receive is
a, then nothing has really changed; in other words, 5(p, a) should be p.
We can describe the states r and s similarly. If the last input symbol was b, and it was not
preceded by a b, and we have not yet arrived in the accepting state, we should be in state r.
We should be in state s if we have just received two consecutive b’s but have not yet reached
the accepting state. What should 5(r, a) be? The b that got us to state r is no longer doing us
any good, since it was not followed by b. In other words, it looked briefly as though we were
making progress toward getting the string bba, but now it appears that we are not. We do not
have to start over, however, because at least we have an a. We conclude that 5(r, a) = p. We
can also see that 6(s, b) = s: If we are in state s and get input b, we have not made any further
progress toward getting bba, but neither have we lost any ground.
At this point, we have managed to define the missing transitions in Figure 3.8a without
adding any more states, and thus we have an FA that accepts the language L. The transition
diagram is shown in Figure 3.8).

Another Example of an FA Corresponding to a Regular Expression | EXAMPLE 3.16 |


We consider another regular expression,

r = (11 +110)*0

and try to construct an FA accepting the corresponding language L. In the previous example,
our preliminary guess at the structure of the FA turned out to provide all the states we needed.
Here it will not be so straightforward. We will just proceed one symbol at a time, and for each
new transition try to determine whether it can go to a state that is already present or whether it
will require a new state.
The null string is not in L, which tells us that the initial state go should not be an accepting
state. The string 0, however, is in L, so that from qo the input symbol 0 must take us to an
accepting state. The string 1 is not in L; furthermore, | must be distinguished from A, because
the subsequent input string 110 should take us to an accepting state in one case and not in the
other (i.e., 110 € L and 1110 ¢ L.) At this point, we have determined that we need at least the
states in Figure 3.9a.
The language L contains 0 but no other strings beginning with 0. It also contains no
strings beginning with 10. It is appropriate, therefore, to introduce a state s that represents all
the strings that fail for either reason to be a prefix of an element of L (Figure 3.9b). Once our
FA reaches the state s, which is not an accepting state, it never leaves this state.
104 PART 2 Regular Languages and Finite Automata

Figure 3.9 |
A finite automaton accepting L3.

We must now consider the situation in which the FA is in state r and receives the input 1.
It should not stay in state r, because the strings 1 and 11 need to be distinguished (for example,
110 € L, but 1110 ¢ L). It should not return to the initial state, because A and 11 need to be
distinguished. Therefore, we need a new state t. From f, the input 0 must lead to an accepting
state, since 110 € L. This accepting state cannot be p, because 110 is a prefix of a longer string
in L and 0 is not. Let u be the new accepting state. If the FA receives a 0 in state u, then we
have the same situation as an initial 0: The string 1100 is in L but is nota prefix of any longer
string in L. So we may let 6(u, 0) = p. We have yet to define 5(t, 1) and d(u, 1). States t and
u can be thought of as “the end of one of the strings 11 and 110.” (The reason u is accepting
is that 110 can also be viewed as 11 followed by 0.) In either case, if the next symbol is 1, we
should think of it as the first symbol in another occurrence of one of these two strings. This
means that it is appropriate to define 6(t, 1) = 6(u, 1) = r, and we arrive at the FA shown in
Figure 3.9c.
The procedure we have followed here may seem hit-or-miss. We continued to add states
as long as it was necessary, stopping only when all transitions from every state had been drawn
CHAPTER 3 Regular Languages and Finite Automata 105

and went to states that were already present. Theorem 3.1 is the reason we can be sure that
the process will eventually stop. If we used the same approach for a language that was not
regular, we would never be able to stop: No matter how many states we created, defining the
transitions from those states would require yet more states. The step that is least obvious and
most laborious in our procedure is determining whether a given transition needs to go to a
new state, and if not, which existing state is the right one. The algorithm that we develop in
Chapter 4 uses a different approach that avoids this difficulty.

3.4 |DISTINGUISHING ONE STRING


FROM ANOTHER
Using a finite automaton to recognize a language L depends on the fact that there are
groups of strings so that strings within the same group do not need to be distinguished
from each other by the machine. In other words, it is not necessary for the machine
to remember exactly which string within the group it has read so far; remembering
which group the string belongs to is enough. The number of distinct states the FA
needs in order to recognize a language is related to the number of distinct strings that
must be distinguished from each other. The following definition specifies precisely
the circumstances under which an FA recognizing L must distinguish between two
strings x and y, and the lemma that follows spells out explicitly how such an FA
accomplishes this. (It says simply that the FA is in different states as a result of
processing the two strings).

Definition 3.5 hfiddsh carport Strings with Besrert ico &

de:
Ljxiiss defined
tL bea languageinJee any stingin=*. The setih
74 Ciowe
: Lfxateed*|azeL)
ae sings x aed . are said to be goiceme with ped to L if -
_ Lix # L/y. Any string z thatis in one of the two sets but not the other Ge.
for which xz € L and yz ¢ L, or vice versa) is i said to distinguish Xandy ©
with er to L. If Lale feL dy, x and y are indistinguishable with respect
toL _

In order to show that two strings x and y are distinguishable with respect to a
language L, it is sufficient to find one string z so that either xz € L and yz ¢ L,
or xz ¢ L and yz € L (in other words, so that z is in one of the two sets L/x and
L/y but not the other). For example, if L is the language in Example 3.9, the set of
all strings in {0, 1}* that end in 10, we observed that 00 and 01 are distinguishable
with respect to L, because we can choose z to be the string 0; that is, O00 ¢ L and
106 PART 2 Regular Languages and Finite Automata

010 € L. The two strings 0 and 00 are indistinguishable with respect to L, because
the two sets L/0 and L/00 are equal; each is just the set L itself. (The only way 0z
or 00z can end in 10 is for z to have this property.)

Lemma 3.1 Suppose L C &* and M = (Q, %, qo, A, 4) is an FA recognizing


L. If x and y are two strings in &* that are distinguishable with respect to L, then
d* (qo, x) #- 5* (Go, y).

Proof The assumption that x and y are distinguishable with respect to L means
that there is a string z in one of the two sets L/x and L/y but not the other. In other
words, one of the two strings xz and yz is in L and the other is not. Because we are
also assuming that M accepts L, it follows that one of the two states 6*(go, xz) and
5*(qo, yZ) is an accepting state and the other is not. In particular, therefore,

5* (qo, XZ) # 5* (Go, yz)

According to Exercise 3.22,

O (Go, 42). — 6 (6 (osx), 2)


5* (go, yz) = 8*(6* (Go, y), Z)

Because the left sides of these two equations are unequal, the right sides must be, and
therefore 6*(go, x) # 6*(qo, y).

_ respect to L, itfollow

We interpret Theorem 3.2 as putting a lower bound on the memory requirements


of any FA that is capable of recognizing L. To make this interpretation more concrete,
we might think of the states as being numbered from 1 through n and assume that
there is a register in the machine that stores the number of the current state. The
binary representation of the number n has approximately log,(n) binary digits, and
thus the register needs to be able to hold approximately log,(n) bits of information.
CHAPTER 3 Regular Languages and Finite Automata 107

The Language Ly, | EXAMPLE 3.17 |


Suppose n > 1, and let

Ly, = {x € {0, 1}* | |x| = n and the nth symbol from the right in x is 1}

There is a straightforward way to construct an FA recognizing L,,, by creating a distinct state


for every possible substring of length n or less, just as we did in Example 3.12. In this way
the FA will be able to remember the last n symbols of the current input string. For each 7, the
number of strings of length exactly i is 2'. If we add these numbers for the values of i from 0 to
n, we obtain 2”*! — 1 (Exercise 2.13); therefore, this is the total number of states. Figure 3.10
illustrates this FA in the case n = 3.
The eight states representing strings of length 3 are at the right. Not all the transitions
from these states are shown, but the rule is simply

5(abc, d) = bcd

The accepting states are the four for which the third symbol from the end is 1.

Figure 3.10 |
A finite automaton accepting L3.
108 PART 2 Regular Languages and Finite Automata

We might ask whether there is some simpler FA, perhaps using a completely different
approach, that would cut the number of states to a number more manageable than 2”+! — 1.
Although the number can be reduced just as in Example 3.12, we can see from Theorem 3.2
that any FA recognizing L,, must have at least 2” states. To do this, we show that any two
strings of length n (of which there are 2” in all) are distinguishable with respect to ire
Let x and y be two distinct strings of length n. They must differ in the ith symbol (from
the left), for some i with 1 < i <n. For the string z that we will use to distinguish these
two strings with respect to L,, we can choose any string of length i — 1. Then xz and yz still
differ in the ith symbol, and now the ith position is precisely the nth from the right. In other
words, one of the strings xz and yz is in L and the other is not, which implies that L/x # L/y.
Therefore, x and y are distinguishable with respect to L,.

Theorem 3.2 is potentially a way of showing that some languages cannot be


accepted by any FA. According to the theorem, if there is a large set of strings, any
two of which are distinguishable with respect to L, then any FA accepting L must
have a large number of states. What if there is a really large (i.e., infinite) set S of
strings with this property? Then no matter what n is, we can choose a subset of S
with n elements, which will be “pairwise distinguishable” (i.e., any two elements are
distinguishable). Therefore, no matter what n is, any FA accepting L must have at
least n states. If this is true, the only way an FA can recognize L at all is for it to have
an infinite set of states, which is exactly what a finite automaton is not allowed to
have. Such a language L cannot be recognized by any abstract machine, or algorithm,
of the type we have described, because no matter how much memory we provide, it
will not be enough.
The following theorem provides our first example of such a language. In the
proof, we show not only that there is an infinite “pairwise distinguishable” set of
strings, but an even stronger statement, that the set ©* itself has this property: Any
two strings are distinguishable with respect to the language. This means that in
attempting to recognize the language, following the conventions we have adopted,
we cannot afford to forget anything. No matter what input string x we have received
so far, we must remember that it is x and not some other string. The second conclusion
of the theorem is a result of Theorem 3.1.

‘Theorem 3.3 _
The language pal of botindGme. 2)
by any finite automaton, anditis
ee beaccepted
be04
Proof _
As we described oe we ail show that for any two distinct strings a x
and y in {0, 1}*, x and y are disting shable with respect to pal. To show —
this, we consider first the case when |x| = |yl, and we letz = x”. Then |
XZ = xx", which is in pal, and yz is not. If |x| A ly], we may as well
assume that |x| < |y|, and we let y= -yiY2, where |y;|= |x|. Again, |
we look for a string z so that xz € pal and yz ¢ pal. Anyz0 “the form
CHAPTER 3 Regular Languages and Finite Automata 109

In Chapter 5 we will consider other nonregular languages and find other methods
for demonstrating that a language is nonregular. Definition 3.5 will also come up again
in Chapter 5; the indistinguishability relation can be used in an elegant description of
a “minimum-state” FA recognizing a given regular language.

3.5 |UNIONS, INTERSECTIONS,


AND COMPLEMENTS
Suppose L, and L» are both regular languages over an alphabet ©. There are FAs
M, and M) accepting L, and L2, respectively (Theorem 3.1); on the other hand, the
languages L; U L2, L;L2, and Lj are also regular (Definition 3.1) and can therefore
be accepted by FAs. It makes sense to ask whether there are natural ways to obtain
machines for these three languages from the two machines M, and M).
The language L, U Ly is different in one important respect from the other two:
Whether a string x belongs to L; U Lz depends only on whether x € L; and whether
x € Ly. As aresult, not only is there a simple solution to the problem in this case, but
with only minor changes the same method also works for the two languages L; M L2
and L; — Lz. We will wait until Chapter 4 to consider the two remaining languages
Ty Lo and Lj.
If as we receive input symbols we execute two algorithms simultaneously, one
to determine whether the current string x is in Lj, the other to determine whether
x is in Ly, we will be able to say at each step whether x is in L; UL). If M; =
(Q1, 4, q1, Ai, 61) and Mz = (Qo, X, q2, Az, 62) are FAs recognizing L; and L2,
respectively, a finite automaton M should be able to recognize L; U Ly if it can
remember at each step both the information that M; remembers and the information
M> remembers. Abstractly, this amounts to “remembering” the ordered pair (p, @),
where p and q are the current states of M, and M). Accordingly, we can construct
M by taking our set of states to be the set of all possible ordered pairs, Q; x Q2. The
initial state will be the pair (q:, q2) of initial states. If M is in the state (p, q) (which
means that p and q are the current states of M, and M2) and receives input symbol
a, it should move to the state (6 (p, a), 52(qg, a)), since the two components of this
pair are the states to which the individual machines would move.
What we have done so far is independent of which of the three languages (L; UL,
L, OL», and L; — L2) we wish to accept. All that remains is to specify the set of
110 PART 2 Regular Languages and Finite Automata

accepting states so that the strings accepted are the ones we want. For the language
L, UL, for example, x should be accepted if it is in either L; or L2; this means that
the state (p,q) should be an accepting state if either p or g is an accepting state of
its respective FA. For the languages L; M Lz and L; — Lz, the accepting states of the
machine are defined similarly.

-
A==Sig. Ipre an
. 3. IFA = ((p,ee) an

: Proof —

ne
0)a=i049) [email protected])

are similar.

We may consider the special case in which L, is all of &*. L; — Lo is therefore


L’,, the complement of L. The construction in the theorem can be used, where M,
is the trivial FA with only the state q,, which is an accepting state. However, this
description of M is unnecessarily complicated. Except for the names of the states, it
is the same as

M; = (Q2, Day q2; Q> rt A, 52)

which we obtain from Mp by just reversing accepting and nonaccepting states.


CHAPTER 3 Regular Languages and Finite Automata 111

It often happens, as in the following example, that the FA we need for one of
these three languages is even simpler than the construction in Theorem 3.4 would
seem to indicate.

An FA Accepting L; — Lo [RYN yWsee


Suppose L, and L, are the subsets

L, = {x |00 is not a substring of x}


L» = {x |x ends with 01}

of {0, 1}*. The languages L, and L, are recognized by the FAs in Figure 3.11a.
The construction in the theorem, for any of the three cases, produces an FA with nine
states. In order to draw the transition diagram, we begin with the initial state (A, P). Since
6,;(A, 0) = B and 6,(P, 0) = Q, we have 5((A, P), 0) = (B, Q). Similarly, 5((A, P), 1) =
(A, P). Next we calculate 6((B, Q), 0) and 6((B, Q), 1). As we continue this process, as

Figure 3.11 |
Constructing a finite automaton to accept L, —L.
PART 2 Regular Languages and Finite Automata

soon as a new state is introduced, we calculate the transitions from this state. After a few steps
we obtain the partial diagram in Figure 3.11b. We now have six states; each of them can be
reached from (A, P) as a result of some input string, and every transition from one of these
six goes to one of these six. We conclude that the other three states are not reachable from the
initial state, and therefore that they can be left out of our FA (Exercise 3.29).
Suppose now that we want our FA to recognize the language L; — L2. Then we designate
as our accepting states those states (X, Y) from among the six for which X is either A or B
and Y is not R. These are (A, P) and (B, Q), and the resulting FA is shown in Figure 3.11c.
In fact, we can simplify this FA even further. None of the states (C, P), (C, Q), or (C, R)
is an accepting state, and once the machine enters one of these states, it remains in one of
them. Therefore, we may replace all of them with a single state, obtaining the FA shown in
Figure 3.11d.

EXERCISES
3.1. In each case, find a string of minimum length in {0, 1}* not in the language
corresponding to the given regular expression.
ar 8 (OL)50°
ba = l*) (O81)O* 1%)
cr0* 007)"1*
dy" (O-F 10)"1*
3.2. Consider the two regular expressions

r=0" + 1* s = 01* + 10* + 170 + (0*1)*


Find a string corresponding to 7 but not to s.
Find a string corresponding to s but not to r.
Find a string corresponding to both r and s.
Fp
ao Find a string in {0, 1}* corresponding to neither r nor s.
3.3. et r and s be arbitrary regular expressions over the alphabet ©. In each
case, find a simpler regular expression corresponding to the same language
as the given one.
a. (r+s-+rs+sr)*
Dar 439)2)*
c. r(r*r+r*)+r*
d. (7+ A)*
e. (r+s)*rs(r+s)* +5s*r*
3.4. Prove the formula

Cit)" ey"
CHAPTER 3 Regular Languages and Finite Automata 113

S5. Prove the formula

(aa*bb*)* = A+a(a+b)*b

3.6. In the definition of regular languages, Definition 3.1, statement 2 can be


omitted without changing the set of regular languages. Why?
REE The set of regular languages over D is the smallest set that contains all the
languages @, {A}, and {a} (for every a € D) and is closed under the
operations of union, concatenation, and Kleene *. In each case below,
describe the smallest set of languages that contains all these “basic”
languages and is closed under the specified operations.
a. union
b. concatenation
6 Kleene t
d. union and concatenation
e. union and Kleene *
3.8. Find regular expressions corresponding to each of the languages defined
recursively below.
a AeEL;, ifx €L,then001x and x11 areelements of L; nothing is in
L unless it can be obtained from these two statements.
b. OEL; ifx EL, then 001x, x001, and x11 are elements of L;
nothing is in L unless it can be obtained from these two statements.
€é. Nell: O¢€ Liifx € L, then 001x and 11x arein L; nothing is in L
unless it can be obtained from these three statements.
3.9. Find a regular expression corresponding to each of the following subsets of
{O,1}*.
a. The language of all strings containing exactly two 0’s.
The language of all strings containing at least two 0’s.
The language of all strings that do not end with 01.
The language of all strings that begin or end with 00 or 11.
The language of all strings not containing the substring 00.
The language of all strings in which the number of 0’s is even.
Co The language of all strings containing no more than one occurrence of
See
os
82
the string 00. (The string 000 should be viewed as containing two
occurrences of 00.)
h. The language of all strings in which every 0 is followed immediately by
ges
i. The language of all strings containing both 11 and 010 as substrings.
3.10. Describe as simply as possible the language corresponding to each of the
following regular expressions.
ad (OF O81)407
114 PART 2 Regular Languages and Finite Automata

b. (0+ 1*)*(A +041)


c. (1+01)*(0+ 01)*
d. (0+ 1)*(0t1t+0t + 170*1*)(0+1)* (Give an answer of the form:
all strings containing both the substring and the substring
:)
3.11. Show that if L is a regular language, then the language L” is regular for
every n > 0.
3.12. Show that every finite language is regular.
3.13. The function rev : X* — &X* is defined in Example 2.24. For a language L,
let L’ denote the language {rev(x) |x € L} = {x” |x € L}.
a. If e is the regular expression (001 + 11010)*1010, and L, is the
corresponding language, give a regular expression corresponding to Lj.
b. Taking this example as a model, give a recursive definition of a function
rrev from the set of regular expressions over © to itself, so that for any
regular expression r over &, the language corresponding to rrev(r) is the
reverse of the language corresponding to r. Give a proof that your
function has this property.
c. Show that if L is a regular language, then L” is regular.
3.14. In the C programming language, all the following expressions represent
valid numerical “literals”:
3 ASG 6 ARS) 41.16 +45.80
+0 = (0)al -14.4 iene +1.4e6
=) O22 7) O1E-06 0.2E-20 - .4E-7 00e0

The letter e or E refers to an exponent, and if it appears, the number


following it is an integer. Based on these examples, write a regular
expression representing the language of numerical literals. You can use the
same shorthand as in Example 3.4: / for “letter,” d for “digit,” a for ‘+’, m
for “—’, and p for “point.” Assume that there are no limits on the number of
consecutive digits in any part of the expression.
3.15. The star height of a regular expression r over &, denoted by sh(r), is
defined as follows:
Gi) as2()-= 0:
(i)oush
CN )=0,
(iii) sh(a) =O foreveryae€ X.
(iv) sh((rs)) = sh((r +s)) = max(sh(r), sh(s)).
(v) sh((r*)) = sh(r) +1.
Find the star heights of the following regular expressions.
a. (a(a+a*aa) + aaa)*
b. (((a + a*aa)aa)* + aaaaaa*)*
3.16. For both the regular expressions in the previous exercise, find an equivalent
regular expression of star height 1.
CHAPTER 3 Regular Languages and Finite Automata

3.17. For each of the FAs pictured in Figure 3.12, describe, either in words or by
writing a regular expression, the strings that cause the FA to be in each state.
3.18. Let x be a string in {0, 1}* of length n. Describe an FA that accepts the string
x and no other strings. How many states are required?

Figure 3.12 |
PART 2 Regular Languages and Finite Automata

3.19. For each of the languages in Exercise 3.9, draw an FA recognizing the
language.
3.20. For each of the following regular expressions, draw an FA recognizing the
corresponding language.
(0+ 1)*0
Ci? 10).
O+1)* (1+ 00)0-+ D*
(111 + 100)*0
O-- 10* + 0170
ee (0 + 1)*(01 + 110)
pie
Noa
3.21. Draw an FA that recognizes the language of all strings of 0’s and 1’s of
length at least 1 that, if they were interpreted as binary representations of
integers, would represent integers evenly divisible by 3. Your FA should
accept the string 0 but no other strings with leading 0’s.
3.22. Suppose M is the finite automaton (Q, %, go, A, 6).
a. Using mathematical induction or structural induction, show that for any
x and yin >*, andany g € Q,
5*(q, xy) = 8°(8", x), y)
Two reasonable approaches are to base the induction on x and to base it
on y. One, however, works better than the other.
b. Show that if for some state g, 6(q, a) = q for every a € &, then
5*(q, x) =q for every x € X*.
c. Show that if for some state g and some string x, 5*(q, x) = q, then for
every n = 050°(g, x)= ¢:
3.25. If L is a language accepted by an FA M, then there is an FA accepting L with
more states than M (and therefore there is no limit to the number of states an
FA accepting L can have). Explain briefly why this is true.
3.24. Show by an example that for some regular language L, any FA recognizing
L must have more than one accepting state. Characterize those regular
languages for which this is true.
3.25. For the FA pictured in Figure 3.11d, show that there cannot be any other FA
with fewer states accepting the same language.
3.26. Let z be a fixed string of length n over the alphabet {0, 1}. What is the
smallest number of states an FA can have if it accepts the language
{O, 1}*{z}? Prove your answer.
3.27. Suppose L is a subset of {0, 1}*. Does an infinite set of distinguishable pairs
with respect to L imply an infinite set that is pairwise distinguishable with
respect to L? In particular, if xo, x1, ... is a sequence of distinct strings in
{0, 1}* so that for any n > 0, x, and x, are distinguishable with respect to
L, does it follow that L is not regular? Either give a proof that it does follow,
or provide an example of a regular language L and a sequence of strings x9,
X,,... with this property.
CHAPTER 3 Regular Languages and Finite Automata 117

3.28. Let L C {0, 1}* be an infinite language, and for each n > 0, let
Lyn = {x € L | |x| =n}. Denote by s(n) the number of states an FA must
have in order to accept L,. What is the smallest that s(n) can be if L, 4 0?
Give an example of an infinite language L so that for every n satisfying
L, 4 @, s(n) is this minimum number.
3.29. Suppose M = (Q, &, qo, A, 5) is an FA. If p and q are elements of Q, we
say q is reachable from p if there is a string x € &* so that 6*(p, x) = q.
Let M;, be the FA obtained from M by deleting any states that are not
reachable from qo (and any transitions to or from such states). Show that M,
and M recognize the same language. (Note: we might also try to simplify M
by eliminating the states from which no accepting state is reachable. The
result might be a pair (¢,a) € Q x & for which no transition is defined, so
that we do not have an FA. In Chapter 4 we will see that this simplification
can still be useful.)
3.30. Let M = (Q, &, qo, A, 5) be an FA. Let M; = (Q, 2, go, R, 5), where R is
the set of states in QO from which some element of A is reachable (see
Exercise 3.29). What is the relationship between the language recognized by
M, and the language recognized by M? Prove your answer.
3.38. Suppose M = (Q, &, qo, A, 5) is an FA, and suppose that there is some
string z so that for every g € Q, 5*(q, z) € A. What conclusion can you
draw, and why?
3.32. (For this problem, refer to the proof of Theorem 3.4.) Show that for any
x € D* and any (p,q) € QO, 8*((p, gq), x) = (87 (p, x), 83(q, x)).
3.33. Let M,, M2, and M; be the FAs pictured in Figure 3.13, recognizing
languages Lj, L2, and L3, respectively.

Boll
1.SCE REN: @

pe Saaerad (a)
M,

(b)

Figure 3.13 |
PART 2 Regular Languages and Finite Automata

Draw FAs recognizing the following languages.


2 (bj LOYbe
ly vba Midas
ce. L,—L»
dae En aes
CO Jon —ILy

MORE CHALLENGING PROBLEMS


3.34. Prove that for any two regular expressions r and s over &, (r*s*)* =
(r+s)*.
32338 Prove the formula
(00*1)*1 = 1+ 0(0 +-10)*11
3.36. Find a regular expression corresponding to each of the following subsets of
(Onl.
a. The language of all strings not containing the substring 000.
b. The language of all strings that do not contain the substring 110.
c. The language of all strings containing both 101 and 010 as substrings.
d. The language of all strings in which both the number of 0’s and the
number of 1’s are even.
e. The language of all strings in which both the number of 0’s and the
number of 1’s are odd.
3.37. Suppose r is a regular expression over &. Show that if for each symbol in r
that is an element of © another regular expression over © is substituted, the
result is a regular expression over D.
3.38. Let c and d be regular expressions over D.
a. Show that the formula r = c + rd, involving the variable r, is true if the
regular expression cd* is substituted for r.
b. Show that if A is not in the language corresponding to d, then any
regular expression r satisfying r = c + rd corresponds to the same
language as cd*.
Aree Describe precisely an algorithm that could be used to eliminate the symbol @
from any regular expression that does not correspond to the empty language.
3.40. Describe an algorithm that could be used to eliminate the symbol A from
any regular expression whose corresponding language does not contain the
null string.
3.41. The order of a regular language L is the smallest integer k for which
Lk = L*+') if there is one, and oo otherwise.
a. Show that the order of L is finite if and only if there is an integer k so
that L‘ = L*, and that in this case the order of L is the smallest k so that
a ba
CHAPTER 3 Regular Languages and Finite Automata 119

b. What is the order of the regular language {A} U {aa}{aaa}*?


c. What is the order of the regular language {a} U {aa}{aaa}*?
What is the order of the language corresponding to the regular
expression (A + b*a)(b + ab*ab*a)*?
3.42. A generalized regular expression is defined the same way as an ordinary
regular expression, except that two additional operations, intersection and
complement, are allowed. So, for example, the generalized regular
expression abb@’ (@’aaa@’)’ represents the set of all strings in {a, b}* that
start with abb and don’t contain the substring aaa.
a. Show that the subset {aba}* of {a, b}* can be described by a generalized
regular expression with no occurrences of *.
b. Can the subset {aa}* be described this way? Give reasons for your
answer.
3.43. For each of the following regular expressions, draw an FA recognizing the
corresponding language.
a. (1+ 110)*0
b. (1+10+ 110)*0
ce. IOl + 10)* + 01 + 10)*
d. 1(1410)*+ 10(0
+ 01)*
e. (O10 4-00)" (10)7
3.44. Let M = (Q, =, qo, A, 5) be an FA. Below are other conceivable methods
of defining the extended transition function 6* (see Definition 3.3). In each
_ case, determine whether it is in fact a valid definition of a function on the set
Q x &*, and why. /f it is, show using mathematical induction that it defines
the same function that Definition 3.3 does.
a. For anygq € Q,6*(q, A) = q; for any y € &*,a € X, andg € Q,
5*(q, ya) = 8*(5*(q, y), @).
b. Foranyg € Q,5*(q, A) = q; for any y € &*,a € U, andg € OQ,
5*(q, ay) = 8"(5(q, 4), y).
c. Foranyg € Q,6*(q, A) =4q; for any q € Q andanya € &,
5*(q, a) = 5(q, a); for any g € Q, and any x and y in X*,
d*(q, xy) = 8*(8"(, x), Y).
3.45. Let L be the set of even-length strings over {0, 1} whose first and second
halves are identical; in other words, L = {ww | w € {0, 1}*}. Use the
technique of Theorem 3.3 to show that L is not regular: Show that for any
two strings x and y in {0, 1}*, x and y are distinguishable with respect to L.
3.46. Let L be the language {0"1" |n = O}.
a. Find two distinct strings x and y that are indistinguishable with respect
to. L.
b. Show that L is not regular, by showing that there is an infinite set of
strings, any two of which are distinguishable with respect to L.
120 PART 2 Regular Languages and Finite Automata

3.47. As in Example 3.17, let

L, = {x € {0, 1}* | |x| > and the nth symbol from the right in x is 1}
It is shown in that example that any FA recognizing L,, must have at least 2”
states. Draw an FA with four states that recognizes Lz. For any n > 1,
describe how to construct an FA with 2” states that recognizes L,.
3.48. Let n be a positive integer and L = {x € {0, 1}* | |x| =n and no(x) =
n,(x)}. What is the minimum number of states in any FA that recognizes L?
Give reasons for your answer.
3.49. Let n be a positive integer and L = {x ¢€ {0, 1}* | |x| =n and no(x) <
n(x)}. What is the minimum number of states in any FA that recognizes L?
Give reasons for your answer.
3.50. Let n be a positive integer, and let L be the set of all strings in pal of length
2n. In other words,

Dr=ixx | xe 10, 15}


What is the minimum number of states in any FA that recognizes L? Give
reasons for your answer.
3.51. Languages such as that in Example 3.9 are regular for what seems like a
particularly simple reason: In order to test a string for membership, we need
to examine only the last few symbols. More precisely, there is an integer n
and a set S of strings of length n so that for any string x of length n or
greater, x is in the language if and only if x = yz for some z € S. (In
Example 3.9, we may take n to be 2 and S to be the set {11}.) Show that any
language L having this property is regular.
3.52. Show that every finite language has the property in the previous problem.
3.53. Give an example of an infinite regular language (a subset of {0, 1}*) that
does not have the property in Exercise 3.51, and prove that it does not.
3.54. (This exercise is due to Hermann Stamm-Wilbrandt.) Consider the language
L = {x € {0, 1, 2}* |x does not contain the substring 01}

Show that for every i > 0, the number of strings of length i in L is


f (2i + 2), where f is the Fibonacci function (see Example 2.13). Hint:
draw an FA accepting L, with initial state go, and consider for each state g in
your FA and each integer 7, the set S(q, i) of all strings x of length i
satisfying 6*(qo, x) = q.
S58 Consider the two FAs in Figure 3.14. If you examine them closely you can
see that they are really identical, except that the states have different names:
State p corresponds to state A, g corresponds to B, and r corresponds to C.
Let us describe this correspondence by the relabeling function i; that is,
i(p) = A,i(q) = B,i(r) = C. What does it mean to say that under this
correspondence, the two FAs are “really identical?” It means several things:
First, the initial states correspond to each other; second, a state is an
accepting state if and only if the corresponding state is; and finally, the
CHAPTER 3 Regular Languages and Finite Automata 121

: ses
Ns

Figure 3.14 |

transitions among the states of the first FA are the same as those among the
corresponding states of the other. For example, if 5; and 5) are the transition
functions,

61(p,0) =p and 42(i(p), 0) = i(p)


di(p,1)=q and &2(i(p), 1) =i(q)
These formulas can be rewritten

d2(i(p), 0) = i (5; (p, 0)) and 52(i(p), 0) = i (5; (p, 0))


and these and all the other relevant formulas can be summarized by the
general formula :

52(i(s), a) = i(6;(s, a)) for every state s and alphabet symbol a

In general, if M; = (Q), 2, q1, Ai, 61) and M2 = (Qo, X, gz, Az, 52) are
FAs andi : Q; — Q> is a bijection (i.e., one-to-one and onto), we say that i
is an isomorphism from M, to M, if these conditions are satisfied:
Gi) i(q1) =q
(ii) for any g € Q;,i(q) € Az if and only ifgq € Ay
(iii) forevery g € Q; andeverya € &,i(6;(g, a)) = d2(i(q), a)
and we say M, is isomorphic to Mz if there is an isomorphism from M, to
M). This is simply a precise way of saying that M; and M) are “essentially
the same.”
a. Show that the relation ~ on the set of FAs over &, defined by M; ~ M2
if M, is isomorphic to Mg), is an equivalence relation.
b. Show that if i is an isomorphism from M, to M> (notation as above),
then for every g € Q; and x € &*,

i(8;(q,
x)) = 8344),x)
c. Show that two isomorphic FAs accept the same language.
How many one-state FAs over the alphabet {0, 1} are there, no two of
which are isomorphic?
122 PART 2 Regular Languages and Finite Automata

e. How many pairwise nonisomorphic two-state FAs over {0, 1} are there,
in which both states are reachable from the initial state and at least one
state is accepting?
f. How many distinct languages are accepted by the FAs in the previous
part?
g. Show that the FAs described by these two transition tables are
isomorphic. The states are 1-6 in the first, A-F in the second; the initial
states are 1 and A, respectively; the accepting states in the first FA are 5
and 6, and D and E in the second.

h. Specify a reasonable algorithm for determining whether or not two given


FAs are isomorphic.
C HAPTER

Nondeterminism and Kleene’s


Theorem

4.1| NONDETERMINISTIC FINITE AUTOMATA


One of our goals in this chapter is to prove that a language is regular if and only
if it can be accepted by a finite automaton (Theorem 3.1). However, examples like
Example 3.16 suggest that finding a finite automaton (FA) corresponding to a given
regular expression can be tedious and unintuitive if we rely only on the techniques we
have developed so far, which involve deciding at each step how much information
about the input string it is necessary to remember. An alternative approach is to
consider a formal device called a nondeterministic finite automaton (NFA), similar
to an FA but with the rules relaxed somewhat. Constructing one of these devices
to correspond to a given regular expression is often much simpler. Furthermore, it
will turn out that NFAs accept exactly the same languages as FAs, and that there is a
straightforward procedure for converting an NFA to an equivalent FA. As a result, not
only will it be easier in many examples to construct an FA by starting with an NFA,
but introducing these more general devices will also help when we get to our proof.

A Simpler Approach to Accepting {11, 110}*{0}


Figure 4.1a shows the FA constructed in Example 3.16, which recognizes the language L cor-
responding to the regular expression (11 + 110)*0. Now look at the diagram in Figure 4.1. If
we concentrate on the resemblance between this device and an FA, and ignore for the moment
the ways in which it fails to satisfy the normal FA rules, we can see that it reflects much more
clearly the structure of the regular expression.
For any string x in L, a path through the diagram corresponding to x can be described as
follows. It starts at go, and for each occurrence of one of the strings 11 or 110 in the portion of
x corresponding to (11 + 110)*, it takes the appropriate loop that returns to go; when the final
0 is encountered, the path moves to the accepting state g4. In the other direction, we can also

123
124 PART 2 Regular Languages and Finite Automata

Figure 4.1 |
A simpler approach to accepting {11, 110}*{0}.

see that any path starting at go and ending at g4 corresponds to a string matching the regular
expression.
Let us now consider the ways in which the diagram in Figure 4.1 differs from that of an
FA, and the way in which it should be interpreted if we are to view it as a language-accepting
device. There are two apparently different problems. From some states there are not transitions
for both input symbols (from state q4 there are no transitions at all); and from one state, go,
there is more than one arrow corresponding to the same input.
The way to interpret the first of these features is easy. The absence of an a-transition
from state g means that from q there is no input string beginning with a that can result in the
device’s being in an accepting state. We could create a transition by introducing a “dead” state
to which the device could go from q on input a, having the property that once the device gets to
that state it can never leave it. However, leaving this transition out makes the picture simpler,
and because it would never be executed during any sequence of moves leading to an accepting
state, leaving it out does not hurt anything except that it violates the rules.
The second violation of the rules for FAs seems to be more serious. The diagram indicates
two transitions from qo on input 1. It does not specify an unambiguous action for that state-
input combination, and therefore apparently no longer represents a recognition aleon ora
language-recognizing machine.
As we will see shortly, there is an FA that operates in such a way that it simulates the
diagram correctly and accepts the right language. However, even as it stands, we can salvage
much of the intuitive idea of a machine from the diagram, if we are willing to give the machine
an element of nondeterminism. An ordinary finite automaton is deterministic: The moves it
makes while processing an input string are completely determined by the input symbols and the
state it starts in. To the extent that a device is nondeterministic, its behavior is unpredictable.
There may be situations (i.e., state-input combinations) in which it has a choice of possible
moves, and in these cases it selects a move in some unspecified way. One way to describe this
is to say that it guesses a move.
CHAPTER 4 Nondeterminism and Kleene’s Theorem 125

For an ordinary (deterministic) FA M, saying what it means for a string to be accepted


by M is easy: x is accepted if the sequence of moves determined by the input symbols of
x causes M to end up in an accepting state. If M is allowed to guess, we must think more
carefully about what acceptance means. (Should we say that x is accepted if some choice of
moves corresponding to the string x leads to an accepting state, or should we require that every
choice does?) Returning to our example, it is helpful to think about the regular expression
(11 + 110)*0 corresponding to the language we are trying to accept, and the problem of trying
to decide directly whether a string x (for example, 11110110) matches this regular expression.
As we read the input symbols of x, there are several ways we might try to match them to the
regular expression. Some approaches will not work, such as matching the prefix 1111 with
(11 + 110)*. If there is at least one way that does work, we conclude that x corresponds to the
regular expression. (If we make a wrong guess during this matching process, we might fail to
accept a string that is actually in the language; however, wrong guesses will never cause us to
accept a string that is not in the language.)
This analogy suggests the appropriate way of making sense of Figure 4.1b. Instead of
asking whether the path corresponding to a string leads to an accepting state (as with an ordinary
FA), we ask whether some path corresponding to the string does. In following the diagram,
when we are in state go and we receive input symbol | we guess which arrow labeled 1 is the
appropriate one to follow. Guessing wrong might result in a “no” answer for a string that is
actually in the language. This does not invalidate the approach, because for a string in the
language there will be at least one sequence of guesses that leads to acceptance, and for a string
not in the language no sequence of guesses will cause the string to be accepted.
v

We are almost ready for a formal definition of the abstract device illustrated by
Figure 4.1b. The only change we will need to make to the definition of an FA involves
the transition function. As we saw in Example 4.1, for a particular combination of
state and input symbol, there may be no states specified to which the device should go,
or there may be several. All we have to do in order to accommodate both these cases
is to let the value of the transition function be a set of states—possibly the empty set,
possibly one with several elements. In other words, our transition function 6 will still
be defined on Q x ¥ but will now have values in 22 (the set of subsets of Q). Our inter-
pretation will be that 6(q, a) represents the set of states that the device can legally be in,
as a result of being in state g at the previous step and then processing input symbol a.
126 PART 2 Regular Languages and Finite Automata

Just as in the case of FAs, it is useful to extend the transition function 6 from
Q x & to the larger set Q x D*. For an NFA M, 5* should be defined so that 5*(p, x)
is the set of states M can legally be in as a result of starting in state p and processing
the symbols in the string x. The recursive formula for FAs was

8"(p, xa) = 5(3"(p,


x), a)
where p is any state, x is any string in &*, and a is any single alphabet symbol. We
will give a recursive definition here as well, but just as before, let us start gradually,
with what may still seem like a more straightforward approach. As before, we want
to say that the only state M can get to from p as a result of processing A is p; the
only difference now is that we must write 6*(p, A) = {p}, rather than 6*(p, A) = p.
If x = aja2 --- a, then saying M can be in state g after processing x means that
there are states po, Pi, P2,-.-, Pn SO that po = p, py = g, and

M can get from po (or p) to p; by processing a;


M can get from p; to p2 by processing az;

M can get from p,—2 to Pp—1 by processing ay_1;


M can get from py; to p, (or g) by processing ay.

A simpler way to say “M can get from p;_; to p; by processing a;” (or “p; is one of
the states to which M can get from p;_, by processing a;”) is

Die NPATGQ)
We may therefore define the function 5* as follows.

Definition 4.2a Nonrecursive De ion of 5* for an NFA

| 5*(p,
S for which there is a sequence of states p = po, pj
q satisfying o

oe 8(pi-1, a;) for each i with 1 <i<n

Although the sequence of statements above was intended specifically to say that
Pn € 8*(po, 4142°++dy), there is nothing to stop us from looking at intermediate
points along the way and observing that for each i > 1,

Pi € 8" (Po, 4142 - + a;)


In order to obtain a recursive description of 5*, it is helpful in particular to consider
i =n — | and to let y denote the string a,az ---a,_,;. Then we have

Pn-1 © o*(p, y)

sPn & 8(Pn-1, Qn)


CHAPTER 4 Nondeterminism and Kleene’s Theorem 127

In other words, if g € 5*(p, ya,), then there is a state r (namely, r = p,_) in the
set d*(p, y) so that g € S(r, a,). It is also clear that this argument can be reversed:
If q € d(r, an) for some r € 5*(p, y), then we may conclude from the definition that
q € 5*(p, ya,). The conclusion is the recursive formula we need:

5°(P, Yan) = {q |q € 8(r, an) for some r € 6*(p, y)}


or more concisely,

8*(p, yan)= |) 8(r, an)


red*(p,y)

Definition 4.2b Recursive Definition of 5* for an NFA

We want the statement that a string x is accepted by an NFA M to mean that


there is a sequence of moves M can make, starting in its initial state and processing
the symbols of x, that will lead to an accepting state. In other words, M accepts x if
the set of states in which M can end up as a result of processing x contains at least
one accepting state.

peplectest de3 mecenreee by an NFA

Using the Recursive Definition of 6* in an NFA | EXAMPLE 4.2


4.2 |
Let M = (Q, X, qo, A, 5), where Q = {qo, 41, 92, 93}, & = {0, 1}, A = {q3}, and4 is given
by the following table.

{qo} {Go, 91}


71 {q2} {qo}
qQ2 {93} {q3}
93 O i)
128 PART 2 Regular Languages and Finite Automata

0,1

(@)— ("1 +@)


@ 0, 1 0, 1

Figure 4.2 |

Then M can be represented by the transition diagram in Figure 4.2. Let us try to determine
L(M) by calculating 6* (qo, x) for a few strings x of increasing length. First observe that from
the nonrecursive definition of 5* it is almost obvious that 6 and 5* agree for strings of length 1
(see also Exercise 4.3). We see from the table that 6*(qo, 0) = {qo} and 6*(qo, 1) = {qo, 41};

6° (go; 1k) = (a) d(r,1) (by definition of 5*)


re€d* (qo, 1)

SG cley,
re{qo.q}

= 6(go, I) Ud@Q, I)
= {40.11} U {42}
{90,915 92}
sg, 0)= LJ) sen
réd*(qo,0)

LU se. 0
re{qo}

= 6(qo,1)
= {40,41}
5 Gotti ele 8G1)
r€6*(qo,11)

= 6(go, 1) U8(qi, 1) U8(q, 1)


= {90, 91, 92, 93}
*(go,01)= [J 8,1)
red*(qo,01)

= 6(qgo, 1) US(qi, 1)
a {qo, 71; gq}

We observe that 111 is accepted by M and 011 is not. You can see if you study the diagram in
Figure 4.2 that 6*(qo, x) contains q, if and only if x ends with 1; that for any y with |y| = 2,
5* (qo, xy) contains q3 if and only if x ends with 1; and, therefore, that the language recognized
by M is

~ {0, 1}*{1}{0, 1)?


CHAPTER 4 Nondeterminism and Kleene’s Theorem 129

This is the language we called L3 in Example 3.17, the set of strings with length at least 3
having a | in the third position from the end. By taking the diagram in Figure 4.2 as a model,
you can easily construct for any n > 1 an NFA with n + 1 states that recognizes L,. Since we
showed in Example 3.17 that any ordinary FA accepting L,, needs at least 2” states, it is now
clear that an NFA recognizing a language may have considerably fewer states than the simplest
FA recognizing the language.
The formulas above may help to convince you that the recursive definition of 5* is a
workable tool for investigating the NFA, and in particular for testing a string for membership
in the language L(M). Another approach that provides an effective way of visualizing the
behavior of the NFA as it processes a string is to draw a computation tree for that string. This
is just a tree diagram tracing the choices the machine has at each step, and the paths through
the states corresponding to the possible sequences of moves. Figure 4.3 shows the tree for the
NFA above and the input string 101101.
Starting in go, M can go to either qo or q; using the first symbol of the string. In either
case, there is only one choice at the next step, with the symbol 0. This tree shows several types
of paths: those that end prematurely (with fewer than six moves), because M arrives in a state
from which there is no move corresponding to the next input symbol; those that contain six

Figure 4.3 |
A computation tree for the NFA in Figure 4.2, as it processes
101101.
130 PART 2 Regular Languages and Finite Automata

moves and end in a nonaccepting state; and one path of length 6 that ends in the accepting state
and therefore shows the string to be accepted.
It is easy to read off from the diagram the sets 5*(qo, y) for prefixes y of x. For example,
5*(qo, 101) = {qo, 41, q3}, because these three states are the ones that appear at that level of
the tree. Deciding whether x is accepted is simply a matter of checking whether any accepting
states appear in the tree at level |x| (assuming that the root of the tree, the initial state, is at
level 0).

We now want to show that although it may be easier to construct an NFA accepting
a given language than to construct an FA, nondeterministic finite automata as a group
are no more powerful than FAs: Any language that can be accepted by an NFA can
also be recognized by a (possibly more complicated) FA.
We have discussed the fact that a (deterministic) finite automaton can be in-
terpreted as an algorithm for recognizing a language. Although an NFA might not
represent an algorithm directly, there are certainly algorithms that can determine for
any string x whether or not there is a sequence of moves that corresponds to the
symbols of x and leads to an accepting state. Looking at the tree diagram in Fig-
ure 4.3, for example, suggests some sort of tree traversal algorithm, of which there
are several standard ones. A depth-first traversal corresponds to checking the possible
paths sequentially, following each path until it stops, and then seeing if it stops in an
accepting state after the correct number of steps. A breadth-first, or level-by-level,
traversal of the tree corresponds in some sense to testing the paths in parallel.
The question we are interested in, however, is not whether there is an algorithm for
recognizing the language, but whether there is one that corresponds to a (deterministic)
finite automaton. We will show that there is by looking carefully at the definition of
an NFA, which contains a potential mechanism for eliminating the nondeterminism
directly.
The definition of a finite automaton, either deterministic or nondeterministic,
involves the idea of a state. The nondeterminism present in an NFA appears whenever
there are state-input combinations for which there is not exactly one resulting state.
In a sense, however, the nondeterminism in an NFA is only apparent, and arises from
the notion of state that we start with. We will be able to transform the NFA into
an FA by redefining state so that for each combination of state and input symbol,
exactly one state results. The way to do this is already suggested by the definition
of the transition function of an NFA. Corresponding to a particular state-input pair,
we needed the function to have a single value, and the way we accomplished this
was to make the value a set. All we have to do now is carry this idea a little further
and consider a state in our FA to be a subset of Q, rather than a single element of Q.
(There is a partial precedent for this in the proof of Theorem 3.4, where we considered
states that were pairs of elements of Q.) Then corresponding to the “state” S and the
input symbol a (i.e., to the set of all the pairs (p, a) for which p € S), there is exactly
one “state” that results: the union of all the sets 8(p, a) for p € S. All of a sudden,
the nondeterminism has disappeared! Furthermore, the resulting machine simulates
the original device in an obvious way, provided that we define the initial and final
states correctly.
CHAPTER 4 Nondeterminism and Kleene’s Theorem 131

This technique is important enough to have acquired a name, the subset con-
struction: States in the FA are subsets of the state set of the NFA.
132 PART 2 Regular Languages and Finite Automata

It is important to realize that the proof of the theorem provides an algorithm (the
subset construction) for removing the nondeterminism from an NFA. Let us illustrate
the algorithm by returning to the NFA in Example 4.2.

| EXAMPLE4.3 | Applying the Subset Construction to Convert an NFA to an FA


The table in Example 4.2 shows the transition function of the NFA shown in Figure 4.2. The
subset construction could produce an FA with as many as 16 states, since Q has 16 subsets. (In
general, if M has n states, M, may have as many as 2”, the number of subsets of a set with n
elements.) However, we can get by with fewer if we follow the same approach as in Example
3.18 and use only those states that are reachable from the initial state. The transition function
in our FA will be 6;, and we proceed to calculate some of its values. Each time a new state
(i.e., a new subset) S appears in our calculation, we must subsequently include the calculation
of 6,(S, 0) and 6;(S, 1):

51 ({Go},0)= {Go}
5: ({go}, 1) = {40, a1}
51 ({40, 41}, 0) = 5({qo}, 0) U 8({qi}, 0) = {Go} U {92} = {40, 92}
81 ({40, qi}, 1) = 8({go}, 1) U 8({ai}, 1) = (40, 91} U {42} = {40, 41, 92}
51 ({Go, 92}, 9) = 8({go}, 9) U S({qa}, 0) = {Go} U {93} = {40, 93}
81({G0, g2}, 1) = (40, i} U {43} = {90, 41, 93}
It turns out that in the course of the calculation, eight distinct states (i.e., sets) arise. We
knew already from the discussion in Example 3.17 that at least this many would be necessary.

¢
1

90919293

Figure 4.4 |
The subset construction applied to the NFA in Figure 4.2.
CHAPTER 4 Nondeterminism and Kleene’s Theorem 133

Therefore, although we should not expect this to happen in general, the calculation in this case
produces the FA with the fewest possible states recognizing the desired language. It is shown
in Figure 4.4.

Another Example Illustrating the Subset Construction | EXAMPLE4.4


4.4 |
We close this section by returning to the first NFA we looked at, the one shown in Figure 4.1b.
In fact, the FA produced by our algorithm is the one in Figure 4.1a; let us see how we obtain it.
If we carry out the calculations analogous to those in the previous example, the distinct
States of the FA that are necessary, in the order they appear, turn out to be {go}, {qa}, {41,92};
, {Go, 93}, and {qo, g4}. The resulting transition table follows:

{qo} niga
{qa} M) D
{41,92} ) {q0, 93}
4) i) i)
{40 93} {91,92}
{qo, qa} {q1, qa}

You can recognize this as the FA shown in Figure 4.1a by substituting p for {qa}, r for {q1, q2},
5 for O, t for {go, 93}, and u for {go, qa}. (Strictly speaking, you shquld also substitute go for
{o}-)

4.2 NONDETERMINISTIC FINITE AUTOMATA


WITH A-TRANSITIONS
One further modification of finite automata will be helpful in our upcoming proof
that regular languages are the same as those accepted by FAs. We introduce the new
devices by an example that suggests some of the ways they will be used in the proof.

How a Device More General Than an NFA Can Be Useful | EXAMPLE4.5 |


Figures 4.5a and 4.5b show simple NFAs M, and M) accepting the two languages L; = {O}{1}*
and Ly = {0}*{1}, respectively, over the alphabet {0, 1}.
We consider two ways of combining these languages, using the operations of concatenation
and union (two of the three operations involved in the definition of regular languages), and in
each case we will try to incorporate the existing NFAs in a composite device M accepting the
language we are interested in. Ideally, the structure of M; and M) will be preserved more or
less intact in the resulting machine, and the way we combine the two NFAs will depend only
on the operation being used to combine the two languages.
First we try the language L,L» corresponding to the regular expression 01*0*1. In this
case, M should in some sense start out the way M, does and finish up the way M2 does. The
134 PART 2 Regular Languages and Finite Automata

Figure 4.5 |
(a) An NFA accepting {0}{1}*; (6) An NFA accepting
{O}*{1}.

question is just how to connect the two. The string 0 takes M, from qo to qj; since 0 is not
itself an element of L;L2, we do not expect the state q, to be an accepting state in M. We
might consider the possible ways to interpret an input symbol once we have reached the state
qi. At that point, a 0 (if it is part of a string that will be accepted) can only be part of the 0*
term from the second language. This might suggest connecting gq; to po by an arrow labeled 0;
this cannot be the only connection, because a string in L does not require a 0 at this point. The
symbol | in state g, could be either part of the 1* term from the first language, which suggests
a connecting arrow from q; to po labeled 1, or the 1 corresponding to the second language,
which suggests an arrow from q, to p; labeled 1. These three connecting arrows turn out to
be enough, and the resulting NFA is shown in Figure 4.6a.
We introduced a procedure in Chapter 3 to take care of the union of two languages provided
we have an FA for each one, but it is not hard to see that the method is not always satisfactory
if we have NFAs instead (Exercise 4.12). One possible NFA to handle L; U L is shown in
Figure 4.6b. This time it may not be obvious whether making qo the initial state, rather than
Po, is a natural choice or simply an arbitrary one. The label 1 on the connecting arrow from qo
to p, is the 1 in the regular expression 0*1, and the 0 on the arrow from qo to po is part of the
0* in the same regular expression. The NFA accepts the language corresponding to the regular
expression 01* + 1 + 00*1, which can be simplified to 01* + 0*1.
For both L,Lz and L; U Ly», we have found relatively simple composite NFAs, and it is
possible in both cases to see how the two original diagrams are incorporated into the result.
However, in both cases the structure of the original diagrams has been obscured somewhat in
the process of combining them, because the extra arrows used to connect the two depend on
the regular expressions being combined, not just on the combining operation. We do not yet
seem to have a general method that will work for two arbitrary NFAs.
One way to solve this problem, it turns out, is to combine the two NFAs so as to create
a nondeterministic device with even a little more freedom in guessing. In the case of con-
catenation, for example, we will allow the new device to guess while in state g, that it will
receive no more inputs that are to be matched with 1*. In making this guess it commits itself
to proceeding to the second part of the regular expression 01*0*1; it will be able to make this
guess, however, without any reference to the actual structure of the second part—in particular,
without receiving any input (or, what amounts to the same thing, receiving only the null string
A as input). The resulting diagram is shown in Figure 4.7a.
In the second case, we provide the NFA with an initial state that has nothing to do with
either M; or M;, and we allow it to guess before it has received any input whether it will be
CHAPTER 4 Nondeterminism and Kleene’s Theorem 135

Figure 4.6 |
(a) An NFA accepting {0}{1}*{0}*{1}; (6) An NFA accepting {O}{1}* U {O}*{1}.

Figure 4.7 |
(a) An NFA-A accepting {0}{1}*{0}*{1}; (6) An NFA-A accepting {0}{1}* U {0}* {1}.

looking for an input string in L; or one in Ly. Making this guess requires only A as input and
allows the machine to begin operating exactly like M, or Mp, whichever is appropriate. The
result is shown in Figure 4.7).
To summarize, the devices in Figure 4.7 are more general than NFAs in that they are
allowed to make transitions, not only on input symbols from the alphabet, but also on null
inputs.

Definition 4.4 A Nondeterministic Finite Automaton with


A-Transitions

wit -A-transitions (abbreviated NFA-


where Q and ©
are finite sets, qo € Q,
136 PART 2 Regular Languages and Finite Automata

=
@HOAHO+OFO>O
Figure 4.8|

As before, we need to define an extended function 5* in order to give a precise


definition of acceptance of a string by an NFA-A. The idea is still that 6*(q, x) will be
the set of all states in which the NFA-A can legally end up as a result of starting in state
q and processing the symbols in x. However, there is now a further complication, since
“processing the symbols in x” allows for the possibility of A-transitions interspersed
among ordinary transitions. Figure 4.8 illustrates this. We want to say that the string
01 is accepted, since 0A A1A = 01 and the path corresponding to these five inputs
leads from go to an accepting state.
Again we start with a nonrecursive definition, which is a straightforward if not
especially elegant adaptation of Definition 4.2a.

: nype€ Q, S*(p, x) is the set


ftransitions eeropontiog §toxb

For example, in Figure 4.8 there is a sequence of transitions corresponding to


01 by which the device moves from go to f; there is a sequence of transitions corre-
sponding to the string A by which it moves from r to t; and there is also a sequence
of transitions corresponding to A by which it moves from any state to itself (namely,
the empty sequence).
Coming up with a recursive definition of 5* is not as easy this time. The recursive
part of the definition will still involve defining 5* for a string with one extra alphabet
symbol a. However, if we denote by S the set of states that the device may be in before
the a is processed, we obtain the new set by allowing all possible transitions from
elements of S on the input a, as well as all subsequent A-transitions. This suggests
that we also want to modify the basis part of the definition, so that the set 6*(q, A)
contains not only q but any other states that the NFA-A can reach from q by using
A-transitions. Both these modifications can be described in terms of the A-closure
of a set S of states. This set, which we define recursively in Definition 4.6, is to be
the set of all states that can be reached from elements of S by using A-transitions.
CHAPTER 4 Nondeterminism and Kleene’s Theorem 137

We know in advance that the set A(S) is finite. As a result, we can translate the
recursive definition into an algorithm for calculating A(S) (Exercise 2.70).

Algorithm to Calculate A(S) Start with T = S. Make a sequence of passes, in


each pass considering every g € T and adding to T all elements of 5(q, A) that are
not already elements of 7. Stop after any pass in which T does not change. The set
A(S) is the final value of 7. @

The A-closure of a set is the extra ingredient we need to define the function 6*
recursively. If 5*(q, y) is the set of all the states that can be reached from q using the
symbols of y as well as A-transitions, then

LJ s@@ .
red*(q,y)

is the set of states we can reach in one more step by using the symbol a, and the
A-closure of this set includes any additional states that we can reach with subsequent
A-transitions.

rtediled sok vA 0) Recursive Definition 2 6* for an Nee: A

Let M =(Q
= 5,qo» A, 8)bes
an NFA-A. The extended transition‘Fyechion

aeA)=Aq).
ye m*,andae x,

. \red gy) |

guage recognized by
138 PART 2 Regular Languages and Finite Automata

4.6 | Applying the Definitions of A(S) and 4*


| EXAMPLE4.6
We consider the NFA-A shown in Figure 4.9, first as an illustration of how to apply the algorithm
for computing A(S), and then to demonstrate that Definition 4.5b really does make it possible
to calculate 6*(q, x).
Suppose we consider S = {s}. After one iteration, T is {s, w}, after two iterations it is
{s, w, go}, after three iterations it is {s, w, qo, p, t}, and in the next iteration it is unchanged.
A({s}) is therefore {s, w, qo, PD, t}.
Let us calculate 5*(qo, 010) using the recursive definition. This set is defined in terms
of 5*(qo, 01), which is defined in terms of 5*(qo, 0), and so on. We therefore approach the
calculation from the bottom up, calculating 5*(go, A), then 5*(qo, 0), 6*(go, 01), and finally
5* (go, 010):

5° (qo, A) = A({go})
= {qo, P, t}

8*(go,0)=Al [J 4(p,0)
ped* (go, A)

= A(S(qo, 0) U 8(p, 0) U 8, 0))


= ABU {p} U {u})
= A({p, u})

= {p,u}

Figure 4.9 |
The NFA-A for Example 4.6.
CHAPTER 4 Nondeterminism and Kleene’s Theorem 139

sq.0D=Al LJ 3,0)
pd*(qo,0)

= AG@(p, 1) UStu, 1)
= A({r})
elas

5*(qo,010) =A} [J 86,0)


p€d* (qo,01)

= A(6(r, 0))
= A({s})
= {s, w, go, Pp, t}

(The last equality is the observation we made near the beginning of this example.) Because
5*(go, 010) contains w, and because w is an element of A, 010 is accepted.
Looking at the figure, you might argue along this line instead: The string AO10A is the
same as 010; the picture shows the sequence
A 0 1 0 A
go > P= p> r—s—w

of transitions; therefore 010 is accepted. In an example as simple as this one, going through
the detailed calculations is not necessary in order to decide whether a string is accepted. The
point, however, is that with the recursive definitions of A(S) and 5*, we can proceed on a
solid algorithmic basis and be confident that the calculations (which are indeed feasible) will
produce the correct answer.

In Section 4.1, we showed (Theorem 4.1) that NFAs are no more powerful than
FAs with regard to the languages they can accept. In order to establish the same result
for NFA-As, it is sufficient to show that any NFA-A can be replaced by an equivalent
NFA; the notation we have developed now allows us to prove this.
140 PART 2 Regular Languages and Finite Automata

from state p, the input sy al


1; is still allowed to be nondeterminis
thout changing the states, by simp!
CHAPTER 4 Nondeterminism and Kleene’s Theorem 141
142 PART 2 Regular Languages and Finite Automata

1 1 theother case, when A({qo


M
histime Aiis accepted by bot

We think of NFAs as generalizations of FAs, and of NFA-Asas generalizations of


NFAs. It is technically not quite correct to say that every FA is an NFA, since the values
of the transition function are states in one case and sets of states in the other; similarly,
the domain of the transition function in an NFA is Q x &, and in an NFA-A it is
Qx(ZU{A}). Practically speaking, we can ignore these technicalities. The following
theorem formalizes this assertion and ties together the results of Theorems 4.1 and 4.2.
CHAPTER 4 Nondeterminism and Kleene’s Theorem 143

Just as in the case of Theorem 4.1, the proof of Theorem 4.2 provides us with
an algorithm for eliminating A-transitions from an NFA-A. We illustrate the algo-
rithm in two examples, in which we can also practice the algorithm for eliminating
nondeterminism.

Converting an NFA-A to an NFA


Let M be the NFA-A pictured in Figure 4.10a, which accepts the language {0}*{01}*{0}*. We
show in tabular form the values of the transition function, as well as the values 5*(g, 0) and
5*(q, 1) that give us the values of the transition function in the resulting NFA.

{A, B,C, D}
{C, D} D
D {B, D}
{D}

(c)

Figure 4.10 |
An NFA-A, an NFA, and an FA for {0}*{01}*{0}*.
144 PART 2 Regular Languages and Finite Automata

For example, 5*(A, 0) is calculated using the formula

S*(A,0)=Al [) 86,0)
reA({A})

In a more involved example we might feel more comfortable carrying out each step literally:
calculating A({A}), finding 5(r, 0) for each r in this set, forming the union, and calculating the
A-closure of the result. In this simple example, we can see that from A with input 0, M can
stay in A, move to B (using a 0 followed by a A-transition), move to C (using a A-transition
and then a 0), or move to D (using the 0 either immediately preceded or immediately followed
by two A-transitions). The other entries in the last two columns of the table can be obtained
similarly. Since M can move from the initial state to D using only A-transitions, A must be
an accepting state in M,, which is shown in Figure 4.10b.
Having the values 5, (q, 0) and 4, (q, 1) in tabular form is useful in arriving at an FA. For
example (if we denote the transition function of the FA by 5), in order to compute 52({C, D}, 0)
we simply form the union of the sets in the third and fourth rows of the 5*(q, 0) column of the
table; the result is {D}. It turns out in this example that the sets of states that came up as we
filled in the table for the NFA were all we needed, and the resulting FA is shown in Figure 4.10c.

| EXAMPLE4.8
4.8 | Another Example of Converting an NFA-A to an NFA
For our last example we consider the NFA-A in Figure 4.1la, recognizing the language
{0}*({O1}*{1} U {1}*{0}). Again we show the transition function in tabular form, as well
as the transition function for the resulting NFA.

yi (ARB CDE} {D,E}


B {C} {E}
C Y) {B}
D {E} {D}
E ty) a)

The NFA we end up with is shown in Figure 4.11b. Note that the initial state A is not
an accepting state, since in the original device an accepting state cannot be reached with only
A-transitions.
Unlike the previous example, when we start to eliminate the nondeterminism from our
NFA, new sets of states are introduced in addition to the ones shown. For example,

O3({A, B,C; D, E},.1)={D, E}U {Ee} U{BVU{D) U9 ={ BSD, BE}


On B,D, BE)Ol ={C nO te OG = {ercey

The calculations are straightforward, and the result is shown in Figure 4.11c.
CHAPTER 4 Nondeterminism and Kleene’s Theorem 145

Figure 4.11 |
An NFA-A, an NFA, and an FA for {0}*{01}*{1} U {1}*{0}.

4.3|KLEENE’S THEOREM
Sections 4.1 and 4.2 of this chapter have provided the tools we need to prove Theo-
rem 3.1. For convenience we have stated the two parts, the “if” and the “only if,” as
separate results.
146 PART 2 Regular Languages and Finite Automata

_ Theorem 4.4 Kleene’s Theorem, )


Any regular language can be ek pya finiteautomaton,
_
2

Proof
Because of Theorem 4.3, it is sufficient to show that every aoa language _
can be accepted e an NFA- A. The set o‘regular hohe overthe alt abet _

ne and LL. ona Picnires will be helpful here also; schematic


- eee ee the noe idea in each case are shown inree 4. 13.

on,1.andHiek lke sxattly the same moves thi


Co Ms would. Formally, wedefine . .
_ 4,Ge A= {dig} «3 W(du a)==0foreverya«2
candfor each ge OilU a2 and aéeXZUiA} e

ONC
byne
lac
|
For either value of i, if x € L;, oS
. _ ona A-transition and then executing the ves ae cause M; to acest:x On
the other hand, if x is accepted by M,, there i
is a sequence of transitions cor-
a responding to x, starting at g, and ending at an element of A; or Ad. The first
ot these transitions must be a A- transition from da to either a or qd
- there are no other transitions from q,. Thereafter, since Q in = =
CHAPTER 4 Nondeterminism and Kleene’s Theorem 147

0 +O -O-O
(a) (b) ()
Figure 4.12 |
NFA-As for the three basic regular languages.

(d)

Figure 4.13 |
NFA- As for union, concatenation, and Kleene *.
148 PART 2 Regular Languages and Finite Automata

then process x2 the way M> would, so ‘thatxix is accepted. Conversely, —


if x is accepted by M,, there is a sequence of transitions corresponding _
to x that begins at g; and ends at an element of A>. One of them must — :
therefore be from an element of Q to an element of Q2, and according o.
the definition of 5, this can only be a A-transition from an element of A, to
qo. Because QM Q> = Y, all the previous transitions are between elem
ofos ae all therenee: ones are between elements of 2. I

ae LZ, dk» Ak, 9x). Let qx bea


==01 Ulan} and Ak = (ae).|
Once Hee

ot
5x(qe, A) = {qi} and 8; (qx,
a) = O fora € &.
For g € Q; anda € DU {A}, 54, a= 51(q. a) unless g€ A, and
a= A.
Forg€ Ai, &(q, A) =81(g, A |
Suppose x € L*. If x = A, then. y x isaccepted by Mx. Otherwise,
for some m = 1, x = X\X2---X%,, wherex, € L; for each i. M, can
move from gx to gi by a A-transition; for each i, My moves from q; to an-
element f; of A; by a sequence of transitions corresponding to x;; and for
each i, M; then moves from fj back to q, by a A-transition. It follows that
ee Soo = x is accepted by M,. On the other |

ca
; ‘eanbe dee y iy in the ion
x = (Ax A)(AxA)-- “(Atm A)
. where, for each i, there is a sequence of transitions co:4
qi to an element of A. Therefore, x € L .
Since we have constructed an NFA-A rec igL in each ofthe three
cases, the proof is complete. |

The constructions in the proof of Theorem 4.4 provide an algorithm for con-
structing an NFA-A corresponding to a given regular expression. The next example
illustrates its application, as well as the fact that there may be simplifications possible
along the way.
CHAPTER 4 Nondeterminism and Kleene’s Theorem 149

Applying the Algorithm in the Proof of Theorem 4.4 | EXAMPLE 4.9


4.9 |
Let r be the regular expression (00 + 1)*(10)*. We illustrate first the literal application of the
algorithm, ignoring possible shortcuts. The primitive (zero-operation) regular expressions ap-
pearing in r are shown in Figure 4.14a. The NFA-As corresponding to 00 and 10 are now
constructed using concatenation and are shown in Figure 4.14b. Next, we form the NFA-A
corresponding to (00 + 1), as in Figure 4.14c. Figures 4.14d and 4.14e illustrate the NFA-As
corresponding to (00 + 1)* and (10)*, respectively. Finally, the resulting NFA-A formed by
concatenation is shown in Figure 4.14 f.
It is probably clear in several places that there are states and A-transitions called for in
the general construction that are not necessary in this example. The six parts of Figure 4.15

Figure 4.14 |
Constructing an NFA-A for (00 + Dado
150 PART 2 Regular Languages and Finite Automata

(c) (d)

Figure 4.15 |
A simplified NFA-A for (00 + 1)*(10)*.

parallel those of Figure 4.14 and incorporate some obvious simplifications. One must be a
little careful with simplifications such as these, as Exercises 4.32 to 4.34 illustrate.
We know from Sections 4.1 and 4.2 of this chapter how to convert an NFA-A obtained
from Theorem 4.4 into an FA. Although we have not yet officially considered the question of
simplifying a given FA as much as possible, we will see how to do this in Chapter 5.

If we have NFA-As M, and M), the proof of Theorem 4.4 provides us with
algorithms for constructing new NFA-As to recognize the union, concatenation, and
Kleene * of the corresponding languages. The first two of these algorithms were
illustrated in Example 4.5. As a further example, we start with the FAs shown in
Figures 4.16a and 4.16b (which were shown in Examples 3.12 and 3.13 to accept
the languages {0, 1}*{10} and {00}*{11}*, respectively). We can apply the algorithms
for union and Kleene *, making one simplification in the second step, to obtain the
CHAPTER 4 Nondeterminism and Kleene’s Theorem 151

Figure 4.16 |
An NFA-A for ((0 + 1)*10 + (00)*(11)*)*.

NFA-A in Figure 4.16c recognizing the language

({O, 1}*{10} U {O0}*{11}*)*


152 PART 2 Regular Languages and Finite Automata
CHAPTER 4 Nondeterminism and Kleene’s Theorem 153

SR ey
yy GOS
qu

Figure 4.17 |
154 PART 2 Regular Languages and Finite Automata

The only property of L(p, q, 0) that is needed in the proof of Theorem 4.5 is that
it is finite. It is simple enough, however, to find an explicit formula. This formula
and the others used in the proof are given below and provide a summary of the steps
in the algorithm provided by the proof for finding a regular expression corresponding
to a given FA:
{a € X|d(p,a) = 4} ifp #4q
Lp, ¢3.0)=
{aeX|d(p,a)=p}U{A} ifp=q
Lip,¢g,k+1)=Lp,¢,k) OL ka 1, DLA +1R-+ 1k) ER 195k)
L(p,q) = L(p,q,n)
L=|)[email protected])
qeA

‘>a Ieee Applying the Algorithm in the Proof of Theorem 4.5


Let M = (Q, &, qo, A, 5) be the FA pictured in Figure 4.18. In carrying out the algorithm
above for finding a regular expression corresponding to M, we construct tables showing regular
expressions r(p, g, j) corresponding to the languages L(p, q, j) for 0 < j <2.

1(P,2,1) (7,3,

2D
1 a*(ba*)*b a*(ba*)*bb
2 at (ba*)* (a*b)* (atb)*b
at + a*(bat)t a*b(atb)* oN +a*b(atb)*b

Figure 4.18 |
CHAPTER 4 Nondeterminism and Kleene’s Theorem 155

Although many of these table entries can be obtained by inspection, the formula can
be used wherever necessary. In a number of cases, simplifications have been made in the
expressions produced by the formula.
For example,

m3; 1) =r, 3, 0)-+- 75.1, 0rd, 1, 0)*r1, 3, 0)


=
because r(1, 3, 0) = G, and the concatenation of any language with @ is @.

F(3,251)i=7 (3% 2,0). 7G, 1y.0)rGs, 1,,0)*7A, 2, 0)


=b+a(a+A)*b
= Ab+a*b
=r
Ta(aly 2) a7, (Lewlel) pater, (Chee Se) ra Dee al) ears (2yal earl)
=a*+a*b(atb)*at
=a* + a*(bat)*bat
za -t a (ba)
=a (ba:)*

Ova )b—I Oyo) 1-473 (Sr 29n) (2a) 172 al)


= a*h + a*b(atb)*(A +.a'tb) .
= a*b+a*b(atb)*
= a*b(atb)*
The two accepting states are 1 and 2, and so the regular expression r we are looking for
isr(1, 1,3) +r, 2, 3). We use the formula for both, making a few simplifications along the
way:
re tes) == 2) (led, 2)r (Gs 3, 2) 7G; 1s 2)
= a*(ba*)* + a*(ba*)*bb(A + a*b(atb)*b)*
(at + a*(ba*)*)
= a*(ba*)* + a*(ba*)*bb(a*(ba*)*bb)*
(at + a*(ba*)*)
= a*(ba*)* + (a*(bat)*bb)*
(at + a*(ba*)*)
FU 25) =i Gd, 2) 2) eras, Yr Oes,.2) £34252)
= a*(bat)*b + a*(ba*)*bb(a*b(at b)*b)*a*b(a* b)*
= a*(ba*)*b + a*(ba*)*bb(a*
(bat )*bb)*a*b(a*b)*
= a*(bat)*b + (a*(ba*)*bb)* a*(ba*)*b
= (a*(ba*)*bb)*a* (ba*)*b
reese 23)

If you wish, you can almost certainly find ways to simplify this further.
156 PART 2 Regular Languages and Finite Automata

EXERCISES
4.1. In the NFA pictured in Figure 4.19, calculate each of the following.
a-0-0' (DD)
b. 64*(1, bab)
c. 6d*(1, aabb)
d. 6*(1, aabbab)
e. 6*(1, aba)
4.2. An NFA with states 1-5 and input alphabet {a, b} has the following
transition table.

a. Draw a transition diagram.


b. Calculate 6*(1, ab).
c. Calculate 6*(1, abaab).
4.3. Let M = (Q, =, qo, A, 5) be an NFA. Show that for any g € Q and any
aeé x, 6*(q,a) =45G, a).
4.4, Suppose L C &* is a regular language. If every FA accepting L has at least
n States, then every NFA accepting L has at least _ states. (Fill in the
blank, and explain your answer.)
4.5. In Definition 4.2b, 5* is defined recursively in an NFA by first defining
6*(q, A) and then defining 5*(q, ya), where y € &* anda € D. Give an

Figure 4.19 |
CHAPTER 4 Nondeterminism and Kleene’s Theorem 157

acceptable recursive definition in which the recursive part of the definition


defines 5*(ay) instead.
4.6. Give an example of a regular language L containing A that cannot be
accepted by any NFA having only one accepting state, and show that your
answer is correct.
4.7. Can every regular language not containing A be accepted by an NFA having
only one accepting state? Prove your answer.
4.8. (Refer to Exercise 3.29). Let M = (Q, X, qo, A, 5) be an FA.
a. Show that if every state other than go from which no element of A can be
reached is deleted, then what remains is an NFA recognizing the same
language.
b. Show that if all states not reachable from go are deleted and all states
other than go from which no element of A can be reached are deleted,
what remains is an NFA recognizing the same language.
4.9. Let M = (Q, &, go, A, 5) be an NFA, let m be the maximum size of any of
the sets 6(q, a) forg € Q anda € ¥&, and let x be a string of length n over
the input alphabet.
a. What is the maximum number of distinct paths that there might be in the
computation tree corresponding to x?
b. In order to determine whether x is accepted by M, it is sufficient to
replace the complete computation tree by one that is perhaps smaller,
obtained by “pruning” the original one so that no level of the tree
contains more than |Q| nodes (and no level contains more nodes than
there are at that level of the original tree). Explain why this is possible,
and how it might be done.
4.10. In each part of Figure 4.20 is pictured an NFA. Using the subset construction,
draw an FA accepting the same language. Label the final picture so as to
make it clear how it was obtained from the subset construction.
4.11. After the proof of Theorem 3.4, we observed that if M = (Q, &, go, A, 4) is
an FA accepting L, then the FA M’ = (Q, 2, qo, Q — A, 5) accepts L’. Does
this still work if M is an NFA? If so, prove it. If not, explain why, and find a
counterexample.
4.12. As in the previous problem, we consider adapting Theorem 3.4 to the case of
NFAs. For i = 1 and 25 let M; = (O;, Di di» Aj, 6;), and let M = (Q; x Qo,
D, (91, 92), A, 6), where A is defined as in the theorem for each of the three
cases and 6 still needs to be defined. If M, and M2 are FAs, the appropriate
definition of 5 is to use the formula 5(p, q) = (61(p), 42(q)). If M; and M2
are NFAs, let us define 5((p, g), a) = 5\(p, a) x 52(g, a). (This says for
example that if from state p M, can reach either p; or p on input a, and
from state r M> can reach either r; or r2 on input a, then M can reach any of
the four states (71,71), (P1, 12), (P2, 11), (p2, 72) from (p, r) on input a.)
158 PART 2 Regular Languages and Finite Automata

Figure 4.20 |

Do the conclusions of the theorem still hold in this more general


situation? Answer in each of the three cases (union, intersection, difference)
dl

and give reasons for your answer.


4.13. In Figure 4.21 is a transition diagram of an NFA-A. For each string below,
say whether the NFA-A accepts it:
a. aba
b. abab
c. aaabbb
CHAPTER 4 Nondeterminism and Kleene’s Theorem 159

Figure 4.21 |

Figure 4.22 |

4.14. Find a regular expression corresponding to the language recognized by the


NFA-A pictured in Figure 4.21. You should be able to do it without applying
Kleene’s theorem: First find a regular expression describing the most
general way of reaching state 4 the first time, and then find a regular
expression describing the most general way, starting in state 4, of moving to
state 4 the next time.
4.15. For each of the NFA-As shown in Figure 4.22, find a regular expression
corresponding to the language it recognizes.
160 PART 2 Regular Languages and Finite Automata

4.16. A transition table is given for an NFA-A with seven states.

ge
1
2
3
4
5)
6
7

Find:
a. A({2, 3})
b. A({1})
c. A({3, 4})
d= "6" Gaba?)
€. 6 (1, ab)
f. 46*(1, ababa)
4.17. A transition table is given for another NFA-A with seven states.

q_
1
2
3
4
5
6
7

Calculate 6*(1, ba).


4.18. For each of these regular expressions over {0, 1}, draw an NFA-A
recognizing the corresponding language. (You should not need the
construction in Kleene’s theorem to do this.)
a. (0+ 1)*(011 + 01010)(0 + 1)*
b. (0+ 1)(01)*(011)*
c. 010* + 0(01 + 10)*11
4.19. Suppose M is an NFA-A accepting L C *. Describe how to modify M to
obtain an NFA-A recognizing rev(L) = {x” |x € iy
4.20. In each part of Figure 4.23, two NFA-As are illustrated. Decide whether
the
two accept the same language, and give reasons for your answer.
CHAPTER 4 Nondeterminism and Kleene’s Theorem 161

$8 Bad
(a)

;
>
() >/\> >8

Figure 4.23 |

4.21. Let M = (Q, &, qo, A, 5) be an FA, and let M; = (Q, D, qo, A, 5) be the
NFA-A defined in the proof of Theorem 4.3, in which 5,(g, A) = @ and
51(q,a) = {6(q, a)}, for every g € Q anda € &. Give a careful proof that
for every g € O andx € &*, dF (g, x) = {5*(q, x)}. Recall that the two
functions 4* and 6F are defined differently.
4.22. Let M, be the NFA-A obtained from the FA M as in the proof of Theorem
4.3. The transition function 6, of M, is defined so that 6,(g, A) = @ for
every state g. Would defining 6;(q¢, A) = {gq} also work? Give reasons for
your answer.
4.23. Let M = (Q, X, qo, A, 5) be an NFA-A. The proofs of Theorems 4.2 and
4.1 describe a two-step process for obtaining an FA M; = (Q), %, q1, Ai,
6,) that accepts the language L(M). Do it in one step, by defining Qj, q,
A, and 6, directly in terms of M.
4.24. Let M =(Q, , qo, A, 5) be an NFA-A. In the proof of Theorem 4.2, the
' NEFA M, might have more accepting states than M: The initial state go is
made an accepting state if A({go}) 0 A 4 YW. Explain why it is not necessary
to make all the states g for which A({g}) N A 4 @ accepting states in M;.
4.25. Suppose M = (Q, X, go, A, 5) is an NFA-A recognizing a language L. Let
M, be the NFA-A obtained from M by adding A-transitions from each
element of A to go. Describe (in terms of L) the language L(M;).
162 PART 2 Regular Languages and Finite Automata

4.26. Suppose M = (Q, &, qo, A, 5) is an NFA-A recognizing a language L.


a. Describe how to construct an NFA-A M, with no transitions to its initial
state so that M; also recognizes L.
b. Describe how to construct an NFA-A M) with exactly one accepting
state and no transitions from that state so that M> also recognizes L.
4.27. Suppose M is an NFA-A with exactly one accepting state q+ that recognizes
the language L C {0, 1}*. In order to find NFA-As recognizing the
languages {0}*Z and L{0}*, we might try adding 0-transitions from qo to
itself and from q+ to itself, respectively. Draw transition diagrams to show
that neither technique always works.
4.28. In each part of Figure 4.24 is pictured an NFA-A. Use the algorithm
illustrated in Example 4.7 to draw an NFA accepting the same language.
4.29. In each part of Figure 4.25 is pictured an NFA-A. Draw an FA accepting the
same language.
4.30. Give an example (i.e., draw a transition diagram) to illustrate the fact that in
the construction of M, in the proof of Theorem 4.4, the two sets Q; and Q»
must be disjoint.
4.31. Give an example to illustrate the fact that in the construction of M, in the
proof of Theorem 4.4, the two sets Q; and Q» must be disjoint.

Figure 4.24 |
CHAPTER 4 Nondeterminism and Kleene’s Theorem 163

Figure 4.25 |

4.32. In the construction of M,, in the proof of Theorem 4.4, consider this
alternative to the construction described: Instead of a new state g, and
A-transitions from it to q; and q2, make qj the initial state of the new
NFA-A, and create a A-transition from it to gz. Either prove that this works
in general, or give an example in which it fails.
4.33. In the construction of M, in the proof of Theorem 4.4, consider the
simplified case in which M, has only one accepting state. Suppose that we
eliminate the A-transition from the accepting state of M, to q2, and merge
these two states into one. Either show that this would always work in this
case, or give an example in which it fails. »

4.34. In the construction of M;, in the proof of Theorem 4.4, suppose that instead
of adding a new state q;, with A-transitions from it to g; and to it from each
accepting state of Q;, we make q, both the initial state and the accepting
state, and create A-transitions from each accepting state of M, to g,. Either
show that this works in general, or give an example in which it fails.
4.35. In each case below, find an NFA-A recognizing the language corresponding
to the regular expression, by applying literally the algorithm in the chapter.
Do not attempt to simplify the answer.
a. ((ab)*b + ab*)*
b. aa(ba)* + b*aba*
c. (ab+ (aab)*)(aa+a)
4.36. In Figures 4.26a and 4.26b are pictured FAs M, and M), recognizing
languages L, and L», respectively. Draw NFA-As recognizing each of the
following languages, using the constructions in this chapter:
L,L2
L,L\L2
Ly, UL,
Ly
LIULy
L2L*
S
oO.
See L1L2 U (L2L))*
164 PART 2 Regular Languages and Finite Automata

Figure 4.26 |

Figure 4.27 |

4.37. Draw NFAs accepting L;L2 and L211, where L; and L» are as in the
preceding problem. Do this by connecting the two given diagrams directly,
by arrows with appropriate labels.
4.38. Use the algorithm of Theorem 4.5 to find a regular expression corresponding
to each of the FAs shown in Figure 4.27. In each case, if the FA has n states,
construct tables showing L(p, q, j) for each j withO < j <n—1.

MORE CHALLENGING PROBLEMS


4.39. Which of the following, if any, would be a correct substitute for the second
part of Definition 4.5b? Give reasons for your answer.
CHAPTER 4 Nondeterminism and Kleene’s Theorem 165

as 6 (Gg, ay) =A. UJ 6*(r, y)


réd(qg,a)

b. 6*(q, ay) = LU A(6*(r, y))


réd(q,a)

RECS I STIS)
reA(d(qg,a))

dq ferayy— . |") “AC 9))


reA(d(q,a))

4.40. Let M =(Q, X, qo, A, 5) be an NFA-A. This exercise involves properties


of the A-closure of a set S$. Since A(S) is defined recursively
(Definition 4.6), structural induction can be used to show that every state in
A(S) satisfies some property—such as being a member of some other set.
a. Show that if § and T are subsets of Q for which S C T, then
NS) SACL).
Show that for any S C Q, A(A(S)) = A(S).
Show that if $, 7 C Q, then A(S UT) = A(S)U A(T).
Show that if S C Q, then A(S) = Unes A({p}).
coal
Vio
eas Draw a transition diagram to illustrate the fact that A(S T) and
A(S)M A(T) are not always the same. Which is always a subset of the
other?
f. Draw a transition diagram illustrating the fact that A(S’) and A(S)’ are
not always the same. Which is always a subset of the other? Under what
circumstances are they equal?
4.41. Let M =(Q, =, qo, A, 5) be an NFA-A. Aset S C Q is called A-closed if
ACS) = S:
a. Show that the union of two A-closed sets is A-closed.
b. Show that the intersection of two A-closed sets is A-closed.
c. Show that for any subset S of Q, A(S) is the smallest A-closed set of
which S is a subset.
4.42. a. Let M =(Q,%, qo, A, 5) be an NFA. Show that for every g € Q and
every x, y.eru",

AFG) lair)
réd*(q,x)

b. Prove the same formula, this time assuming that M is an NFA-A.


4.43. Suppose M = (Q, B, go, A, 5) is an NFA. We may consider a new NFA
M, = (Q, %, qo, A, 61) obtained by letting 5;(g, a) = 5(q, a) if
|5(q, a)| < 1 and, for every pair (q, a) for which |5(q, a)| > 1, choosing
-one state p € 5(q, a) arbitrarily and letting 5,(g, a) = {p}. Although M; is
166 PART 2 Regular Languages and Finite Automata

not necessarily an FA, it could easily be converted to an FA, because it never


has more than one choice of moves.
If S is the set of all pairs (¢, a) for which |5(g, a)| > 1, and c denotes
some specific sequence of choices, one choice for each element of S, then
we let M; denote the specific NFA that results. What is the relationship
between L(M) and the collection of languages L(M;) obtained by
considering all possible sequences c of choices? Be as precise as you can,
and give reasons for your answer.
4.44, Let M = (Q, %, go, A, 6) be an NFA-A recognizing a language L. Assume
that there are no transitions to go, that A has only one element, g f> and that
there are no transitions from q ,.
a. Let M, be obtained from M by adding A-transitions from go to every
state that is reachable from qo in M. (If p and q are states, g is reachable
from p if there is a string x € &* such that q € 6*(p, x).) Describe (in
terms of L) the language accepted by M.
b. Let M be obtained from M by adding A-transitions to q+ from every
state from which q+ is reachable in M. Describe in terms of L the
language accepted by M).
c. Let M3 be obtained from M by adding both the A-transitions in (a) and
those in (b). Describe the language accepted by M3.
d. Let M, be obtained from M by adding A-transitions from p to g
whenever q is reachable from p in M. Describe the language accepted
by M34.

4.45. In Example 4.5, we started with NFAs M, and M) and incorporated them
into a composite NFA M accepting L(M,) U L(M>) in sucha way that no
new states were required.
a. Show by considering the languages {0}* and {1}* that this is not always
possible. (Each of the two languages can obviously be accepted by a
one-state NFA; show that their union cannot be accepted by a two-state
NFA.)
b. Describe a reasonably general set of circumstances in which this is
possible. (Find a condition that might be satisfied by one or both of the
NFAs that would make it possible.)
4.46. Suppose &; and &» are alphabets, and the function f : pH DEST)
homomorphism; i.e., f (xy) = f (x) f(y) for every x, y € ay.
a. Show that f(A) = A.
b. Show that if L C DF is regular, then f(L) is regular. (f (L) is the set
{y © 2 | Vie. f (x) for some x €.L}.)
c. Show that if L C D3 is regular, then f~!(L) is regular. (f“(L) is the
set {x € Li | f(x) € L}.)
4.47. Suppose M = (Q, %, go, A, 5) is an NFA-A. For two (not necessarily
distinct) states p and g, we define the regular expression e( p,q) as follows:
e(p,q) =l+rj+ro+---+7,, where/ is either A (if 5(p, A) contains q)
CHAPTER 4 Nondeterminism and Kleene’s Theorem 167

or %, and the r;’s are all the elements a of © for which 5(p, a) contains q.
It’s possible for e(p, q) to be 9, if there are no transitions from p to g;
otherwise, e(p, q) represents the “most general” transition from p to q.
If we generalize this by allowing e(p, q) to be an arbitrary regular
expression over 1, we get what is called an expression graph. If p and q are
two states in an expression graph G, and x € &*, we say that x allows G to
move from p to q if there are states po, pi,..., Pm, With po = p and
Pm = 4, SO that x corresponds to the regular expression
€(Po, Pide(P1, P2) +: e(Pn—1, Pn). This allows us to say how G accepts a
string x (x allows G to move from the initial state to an accepting state), and
therefore to talk about the language accepted by G. It is easy to see that in
the special case where G is simply an NFA-A, the two definitions for the
language accepted by G coincide. (See Definition 4.5a.) It is also not hard to
convince yourself, using Theorem 4.4, that for any expression graph G, the
language accepted by G can be accepted by an NFA-A.
We can use the idea of an expression graph to obtain an alternate proof
of Theorem 4.5, as follows. Starting with an FA M accepting L, we may
easily convert it to an NFA-A M, accepting L, so that M, has no transitions
to its initial state go, exactly one accepting state q+ (which is different from
go), and no transitions from q+. The remainder of the proof is to specify a
reduction technique to reduce by one the number of states other than go and
qf, obtaining an equivalent expression graph at each step, until go and qf are
the only states remaining. The regular expression e(qo, q+) then describes
the language accepted. If p is the state to be eliminated, the reduction step
involves redefining e(q,r) for every pair of states g and r other than p.
Describe in more detail how this reduction can be done. Then apply this
technique to the FAs in Figure 4.27 to obtain regular expressions
corresponding to their languages.
C H AP T E R

Regular and Nonregular


Languages

5.1! A CRITERION FOR REGULARITY


Kleene’s theorem (Theorems 4.4 and 4.5) provides a useful characterization of regular
languages: A language is regular (describable by a regular expression) if and only if
it can be accepted by a finite automaton. In other words, a language can be generated
in a simple way, from simple primitive languages, if and only if it can be recognized
in a simple way, by a device with a finite number of states and no auxiliary memory.
There is a construction algorithm associated with each half of the theorem, so that if
we already have a regular expression, we can find an FA to accept the corresponding
language, and if we already have an FA, we can find a regular expression to describe
the language it accepts.
Suppose now that we have a language over the alphabet © specified in some way
that involves neither a regular expression nor an FA. How can we tell whether it is
regular? (What inherent property of a language identifies it as being regular?) And,
if we suspect that the language is regular, how can we find either a regular expression
describing it or an FA accepting it?
We have a partial answer to the first question already. According to Theorem 3.2,
if there are infinitely many strings that are “pairwise distinguishable” with respect to
L, then L cannot be regular. (To say it another way, if L is regular, then every set
that is pairwise distinguishable with respect to L is finite.) It is useful to reformulate
this condition slightly, using Definition 3.5. Recall that L/x denotes the set {y €
u* |xy € L}. If we let J, be the indistinguishability relation on D*, defined by

xI,y if and only if L/x = L/y


then J; is an equivalence relation on D* (Exercise 1.33b), and saying that two strings
are distinguishable with respect to L means that they are in different equivalence
classes of I;. The statement above can therefore be reformulated as follows: If L

168
CHAPTER 5 Regular and Nonregular Languages 169

is regular, then the set of equivalence classes for the relation Jzt is finite. In order to
obtain a characterization of regularity using this approach, we need to show that the
converse is also true, that if the set of equivalence classes is finite then L is regular.
Once we do this, we will have an answer to the first question above (how can
we tell whether L is regular?), in terms of the equivalence classes of the relation
I,. Furthermore, it turns out that if our language L is regular, identifying these
equivalence classes will also give us an answer to the second question (how can we
find an FA?), because there is a simple way to use the equivalence classes to construct
an FA accepting L. This FA is the most natural one to accept L, in the sense that it has
the fewest possible states; and an interesting by-product of the discussion will be a
method for taking any FA known to accept L and simplifying it as much as possible.
We wish to show that if the the set of equivalence classes of J; is finite, then
there is a finite automaton accepting L. The discussion may be easier to understand,
however, if we start with a language L known to be regular, and with an FA M =
(Q, X, qo, A, 5) recognizing L. If qg € Q, then adapting the notation introduced in
Section 4.3, we let

Lt, Gide b0.(dor) = Gg}:

We remember from Chapter | that talking about equivalence relations on a set


is essentially the same as talking about partitions of the set: An equivalence relation
determines a partition (in which the subsets are the equivalence classes), and a partition
determines an equivalence relation (in which being equivalent means belonging to
the same subset). At this point, there are two natural partitions of ©* that we might
consider: the one determined by the equivalence relation /;, and the one formed by
all the sets L, for g € Q. The relationship between them is given by Lemma 3.1. If x
and y are in the same L, (in other words, if 6*(go, x) = 6* (qo, y)), then L/x = L/y,
so that x and y are in the same equivalence class of J;. This means that each set L,
must be a subset of a single equivalence class, and therefore that every equivalence
class of J; is the union of one or more of the L,’s (Exercise 1.68). In particular,
there can be no fewer of the L,’s than there are equivalence classes of J;,. If the two
numbers are the same, then the two partitions are identical, each set Ly is precisely
one of the equivalence classes of J;, and M is an FA with the fewest possible states
recognizing L.
These observations suggest how to turn things around and begin from the other
end. If we have an FA accepting L, then under certain circumstances, the strings in
one of the equivalence classes of J; are precisely those that correspond to one of the
states of the FA. If we are not given an FA to start with, but we know the equivalence
classes of J, then we might try to construct an FA with exactly this property: Rather
than starting with a state g and considering the corresponding set L, of strings, this
time we have the set of strings and we hope to specify a state to which the set will
correspond. However, the point is that we do not have to find a state like this, we can
simply define one. A “‘state” is an abstraction anyway; why not go ahead and say that
a state is a set of strings—specifically, one of the equivalence classes of [,? If there
are only a finite number of these equivalence classes, then we have at least the first
ingredient of a finite automaton accepting L: a finite set of states.
170 PART 2 Regular Languages and Finite Automata

Once we commit ourselves to this abstraction, filling in the remaining details is


surprisingly easy. Because one of the strings that cause an FA to be in the initial state
is A, we choose for our initial state the equivalence class containing A. Because we
want the FA to accept L, we choose for our accepting states those equivalence classes
containing elements of L. And because in the recognition algorithm we change the
current string by concatenating one more input symbol, we compute the value of
our transition function by taking a string in our present state (equivalence class) and
concatenating it with the input symbol. The resulting string determines the new
equivalence class.
Before making our definition official, we need to look a little more closely at
this last step. “Taking a string in our present state and concatenating it with the input
symbol” means that if we start with an equivalence class g containing a string x, then
5(q, a) should be the equivalence class containing xa. Writing this symbolically, we
should have

é([x], a) = [xa]
where for any string z, [z] denotes the equivalence class containing z. As an assertion
about a string x, this is a perfectly straightforward formula, which may be true or false,
depending on how 4([x], a) is defined. If we want the formula to be the definition
of 6([x], a), we must consider a potential problem. We are trying to define 5(q, a),
where gq is an equivalence class (a set of strings). We have taken a string x in the set
q, which allows us to write gq = [x]. However, there is nothing special about x; the
set g could just as easily be written as [y] for any other string y € q. If our definition
is to make any sense, it must tell us what 5(q, a) is, whether we write g = [x] or
q = [y]. The formula gives us [xa] in one case and [ya] in the other; obviously,
unless [xa] = [ya], our definition is nonsense. Fortunately, the next lemma takes
care of the potential problem.

Lemma 5.1 /;, is right invariant with respect to concatenation. In other words, for
any x,y € X* and anya € &, ifx I; y, then xa I; ya. Equivalently, if [x] = [y],
then [xa] = [ya].

Proof Suppose x I; y anda € X. Then L/x = L/y, so that for any z’ € D*, xz’
and yz’ are either both in L or both not in L. Therefore, for any z € D*, xaz and yaz
are either both in L or both not in L (because we can apply the previous statement
with z’ = az), and we conclude that xal, ya. @

Theorem 5.1 .
Let L C &*, and let Q, be the se
on &*. (Each element of 0, therefo
set, then M; = (Q,, d, qo.‘AL; Sé
_ gi
4. =(¢¢
l, 01|g0L 40).
__ by the formula 6([x], a) = = [xa]. Fee e,M, has
any FA accepting LL :
CHAPTER 5 Regular and Nonregular Languages 171

. the formula 5(Lx, a:= bra] isa meaningful :


i om Oi x > to O;, and thus the 5-tuple M; has
yfan FA. In eet toa that M,
J ae L, we

ma
tny= fy
one) induction ono y. The base Gensis
fores x. This is
i eae sincee the left side

rluction op suppose thatee some yy,ob: 2 == bey]for


, anc consider or, ya) fora at .

= 5( * (Ce) a “(by sinh of


ea)7
= 8(by], ay (by,the induction iypthess |
lxya] “(by thedefinition of a
aoeitfollowsthatSao,se(UA), ne= [ehOur

ments
ShcA
1b- o nere iare the same. One action
rE L, then [x]JNL AG, sincex € (el. In the other direction, —
an element y of L, then x must bein L. Otherwise the ‘string
uishx andy withTespect |to L,and x and y could not both —

Corollary 5.1 L is a regular language if and only if the set of equivalence classes
of J, is finite.

Proof Theorem 5.1 tells us that if the set of equivalence classes is finite, there is
an FA accepting L; and Theorem 3.2 says that if the set is infinite, there can be no
such FA. @

Corollary 5.1 was proved by Myhill and Nerode, and it is often called the Myhill-
Nerode theorem.
It is interesting to observe that to some extent, the construction of M; in Theorem
5.1 makes sense even when the language L is not regular. /;, is an equivalence relation
for any language L, and we can consider the set Q; of equivalence classes. Neither
172 PART 2 Regular Languages and Finite Automata

S
O
©

O
@
Q
Figure 5.1 |

the definition of 5: Q; x & — Q, nor the proof that M; accepts L depends on the
assumption that Q, is a finite set. It appears that even in the most general case, we
have some sort of “device” that accepts L. If it is not a finite automaton, what is it?
Instead of inventing a name for something with an infinite number of states, let us
draw a (partial) picture of it in a simple case we have studied. Let L be the language
pal of all palindromes over {a,b}. As we observed in the proof of Theorem 3.3,
any two strings in {a, b}* are distinguishable with respect to L. Not only is the
set Q, infinite, but there are as many equivalence classes as there are strings; each
equivalence class contains exactly one string. Even in this most extreme case, there
is no difficulty in visualizing M;, as Figure 5.1 indicates.
The only problem, of course, is that there is no way to complete the picture, and
no way to implement M, as a physical machine. As we have seen in other ways, the
crucial aspect of an FA is precisely the finiteness of the set of states.

| EXAMPLES.1 | Applying Theorem 5.1 to {0, 1}*{10}


Consider the language discussed in Example 3.12,

L = {x € {0, 1}* |x ends with 10}


CHAPTER 5 Regular and Nonregular Languages 173

Figure 5.2 |
A minimum-state FA recognizing
{O, 1}*{10}.

and consider the three strings A, 1, and 10. We can easily verify that any two of these strings
are distinguishable with respect to L: The string A distinguishes A and 10, and also 1 and 10,
while the string 0 distinguishes A and 1. Therefore, the three equivalence classes [A], [1], and
[10] are distinct.
However, any string y is equivalent to (indistinguishable from) one of these strings. If y
ends in 10, then y is equivalent to 10; if y ends in 1, y is equivalent to 1; otherwise (if Ay,
y = 0, or y ends with 00), y is equivalent to A. Therefore, these three equivalence classes are
the only ones.
Let M; = (Q,, {0, 1}, [A], {[10]}, 5) be the FA we constructed in Theorem 5.1. Then

6([A],0)=[A] and 3$({A], 1) = [1]


since AO is equivalent to A and Al = 1. Similarly,

6({1],0) = [10] and 4$({1], 1) = [1]

since 7 is equivalent to 1. Finally,

6({10],0) =[A] and 6({10], 1) = [1]

since 100 is equivalent to A and 101 is equivalent to 1. It follows that the FA M, is the one
shown in Figure 5.2. Not surprisingly, this is the same FA we came up with in Example 3.12,
except for the names given to the states. (One reason it is not surprising is that the strings A,
1, and 10 were chosen to correspond to the three states in the previous FA!)

We used Theorem 3.2, which is the “only if” part of Theorem 5.1, to show that
the language of palindromes over {0, 1} is nonregular. We may use the same principle
to exhibit a number of other nonregular languages.

The Equivalence Classes of /, for L = {0717 |n > 0} | EXAMPLE 5.2 |


Let L = {0"1" | n > O}. The intuitive reason L is not regular is that in trying to recognize
elements of L, we must remember how many 0’s we have seen, so that when we start seeing
1’s we will be able to determine whether the number of 1’s is exactly the same. In order to
use Theorem 5.1 to show L is not regular, we must show that there are infinitely many distinct
equivalence classes of J;. In this example, at least, let us do even more than that and describe
the equivalence classes exactly.
174 PART 2 Regular Languages and Finite Automata

Some strings are not prefixes of any elements of L (examples include 1, 011, and 010),
and it is not hard to see that the set of all such strings is an equivalence class (Exercise 5.4).
The remaining strings are of three types: strings in L, strings of 0’s (0', for some i > 0), and
strings of the form 0'1/ with O < j <i.
The set L is itself an equivalence class of J,. This is because for any string x € L, A is
the only string that can follow x so as to produce an element of L.
Saying as we did above that ““we must remember how many 0’s we have seen” suggests
that two distinct strings of 0’s should be in different equivalence classes. This is true: Ifi 4 j,
the strings 0! and O/ are distinguished by the string 1', because 0'1' € L and 0/1' ¢ L. We
now know that [0'] 4 [0/]. To see exactly what these sets are, we note that for any string x
other than 0’, the string 01'+! distinguishes 0' and x (because 0'01't! € L and x01'*! ¢ L).
In other words, 0! is equivalent only to itself, and [0'] = {0'}.
Finally, consider the string 000011, for example. There is exactly one string z for which
000011z € L: the string z = 11. However, any other string x having the property that xz € L
if and only if z = 11 is equivalent to 000011, and these are the strings 0/*71/ for j > 0.
No string other than one of these can be equivalent to 000011, and we may conclude that the
equivalence class [000011] is the set {0/*?1/ | j > 0}. Similarly, for each k > 0, the set
{0/+*1/ | 7 > 0} is an equivalence class.
Let us summarize our conclusions. The set L and the set of all nonprefixes of elements
of L are two of the equivalence classes; for each i > 0, the set with the single element 0! is
an equivalence class; and for each k > 0, the infinite set {0/+*1/ | 7 > 0} is an equivalence
class. Since every string is in one of these equivalence classes, these are the only equivalence
classes.
As we expected, we have shown in particular that there are infinitely many distinct equiv-
alence classes, which allows us to conclude that L is not regular.

| EXAMPLE5.3__ Simple Algebraic Expressions


Let L be the set of all legal algebraic expressions involving the identifier a, the operator +, and
left and right parentheses. To show that the relation J, has infinitely many distinct equivalence
classes, we can ignore much of the structure of L. The only fact we use is that the string

is in L if and only if the numbers of left and right parentheses are the same. We may therefore
consider the set S = {(" | n > O}, in the same way that we considered the strings 0” in the
previous example. For 0 < m < n, the string a)” distinguishes ( and (", and so any two
elements of S are distinguishable with respect to L (i.e., in different equivalence classes). We
conclude from this that L is not regular. Exercise 5.35 asks you to describe the equivalence
classes of J; more precisely.

| EXAMPLES5.4 | The Set of Strings of the Form ww


For yet another example where the set S = {0” | n > 0} can be used to prove a language
nonregular, take L to be the language

{ww | w e {0, 1}*}


CHAPTER 5 Regular and Nonregular Languages 175

of all even-length strings of 0’s and 1’s whose first and second halves are identical. This time,
for a string z that distinguishes 0” and 0” when m # n, we choose z = 1"0"1". The string 0"z
is in L, and the string 0”z is not.

Exercise 5.20 asks you to get even a little more mileage out of the set {0” |n > 0}
or some variation of it. We close this section with one more example.

Another Nonregular Language from Theorem 5.1 | EXAMPLES.5 |


Let L = {0, 011, 011000, 0110001111, ...}. A string in L consists of groups of 0’s alternated
with groups of 1’s. It begins with a single 0, and each subsequent group of identical symbols
is one symbol longer than the previous group. Here we can show that the infinite set L itself is
pairwise distinguishable with respect to L, and therefore that L is not regular. Let x and y be
two distinct elements of L. Suppose x and y both end with groups of 0’s, for example, x with
O/ and y with OX. Then x1/t! € L, but yl/*! ¢ L. It is easy to see that similar arguments also
work in the other three cases.

5.2|MINIMAL FINITE AUTOMATA


Theorem 5.1 and Corollary 5.1 help us to understand a little better what makes a
language L regular. In a sense they provide an absolute answer to the question of
how much information we need to remember at each step of the recognition algorithm:
We can forget everything about the current string except which equivalence class of
I, it belongs to. If there are infinitely many of these equivalence classes, then this
is more information than any FA can remember, and L cannot be regular. If the set
of equivalence classes is finite, and if we can identify them, then we can use them to
construct the simplest possible FA accepting L.
It is not clear in general just how these equivalence classes are to be identified
or described precisely; in Example 5.1, where we did it by finding three pairwise
distinguishable strings, we had a three-state FA recognizing L to start with, so that
we obtained little or no new information. In this section we will show that as long
as we have some FA to start with, we can always find the simplest possible one. We
will develop an algorithm for taking an arbitrary FA and modifying it if necessary so
that the resulting machine has the fewest possible states (and the states correspond
exactly to the equivalence classes of J;).
Suppose we begin with the finite automaton M = (Q, ©, qo, A, 4). We consider
again the two partitions of &* that we described in Section 5.1, one in which the
subsets are the sets L, and one in which the subsets are the equivalence classes of I.
If the two partitions are the same, then we already have the answer we want, and M
is already a minimum-state FA. If not, the fact that the first partition is finer than the
second (one subset from the second partition might be the union of several from the
first) tells us that we do not need to abandon our FA and start over, looking for one
with fewer states; we just have to determine which sets L, we can combine to obtain
an equivalence class.
176 PART 2 Regular Languages and Finite Automata

Before we attack this problem, there is one obvious way in which we might be
able to reduce the number of states in M without affecting the L, partition at all.
This is to eliminate the states q for which L, = M. For such a q, there are no strings
x satisfying 6*(qo, x) = q; in other words, qg is unreachable from go. It is easy to
formulate a recursive definition of the set of reachable states of M and then use that to
obtain an algorithm that finds all reachable states. If all the others are eliminated, the
resulting FA still recognizes L (Exercise 3.29). For the remainder of this discussion,
therefore, we assume that all states of M are reachable from qo.
It may be helpful at this point to look again at Example 3.12. Figure 5.3a shows
the original FA we drew for this language; Figure 5.3b shows the minimum-state FA
we arrived at in Example 5.1; Figure 5.3c shows the partition corresponding to the
original FA, with seven subsets; and Figure 5.3d shows the three equivalence classes
of I,, which are the sets L, for the minimum-state FA. We obtain the simpler FA
from the first one by merging the three states 1, 2, and 4 into the state A, and by
merging the states 3, 5, and 7 into B. State 6 becomes state C. Once we have done
this, we can easily determine the new transitions. From any of the states 1, 2, and
4, the input symbol 0 takes us to one of those same states. Therefore, the transition
from A with input 0 must go to A. From 1, 2, or 4, the input 1 takes us to 3, 5,
or 7. Therefore, the transition from A with input 1 goes to B. The other cases are
similar.
In general, starting with a finite automaton M, we may describe the problem in
terms of identifying the pairs (p, q) of states for which L, and L, are subsets of the
same equivalence class. Let us write this condition as p = q. What we will actually
do is to solve the opposite problem: identify those pairs (p, q) for which p 4 q. The
first step is to express the statement p = q ina slightly different way.

Lemma 5.2 Suppose p,q € Q, and x and y are strings with x € L, and y € Lg
(in other words, 8" (Go, x) = p and 6*(qo, y) = q). Then these three statements are
all equivalent:
1 p=q.
2. L/x =L/y (ie., xI_y, or x and y are indistinguishable with respect to L).
3. For any z € &*, 5*(p,z) € A @ 8*(q, z) € A (ie., 5*(p, z) and 5*(q, z) are
either both in A or both not in A).

Proof To see that statements 2 and 3 are equivalent, we begin with the formulas

é*(p, Zz) = 5*(8* (qo, Gi; Zz) = 5* (qo, XZ)

5°(q, 2) = 8*(6* (qo, y), Z) = 8* (qo, yz)


Saying that L/x = L/y means that a string z is in one set if and only if it is in the
other, or that xz € L if and only if yz € L; since M accepts L, this is exactly the
same as statement 3.
Now if statement 1 is true, then L, and L, are both subsets of the same equiv-
alence class. This means that x and y are equivalent, which is statement 2. The
converse is also true, because we know that if L p and L, are not both subsets of the
CHAPTER 5 Regular and Nonregular Languages 177

(a) (d)

De BUN ogee Oe 2 oye te


L7= =*{11}

Lg=L=2*{10}

(c) (d)

Figure 5.3 |
Two FAs for {0, 1}*{10} and the corresponding partitions of {0, 1}*.

same equivalence class, then they are subsets of different equivalence classes, so that
statement 2 does not hold. @

Let us now consider how it can happen that p € g. According to the lemma,
this means that for some z, exactly one of the two states 6*(p, z) and 6*(q, z) isin A.
178 PART 2 Regular Languages and Finite Automata

The simplest way this can happen is with z = A, so that only one of the states p and
q isin A. Once we have one pair (p, q) with p # q, we consider the situation where
r,s € Q, and for some a € &, 5(r, a) = p and 6(s, a) = q. We may write

5*(r,az) = 8*(8"(r, a), z) = 8° (8(r, a), 2) = (Pp, 2)


and similarly, 6*(s, az) = 6*(q, z). Since p € q, then for some z, exactly one of the
states 6*(p, z) and 6*(q, z) is in A; therefore, exactly one of 5*(r, az) and 6*(s, az)
isin A, andr #s.
These observations suggest the following recursive definition of a set S, which
will turn out to be the set of all pairs (p,q) with p € q.

1. For any p and g for which exactly one of p and q is in A, (p, qg) isin S.
2. For any pair (p, g) € S, if (r, s) is a pair for pick 6(r, a) = p and 6(s,a) =q
for some a € &, then (r,s) is in S.
3. No other pairs are in S.
It is not difficult to see from the comments preceding the recursive definition that
for any pair (p,q) € S, p #q. On the other hand, it follows from Lemma 5.2 that
we can show S contains all such pairs by establishing the following statement: For
any string z € &*, every pair of states (p, q) for which only one of the states 6*(p, z)
and 6*(q, z) is in A is an element of S.
We do this by using structural induction on z. For the basis step, if only one of
6*(p, A) and 5*(q, A) is in A, then only one of the two states p and g is in A, and
(p,q) € S because of statement 1 of the definition.
Now suppose that for some z, all pairs (p, q) for which only one of 5*(p, z) and
6*(q, z) isin A are in S. Consider the string az, where a € &, and suppose that (r, s)
is a pair for which only one of 5*(r, az) and 5*(s, az) is in A. If we let p = 8(r, a)
and q = 4(s, a), then we have

d(FGz) = 8" (6(G.a),.z) = 0" Cp, Zz)


OMS OZ) = 0 O(5d), ZO (G5 2)

Our assumption on r and s is that only one of the states 5*(r, az) and 5*(s, az),
and therefore only one of the states 6*(p, z) and 6*(q, z), is in A. Our induction
hypothesis therefore implies that (p,q) € S, and it then follows from statement 2
in the recursive definition that (r,s) € S. (Note that in the recursive definition of
x* implicit in this structural induction, the recursive step of the definition involves
strings of the form az rather than za.)
Now it is a simple matter to convert this recursive definition into an algorithm to
identify all the pairs (p, q) for which p # q.

Algorithm 5.1 (For Identifying the Pairs (p, g) with p £ q) List all (unordered)
pairs of states (p,q) for which p # q. Make a sequence of passes through these
pairs. On the first pass, mark each pair of which exactly one element is in A. On
each subsequent pass, mark any pair (r, s) if there is ana € © for which 4(r, a) = p,
CHAPTER 5 Regular and Nonregular Languages 179

5(s,a) = q, and (p, q) is already marked. After a pass in which no new pairs are
marked, stop. The marked pairs (p, q) are precisely those for which p #q. @
When the algorithm terminates, any pair (p, q) that remains unmarked represents
two states in our FA that can be merged into one, since the corresponding sets of strings
are both subsets of the same equivalence class. In order to find the total number of
equivalence classes, or the minimum number of states, we can make one final pass
through the states of M; the first state of M to be considered corresponds to one
equivalence class; for each subsequent state g of M, q represents a new equivalence
class only if the pair (p, q) was marked by Algorithm 5.1 for every previous state p of
M. As we have seen in our example, once we have the states in the minimum-state FA,
determining the transitions is straightforward. We return once more to Example 5.1
to illustrate the algorithm.

Minimizing the FA in Figure 5.3a | EXAMPLES.6 |


We apply Algorithm 5.1 to the FA in Figure 5.3a. Figure 5.4a shows all unordered pairs (p, q)
with p # q. The pairs marked 1 are those of which exactly one element is in A; they are
marked on pass 1. The pairs marked 2 are those marked on the second pass. For example,
(2, 5) is one of these, since 5(2, 0) = 4, (5, 0) = 6, and the pair (4, 6) was marked on pass 1.
A third pass produces no new marked pairs. Suppose for example that (1, 2) is the first
pair to be tested on the third pass. We calculate 5(1, 0) = 2 and 6(2, 0) = 4, and (2, 4) is not
marked. Similarly, 5(1, 1) = 3 and 5(2, 1) = 5, and (3, 5) is not marked. It follows that (1, 2)
will not be marked on this pass. ;
If we now go through the seven states in numerical order, we see that there is an equivalence
class containing state 1; state 2 is in the same class, since the pair (1, 2) is unmarked; state 3 is
in a new equivalence class, since (1, 3) and (2, 3) are both marked; state 4 is in the same class
as 2, since (2, 4) is unmarked; 5 is in the same class as 3; 6 is in a new class; and 7 is in the
same class as 3. We conclude that the three equivalence classes of J, are p) = L; UL2 ULy,
P2= L3U Ls IEF, and P3= ee
We found the transitions earlier. The resulting FA is shown in Figure 5.4b. Again, it is
identical to the one obtained in Example 3.12 except for the names of the states.

Figure 5.4 |
Applying Algorithm 5.1 to the FA in Figure 5.3a.
180 PART 2 Regular Languages and Finite Automata

5.3 |THE PUMPING LEMMA


FOR REGULAR LANGUAGES
Every regular language can be accepted by a finite automaton, a recognizing device
with a finite set of states and no auxiliary memory. We can use the finiteness of
this set to derive another property shared by all regular languages. Showing that a
language does not have this property will then be another way, in addition to using
Corollary 5.1, of showing that the language is not regular. One reason this is useful is
that the method we come up with can be adapted for use with more general languages,
as we will see in Chapter 8.
Suppose M = (Q, =, qo, A, 5) is an FA recognizing a language L. The property
we are interested in has to do with paths through M that contain “loops.” An input
string x € L requiring M to enter some state twice corresponds to a path that starts
at qo, ends at some accepting state gy, and contains a loop. (See Figure 5.5.) Any
other path obtained from this one by changing the number of traversals of the loop
will then also correspond to an element of L, different from x in that it contains a
different number of occurrences of the substring corresponding to the loop. This
simple observation will lead to the property we want.
Suppose that the set Q has n elements. For any string x in L with length at least
n, if we write x = a\a2---a,y, then the sequence of n + 1 states

go = 5" (qo, A)
qi = 5* (qo, 41)
q2 = 8* (qo, 4142)

dn = 5* (Go, aja2--- an)

must contain some state at least twice, by the pigeonhole principle (Exercise 2.44).
This is where our loop comes from. Suppose g; = gi+p, where 0 <i <i+p <n.
Then

5° (qo, 4142 °-- aj) = Gi


8" (Gi, Gi414i42 °° itp) = Gi
5" (Gi, i+ pt+iGitp+2°**Any) =r EA
To simplify the notation, let
U = a\Q2°-- dj
U = Gj4+1442°°* Gi+p

Figure 5.5 |
CHAPTER 5 Regular and Nonregular Languages 181

W = Gi+pt1Git+p+2°°* any
(See Figure 5.5.) The string u is interpreted to be A if i = 0, and w is interpreted to
bey 1fi + p= n.
Since 6*(q;, v) = gi, we have 5*(q;, v”) = q; for every m > 0, and it follows
that 6*(qo, uv"w) = qf for every m > 0. Since p > 0 andi + p <n, we have
proved the following result.

This result is often referred to as the Pumping Lemma for Regular Languages,
since we can think of it as saying that for an arbitrary string in L, provided it is
sufficiently long, a portion of it can be “pumped up,” introducing additional copies
of the substring v, so as to obtain many more distinct elements of L.
The proof of the result was easy, but the result itself is complicated enough in its
logical structure that applying it correctly requires some care. Itmay be helpful first to
weaken it slightly by leaving out some information (where the integer n comes from).
Theorem 5.2a clarifies the essential feature and is sufficient for most applications.

In order to use the pumping lemma to show that a language L is not regular, we
must show that L fails to have the property described in the lemma. We do this by
assuming that the property is satisfied and deriving a contradiction.
The statement is of the form “There is ann so that for any x € L with|x| >n,..-
%

We assume, therefore, that we have such an n, although we do not know what it is.
We try to find a specific string x with |x| > n so that the statements involving x in the
theorem will lead to a contradiction. (The theorem says that under the assumption
that L is regular, any x € L with |x| > n satisfies certain conditions; therefore,
182 PART 2 Regular Languages and Finite Automata

our specific x satisfies these conditions; this leads to a contradiction; therefore, the
assumption leads to a contradiction; therefore, L is not regular.)
Remember, however, that we do not know what n is. In effect, therefore, we
must show that for any n, we can find an x € L with |x| > n so that the statements
about x in the theorem lead to a contradiction. It may be that we have to choose x
carefully in order to obtain a contradiction. We are free to pick any x we like, as long
as |x| => n—but since we do not know what n is, the choice of x must involve n.
Once we have chosen x, we are not free to choose the strings u, v, and w into
which the theorem says x can be decomposed. What we know is that there is some
way to write x as uvw so that equations (5.2)-(5.4) are true. Because we must
guarantee that a contradiction is produced, we must show once we have chosen x that
any choice of u, v, and w satisfying equations (5.1)—(5.4) produces a contradiction.
Let us use as our first illustration one of the languages that we already know is not
regular.

| EXAMPLES.7 | Application of the Pumping Lemma


Let L = {0'1' | i > 0}. Suppose that L is regular, and let n be the integer in Theorem 5.2a.
We can now choose any x with |x| > n; a reasonable choice is x = 0"1". The theorem says
that x = uvw for some u, v, and w satisfying equations (5.2)-(5.4). No matter what u, v, and
w are, the fact that (5.2) is true implies that wv = 0* for some k, and it follows from (5.3) that
v = O/ for some j > 0. Equation (5.4) says that uv"w € L for every m > 0. However, we
can obtain a contradiction by considering m = 2. The string uv*w contains j extra 0’s in the
first part (uv*w = 0"*/1"), and cannot be in L because j > 0. This contradiction allows us to
conclude that L cannot be regular.
Let us look a little more closely at the way we chose x in this example (which for this
language just means the way we chose |x|). In the statement of the pumping lemma, the only
condition x needs to satisfy is |x| > n; with a little more effort, we can obtain a contradiction
by starting with x = 0”1", for any m > n/2. However, now we can no longer assert that
uv = O*. There are two other possibilities to consider. In each case, however, looking at
m = 2 is enough to obtain a contradiction. If v contains both 0’s and 1’s, then v = 0! 1/ , So that
uv*w contains the substring 10 and is therefore not in L. If v contains only 1’s, then v = 1’,
and uv*w = 0"1"*/, also not in L.
Again, the point is that when we use the pumping lemma to show L is nonregular, we are
free to choose x any way we wish, as long as |x| > n and as long as it will allow us to derive a
contradiction. We try to choose x so that getting the contradiction is as simple as possible. Once
we have chosen x, we must be careful to show that a contradiction follows inevitably. Unless
we can get a contradiction in every conceivable case, we have not accomplished anything.
Another feature of this example is that it allows us to prove more than we originally set out
to. We started with the string x = 0" 1" € L and observed that for the strings u, v, and w, uv?w
fails to be an element, not only of L but of the larger language L; = {x € {0, 1}* | no(x) =
n,(x)}. Therefore, our proof also allows us to conclude that L, is not regular.
However, with this larger language it is worth looking one more time at the initial choice
of x, because specifying a length no longer determines the string, and not all strings of the
same length are equally suitable. First we observe that choosing x = 0"/21"/2, which would
CHAPTER 5 Regular and Nonregular Languages 183

have worked for the language L (at least if n is even), no longer works for L,. The reason
this string works for L is that even if v happens to contain both 0’s and 1’s, uv2w is not of the
form 0'1'. (The contradiction is obtained, not by looking at the numbers of 0’s and 1’s, but by
looking at the order of the symbols.) The reason it does not work for L is that if v contains
equal numbers of 0’s and 1’s, then uv” w also has equal numbers of 0’s and 1’s, no matter what
m we use, and there is no contradiction.
If we had set out originally to show that L; was not regular, we might have chosen an x
in L, but not in L. An example of an inappropriate choice is the string x = (01)”. Although
this string is in L, and its length is at least n, look what happens when we try to produce a
contradiction. If x = uvw, we have these possibilities:

1. v= (01) for some j > 0


2. v=1(01)/ for some j > 0
3. v=1(01)/0 for some j > 0
4. v= (01)/0 for some j > 0

Unfortunately, none of the conditions (5.2)-(5.4) gives us any more information about v,
except for some upper bounds on j. In cases 2 and 4 we can obtain a contradiction because
the string v that is being pumped has unequal numbers of 0’s and 1’s. In the other two cases,
however, there is no contradiction, because uv”w has equal numbers of 0’s and 1’s for any
m. We cannot guarantee that one of these cases does not occur, and therefore we are unable to
finish the proof using this choice of x.

Another Application of the. Pumping Lemma | EXAMPLES.8


5.8 |
Consider the language

EL =AO alt 2.0, xe {0, 1} and |x|-<i}


Another description of L is that it is the set of all strings of 0’s and 1’s so that at least the first
half of x consists of 0’s. The proof that L is not regular starts the same way as in the previous
example. Assume that L is regular, and let n be the integer in Theorem 5.2a. We obviously
should not try to start with a string x of all 0’s, because then no string obtained from x by
pumping could have any 1’s, and there would be no chance of a contradiction. Suppose we try
x = 0" 1", just as in the previous example. Then if Equations (5.1)—(5.4) hold, it follows as
before that v = 0/ for some j > 0. In this example the term pumping is a little misleading. We
cannot obtain a contradiction by looking at strings with additional copies of v, because initial
0’s account for an even larger fraction of these strings than in x. However, Equation (5.4) also
says that uv°w € L. This does give us our contradiction, because uv9w = uw = 0" 1" ¢ L.
Therefore, L is not regular.

Application of the Pumping Lemma to pal | EXAMPLE 5.9 |


Let L be pal, the languages of palindromes over {0, 1}. We know from Theorem 3.3 that L
is not regular, and now we can also use the pumping lemma to prove this. Suppose that L is
regular, and let n be the integer in the statement of the pumping lemma. We must choose x
to be a palindrome of length at least n that will produce a contradiction; let us try x = 0” 10".
184 PART 2 Regular Languages and Finite Automata

Then just as in the two previous examples, if Equations (5.1)—(5.4) are true, the string v is a
substring of the form 0/ (with j > 0) from the first part of x. We can obtain a contradiction
using either m = 0 or m > 1. In the first case, uv”w = 0"-/10", and in the second case, if
m = 2 for example, uv”w = 0"*/10". Neither of these is a palindrome, and it follows that L
cannot be regular.

It is often possible to get by with a weakened form of the pumping lemma. Here
are two versions that leave out many of the conclusions of Theorem 5.2a but are still
strong enough to show that certain languages are not regular.

Theorem 5.3 would be sufficient for Example 5.7, as you are asked to show in
Exercise 5.21, but it is not enough to take care of Examples 5.8 or 5.9.

Theorem 5.4 would not be enough to show that the language in Example 5.7 is
not regular. The next example shows a language for which it might be used.

| EXAMPLE 5.10 | An Application of Theorem 5.4


yet

L,= {07 | nis prime} = {07,.0°,.0°,0'..0"....}


According to Theorem 5.4, in order to show that L is not regular we just need to show that the
set of primes cannot contain an infinite arithmetic progression of the form {p +mq |m > 0};
in other words, for any p > 0 and any q > 0, there is an integer m so that Pp +mgq is not prime.
CHAPTER 5 Regular and Nonregular Languages 185

The phrase not prime means factorable into factors 2 or bigger. We could choose m = p,
which would give

p+mq = p+ pq =p(l+q)
except that we are not certain that p > 2. Instead let m = p + 2g +2. Then

p+mq=p+(p+2q+2)q
= (p+ 2q) + (p+ 2q)q
= (p+2q)1+4q)
and this is clearly not prime.
This example has a different flavor from the preceding ones and seems to have more to
do with arithmetic, or number theory, than with languages. Yet it illustrates the fact, which
will become even more obvious in the later parts of this book, that many statements about
computation can be formulated as statements about languages. What we have found in this
example is that a finite automaton is not a powerful enough device (it does not have enough
memory) to solve the problem of determining, for an arbitrary integer, whether it is prime.

Corollary 5.1, in the first part of this chapter, gives a condition involving a
language that is necessary and sufficient for the language to be regular. Theorem 5.2a
gives a necessary condition. One might hope that it is also sufficient. This result (the
converse of Theorem 5.2a) would imply that for any nonregular language L, the
pumping lemma could be used to prove L nonregular; constructing the proof would
just be a matter of making the right choice for x. The next example shows that this
is not correct: Showing that the conclusions of Theorem 5.2a hold (i.e., showing that
there is no choice of x that produces a contradiction) is not enough to show that the
language is regular.

The Pumping Lemma Cannot Show a Language Is Regular | EXAMPLE5.11_


Let

L = {a'bic/ |i> 1 andj > 0} U {b/c* | j,k = 0}


Let us show first that the conclusions of Theorem 5.2a hold. Take n to be 1, and suppose that
x € L and |x| > n. There are two cases to consider. If x = a'b/c/, where i > 0, then define
Las Uta w=a''bici

Any string of the form uv” w is still of the form a'b/c/ and is therefore an element of L (whether
or not/ is 0). If x = bc/, then again let u = A and let v be the first symbol in x. It is still true
that uv”w € L for every m > 0.
However, L is not regular, as you can show using Corollary 5.1. The details are almost
identical to those in Example 5.7 and are left to Exercise 5.22.
e
oe eae ee ee ee S
186 PART 2 Regular Languages and Finite Automata

5.4| DECISION PROBLEMS


A finite automaton is a rudimentary computer. It receives input, and in response to
that input produces the output “yes” or “no,” in the sense that it does or does not
end up in an accepting state. The computational problems that a finite automaton
can solve are therefore limited to decision problems: problems that can be answered
yes or no, like “Given a string x of a’s and b’s, does x contain an occurrence of the
substring baa?” or “Given a regular expression r and a string x, does x belong to the
language corresponding to r?” A decision problem of this type consists of a set of
specific instances, or specific cases in which we want the answer. An instance of the
first problem is a string x of a’s and b’s, and the set of possible instances is the entire
set {a, b}*. An instance of the second is a pair (r, x), where r is a regular expression
and x is a string. In general, if the problem takes the form “Given x, is it true that
...?”, then an instance is a particular value of x.
There are other possible formulations of a finite automaton, in which the machine
operates essentially the same way but can produce more general outputs, perhaps in
the form of strings over the input alphabet. What makes the finite automaton only a
primitive model of computation is not that it is limited to solving decision problems,
but that it can handle only simple decision problems. An FA cannot remember more
than a fixed amount of information, and it is incapable of solving a decision problem
if some instances of the problem would require the machine to remember more than
this amount.
The generic decision problem that can be solved by a particular finite automaton
is the membership problem for the corresponding regular language L: Given a string
x, is x an element of L? An instance of this problem is a string x. We might step
up one level and formulate the membership problem for regular languages: Given
a finite automaton M and a string x, is x accepted by M? (Or, equivalently, given a
regular language specified by the finite automaton M, and a string x, is x an element
of the language?) Now an instance of the problem is a pair (M, x), where M is an FA
and x is a string. The problem has an easy solution—informally, it is simply to give
the string x to the FA M as input and see what happens! If M ends up in an accepting
state as a result of processing x, the answer is yes; otherwise the answer is no. The
reason this approach is acceptable as an algorithm is that M behaves deterministically
(that is, its specifications determine exactly what steps it will follow in processing x)
and is guaranteed to produce an answer after |x| steps.
In addition to the membership problem, we can formulate a number of other
decision problems having to do with finite automata and regular languages, and some
of them we already have decision algorithms to answer. Here is a list that is not by any
means exhaustive. (The first problem on the list is one of the two mentioned above.)

1. Given a regular expression r and a string x, does x belong to the language


corresponding to r?
2. Given a finite automaton M, is there a string that it accepts? (Alternatively,
given an FA M, is L(M) = @)?
3. Given an FA M, is L(M) finite?
CHAPTER 5 Regular and Nonregular Languages 187

4. Given two finite automata M; and Mb, are there any strings that are accepted by
both?
5. Given two FAs M;, and M), do they accept the same language? In other words,
is L(M,) = L(M))?
6. Given two FAs M;, and Mp, is L(M;) a subset of L(M>)?
7. Given two regular expressions r; and r2, do they correspond to the same
language?
8. Given an FA M, is it a minimum-state FA accepting the language L(M)?
Problem | is a version of the membership problem for regular languages, except
that we start with a regular expression rather than a finite automaton. Because we
have an algorithm from Chapter 4 to take an arbitrary regular expression and produce
an FA accepting the corresponding language, we can reduce problem | to the version
of the membership problem previously mentioned.
Section 5.2 gives a decision algorithm for problem 8: Apply the minimization
algorithm to M, and see if the number of states is reduced. Of the remaining problems,
some are closely related to others. In fact, if we had an algorithm to solve problem
2, we could construct algorithms to solve problems 4 through 7. For problem 4, we
could first use the algorithm presented in Section 3.5 to construct a finite automaton
M recognizing L(M,) M L(M>), and then apply to M the algorithm for problem
2. Problem 6 could be solved the same way, with L(M,) NM L(M2) replaced by
L(M,) — L(M)), because L(M,) C L(M2) if and only if L(M;) — L(M2) = 9.
Problem 5 can be reduced to problem 6, since two sets are equal precisely when each
is a subset of the other. Finally, a solution to problem 6 would give us one to problem
7, because of our algorithm for finding a finite automaton corresponding to a given
regular expression.
Problems 2 and 3 remain. With regard to problem 2, one might ask how a finite
automaton could fail to accept any strings. A trivial way is for it to have no accepting
states. Even if M does have accepting states, however, it fails to accept anything
if none of its accepting states is reachable from the initial state. We can determine
whether this is true by calculating 7; the set of states that can be reached from qo by
using strings of length k or less, as follows:
{qo} ifk=0
k
el EPC an mee ee =D ew) ee
(J; contains, in addition to the elements of 7-1, the states that can be reached in one
step from the elements of T;_1.)

Decision Algorithm for Problem 2 (Given an FA M, is L(M) = 6?) Compute


the set 7; for each k > 0, until either 7, contains an accepting state or until k > 0
and T; = T;,-. In the first case L(M) # @, and in the second case L(M) = 9. @f

If n is the number of states of M, then one of the two outcomes of the algorithm
must occur by the time 7; has been computed. This implies that the following
algorithm would also work.
188 PART 2 Regular Languages and Finite Automata

Begin testing all input strings, in nondecreasing order of length, for acceptance by M. If
no strings of length 7 or less are accepted, then L(M) = @.

Note, however, that this approach is likely to be much less efficient. For example,
if we test the string 0101100 and later the string 01011000, all but the last step of the
second test is duplicated effort.
The idea of testing individual strings in order to decide whether an FA accepts
something is naturally tempting, but useless as an algorithm without some way to
stop if the individual tests continue to fail. Only the fact that we can stop after testing
strings of length n makes the approach feasible. Theorem 5.2, the original form of
the pumping lemma, is another way to see that this is possible. The pumping lemma
implies that if x is any string in L of length at least n, then there is a shorter string in
L (the one obtained by deleting the middle portion v). Therefore, it is impossible for
the shortest string in the language to have length n or greater.
Perhaps surprisingly, the pumping lemma allows us to use a similar approach
with problem 3. If the FA M has n states, and x is any string in L of length at least
n, then there is a string y in L that is shorter than x but not too much shorter: There
exist u, v, and w with O < |v| < nso thatx = uvw € L and y = uw € L, so
that the difference in length between x and y is at most n. Now consider strings in
L whose length is at least n. If there are any at all, then the pumping lemma implies
that L must be infinite (because there are infinitely many strings of the form uv' w);
in particular, if there is a string x € L withn < |x| < 2n, then L is infinite. On the
other hand, if there are strings in L of length at least n, it is impossible for the shortest
such string x to have length 2n or greater—because as we have seen, there would
then have to be a shorter string y € L close enough in length to x so that |y| > n.
Therefore, if L is infinite, there must be a string x € L withn < |x| < 2n. We have
therefore established that the following algorithm is a solution for problem 3.

Decision Algorithm for Problem 3 (Given an FA M, is L(M) finite?) Test input


strings beginning with those of length n (where n is the number of states of M),
in nondecreasing order of length. If there is a string x with n < |x| < 2n that is
accepted, then L(M) is infinite; otherwise, L(M) is finite. @

There are at least two reasons for discussing, and trying to solve, decision prob-
lems like the ones in our list. One is the obvious fact that solutions may be useful.
For a not entirely frivolous example, picture your hard-working instructor grading an
exam question that asks for an FA recognizing a specific language. He or she knows
a solution, but one student’s paper shows a different FA. The instructor must then try
to determine whether the two are equivalent, and this means answering an instance
of problem 5. If the answer to a specific instance of the problem is the primary con-
cern, then whether there is an efficient, or feasible, solution is at least as important as
whether there is a solution in principle. The solution sketched above for problem 5
involves solving problem 2, and the second version of the decision algorithm given
for problem 2 would not help much in the case of machines with a hundred states,
even if a computer program and a fast computer were available.
CHAPTER 5 Regular and Nonregular Languages 189

Aside from the question of finding efficient algorithms, however, there is another
reason for considering these decision problems, a reason that will assume greater
significance later in the book. It is simply that not all decision problems can be solved.
An example of an easy-to-state problem that cannot be solved by any decision
algorithm was formulated in the 1930s by the mathematician Alan Turing. He de-
scribed a type of abstract machine, now called a Turing machine, more general than
a finite automaton. These machines can recognize certain languages in the same way
that FAs can recognize regular languages, and Turing’s original unsolvable problem
is simply the membership problem for this more general class of languages: Given a
Turing machine M and a string x, does M accept x? (see Section 11.2). Turing ma-
chines are involved in this discussion in another way as well, because such a machine
turns out to be a general model of computation. This is what allows us to formulate
the idea of an algorithm precisely and to say exactly what “unsolvable” means.
Showing the existence of unsolvable problems—particularly ones that arise nat-
urally and are easy to state—was a significant development in the theory of computa-
tion. The conclusion is that there are definite theoretical limits on what it is possible
to compute. These limits have nothing to do with how smart we are, or how good at
designing software; and they are not simply practical limits having to do with effi-
ciency and the amount of time available, or physical considerations like the number
of atoms available for constructing memory devices. Rather, they are fundamental
limits inherent in the rules of logic and the nature of computation. We will be in-
vestigating these matters later in the book; for the moment, it is reassuring to find
that many of the natural problems involving regular languages do have algorithmic
solutions.

5.5 |REGULAR LANGUAGES


AND COMPUTERS
Now that we have introduced the first class of languages we will be studying, we can
ask what the relationship is between these simple languages and the familiar ones
people use in computer science, programming languages such as C, Java, and Pascal.
The answer is almost obvious: Programming languages are not regular. In the C
language, for example, the string main () {™}" is a valid program if and only if
m =n, and this allows us to prove easily that the set of valid programs is not regular,
using either Corollary 5.1 or the pumping lemma.
Although regular languages are not rich enough in structure to be programming
languages themselves, however, we have seen in Examples 3.5 and 3.6 some of
the ways they occur within programming languages. As a general rule, the tokens
of a programming language, which include identifiers, literals, operators, reserved
words, and punctuation, can be described by a regular expression. The first phase in
compiling a program written in a high-level programming language is lexical anal-
ysis: identifying and classifying the tokens into which the individual characters are
grouped. There are programs called lexical-analyzer generators. The input provided
190 PART 2 Regular Languages and Finite Automata

to such a program is a set of regular expressions specifying the structure of tokens,


and the output produced by the program is a software version of an FA that can be
incorporated as a token-recognizing module in a compiler. One of the most widely
used of these is a program called 1ex, which is a tool provided in the Unix operating
system. Although 1ex can be used in many situations that require the processing of
structured input, it is used most often in conjunction with yacc, another Unix tool.
The lexical analyzer produced by 1ex creates a string of tokens; and the parser pro-
duced by yacc, on the basis of grammar rules provided as input, is able to determine
the syntactical structure of the token string. (yacc stands for yet another compiler
compiler.) Regular expressions come up in Unix in other ways as well. The Unix
text editor allows the user to specify a regular expression and searches for patterns
in the text that match it. Other commands such as grep (global regular expression
print) and egrep (extended global regular expression print) cause a specified file to
be searched for lines containing strings that match a specified regular expression.
If regular languages cannot be programming languages, it would seem that finite
automata are even less equipped to be computers. There are a number of obvious
differences, some more significant than others, having to do with memory, output ca-
pabilities, programmability, and so on. We have seen several examples of languages,
such as {0"1” |n > 0}, that no FA can recognize but for which a recognition program
could be written by any programmer and run on just about any computer.
Well, yes and no; as obvious as this conclusion seems, it has to be qualified at
least a little. Any physical computer is a finite device; it has, for some integer n, n
total bits of internal memory and disk space. (It may be connected to a larger network,
but in that case we may think of the “computer” as the entire network and simply use
a larger value of n.) We can describe the complete state of the machine by specifying
the status of each bit of memory, each pixel on the screen, and so forth. The number
of states is huge but still finite, and in this sense our computer is in fact an FA, where
the inputs can be thought of as keystrokes, or perhaps bits in some external file being
read. (In particular, the computer cannot actually recognize the language {0/1/},
because there is an integer j so large that if the computer has read exactly j 0’s, it
will not be able to remember this.)
As a way of understanding a computer, however, this observation is not helpful;
there is hardly any practical difference between this finite number of possible states
and infinity. Finite automata are simple machines, but having to think about a com-
puter as an FA would complicate working with computers considerably. The situation
is similar with regard to languages. One might argue that programming languages
are effectively regular because the set of programs that one can physically write, or
enter as input to a computer, is finite. However, though finite languages are simpler
in some ways than infinite languages (in particular, they are always regular), restrict-
ing ourselves to finite languages would by no means simplify the discussion. Finite
languages can be called simple because there is no need to consider any underlying
structure—they are just sets of strings. However, with no underlying principle
to
impose some logical organization (some complexity!), a large set becomes unwieldy
and complicated to deal with.
CHAPTER 5 Regular and Nonregular Languages 191

The advantage of a theoretical approach to computation is that we do not need to


get bogged down in issues like memory size. Obviously there are many languages,
including some regular ones, that no physical computer will ever be able to recognize;
however, it still makes sense to distinguish between the logical problems that arise in
recognizing some of these and the problems that arise in recognizing others. Finite
automata and computers are different in principle, and what we are studying is what
each is capable of in principle, not what a specific computer can do in practice.
(Most people would probably agree that in principle, a computer can recognize the
language {0”1" | n > O}.) As we progress further in the book, we will introduce
abstract models that resemble a computer more closely. There will never be a perfect
physical realization of any of them. Studying the conceptual model, however, is
still the best way to understand both the potential and the limitations of the physical
machines that approximate the model.

EXERCISES
5.1. For which languages L C {0, 1}* is there only one equivalence class with
respect to the relation /;,?
5.2. Let x be an arbitrary string in {0, 1}*, and let L = {x}. How many
equivalence classes are there for the relation J; ? Describe them.
5.3. Finda language L C {0, 1}* for which every equivalence class of J, has
exactly one element.
5.4. Show that for any language L C &%, the set
S = {x € &* | x is not a prefix of any element of L}
is one equivalence class of J, provided it is not empty.
5.5. Let L C &* be any language. Show that if [A] (the equivalence class of J,
containing A) is not {A}, then it is infinite.
5.6. Show that if L C D* is a language, x € D*, and [x] (the equivalence class of
I, containing x) is finite, then x is a prefix of an element of L.
5.7. For acertain language L C {a, b}*, I, has exactly four equivalence classes.
They are [A], [a], [ab], and [b]. It is also true that the three strings a, aa,
and abb are all equivalent, and that the two strings b and aba are equivalent.
Finally, ab € L, but A and a are not in L, and b is not even a prefix of any
element of L. Draw an FA accepting L.
5.8. Suppose there is a 3-state FA accepting L C {a, b}*. Suppose A ¢L,b¢ L,
and ba € L. Suppose also that aI,b, AI, bab, al,aaa, and bI, bb. Draw an
FA accepting L.
5.9. Suppose there is a 3-state FA accepting L C {a, b}". Suppose A€GL,DEL,
ba € L, and baba € L, and that AI,a and aI, bb. Draw an FA accepting L.
5.10. Find all possible languages L C {a, b}* for which /, has these three
equivalence classes: the set of all strings ending in b, the set of all strings
ending in ba, and the set of all strings ending in neither b nor ba.
192 PART 2 Regular Languages and Finite Automata

5.11. Find all possible languages L C {a, b}* for which J; has three equivalence
classes, corresponding to the regular expressions ((a + b)a*b)*,
((a + b)a*b)*aa*, and ((a + b)a*b)*ba*, respectively.
5.12. In Example 5.2, if the language is changed to {0"1” |n > 0} (i.e., A is added
to the original language), are there any changes in the partition of {0, 1}*
corresponding to J? Explain.
5.13. Consider the language L = {x € {0, 1}* |no(x) = n1(x)} (where no(x) and
n(x) are the number of 0’s and the number of 1’s, respectively, in x).
a. Show that if no(x) — ni (x) = no(y) — n1(y), then x Iz y.
b. Show that if no(x) — ni (x) 4 no(y) — n\(y), then x and y are
distinguishable with respect to x.
c. Describe all the equivalence classes of I.
5.14. Let M = (Q, &, qo, A, 5) be an FA, and suppose that Q; is a subset of Q
such that 5(qg, a) € Q, for every q € Q; andeverya € &.
a. Show that if Q; 1 A =4@, then for any p andq in Q), p =q.
b. Show that if Q; C A, then for any p andq in Q1, p = q.
SalS. For a language L over &, and two strings x and y in D* that are
distinguishable with respect to L, let

d,x,y = min{|z| |z distinguishes x and y with respect to L}

a. For the language L = {x € {0, 1}* |x ends in 010}, find the maximum
of the numbers dx) over all possible pairs of distinguishable strings x
and y.
b. If L is the language of balanced strings of parentheses, if |x| = m and
|y| =n, find an upper bound involving m and n on the numbers d L,X,y:
5.16. For each of the FAs pictured in Figure 5.6, use the minimization algorithm
described in Algorithm 5.1 and illustrated in Example 5.6 to find a
minimum-state FA recognizing the same language. (It’s possible that the
given FA may already be minimal.)
By if Find a minimum-state FA recognizing the language corresponding to each of
these regular expressions.
a. (0*10-- 1*0)@1)*
b. (010)*1 + (1*0)*
5.18. Suppose that in applying Algorithm 5.1, we establish some fixed order in
which to process the pairs, and we follow the same order on each pass.
a. What is the maximum number of passes that might be required? Describe
an FA, and an ordering of the pairs, that would require this number.
b. Is there always a fixed order (depending on M) that would guarantee no
pairs are marked after the second pass, so that the algorithm terminates
after three passes?
S219: For each of the NFA-As pictured in Figure 5.7, find a minimum-state FA
accepting the same language.
Figure 5.6 |

193
194 PART 2 Regular Languages and Finite Automata

(8) (h)
Figure 5.6 |
Continued

Figure 5.7 |

5.20. In each of the following cases, prove that L is nonregular by showing that
any two elements of the infinite set {0” |n > 0} are distinguishable with
respect to L.
a = 10710"
n= 0}
b. b= {ONO
lk > 7475}
e L={0'l)
| j =iorj =2i}
CHAPTER 5 Regular and Nonregular Languages 195

d. L = {0'l/ |j isa multiple of i}


Cxyide= fxre-{0;, bY" no(e).< 2iG)}
f. L = {x e€ {0, 1}* | no prefix of x has more 1’s than 0’s}
Seale Use Theorem 5.3 to show that {0"1” |n > 0} is not regular.
5.22. Use Corollary 5.1 to show that the language in Example 5.11 is not regular.
5.23. In each part of Exercise 5.20, use the pumping lemma for regular languages
to show that the language is not regular.
5.24. Use the pumping lemma to show that each of these languages is not regular:
a. L={ww
| we {0, 1}*}
b. L={xy |x,y € {0, 1}* and'yis eitherx or x"}
c. The language of algebraic expressions in Example 5.3.
3.25. Suppose L is a language over {0, 1}, and there is a fixed integer k so that for
every x € &*, xz € L for some string z with |z| < k. Does it follow that L is
regular? Why or why not?
5.26. For each statement below, decide whether it is true or false. If it is true,
prove it. If not, give a counterexample. All parts refer to languages over the
alphabet {0, 1}.
If L; C Ly and L, is not regular, then L> is not regular.
If L; C Ly» and Lz is not regular, then L, is not regular.
If L; and L> are nonregular, then L; U Lz is nonregular.
If L; and L> are nonregular, then L; M Lz is nonregular.
If L is nonregular, then L’ is nonregular.
If L, is regular and L> is nonregular, then L; U L2 is nonregular.
8 If L; is regular, L» is nonregular, and L; M Lz is regular, then L; U Lz is
Ss
Oe
OEE
Oe
GG
nonregular.
h. If L; is regular, Lz is nonregular, and L; M Lz is nonregular, then
L, UL} is nonregular.
foktel elo La, 2. areal regular, then UP , Ly is regular.
j. If Ly, Lo, Lz, ... are all nonregular and L; € Li+1 for each i, then
Use ,L, is nonregular.
5.27. A number of languages over {0, 1} are given in (a)-(h). In each case, decide
whether the language is regular or not, and prove that your answer is correct.
a. The set of all strings x beginning with a nonnull string of the form ww.
b. The set of all strings x containing some nonnull substring of the form
ww.
c. The set of odd-length strings over {0, 1} with middle symbol 0.
The set of even-length strings over {0, 1} with the two middle symbols
equal.
The set of strings over {0, 1} of the form xyx for some x with |x| > 1.
f. The set of nonpalindromes.
196 PART 2 Regular Languages and Finite Automata

g. The set of strings beginning with a palindrome of length at least 3.


h. The set of strings in which the number of 0’s is a perfect square.
5.28. Describe decision algorithms to answer each of these questions.
a. Given two FAs M, and M2, are there any strings that are accepted by
neither?
Given a regular expression r and an FA M, are the corresponding
languages the same?
Given an FA M = (Q, &, qo, A, 5) and a state g € Q, is there an x with
|x| > Oso that 6*(q,x) =q?
Given an NFA-A M and a string x, does M accept x?
Given two NFA-As, do they accept the same language?
Given an NFA-A M and a string x, is there more than one sequence of
transitions corresponding to x that causes M to accept x?
Given an FA M accepting a language L, and given two strings x and y,
are x and y distinguishable with respect to L?
Given an FA M accepting a language L, and a string x, is x a prefix of an
element of L?
Given an FA M accepting a language L, and a string x, is x a suffix of an
element of L?
ip Given an FA M accepting a language L, and a string x, is x a substring
of an element of L?
5.29. Find an example of a language L C {0, 1}* so that L* is not regular.
5.30. Find an example of a nonregular language L C {0, 1}* so that L* is regular.

MORE CHALLENGING PROBLEMS


5.31. Let L C &* be a language, and let L, be the set of prefixes of elements of L.
What is the relationship, if any, between the two partitions of ©*
corresponding to the equivalence relations J, and I;,,, respectively? Explain.
Sea List all the subsets A of {0, 1}* having the property that for some
language L C {0, 1}* for which J, has exactly two equivalence classes,
eves) aNd
For each set A that is one of your answers to (a), how many distinct
languages L are there so that I, has two equivalence classes and [A] is
A?
5.33. Let L = {ww | w € {0, 1}*}. Describe all the equivalence classes of J;,.
5.34. Let L be the language of “balanced” strings of parentheses—that is, all
strings that are the strings of parentheses in legal algebraic expressions. For
example, A, ()(), and ((()())) are in L, (() and ())(O are not. Describe all
the equivalence classes of I;
CHAPTER 5 Regular and Nonregular Languages 197

D635: a. Let L be the language of all fully parenthesized algebraic expressions


involving the operator + and the identifier i. (L can be defined
recursively by sayingi € L, (x + y) € L for every x and y in L, and
nothing else is in L.) Describe all the equivalence classes of [;,.
b. Answer the same question for the language L in Example 5.3, defined by
saying a € L,x + y € L for every x and y in L, and (x) € L for every
ed.
5.36. For an arbitrary string x € {0, 1}*, denote by x~ the string obtained by
replacing all 0’s by 1’s and vice versa. For example, A~ = A and
(O1f)~ = 100.
a. Define
disal xx. ili te {On13t}
Determine the equivalence classes of I.
b. Define
L, = {xy |x € {0, 1}* and y is eitherx or x~}
Determine the equivalence classes of J,,.
Let L = {x € {0, 1}* | 2) (x) is a multiple of no(x)}. Determine the
equivalence classes of I,.
Let L be a language over ©. We know that I, is a right invariant
equivalence relation (i.e., for any x and y in D* and anya € ¥, if x Iz, y,
then xa I; ya). By the Myhill-Nerode theorem (Corollary 5.1), we know
that if the set of equivalence classes of J; is finite, then L is regular, and in
’ this case L is the union of some (zero or more) of these finitely many
equivalence classes. Show that if R is any right invariant equivalence
relation such that the set of equivalence classes of R is finite and L is the
union of some of the equivalence classes of R, then L is regular.
5.39. If P is a partition of {0, 1}* (.e., a collection of pairwise disjoint subsets
whose union is {0, 1}*), then there is an equivalence relation R on {0, 1}*
whose equivalence classes are precisely the subsets in P. Let us say that P
is right invariant if the resulting equivalence relation is.
a. Show that for a subset S of {0, 1}*, S is one of the subsets of some right
invariant partition of {0, 1}* (not necessarily a finite partition) if and only
if the following condition is satisfied: for any x, y € S, and any
z € {0, 1}*, xz and yz are either both in S or both not in S.
b. To what simpler condition does this one reduce in the case where S is a
finite set?
c. Show that if a finite set S satisfies this condition, then there is a finite
right invariant partition having S as one of its subsets.
d. Foran arbitrary set S satisfying the condition in part (a), there may be no
finite right invariant partition having S as one of its subsets.
Characterize those sets S for which there is.
198 PART 2 Regular Languages and Finite Automata

5.40. For two languages L; and L> over &, we define the quotient of L, and L> to
be the language
L,/L2 = {x | for some y € Lo,xy € Ly}
Show that if L; is regular and L> is any language, then L/L> is regular.
5.41. Suppose L is a language over J, and x1, x2,..., X, are strings that are
pairwise distinguishable with respect to L; that is, for any i # j, x; and x;
are distinguishable. How many distinct strings are necessary in order to
distinguish between the x;’s? In other words, what is the smallest number k
so that for some set {z;, Z2,..., Zx}, any two distinct x;’s are distinguished,
relative to L, by some z;? Prove your answer. (Here is a way of thinking
about the question that may make it easier. Think of the x;’s as points on a
piece of paper, and think of the z,’s as cans of paint, each z; representing a
different primary color. Saying that z; distinguishes x; and x; means that one
of those two points is colored with that primary color and the other isn’t. We
allow a single point to have more than one primary color applied to it, and
we assume that two distinct combinations of primary colors produce
different resulting colors. Then the question is, how many different primary
colors are needed in order to color the points so that no two points end up the
same color?)
5.42. Suppose M = (Q, &, qo, A, 5) is an FA accepting L. We know (Lemma 5-2)
that if p,q € Q and p # q, then there is a string z so that exactly one of the
two states 6*(p, z) and 6*(q, z) is in A. Find an integer n (depending only
on M) so that for any p and q with p ¥ q, there is such a z with |z| <n.
5.43. Show that L is regular if and only if there is an integer n so that any two
strings distinguishable with respect to L can be distinguished by a string of
length < n. (Use the two previous exercises.)
5.44, Suppose that M; = (Q), X, qi, Ai, 6;) and Mp = (Qo, ¥, q2, Az, 52) are
both FAs accepting the language L, and both have as few states as possible.
Show that M; and M) are isomorphic (see Exercise 3.55). Note that in both
cases, the sets L, forming the partition of * are precisely the equivalence
classes of I;. This tells you how to come up with a bijection from Q; to Qo.
What you must do next is to show that the other conditions of an
isomorphism are satisfied.
5.45. Use the preceding exercise to describe another decision algorithm to answer
the question “Given two FAs, do they accept the same language?”
5.46. Suppose L and L; are both languages over , and M is an FA with alphabet
Xu. Let us say that M accepts L relative to L, if M accepts every string in
the set L 1 L, and rejects every string in the set L; — L. Note that this is not
in general the same as saying that M accepts the language LN Lj.
Now suppose L;, Lo, ... are regular languages over ©, L; C Lj4, for
each 7, and UL; = X*. For each i, let n; be the minimum number of
states required to accept L relative to L;. If there is no FA accepting L
relative to L;, we say n; is 00.
CHAPTER 5 Regular and Nonregular Languages 199

a. Show that for each i, nj; < nj41.


b. Show that if the sequence n; is bounded (i.e., for some constant C,
n; < C for every i), then L is regular. (It follows in particular that if
there is some fixed FA that accepts L relative to L; for every i, then L is
regular.)
5.47. Prove the following generalization of the pumping lemma, which can
sometimes make it unnecessary to break the proof into cases. If L is a regular
language, then there is an integer n so that for any x € L, and any way of
writing x as x = x1x2x3 with |x| = n, there are strings uv, v, and w so that

x2 = uvw

|v] > 0

for any m > 0, xjuv™ wx € L


5.48. Can you find a language L C {0, 1}* so that in order to prove L nonregular,
the pumping lemma is not sufficient but the statement in the preceding
problem is?
5.49. Describe decision algorithms to answer each of these questions.
a. Given a regular expression r, is there a simpler regular expression (i.e.,
one involving fewer operations) that is equivalent to r?
b. Given two FAs M; and M), is L(M,) a subset of L(M>)?
Given two FAs M, and Mp, is every element of L(M,) a prefix of an
element of L(M>)?
d. Given two FAs M,; = (Q), ¥, qi, Ai, 5) and Mp, and two states
P,q € Qj, is there a string x € L(Mp) so that 5*(p, x) =q?
5.50. Below are a number of languages over {0, 1}. In each case, decide whether
the language is regular or not, and prove that your answer is correct.
a. The set of all strings x having some nonnull substring of the form www.
(You may assume the following fact: There are arbitrarily long strings in
{O, 1}* that do not contain any nonnull substring of the form www. In
fact, such strings can be obtained using the construction in
Exercise 2.43.)
b. The set of strings having the property that in every prefix, the number of
0’s and the number of 1’s differ by no more than 2.
c. The set of strings having the property that in some prefix, the number of
0’s is 3 more than the number of 1’s.
d. The set of strings in which the number of 0’s and the number of 1’s are
both divisible by 5.
e. The set of strings x for which there is an integer k > 1 (possibly
depending on x) so that the number of 0’s in x and the number of 1’s in x
are both divisible by k.
200 PART 2 Regular Languages and Finite Automata

f. (Assuming L is a regular language), Max(L) = {x € L | there is no


nonnull string y so that xy € L}.
g. (Assuming L is a regular language), Min(L) = {x € L | no prefix of x
other than x itself is in L}.
5.51. Aset S of nonnegative integers is an arithmetic progression if for some
integers n and p,
S={n+ip|i>0}
Let A be a subset of {0}*, and let § = {|x| |x € A}.
a. Show that if S is an arithmetic progression, then A is regular.
b. Show that if A is regular, then S is the union of a finite number of
arithmetic progressions.
5.52. This exercise involves languages of the form
L = {x € {a, b}* |.ng(x) = f (a(x))}
for some function f from the set of natural numbers to itself. Example 5.7
shows that if f is the function defined by f(n) =n, then L is nonregular. If
f is any constant function (e.g., f(n) = 4), L is regular. One might ask
whether L can still be regular when f is not restricted quite so severely.
a. Show that if L is regular, the function f must be bounded—that is, there
must be some integer B so that f(n) < B for every n. (Suggestion:
suppose not, and apply the pumping lemma to strings of the form
af™ bp.)

b. Show that if f(n) = n mod 2, then L is regular.


n mod 2 is an eventually periodic function; that is, there are integers no
and p, with p > 0, so that for any n > no, f(n) = f(n + p). Show that
if f is any eventually periodic function, L is regular.
d. Show that if L is regular, then f must be eventually periodic.
(Suggestion: as in part (a), find a class of strings to which you can apply
the pumping lemma.)
5.53. Find an example of a nonregular language L C {0, 1}* so that L? is regular.
5.54. Show that if L is any language over a one-symbol alphabet, then L* is
regular.
Context-Free Languages and
Pushdown Automata

context-free grammar is a simple recursive method of specifying grammar rules


by which strings in a language can be generated. All the regular languages can
be generated this way, and there are also simple examples of context-free grammars
generating nonregular languages. Grammar rules of this type permit syntax of more
variety and sophistication than is possible with regular languages. To a large extent,
they are capable of specifying the syntax of high-level programming languages and
other formal languages.
A model of computation that corresponds to context-free languages, in the same
way that finite automata correspond to regular languages, can be obtained by starting
with the finite-state model and adding an auxiliary memory. Although the memory
will be potentially infinite, it is sufficient to impose upon it a very simple organization,
that of a stack. It is necessary, however, to retain the element of nondeterminism in
these pushdown automata; otherwise, not every context-free language can be accepted
this way. For any context-free grammar, there is a simple way to get anondeterministic
pushdown automaton accepting the language so that a sequence of moves by which
a string is accepted simulates a derivation of the string in the grammar. For certain
classes of grammars, this feature can be retained even when the nondeterminism is
removed, so that the result is a parser for the grammar; we study this problem briefly.
The class of context-free languages is still not general enough to include all
interesting or useful formal languages. Techniques similar to those in Chapter 5 can
be used to exhibit simple non-context-free languages, and these techniques can also
be used to find algorithms for certain decision problems associated with context-free
languages.

201
va
oat J
yee

i-txeit09 ”
feud — =
. + _fet

i Chal: Se

if ; = = shbbreaton
2 : ¢ iy, (2m
’ : <2 anf) Ps it — rau on te Sa ons Gr; x » metre 4G a wiih wv ee
oy a a nga } soqaal 6ri aes
“¢en| ite aL ge é H
rea oo fi f nite" © i sare13 fr.
:
Sele en anTtA TS deri. ta
| wires 4 apart Sy 3, re Dare esr Beat:
Pe | atten Mies Ta say 7) Belle ts ies Mier). gaia Lahde eition
mY o* Sirsigz 4et ESSA MNES SL en aneey or bigyad Sone Apaii Viz p ybiay
t < : rit agua vi ieTreat ipo ve bs > ity Ma Mh) P78 Ai uy sey a | wicked ent,Ms
a 4 ‘ a / - «pam rarest Letra di
a eal : Mae am ni“seliyearstl tehaha; = wre a ds qeartalhunpantty sistem
wae rt or ars, ie Nd bay teed Toohey gra) ahlpn? $4 | bs yao * Plows sitet rests a
. eo Ohl ute Mie een vmitieto iu qi) Sue Nuavieiytee, Stasesaiait soli ht
i Thorns 5 its wasn AJL cme ld gy aM et SiehAe ane 2oi 5 tain eibinsriot ad it
ae

‘ Bs = piracsrl Nghia Na Geycutthe duly Ok raLat av iocwtedh ae PETIA, Ae eee 4


’ a e rs NPS ca iy ae eS RGS
Ries a oy) ere sei vie j Kb pent ae a

nae? ee baacateace. Boy Bow Gtifee a} py seni wig ST yapstgoe 2 ny IR


. aigw. We Sor iT Sy eee oy) ET! ve INALNME i) Gait HA, iy Aa nomiiolun

— = . 4° ie aed igen: 1 WT gaits moe te viva 2 CoS oe


a ae oe rizin® nmiba ay aah maha ised seis ba 14shuo weyted t with nant wa
oe =i Vihotglty Wire « Wh eisdiesee tegen shh MEHTA won 21 Hee restea al Nog
- < Ke sian mn Jone wits, Qeeaivam Fas live ? craps yrite wait), ote a) -
| ' mee ie Re Gear? ot tah inthis aeeenapal’ 4 siutaryemat lanwKsd ita ane
ole iin,diesel mn Syl os bp! Shoei aban tidy ‘
Soe hill Lien ebaheh trvity het “oly (ihe wil amet
2

a
C HAPTER

Context-Free Grammars

6.1 | EXAMPLES AND DEFINITIONS


Many of the languages we have considered, both regular and nonregular, can be
described by recursive definitions. In our first example, involving a very simple
regular language, a slight reformulation of the recursive definition leads us to the idea
of a context-free grammar. These and other more general grammars that we will later
study turn out to be powerful tools for describing and analyzing languages.

Using Grammar Rules to Describe a Language | EXAMPLE6.1 |


Let us consider the language L = {a, b}* of all strings over the alphabet {a, b}. In Example 2.15
we considered a recursive definition of L equivalent to the following:

1 ay &
2. ForanySéL,SaeL.
3. ForanySéeL,SbeL.
4. No other strings are in L.

We think of S here as a variable, representing an arbitrary element of L whose value is to be


obtained by some combination of rules 1 to 3. Rule 1, which we write S — A, indicates that
one way of giving S a value is to replace it by A. Rules 2 and 3 can be written S + Sa and
S — Sb. This means that S can also be replaced by Sa or Sb; in either case we must then
obtain the final value by continuing to use the rules to give a value to the new S.
The symbol — is used for each of the rules by which a variable is replaced by a string.
For two strings aw and B, we will use the notation a = £ to mean that 6 can be obtained
by applying one of these rules to a single variable in the string a. Using this notation in our
example, we can write

S => Sa => Sba => Sbba => Abba = bba

to describe the sequence of steps (the application of rules 2, 3, 3, and 1) used to obtain, or

203
204 PART 3 Context-Free Languages and Pushdown Automata

derive, the string bba. The derivation comes to an end at the point where we replace S by an
actual string of alphabet symbols (in this case A); each step before that is roughly analogous
to a recursive call, since we are replacing S by a string that still contains a variable.
We can simplify the notation even further by introducing the symbol | to mean “or” and
writing the first three rules as

S— A|Sa|Sb
(In our new notation, we dispense with writing rule 4, even though it is implicitly still in effect.)
We note for future reference that in an expression such as Sa | Sb, the two alternatives are Sa
and Sb, not a and S—in other words, the concatenation operation takes precedence over the |
operation.
In Example 2.15 we also considered this alternative definition of L:

A€eEL.
aeL.
beL.
For every x and yin L, xy € L.
8S
fo
9SS No other strings are in L.

Using our new notation, we would summarize the “grammar rules” by writing

S—>A|a|b|SS

With this approach there is more than one way to obtain the string bba. Two derivations are
shown below:

SSS
DS DSS! => DPS => bba
= So Sa => SSa — bSa— bba

The five steps in the first line correspond to rules 4, 3, 4, 3, and 2, and those in the second line
to rules 4, 2, 4, 3, and 3.

In both cases in Example 6.1, the formulas obtained from the recursive definition
can be interpreted as the grammar rules in a context-free grammar. Before we give the
official definition of such a grammar, we consider two more examples. In Example
6.2, although the grammar is perhaps even simpler than that in Example 6.1, the
corresponding language is one we know to be nonregular. Example 6.3 is perhaps
more typical in that it includes a grammar containing more than one variable.

| EXAMPLEG.2 | The Language {a’b” |n > 0}


The grammar rules

S—>aSb|A

are another way of describing the language L defined as follows:


CHAPTER 6 Context-Free Grammars 205

1. AEL.
2. Forevery x € L,axb € L.
3. Nothing else is in L.

The language L is easily seen to be the nonregular language {a"b” | n > O}. The grammar
rules in the first part of Example 6.1, for the language {a, b}*, allow a’s and b’s to be added
independently of each other. Here, on the other hand, each time one symbol is added to one
end of the string by an application of the grammar rule § — aSb, the opposite symbol is
added simultaneously to the other end. As we have seen, this constraint is not one that can be
captured by any regular expression.

Palindromes | EXAMPLE6.3 |
Let us consider both the language pal of palindromes over the alphabet {a, b} and its comple-
ment JN, the set of all nonpalindromes over {a, b}. From Example 2.16, we have the following
recursive definition of pal:

1. A,a,b € pal.
2. For any S € pal, aSa and bSb are in pal.
3. No other strings are in pal.

We can therefore describe pal by the context-free grammar with the grammar rules

S— A|a|b|aSa|bSb
The language N also obeys rule 2: For any nonpalindrome x, both axa and bxb are nonpalin-
dromes. However, a recursive definition of N cannot be as simple as this definition of pal,
because there is no finite set of strings that can serve as basis elements in the definition. There
is no finite set No so that every nonpalindrome can be obtained from an element of No by
applications of rule 2 (Exercise 6.42). Consider a specific nonpalindrome, say

abbaaba

If we start at the ends and work our way in, trying to match the symbols at the beginning
with those at the end, the string looks like a palindrome for the first two steps. What makes
it a nonpalindrome is the central portion baa, which starts with one symbol and ends with the
opposite symbol; the string between those two can be anything. A string is a nonpalindrome if
and only if it has a central portion of this type. These are the “basic” nonpalindromes, and we
can therefore write the following definition of N:

1. For any A € {a, b}*, aAb and bAa are in N.


2. Forany S € N,aSa and bSb are in N.
3. No other strings are in NV.

In order to obtain a context-free grammar describing N, we can now simply introduce a second
variable A, representing an arbitrary element of {a, b}*, and incorporate the grammar rules for
this language from Example 6.1:
S — aAb |bAa |aSa |bSb
A—>A|Aa|Ab
206 PART 3 Context-Free Languages and Pushdown Automata

A derivation of the nonpalindrome abbaaba, for example, would look like

S = aSa => abSba = abbAaba => abbAaaba = abbAaaba = abbaaba

It is common, and often necessary, to include several variables in a context-free grammar


describing a language L. There will still be one special variable that represents an arbitrary
string in L, and it is customary to denote this one by S (the start variable). The other variables
can then be thought of as representing typical strings in certain auxiliary languages involved
in the definition of L. (We can still interpret the grammar as a recursive definition of L, if we
extend our notion of recursion slightly to include the idea of mutual recursion: rather than one
object defined in terms of itself, several objects defined in terms of each other.)

Let us now give the general definition illustrated by these examples.

Definition 6.1 oda Me) me MOT Ca tet Cle-liiliitle

Suppose G = (V, x, S, P) is a CFG As in the first three examples, we will


reserve the symbol — for individual productions in P. We use the symbol => for
steps in a derivation such as those in Examples 6.1 and 6.3. Sometimes it is useful to
indicate explicitly that the derivation is with respect to the CFG G, and in this case
we write >G.

a>c fp

means that the string 6 can be obtained from the string a by replacing some variable
that appears on the left side of a production in G by the corresponding right side, or
that

a =a,Aa

B= ay yap
and one of the productions in G is A > y. (We can now understand better the term
context-free. If at some point in a derivation we have obtained a string a containing
the variable A, then we may continue by substituting y for A, no matter what the
strings a, and a are—that is, independent of the context.)
In this case, we will say that a derives B, or B is derived from a, in one step.
More generally, we write

a =G B
CHAPTER 6 Context-Free Grammars 207

(and shorten it to a =* £ if it is clear what grammar is involved) if w derives B in


zero or more steps; in other words, either a = £, or there exist an integer k > 1 and
Strings @, @,..., @%, with a = a and a, = f, so that a; >G a+, for every i with
O<i<k-1l.

The Language of Algebraic Expressions | EXAMPLEG.4


6.4 |
An important language in computer science is the language of legal algebraic expressions. For
simplicity we restrict ourselves here to the simple expressions that can be formed from the
four binary operators +, —, *, and /, left and right parentheses, and the single identifier a.
Some of the features omitted, therefore, are unary operators, any binary operators other than
these four, numerical literals such as 3.0 or V2, expressions involving functional notation, and
more general identifiers. Many of these features could be handled simply enough; for example,
the language of arbitrary identifiers can be “embedded” within this one by using a variable A
instead of the terminal symbol a and introducing productions that allow any identifier to be
derived from A. (See Examples 3.5 and 3.6.)
Arecursive definition of our language is based on the observation that legal expressions can
be formed by joining two legal expressions using one of the four operators or by enclosing a legal
expression within parentheses, and that these two operations account for all legal expressions
except the single identifier a. The most straightforward way of getting a context-free grammar,
therefore, is probably to use the productions

S> S+S|S—S|Sx*S|S/S|(S)|a
The string a + (a x a)/a — a can be obtained from the derivation

S>3>S-—S > S+S-S Ssa+S—S => a+S8/S—S


+>a+(S)/S—S > a+(S*S)/S—S > a+(a*S)/S—-S
>a+(axa)/S—S > a+(axa)/a—S > a+(ax*a)/a—a

It is easy to see that there are many other derivations as well. For example,

S=> S/S => S+S/S >a+S/S > a+(S)/S


=>a+(S*S)/S > a+(ax*S)/S > a+(ax*a)/S
>at+(axa)/S—S > a+(axa)/a—S > a+(axa)/a—a

We would probably say that the first of these is more natural than the second. The first starts
with the production
S—>S-S
208 PART 3 Context-Free Languages and Pushdown Automata

and therefore indicates that we are interpreting the original expression as the difference of two
other expressions. This seems correct because the expression would normally be evaluated as
follows:

1. Evaluate a « a, and call its value A.


2. Evaluate A/a, and call its value B.
3. Evaluate a + B, and call its value C.
4, Evaluate C —a.

The expression “is” the difference of the subexpression with value C and the subexpression a.
The second derivation, by contrast, interprets the expression as a quotient. Although there is
nothing in the grammar to rule out this derivation, it does not reflect our view of the correct
structure of the expression.
One possible conclusion is that the context-free grammar we have given for the language
may not be the most appropriate. It does not incorporate in any way the standard conventions,
having to do with the precedence of operators and the left-to-right order of evaluation, that
we use in evaluating the expression. (Precedence of operators dictates that in the expression
a+b +c, the multiplication is performed before the addition; and the expression a — b+c
means (a — b) +c, not a — (b+ c).) Moreover, rather than having to choose between two
derivations of a string, it is often desirable to select, if possible, a CFG in which a string can have
only one derivation (except for trivial differences between the order in which two variables
in some intermediate string are chosen for replacement). We will return to this question in
Section 6.4, when we discuss ambiguity in a CFG.

| EXAMPLEG.5 | The Syntax of Programming Languages


The language in the previous example and languages like those in Examples 3.5 and 3.6 are
relatively simple ingredients of programming languages such as C and Pascal. To a large
extent, context-free grammars can be used to describe the overall syntax of such languages.
InC, one might try to formulate grammar rules to specify what constitutes a legal statement.
(As you might expect, a complete specification is very involved.) Two types of statements in
C are if statements and for statements; if we represent an arbitrary statement by the variable
(statement), the productions involving (statement) might look like

(statement) — --- | (if-statement) | (for-statement) |- --

The syntax of these two types of statements can be described by the rules

(if-statement) — if ( (expression) ) (statement)


(for-statement) —> for( (expression); (expression); (expression) ) (statement)

where (expression) is another variable, whose productions would also be difficult to describe
completely.
Although in both cases the last term on the right side specifies a single statement, the logic
of a program often requires more than one. It is therefore necessary to have our definition of
(statement) allow a compound statement, which is simply a sequence of zero or more statements
enclosed within {}. We could easily write a definition for (compound-statement) that
would
say this. A syntax diagram such as the one shown accomplishes the same thing.
CHAPTER 6 Context-Free Grammars 209

( statement )

A path through the diagram begins with {, ends with }, and can traverse the loop zero or more
times.

Grammar Rules for English EXAMPLE 6.6


The advantage of using high-level programming languages like C and Pascal is that they allow
us to write statements that look more like English. If we can use context-free grammars to
capture many of the rules of programming languages, what about English itself, which has its
own “grammar rules”?
English sentences that are sufficiently simple can be described by CFGs. A great many
sentences could be taken care of by the productions
(declarative sentence) — (subject phrase) (verb phrase) (object) |
(subject phrase) (verb phrase)
if we provided reasonable productions for each of the three variables on the right. Producing
a wide variety of reasonable, idiomatic English sentences with context-free grammars, even
context-free grammars of manageable size, is not hard; what is hard is doing this and at the
same time disallowing gibberish. Even harder is disallowing sentences that seem to obey
English syntax but that a native English speaker would probably never say, because they don’t
sound right. .
Here is a simple example that might illustrate the point. Consider the productions
(declarative sentence ) —>
(subject) (verb) (object )
(subject) ) > (proper noun)
(proper noun) ) — John |Jane
(verb) — reminded
(object) — (proper noun) | (reflexive pronoun)
(reflexive pronoun) )— himself |herself

More than one sentence derivable from this grammar does not quite work: “John reminded her-
self” and “Jane reminded himself,” for example. These could be eliminated in a straightforward
way (at the cost of complicating the grammar) by introducing productions like
(declarative sentence) —> (masculine noun) (verb) (masculine reflexive pronoun)

A slightly more subtle problem is “Jane reminded Jane.” N ormally we do not say this, unless
we have in mind two different people named Jane, but there is no obvious way to prohibit it
without also prohibiting “Jane reminded John.” (At least, there is no obvious way without
essentially using a different production for every sentence we want to end up with. This trivial
option is available here, since the language is finite.) To distinguish “Jane reminded John,”
which is a perfectly good English sentence, from “Jane reminded Jane” requires using context,
and this is exactly what a context-free grammar does not allow.
Meee eee ee eee ——
210 PART 3 Context-Free Languages and Pushdown Automata

6.2| MORE EXAMPLES


In general, to show that a CFG generates a language, we must show two things: first,
that every string in the language can be derived from the grammar, and second, that
no other string can. In some of the examples in this section, at least one of these two
statements is less obvious.

A CFG for {x |No(x) = 14(x)}


Consider the language

L = {x € {0, 1}" |mo) =m (x)}


where n; (x) is the number ofi’s in the string x.
As in Examples 6.1—6.3, we can begin by thinking about a recursive definition of L, and
once we find one we can easily turn it into a context-free grammar.
Clearly, A € L. Given a string x in L, we get a longer string in L by adding one 0 and
one 1. (Conversely, any nonnull string in L can be obtained this way.) One way to add the
symbols is to add one at each end, producing either 0x1 or 1x0. This suggests the productions

S—A|0S1|1S0
Not every string in L can be obtained from these productions, because some elements of L begin
and end with the same symbol; the strings 0110, 10001101, and 0010111100 are examples.
If we look for ways of expressing each of these in terms of simpler elements of L, we might
notice that each is the concatenation of two nonnull elements of L (for example, the third string
is the concatenation of 001011 and 1100). This observation suggests the production S > SS.
It is reasonably clear that if G is the CFG containing the productions we have so far,

S > A |OST | 1:50) |:S:S


then derivations in G produce only strings in L. We will prove the converse, that L © L(G).
It will be helpful to introduce the notation

d(x) = no(x) — n(x)


What we must show, therefore, is that for any string x with d(x) = 0, x € L(G). The proof is
by mathematical induction on |x|.
In the basis step of the proof, we must show that if |x| = 0 and d (x) = 0 (of course, the
second hypothesis is redundant), then x € L(G). This is true because one of the productions
inGisS > A.
Our induction hypothesis will be that k > 0 and that for any y with |y| < kandd(y) =
0,
y € L(G). We must show that if |x| = k + 1 and d(x) =0, then x € L(G).
If x begins with 0 and ends with 1, then x = Oy1 for some string y satisfying d(y) = 0.
By the induction hypothesis, y € L(G). Therefore, since S => y, we can derive x from
§
by starting the derivation with the production S > 051 and continuing to derive y from
the
second S. The case when x begins with 1 and ends with 0 is handled the same way, except
that
the production S — 150 is used to begin the derivation.
The remaining case is the one in which x begins and ends with the same symbol.
Since
d(x) = 0, x has length at least 2; suppose for example that x = Oy0 for some
string y. We
CHAPTER 6 Context-Free Grammars 211

would like to show that x has a derivation in G. Such a derivation would have to start with the
production S$ — SS; in order to show that there is such a derivation, we would like to show
that x = wz, where w and z are shorter strings that can both be derived from S. (It will then
follow that we can start the derivation with S + SS, then continue by deriving w from the first
S and z from the second.) Another way to express this condition is to say that x has a prefix w
so that 0 < |w| < |x| and d(w) = 0.
Let us consider d(w) for prefixes w of x. The shortest nonnull prefix is 0, and d(0) = 1;
the longest prefix shorter than x is Oy, and d(0y) = —1 (because the last symbol of x is 0,
and d(x) = 0). Furthermore, the d-value of a prefix changes by 1 each time an extra symbol
is added. It follows that there must be a prefix w, longer than 0 and shorter than Oy, with
d(w) = 0. This is what we wanted to prove. The case when x = ly1 is almost the same, and
so the proof is concluded.

Another CFG for {x |No(x) = 14(Xx)} EXAMPLE 6.8

Let us continue with the language L = {x € {0, 1}* | no(x) = n(x)} of the last example;
this time we construct a CFG with three variables, based on a different approach to a recursive
definition of L.
One way to obtain an element of L is to add both symbols to a string already in L. Another
way, however, is to add a single symbol to a string that has one extra occurrence of the opposite
symbol. Moreover, every element of L can be obtained this way and in fact can be obtained by
adding this extra symbol at the beginning. Let us introduce the variables A and B, to represent
strings with an extra 0 and an extra 1, respectively, and let us denote these two languages by
Lo and L: 4

Lo = {x € {0, 1}* |nox) = mi (x) + 1} = fx € (0, I |d@) =}


L; = {x € {0, 1}* | m1 (x) = no(x) + 1} = {x € (0, 1} |d@) = -1}
where d is the function in Example 6.7 defined by d(x) = no(x) — n(x). Then it is easy to
formulate the productions we need starting with S:

S—>OB|1A|A

It is also easy to find one production for each of the variables A and B. Ifa string in Lo begins
with 0, or if a string in L; begins with 1, then the remainder is an element of L. Thus, it is
appropriate to add the productions

A — 0S B- 1S

What remains are the strings in Lo that start with 1 and the strings in L, that start with 0. In the
first case, if x = ly and x € Lo, then y has two more 0’s than 1’s. If it were true that y could
be written as the concatenation of two strings, each with one extra 0, then we could complete
the A-productions by adding A > 1AA, and we could handle B similarly.
In fact, the same technique we used in Example 6.7 will work here. If d(x) = 1 and
x = ly, then A is a prefix of y with d(A) = 0, and y itself is a prefix of y with d(y) = D,
Therefore, there is some intermediate prefix w of y with d(w) = 1, and y = wz where
w,z €.Lo.
212 PART 3 Context-Free Languages and Pushdown Automata

This discussion should make it at least plausible that the context-free grammar with
productions

S—>OB|IA|A
A—0OS|1AA
B—>1S|0BB
generates the language L. By taking the start variable to be A or B, we could just as easily
think of it as a CFG generating Lo or Lj. It is possible without much difficulty to give an
induction proof; see Exercise 6.50.

The following theorem provides three simple ways of obtaining new CFLs from
languages that are known to be context-free.
CHAPTER 6 Context-Free Grammars 213

Note that it really is necessary in the first two parts of the proof to make sure that
V, NM V2 = @. Consider CFGs having productions

5S; > XA x->Cc A>a

and

S, > XB xX—-d Bob


214 PART 3 Context-Free Languages and Pushdown Automata

respectively. If we applied the construction in the first part of the proof without
relabeling variables, the resulting grammar would allow the derivation
S'S)
=> XA => 0A =ida
even though da is not derivable from either of the two original grammars.
Corollary 6.1 Every regular language is a CFL.
Proof
According to Definition 3.1, regular languages over D are the languages obtained
from @, {A}, and {a} (a € X) by using the operations of union, concatenation, and
Kleene *. Each of the primitive languages 9, {A}, and {a} is a context-free language.
(In the first case we can use the trivial grammar with no productions, and in the
other two cases one production is sufficient.) The corollary therefore follows from
Theorem 6.1, using the principle of structural induction.

EXAMPLE 6.9 A CFG Equivalent to a Regular Expression


Let L be the language corresponding to the regular expression

(O11 + 1)*(01)*

We can take a few obvious shortcuts in the algorithm provided by the proof of Theorem 6.1.
The productions

A — O11/1
generate the language {011, 1}. Following the third part of the theorem, we can use the
productions

B- AB\A
A> O0l1]1
with B as the start symbol to generate the language {011, 1}*. Similarly, we can use

C+ DC|A
D> 01

to derive {01}* from the start symbol C. Finally, we generate the concatenation of the
two
languages by adding the production S > BC. The final grammar has start symbol S, auxiliary
variables A, B, C, and D, and productions

S— BC
B+ AB|A
Ae sO (el
C> DC|A

as D-> a
ee 01 Eee
CHAPTER 6 Context-Free Grammars 215

Starting with any regular expression, we can obtain an equivalent CFG using the
techniques illustrated in this example. In the next section we will see that any regular
language L can also be described by a CFG whose productions all have a very simple
form, and that such a CFG can be obtained easily from an FA accepting L.

A CFG for {xX| No(x) 4 4(X)} EXAMPLE 6.10

Consider the language

L=4x |} {0,1}"| nox) Ani )}


Neither of the CFGs we found in Examples 6.7 and 6.8 for the complement of L is especially
helpful here. As we will see in Chapter 8, there is no general technique for finding a grammar
generating the complement of a given CFL—which may in some cases not be a context-free
language at all. However, we can express L as the union of the languages Lp and L;, where

Lo = {x € {0, 1}* |mo) > mi (x)}


Ly = {x € {0, 1}* | m1(%) > no@)}
and thus we concentrate on finding a CFG Go generating the language Lo. Clearly 0 € Lo,
and for any x € Lo, both x0 and Ox are in Lo. This suggests the productions

S—>0|S0|0S
We also need to be able to add 1’s to our strings. We cannot expect that adding a 1 to an
element of Lo will always produce an element of Lo; however, if we have two strings in Lo,
concatenating them produces a string with at least two more 0’s than 1’s, and then adding a
single 1 will still yield an element of Lo. We could add it at the left, at the right, or between
the two. The corresponding productions are
S—> 1SS|SS1| S1S

It is not hard to see that any string derived by using the productions

S—>0|SO|0S|1SS | SS1|
SIS

is an element of Lo (see Exercise 6.43). In the converse direction we can do even a little better:
if Go is the grammar with productions
S—>0/0S|1SS|SS1|S1S

every string in Lo can be derived in Go.


The proof is by induction on the length of the string. We consider the case that is probably
hardest and leave the others to the exercises. As in previous examples, let d(x) = no(x)—1 (x).
The basis step, for a string in Lo of length 1, is straightforward. Suppose that k > 1 and that
any x for which |x| < k and d(x) > 0 can be derived in Go; and consider a string x for which
Ix] =k +1 and d(x) > 0.
We consider the case in which x = Oy0 for some string y. If x contains only 0’s, it can
be derived from S using the productions S — 0 |0S; we assume, therefore, that x contains at
least one 1. Our goal is to show that x has the form
x = wlz for some w and z with d(w) > 0 and d(z) > 0
216 PART 3 Context-Free Languages and Pushdown Automata

Once we have done this, the induction hypothesis will tell us that both w and z can be derived
from S, so that we will be able to derive x by starting with the production
S— S1S
To show that x has this form, suppose x contains n 1’s, wheren > 1. Foreachi with] <i <n,
let w; be the prefix of x up to but not including the ith 1, and z; the suffix of x that follows this
1. In other words, for each 7,
x = ww; lz;
where the | is the ith 1 in x. If d(w,) > 0, then we may let w = w, and z = z,. The string
Zn is O/ for some j > 0 because x ends with 0, and we have the result we want. Otherwise,
d(w,) < 0. In this case we select the first i with d(w;) < 0, say i = m. Now since x
begins with 0, d(w;) must be > 0, which implies that m > 2. At this point, we can say that
d(Wm-1) > Oandd(w,) < 0. Because w,, has only one more 1 than w,,_1, d(w,,_1) can be no
more than 1. Therefore, d(Wm_1) = 1. Since x = wm_11Zm—1, and d(x) > 0, it follows that
d(Zm-1) > 0. This means that we get the result we want by letting w = w,,_, and z = Zm_1.
The proof in this case is complete.
For the other two cases, the one in which x starts with 1 and the one in which x ends with
1, see Exercise 6.44.
Now it is easy enough to obtain a context-free grammar G generating L. We use S as our
start symbol, A as the start symbol of the grammar we have just derived generating Lo, and B
as the start symbol for the corresponding grammar generating L,. The grammar G then has
the productions
S—>A|B
A—>0|0A|1AA]|AA1|A1A
B—>1|1B|0BB|BBO|
BOB

| EXAMPLEG.11 | Another Application of Theorem 6.1


Let L = {0'1/0" | j > i +k}. We try expressing L as the concatenation of CFLs, although
what may seem at first like the obvious approach—writing L as aconcatenation L, L2L3, where
these three languages contain strings of 0’s, strings of 1’s, and strings of 0’s, respectively—is
doomed to failure. L contains both 0'130' and 0'140?, but if we allowed L to contain 0!, L,
to contain 1°, and L; to contain 0, then L,L2L3 would also contain 0! 1302, and this string is
not an element of L.
Observe that
oO!1'+*ok = OF 1! 1*OF
The only difference between this and a string x in L is that x has at least one extra 1
in the
middle:

x =0'1' 1" 1*0* (for somem > 0)


A correct formula for L is therefore L = L,;L>L3, where
ly ={Ot |r 0}
L, = {1" |m > 0}
Ls = {140" |k > 0}
CHAPTER 6 Context-Free Grammars 217

The second part of Theorem 6.1, applied twice, reduces the problem to finding CFGs for these
three languages.
L; is essentially the language in Example 6.2, L3 is the same with the symbols 0 and 1
reversed, and L» can be generated by the productions

Bes |p

(The second production is B — 1, not B > A, since we want only nonnull strings.)
The final CFG G = (V, &, S, P) incorporating these pieces is shown below.

/V ={S/A,B,C} T= 10/1}
P=(S > ABC
A> OA1|A
B—->1B{1
GoEO LAR
A derivation of 0140? = (01)(1) (170°), for example, is

SiAD CaO Al C—O NID Cm —-mOIG


=> 0111C0 => 01111C00 = 01111A00 = 0111100

6.3 |REGULAR GRAMMARS :


The proof of Theorem 6.1 provides an algorithm for constructing a CFG corresponding
to a given regular expression. In this section, we consider another way of obtaining a
CFG for a regular language L, this time starting with an FA accepting L. The resulting
grammar is distinctive in two ways. First, the productions all have a very simple form,
closely related to the moves of the FA; second, the construction is reversible, so that
a CFG of this simple type can be used to generate a corresponding FA.
We can see how to proceed by looking at an example, the FA in Figure 6.1. It
accepts the language L = {0, 1}*{10}, the set of all strings over {0, 1} that end in 10.
One element of L is x = 110001010. We trace its processing by the FA, as follows.

11
110
1100
11000
110001
1100010
11000101
110001010 AwBAwRPFeeQdsnae
218 PART 3 Context-Free Languages and Pushdown Automata

0 1

ea
eet
1

Figure 6.1 |

If we list the lines of this table consecutively, separated by >, we obtain

A>1B > 11B => 110C => 1100A => 11000A > 110001B
= 1100010C => 11000101B => 110001010C
This looks like a derivation in a grammar. The grammar can be obtained by
specifying the variables to be the states of the FA and starting with the productions

A—>I1B
B—1B
B+ 0C
C+>O0A
AOA
C—>1B
These include every production of the form

P—>aQ
where

P30
is a transition in the FA. The start symbol is A, the initial state of the FA. To complete
the derivation, we must remove the C from the last string. We do this by adding the
production B — 0, so that the last step in the derivation is actually

11000101B = 110001010
Note that the production we have added is of the form

P->a
where

PSF
is a transition from P to an accepting state F.
Any FA leads to a grammar in exactly this way. In our example
it is easy to see
that the language generated is exactly the one recognized by the FA. In
general, we
must qualify the statement slightly because the rules we have described
for obtaining
CHAPTER 6 Context-Free Grammars 219

productions do not allow A-productions; however, it will still be true that the nonnull
strings accepted by the FA are precisely those generated by the resulting grammar.
A significant feature of any derivation in such a grammar is that until the last step
there is exactly one variable in the current string; we can think of it as the “state of
the derivation,” and in this sense the derivation simulates the processing of the string
by the FA.

Definition 6.3 Regular Grammars

onas= poe-
oea {B >aa 5B,
: neea= aa +n isaccepted PYM, andn> 1,
- Thenther i
isa
‘sequence ea _

aig > ayange=. = 1


ajay: ** n= 1Gn— => aya): ae

‘is genes byc itis clear that x is accepted byM.


; _S, is1isa secaesganna pene aies :

dd 1g ne extri in
10a transition Bo a
ro ction B > a forearai to
220 PART 3 Context-Free Languages and Pushdown Automata

Sometimes the term regular is applied to grammars that do not restrict the form
of the reductions so severely. It can be shown (Exercise 6.12) that a language is
regular if and only if it can be generated, except possibly for the null string, by a
grammar in which all productions look like this:

Bo>xC
Box

where B and C are variables and x is a nonnull string of terminals. Grammars of this
type are also called linear. The exercises discuss a few other variations as well.

6.4| DERIVATION TREES AND AMBIGUITY


In a natural language such as English, understanding a sentence begins with under-
standing its grammatical structure, which means knowing how it is derived from the
grammar rules for the language. Similarly, in a context-free grammar that specifies
the syntax of a programming language or the rules for constructing an algebraic ex-
pression, interpreting a string correctly requires finding a correct derivation of the
string in the grammar. A natural way of exhibiting the structure of a derivation is
to draw a derivation tree, or parse tree. At the root of the tree is the variable with
which the derivation begins. Interior nodes correspond to variables that appear in the
derivation, and the children of the node corresponding to A represent the symbols in
CHAPTER 6 Context-Free Grammars 221

a string @ for which the production A > a is used in the derivation. (In the case of
a production A — A, the node labeled A has the single child A.)
In the simplest case, when the tree is the derivation tree for a string x € L(G)
and there are no “A-productions” (of the form A — A), the leaf nodes of the tree
correspond precisely to the symbols of x. If there are A-productions, they show up
in the tree, so that some of the leaf nodes correspond to A; of course, those nodes can
be ignored as one scans the leaf nodes to see the string being derived, because A’s
can be interspersed arbitrarily among the terminals without changing the string. In
the most general case, we will also allow “derivations” that begin with some variable
other than the start symbol of the grammar, and the string being derived may still
contain some variables as well as terminal symbols.
In Example 6.4 we considered the CFG with productions

SS + 515 —'S | S* S| S/S CS) [a


The derivation

SS Ss > SESS => GES S > a¥a—S > aea—a

has the derivation tree show in Figure 6.2a. The derivation

Ss=>S-S > S-S/S>.:--- > a-a/a

has the derivation tree shown in Figure 6.2b. In general, any derivation of a string in
a CFL has a corresponding derivation tree (exactly one).
(There is a technical point here that is worth mentioning: With the two produc-
tions S > SS |a, the sequence of steps S > SS = SSS =* aaa can be interpreted
two ways, because in the second step it could be either the first S or the second that
is replaced by SS. The two interpretations correspond to two different derivation
trees. For this reason, we say that specifying a derivation means giving not only the
sequence of strings but also the position in each string at which the next substitution
occurs. The steps S > SS = SSS already represent two different derivations.)

(a) (b)

Figure 6.2 |
Derivation trees for two algebraic
expressions.
222 PART 3 Context-Free Languages and Pushdown Automata

Algebraic expressions such as the two shown in Figure 6.2 are often represented
by expression trees—binary trees in which terminal nodes correspond to identifiers
or constants and nonterminal nodes correspond to operators. The expression tree in
Figure 6.3 conveys the same information as the derivation tree in Figure 6.2a, except
that only the nodes representing terminal symbols are drawn.
One step in a derivation is the replacement of a variable (to be precise, a particular
occurrence of a variable) by the string on the right side of a production. The derivation
a a is the entire sequence of such steps, and in a sequence the order of the steps is
Figure 6.3 | significant. The derivations
Expression tree
corresponding to Se NAS Ss eS Ss aka
Figure 6. 1a. and

SSS eas, = Sha ae a


are therefore different. However, they are different only in a trivial way: When the
current string is S + S, the S that is used in the next step is the leftmost S in the first
derivation and the rightmost in the second. A precise way to say that these derivations
are not significantly different is to say that their derivation trees are the same. A
derivation tree specifies completely which productions are used in the derivation, as
well as where the right side of each production fits in the string being derived. It does
not specify the “temporal” order in which the variables are used, and this order plays
no role in using the derivation to interpret the string’s structure. Two derivations that
correspond to the same derivation tree are essentially the same.
Another way to compare two derivations is to normalize each of them, by re-
quiring that each follow the same rule as to which variable to replace first whenever
there is a choice, and to compare the normalized versions. A derivation is a leftmost
derivation if the variable used in each step is always the leftmost variable of the ones
in the current string. If the two derivations being compared are both leftmost and
are still different, it seems reasonable to say that they are essentially, or significantly,
different.
In fact, these two criteria for “essentially the same” are equivalent. On the
one hand, leftmost derivations corresponding to different derivation trees are clearly
different, because as we have already observed, any derivation corresponds to only
one
derivation tree. On the other hand, the derivation trees corresponding to two different
leftmost derivations are also different, and we can see this as follows. Consider
the
first step at which the derivations differ; suppose that this step is

cAB => xa, B

in one derivation and

xXAB => xa2B


in the other. Here x is a string of terminals, since the derivations
are leftmost; A is
a variable; and a; # a. The two derivation trees must both
have a node labeled A,
and the respective portions of the two trees to the left of this node
must be identical,
because the leftmost derivations have been the same up to this point.
These two nodes
have different sets of children, however, and the trees cannot
be the same.
CHAPTER 6 Context-Free Grammars 223

We conclude that a string of terminals has more than one derivation tree if and
only if it has more than one leftmost derivation. Notice that in this discussion “left-
most” could just as easily be “rightmost”; the important thing is not what order is
followed, only that some clearly defined order be followed consistently, so that the
two normalized versions can be compared meaningfully.
As we have already noticed, a string can have two or more essentially different
derivations in the same CFG.

Definition 6.4 n Ambiguous

It is not hard to see that the ambiguity defined here is closely related to the
ambiguity we encounter every day in written and spoken language. The reporter
who wrote the headline “Disabled Fly to See Carter,” which appeared during the
administration of the thirty-ninth U.S. President, probably had in mind a derivation
such as

S — (collective noun) (verb) - - -

However, one that begins

S — (adjective) (noun) - - -

might suggest a more intriguing or at least less predictable story. Understanding a


sentence or a newspaper headline requires picking the right grammatical derivation
for it.

Ambiguity in the CFG in Example 6.4 | EXAMPLEG.12 |


Let us return to the algebraic-expression CFG discussed in Example 6.4, with productions

S—> S+S|S—S|Sx*S|S/S|(S)|a

In that example we considered two essentially different derivations of the string

a+(axa)/a—a

and in fact the two derivations were both leftmost, which therefore demonstrates the ambiguity
of the grammar. This can also be demonstrated using only the productions S> S + S and
S — a; the string a + a + a has leftmost derivations

S> S+S >pa+S 3 a+S+S > a+t+a4+S > a+a+a

and

§3 S$4+S8S => §S+S84+8S > a4+S4+8 > a+a+S >a+a+t+a

The corresponding derivation trees are shown in Figures 6.4a and 6.4b, respectively.
224 PART 3 Context-Free Languages and Pushdown Automata

ee
wer A
a a

a a a a

(a) (b)

Figure 6.4 |
Two derivation trees fora +a+a.

Although the difference in the two interpretations of the string a + a +a is not quite as
dramatic as in Example 6.4 (the expression is viewed as the sum of two subexpressions in both
cases), the principle is the same. The expression is interpreted as a + (a + a) in one case, and
(a +a) +a in the other. The parentheses might be said to remove the ambiguity as to how the
expression is to be interpreted. We will examine this property of parentheses more carefully
in the next section, when we discuss an unambiguous CFG equivalent to this one.
SSe
ES ee
S ee ee eee

It is easy to see by studying Example 6.12 that every CFG containing a production
of the general form A — AqaA is ambiguous. However, there are more subtle ways
in which ambiguity occurs, and characterizing the ambiguous context-free grammars
in any nontrivial way turns out to be difficult or impossible (see Section 11.6).

| EXAMPLE 6.13 | The “Dangling Else”


A standard example of ambiguity in programming languages is the “dangling else” phe-
nomenon. Consider the productions

(statement) — if ((expression)) (statement) |


if ((expression)) (statement) else (statement) |
(otherstatement)

describing the if statement of Example 6.5 as well as the related if-else statement,
both part of
the C language. Now consider the statement

if (exprl) if (expr2) £(); else Sia

This can be derived in two ways from the grammar rules. In one, illustrated
in Figure 6.5a,
the else goes with the first if, and in the other, illustrated in Figure 6.5),
it goes with the second.
AC compiler should interpret the statement the second way, but not
as a result of the syntax
tules given; this is additional information with which the compiler must be
furnished.
Just as in Example 6.12, parentheses or their equivalent could be used
to remove the
ambiguity in the statement.
CHAPTER 6 Context-Free Grammars 225

(statement)

if ( (expression) (statement) else (statement)

exprl (expression) (statement) gS )8

expr2 LG);
(a)

(statement)

if ( (expression) (statement)

exprl (expression) ? (statement) else (statement)

expr2 f(); Es
(d)

Figure 6.5 |
Two interpretations of a “dangling else.”

if (expr1) {if (expr2) £();} else g();


forces the first interpretation, whereas

if (expr1) {if (expr2) f(); else g();}


forces the second. In some other languages, the appropriate version of “parentheses” is BEGIN
...END.
It is possible, however, to find grammar rules equivalent to the given ones that incorporate
the correct interpretation into the syntax. Consider the formulas
(statement) — (stl) | (st2)
(stl) — if ((expression)) (stl) else (st1) | (otherstatement)
(st2) — if ((expression)) (statement) |

(
if ((expression)) (stl) else (st2)
PART 3 Context-Free Languages and Pushdown Automata

These generate the same strings as the original rules and can be shown to be unambigu-
ous. Although we will not present a proof of either fact, you can see the intuitive reason for
the second. The variable ( stl ) represents a statement in which every if is matched by a
corresponding else, while any statement derived from ( st2 ) contains at least one unmatched
if. The only variable appearing before e/se in these formulas is ( stl ); since the else cannot
match any of the ifs in the statement derived from ( stl ), it must match the if that appeared
in the formula with the else.
It is interesting to compare both these sets of formulas with the corresponding ones in the
official grammar for the Modula-2 programming language:

(statement) — IF (expression) THEN (statementsequence) END |


IF (expression) THEN (statementsequence)
ELSE (statementsequence) END |
(otherstatement) :

These obviously resemble the rules for C in the first set above. However, the explicit END after
each sequence of one or more statements allows the straightforward grammar rule to avoid the
“dangling else” ambiguity. The Modula-2 statement corresponding most closely to the tree in
Figure 6.5a is

IF Al THEN IF A2 THEN S1 END ELSE S2 END

while Figure 6.5) corresponds to

IF Al THEN IF A2 THEN S1 ELSE S2 END END


a eee ee eee

6.5 |AN UNAMBIGUOUS CFG FOR


ALGEBRAIC EXPRESSIONS
Although it is possible to prove that some context-free languages are inherently am-
biguous, in the sense that they can be produced only by ambiguous grammars, ambi-
guity is normally a property of the grammar rather than the language. If a CFG
is am-
biguous, it is often possible and usually desirable to find an equivalent unambigu
ous
CFG. In this section, we will solve this problem in the case of the algebraic-
expression
grammar discussed in Example 6.4.
For the sake of simplicity we will use only the two operators + and x
in our
discussion, so that G has productions

S > S+S|S*S|(S)\a
Once we obtain an unambiguous grammar equivalent to this one, it
will be easy to
reinstate the other operators.
Our final grammar will not have either S > $+ SorS > Sx S,
because either
production by itself is enough to produce ambiguity. We will also
keep in mind the
possibility, mentioned in Example 6.4, of incorporating into the grammar
the standard
CHAPTER 6 Context-Free Grammars 227

tules of order and operator precedence: » should have higher precedence than +,
and a + a + a should “mean” (a + a) +a, nota+(a+a).
In trying to eliminate S > $+ Sand S > SS, it is helpful to remember
Example 2.15, where we discussed possible recursive definitions of L*. Two possible
ways of obtaining new elements of L* are to concatenate two elements of L* and
to concatenate an element of L* with an element of L; we observed that the second
approach preserves the direct correspondence between one application of the recursive
tule and one of the “primitive” strings being concatenated. Here this idea suggests
that we replace S> S + S by either S > $+ T or S > T + S, where the variable
T stands for a term, an expression that cannot itself be expressed as a sum. If we
remember that a + a + a = (a +a) +a, we would probably choose §> S$ + T
as more appropriate; in other words, an expression consists of (all but the last term)
plus the last term. Because an expression can also consist of a single term, we will
also need the production $ — 7. At this point, we have
Se Sactets
[le
We may now apply the same principle to the set of terms. Terms can be products;
however, rather than thinking of a term as a product of terms, we introduce factors,
which are terms that cannot be expressed as products. The corresponding productions
are
TroTrs#F|F
So far we have a hierarchy of levels. Expressions, the most general objects, are
sums of one or more terms, and terms are products of one or more factors. This hierar-
chy incorporates the precedence of multiplication over addition, and the productions
we have chosen also incorporate the fact that both the + and « operations associate
to the left.
It should now be easy to see where parenthesized expressions fit into the hierarchy.
(Although we might say (A) could be an expression or a term or a factor, we should
permit ourselves only one way of deriving it, and we must decide which is most
appropriate.) A parenthesized expression cannot be expressed directly as either a sum
or a product, and it therefore seems most appropriate to consider it a factor. To say
it another way, evaluation of a parenthetical expression should take precedence over
any operators outside the parentheses; therefore, (A) should be considered a factor,
because in our hierarchy factors are evaluated first. What is inside the parentheses
should be an expression, since it is not restricted at all.
The grammar that we end up with is G1 = (V, &, S, P), where V = {S, T, F}
and P contains the productions
S>S+T|T
T—oTx«F\|F
F->(S)|a

We must now prove two things: first, that G1 is indeed equivalent to the original
grammar G, and second, that it is unambiguous. To avoid confusion, we relabel the
start symbol in G1.
228 PART 3 Context-Free Languages and Pushdown Automata
CHAPTER 6 Context-Free Grammars

In order to show that the grammar G1 is unambiguous, it will be helpful to


concentrate on the parentheses in a string, temporarily ignoring the other terminal
symbols. In a sense we want to convince ourselves that the grammar is unambiguous
insofar as it generates strings of parentheses; this will then allow us to demonstrate
the unambiguity of the entire grammar. In Exercise 5.34, we defined a string of
parentheses to be balanced if it is the string of parentheses appearing in some legal
algebraic expression. At this point, however, we must be a little more explicit.
230 PART 3 Context-Free Languages and Pushdown Automata

Definition 6.5 Balanced Strings of Parentheses

You should spend a minute convincing yourself that every left parenthesis in a
balanced string has a mate (Exercise 6.30). The first-observation we make is that the
string of parentheses in any string obtained from $1 in the grammar G1 is balanced.
Certainly it has equal numbers of left and right parentheses, since they are produced
in pairs. Moreover, for every right parenthesis produced by a derivation in G1, a left
parenthesis appearing before it is produced simultaneously, and so no prefix of the
string can have more right parentheses than left.
Secondly, observe that in any derivation in G1, the parentheses between and
including the pair produced by a single application of the production F > (S 1) form
a balanced string. This is because the parentheses within the string derived from $1
do, and because enclosing a balanced string of parentheses within parentheses yields
a balanced string.
Now suppose that x <¢ L(G1), and (0 is any left parenthesis in x. The statement
that there is only one leftmost derivation of x in G1 will follow if we can show that
G1 is unambiguous, and we will be able to do this very soon. For now, however, the
discussion above allows us to say that even if there are several leftmost derivations
of x, the right parenthesis produced at the same time as (o is the same for any of
them—it is simply the mate of (0. To see this, let us consider a fixed derivation of x.
In this derivation, the step in which (9 is produced also produces a right parenthesis,
which we call )o. As we have seen in the previous paragraph, the parentheses in
x
beginning with (9 and ending with )) form a balanced string. This implies that
the
mate of (9 cannot appear after )y, because of how “mate” is defined.
However, the mate of (9 cannot appear before )o either. Suppose @ is the string
of parentheses starting just after (9 and ending with the mate of (9. The
string a
has an excess of right parentheses, because (ow is balanced. Let £ be the
string of
parentheses strictly between (9 and )g. Then B is balanced. If the mate of
(9 appeared
before )o, w would be a prefix of A, and this is impossible. Therefore
, the mate of (0
coincides with )o.
The point of this discussion is that when we say that something is within parenthe
-
ses, we can be sure that the parentheses it is within are the two parenthe
ses produced
by the production F — ($1), no matter what derivation we have in mind. This is the
ingredient we need for our theorem.
CHAPTER 6 Context-Free Grammars 231

oitionfom Si. ‘The ae wallbe bysaaeiankeSrna 1 on Ix,and i


will actuallybe easier to prove something apparently stronger: For any x _
derivable from one of the variables. Si, r,or Fix has only oneleftmost
- derivation from that variable. :
For the basis step, we observe that ¢ : C
three variables, and that in each case there iisonly one
0 derivation. _
Tn the induction step, we assume that k> land that for every y Se |
. from S1, T, or F for which |y| < k, y has only one leftmost derivation from —
that variable. We wish to show the same resultfor a string. x with |x| =k +1.
Consider first the case in which x contains at least one + not within :
— parentheses. — Since the only +’s in strings derivable from T or F are
within parentheses, x can be derived only. from Si,and any derivation oF
x must begin S1 = S1 +, where this + is the last + in x that is not
_ within parentheses. Therefore, any leftmost derivation of x from Si hasthe
form /

os = S147 > yar > ee :


ie the a two stepsrepresent leftmost on ofyce s1 wad4 4-
| fonmT, respectively, and the + is still the last one not within parentheses. -
The induction hypothesis tells us that y has only one leftmost derivation -
| from S1 and z has only one from q Therefore, s hasony one leftmost :
derivation from S1. _ : -
Next consider iecase in which ; x- contains no + oud: oe
: but atleast one * outside parentheses. This time x can be derived only from
- si or T; any derivation from $1 must begin Sl=T => I * F; and any |
derivation from T must begin T = T x F. In either case, the + must be
the last one in x that is not within parentheses. As in the first case, ‘the
‘Ss ubsequent steps o any leftmost derivation must be
Ts F - yu =" 2

/ ee fist of a leftmost derivation of y from T and thenof:a leftmost |


unee wereiS
£from F._ Again a induction hypothesis tellsus
derivation Oye
232 PART 3 Context-Free Languages and Pushdown Automata

6.6 |SIMPLIFIED FORMS AND


NORMAL FORMS
Ambiguity is one undesirable property of a context-free grammar that we might
wish to eliminate. In this section we discuss some slightly more straightforward
ways of improving a grammar without changing the resulting language: first by
eliminating certain types of productions that may be awkward to work with, and then
by standardizing the productions so that they all have a certain “normal form.”
We begin by trying to eliminate “A-productions,” those of the form A > A, and
“unit productions,” in which one variable is simply replaced by another. To illustrate
how these improvements might be useful, suppose that a grammar contains neither
type of production, and consider a derivation containing the step

C=
If there are no A-productions, then the string 6 must be at least as long as q; if there
are no unit productions, a and £ can be of equal length only if this step consists of
replacing a variable by a single terminal. To say it another way, if / and ¢ represent
the length of the current string and the number of terminals in the current string,
respectively, then the quantity / + t must increase at each step of the derivation. The
value of / + ¢ is 1 for the string § and 2k for a string x of length k in the language.
We may conclude that a derivation of x can have no more than 2k — 1 steps. In
particular, we now have an algorithm for determining whether a given string x is
in
the language generated by the grammar: If |x| = k, try all possible sequences
of
2k — | productions, and see if any of them produces x. Although this is not usually
a practical algorithm, at least it illustrates the fact that information about the form
of
productions can be used to derive conclusions about the resulting language.
In trying to eliminate A-productions from a grammar, we must begin with
a
qualification. We obviously cannot eliminate all productions of this form if the
string
A itself is in the language. This obstacle is only minor, however: We will
be able to
show that for any context-free language L, L — {A} can be generated by
a CFG with
no A-productions. A preliminary example will help us see how to proceed.
CHAPTER 6 Context-Free Grammars 233

Eliminating A-productions from a CFG | EXAMPLEG.14


6.14 |
Let G be the context-free grammar with productions

S— ABCBCDA
A—>CD
B— Cb
Co>alA
D—>bD\A
The first thing this example illustrates is probably obvious already: We cannot simply
throw away the A-productions without adding anything. In this case, if D — A is eliminated
then nothing can be derived, because the A-production is the only way to remove the variable
D from the current string.
Let us consider the production § ~ ABCBCDA, which we write temporarily as

S— ABC,BC,DA

The three variables C,, C2, and D on the right side all begin A-productions, and each can also
be used to derive a nonnull string. In a derivation we may replace none, any, or all of these
three by A. Without A-productions, we will need to allow for all these options by adding
productions of the form S — a, where a is a string obtained from ABCBCDA by deleting
some or all of {C;, Cy, D}. In other words, we will need at least the productions

S — ABBC,DA | ABC;BDA | ABC, BC,A |


ABBDA | ABBC,A | ABC,BA |
ABBA

in addition to the one we started with, in order to make sure of obtaining all the strings that can
be obtained from the original grammar.
If we now consider the variable A, we see that these productions are still not enough.
Although A does not begin a A-production, the string A can be derived from A (as can other
nonnull strings). Starting with the production A > CD, we can leave out C or D, using the
same argument as before. We cannot leave out both, because we do not want the production
A — A in our final grammar. If we add subscripts to the occurrences of A, as we did to those
of C, so that the original production is
Sa A,BC,BC,DA;

we need to add productions in which the right side is obtained by leaving out some subset of
{A;, Az, Ci, C2, D}. There are 32 subsets, which means that from this original production we
obtain 31 others that will be added to our grammar.
The same reasoning applies to each of the original productions. If we can identify in
the production X — «a all the variables occurring in w from which A can be derived, then
we can add all the productions X — a’, where a’ is obtained from a by deleting some of
these occurrences. In general this procedure might produce new A-productions—if so, they
are ignored—and it might produce productions of the form X —> X, which also contribute
nothing to the grammar and can be omitted.
234 PART 3 Context-Free Languages and Pushdown Automata

In this case our final context-free grammar has 40 productions, including the 32 S-
productions already mentioned and the ones that follow:

Av C.D CD
B—>Cb\|b
Ca
Dip Dib

The procedure outlined in Example 6.14 is the one that we will show works in
general. In presenting it more systematically, we give first a recursive definition of a
nullable variable (one from which A can be derived), and then we give the algorithm
suggested by this definition for identifying such variables.

vari
a

No other variables in V-

Algorithm FindNull (Finding the nullable variables in a CFG (V, 2, S, P))

No = {A € V | P contains the production A > A};


n= 0:
do
i=i+l;
N; = Ni-; U{A | P contains A > a for some a € N¥*_y}
while N; ae Ni-3
N; is the set of nullable variables.

You can easily convince yourself that the variables defined in Definition 6.6 are
the variables A for which A >* A. Obtaining the algorithm FindNull from
the
definition is straightforward, and a similar procedure can be used
whenever we have
such a recursive definition (see Exercise 2.70). When we apply the algorithm
in
Example 6.14, the set No is {C, D}. The set N, also contains A, as
a result of the
production A — CD. Since no other productions have right sides in {A,
C, D}*,
these three are the only nullable variables in the grammar.

Algorithm 6.1 (Finding an equivalent CFG with no A-productions)


Given aCFG
G=(V, =, S, P), construct a CFG G1 = (V, &, S, P1) with no A-produc
tions as
follows.
CHAPTER 6 Context-Free Grammars 235

Initialize P1 to be P.
Find all nullable variables in V, using Algorithm FindNull.
For every production A — a in P, add to P1 every production that can be
obtained from this one by deleting from @ one or more of the occurrences of
nullable variables in a.
Delete all A-productions from P1. Also delete any duplicates, as well as
productions of the form A > A.
236 PART 3 Context-Free Languages and Pushdown Automata

Eliminating A-productions from a grammar is likely to increase the number of


productions substantially. We might ask whether any other undesirable properties are
introduced. One partial answer is that if the context-free grammar G is unambiguous,
then the grammar G1 produced by Algorithm 6.1 is also (Exercise 6.64).
The next method of modifying a context-free grammar, eliminating unit produc-
tions, is similar enough in principle to what we have just done that we omit many of
the details or leave them to the exercises. Just as it was necessary before to consider
all nullable variables as well as those that actually begin A-productions, here it is
necessary to consider all the pairs of variables A, B for which A >* B as well as
the pairs for which there is actually a production A + B. In order to guarantee
that eliminating unit productions does not also eliminate strings in the language, we
make sure that whenever B — a is a nonunit production and A =* B, we add the
production A > a.
In order to simplify the process of finding all such pairs A, B, we make the
simplifying assumption that we have already used Algorithm 6.1 if necessary so that
the grammar has no A-productions. It follows in this case that one variable can be
derived from another only by a sequence of unit productions. For any variable A, we
may therefore formulate the following recursive definition of the set of “A-derivable”
variables (essentially, variables B other than A for which A =* B), and the definition
can easily be adapted to obtain an algorithm.
1. If A — Bisa production, B is A-derivable.
2. IfC is A-derivable,C > Bisa production, and B # A, then B is A-derivable.
3. No other variables are A-derivable.

(Note that according to our definition, a variable A is A-derivable only if A > A is


actually a production.)

Algorithm 6.2 (Finding an equivalent CFG with no unit productions) Given a


context-free grammar G = (V, , S, P) withno A-productions, construct a grammar
G1 =(V, &, S, P1) having no unit productions as follows.

1. Initialize P1 to be P.
2. Foreach A € V, find the set of A-derivable variables.
CHAPTER 6 Context-Free Grammars 237

3. For every pair (A, B) such that B is A-derivable, and every nonunit production
B — a, add the production A > a to P1 if it is not already present in P 1.
4. Delete all unit productions from P1.

The proof is omitted (Exercise 6.62). Itis worth pointing out, again without proof, that
if the grammar G is unambiguous, then the grammar G1 obtained from the algorithm
is also.

Eliminating Unit Productions | EXAMPLEG.15 |


Let G be the algebraic-expression grammar obtained in the previous section, with productions

S>S+4+T|T
T—>T+*F\|F
F > (S)|a

The S-derivable variables are T and F, and F is T-derivable. In step 3 of Algorithm 6.2, the
productions § + T * F | (S) | a and T — (S) |a are added to P1. When unit productions
are deleted, we are left with

S>S+T|TxF|(S)|a
T—>TxF\|(S)|a@
F — (S))|\4

In addition to eliminating specific types of productions, such as A-productions


and unit productions, it may also be useful to impose restrictions upon the form of
the remaining productions. Several types of “normal forms” have been introduced;
we shall present one of them, the Chomsky normal form.

n6.7 Chomsky Normal Form


238 PART 3 Context-Free Languages and Pushdown Automata

Transforming a grammar G = (V, &, S, P) into Chomsky normal form may


be done in three steps. The first is to apply Algorithms 6.1 and 6.2 to obtain a
CFG G1 = (V, &, S, P1) having neither A-productions nor unit productions so that
L(G1) = L(G)—{A}. The second step is to obtain a grammar G2 = (V2, &, S, P2),
generating the same language as G1, so that every production in P2 is either of the
form
A> B, Bo OD B,

where k > 2 and each B; is a variable in V2, or of the form

A->a

for some a € &.


The construction of G2 is very simple. Since P1 contains no A-productions or
unit productions, every production in P 1 that is not already of the form A — a looks
like A — a for some string a of length at least 2. For every terminal a appearing
in such a string @, we introduce a new variable X, and a new production X, — a,
and replace a by X, in all the productions where it appears (except those of the form
A—> a).
For example, if there were two productions A + aAb and B — ab, they would
be replaced by A > X,AX, and B > X,X,, and the productions X, — a and
Xp — b would be added. The only X,-production is X, — a, and so it is reasonably
clear that G2 is equivalent to G1.
The grammar G2 obtained this way now resembles a grammar in CNF, in the
sense that the right side of every production is either a single terminal or a string
of two or more variables. The last step is to replace each production having more
than two variables on the right by an equivalent set of productions, each one having
exactly two variables on the right. This process is described best by an example; the
production

A— BCDBCE
would be replaced by

A— BY,
Yi > CY,

Y, — DY;

5 — BY,
Y4—>CE
The new variables Y;, Y>, Y3, Y4 are specific to this production and would be used
nowhere else. Although this may seem wasteful, in terms of the number of variables,
at least there is no doubt that the combined effect of this set of five productions
is precisely equivalent to the original production. Adding these new variables and
productions therefore does not change the language generated.
If we are willing to let these informal arguments suffice, we have obtained
the
following result.
CHAPTER 6 Context-Free Grammars 239

Converting a CFG to Chomsky Normal Form (|S WSSeTs


Let G be the grammar with productions

S— AACD
A—>aAb|A
C—>aCla
D—>aDa\|bDb\|A

Let us go through each step of the conversion to CNF.

1. Eliminating A-productions. The nullable variables are A and D, and Algorithm 6.1
produces the grammar with productions

S— AACD|ACD|AAC|CD|AC|C
A — aAb|ab
C—>aCla
D — aDa\|bDb
| aa | bb
»

2. Eliminating unit productions. Here we may simply add the productions

S—>aCla

and delete $ — C.
3. Restricting the right sides of productions to single terminals or strings of two or
more variables. This step yields the productions

S— AACD|ACD|AAC|CD|AC|X,C|a
A—> X,AX, | XaXp
C—> X,C|a
D— X,DX, | X,DXp | XaXa | XpXp
X,g>a

Xp —>b

4. The final step to CNF. There are six productions whose right sides are too long.
Applying our algorithm produces the grammar with productions

S— AT, T; —~ ATh In CD
S— AU, U, ~ CD

S— AV, V, ~ AC
240 PART 3 Context-Free Languages and Pushdown Automata

S 2 CID |)AE ||260 || @


A— X,W, W, > AX,
A— X,Xp
C—> X,Cla
D— XqVX, Y, — DX,
D— XZ, Z; > DX,
D —> X,Xp | XpXp
X,> a X,—>b

EXERCISES
6.1. In each case, say what language is generated by the context-free grammar
with the indicated productions.
a.
S— aSa|bSb|A
b.
S — aSa|bSb|a\|b
C;
S— aSb | bSa|A
d.

S — aSa|bSb|aAb
| bAa
A—aAa|bAb|a|b|A
(See Example 6.3.)
€.

S—>aS|bS|a
if

S > SS |.bS | a
g.

S— SaS |b
h.

Sartorin
Ea8'bs
6.2. Find a context-free grammar corresponding to the “syntax diagram” in
Figure 6.6.
CHAPTER 6 Context-Free Grammars 241

Figure 6.6 |

6.3. Acontext-free grammar is sometimes specified in the form of BNF rules; the
letters are an abbreviation for Backus-Naur Form. In these rules, the symbol
:!= corresponds to the usual —, and {X} means zero or more occurrences of
'X. Find a context-free grammar corresponding to the BNF rules shown
below. Uppercase letters denote variables, lowercase denote terminals.

Pra=epuay pal inlay)


F:=G\|vG|fG|pl{, I}
Goa ea
Paar
AL i}
| eee oaeb)
Fa Oe,
De Olt

6.4. In each case, find a CFG generating the given language.


a. The set of odd-length strings in {a, b}* with middle symbol a.
b. The set of even-length strings in {a, b}* with the two middle symbols
equal.
c. The set of odd-length strings in {a, b}* whose first, middle, and last
symbols are all the same.
6.5. Ineach case, the productions in a CFG are given. Prove that neither one
generates the language L = {x € {0, 1}* |no(x) = m1 ()}.
242 PART 3 Context-Free Languages and Pushdown Automata

S— SOILS | S10S | A

§ OST | 180} O1S | 108 |-S0l | S10 1A


6.6. Consider the CFG with productions
S — aSbScS |aScSbS |bSaScS |bScSaS | cSaSbS | cSbSaS | A
Does this generate the language {x € {a, b, c}* |ng(x) = np(x) = n-(x)}?
Prove your answer.
6.7. Find a context-free grammar generating the language of all regular
expressions over an alphabet D:
a. Ifthe definition of regular expression is interpreted strictly with regard to
parentheses.
b. If the definition is interpreted so as to allow regular expressions that are
not “fully parenthesized.”
Be careful to distinguish between A-productions and productions whose
right side is the symbol A appearing in a regular expression; use A in the
second case.
6.8. This problem gives proposed alternative constructions for the CFGs G,, G-,
and G* in Theorem 6.1. In each case, either prove that the construction
works, or give an example of grammars for which it doesn’t and say why it
doesn’t.
as (ForG;) Vy = Vr Va; SP= Siz “P= PpUPSU{SpSs5)}
be (For'G.) Vi Vi O33, So = Ss Pe Py OP IS Shy)
C. (For G*) V = V;; Sasa P= Pi U{S; > 81S; | A}
6.9. Find context-free grammars generating each of these languages.
a. faibick |i =j+k}
b. {aibick| j =i+k}
c. {a'bick |j =iorj =k}
d. {a'bic’
|i = j ori =k}
en (aeNt — jor =k}
fab \i = 25)
g. tabi
|i < 27)
h. {a'b!
|i <j < 23}
6.10. Describe the language generated by each of these grammars.
a. The regular grammar with productions
S = aA'|bC |b
A—> aS |bB
B— aC |bA|a
C= aB bs
CHAPTER 6 Context-Free Grammars 243

b. The grammar with productions


S—>bS|aA|A
A—aA|bB\b
B— bS
6.11. Show that for a language L C &* such that A ¢ L, the following statements
are equivalent.
a. L is regular.
b. JL can be generated by a grammar in which all productions are either of
the form A — xB or of the form A — x (where A and B are variables
and x € &*).
c. L can be generated by a grammar in which all productions are either of
the form A — Bx or of the form A — x (where A and B are variables
and x € &*),.
6.12. Show that for any language L C &%*, the following statements are equivalent.
a. L is regular.
b. JL can be generated by a grammar in which all productions are either of
the form A — xB or of the form A — x (where A and B are variables
and x € &*).
c. lL canbe generated by a grammar in which all productions are either of
the form A — Bx or of the form A — x (where A and B are variables
and x € &*). ¢
6.13. Given the FA shown in Figure 6.7, accepting the language L, find a regular
' grammar generating L — {A}.

Figure 6.7 |

6.14. Draw an NFA accepting the language generated by the grammar with
productions
S — abA|bB
| aba
A—>b|aB\|bA
B—>aB\aA
244 PART 3 Context-Free Languages and Pushdown Automata

6.15. Show that if the procedure described in the proof of Theorem 6.2 is applied
to an NFA instead of an FA, the result is still a regular grammar generating
the language accepted by the NFA.
6.16. Consider the following statement. For any language L C &%, L is regular if
and only if L can be generated by some grammar in which every production
takes one of the four forms B > a, B ~ Ca, B > aC, or B —> A, where
B and C are variables and a € &. For both the “if” and the “only if” parts,
give either a proof or a counterexample.
6.17. A context-free grammar is said to be self-embedding if there is some variable
A and two nonnull strings of terminals w and 6 so that A >* aAB. Show
that a language L is regular if and only if it can be generated by a grammar
that is not self-embedding.
6.18. Each of the following grammars, though not regular, generates a regular
language. In each case, find a regular grammiar generating the language.
a. S— SSS|a|ab
S — AabB A—>aA|bA|A B — Bab| Bb|ab|b
S — AAS | ab | aab A—ab\|ba|A
S— AB A — aAa|bAb|a|b B+>aB\bB\A
@ S—>AA|B
(s
Oo
ie A— AAA|Ab|bA|a B— bB\|b
6.19. Refer to Example 6.4.
a. Draw the derivation tree corresponding to each of the two given
derivations of a+ (a x a)/a —a.
Write the rightmost derivation corresponding to each of the trees in (a).
How many distinct leftmost derivations of this string are there?
How many derivation trees are there for the string a +a +a+a+a?
Se
(@
Oe How many derivation trees are there for the string
(a+ (a+a))+(a+a)?
6.20. Give an example of a CFG and a string of variables and/or terminals
derivable from the start symbol for which there is neither a leftmost
derivation nor a rightmost derivation.
6.21. Consider the C statements
Hoel aE (ats 2) TF (as 4x = 23 eelge s = Se
a. What is the resulting value of x if a = 3? Ifa = 1?
b. Same question as in (a), but this time assume that the statement is
interpreted as in Figure 6.5a.
6.22. Show that the CFG with productions
S—a|Sa|bSS|SSb| SbS
is ambiguous.
6.23. Consider the context-free grammar with productions
S—> AB A>aA|A B—->ab|bB\|A
CHAPTER 6 Context-Free Grammars 245

Any derivation of a string in this grammar must begin with the production
S — AB. Clearly, any string derivable from A has only one derivation from
A, and likewise for B. Therefore, the grammar is unambiguous. True or
false? Why? (Compare with the proof of Theorem 6.4.)
6.24. In each part of Exercise 6.1, decide whether the grammar is ambiguous or
not, and prove your answer.
6.25. For each of the CFGs in Examples 6.3, 6.9, and 6.11, determine whether or
not the grammar is ambiguous, and prove your answer.
6.26. In each case, show that the grammar is ambiguous, and find an equivalent
unambiguous grammar.
S—> SS|a\|b
SAB Arica aA | AO 1 ABS
BB (A
S—>A|B A — aAb|ab B—>abB\A
S — aSb|aaSb\|A
2o S—
PS
O'S aSb|abS|A
6.27. Find an unambiguous context-free grammar equivalent to the grammar with
productions
S — aaaaS |aaaaaaaS | A
(See Exercise 2.50.)
6.28. The proof of Theorem 6.1 shows how to find a regular grammar generating
L, given a finite automaton accepting L.
a. Under what circumstances is the grammar obtained this way
unambiguous?
b. Describe how the grammar can be modified if necessary in order to make
it unambiguous.
6.29. Describe an algorithm for starting with a regular grammar and finding an
equivalent unambiguous grammar.
6.30. Show that every left parenthesis in a balanced string has a mate.
6.31. Show that if a is a left parenthesis in a balanced string, and b is its mate, then
a is the last left parentheses for which the string consisting of a and b and
everything in between is balanced.
6.32. Find an unambiguous context-free grammar for the language of all algebraic
expressions involving parentheses, the identifier a, and the four binary
operators +, —, *, and /.
6.33. Show that the nullable variables defined by Definition 6.6 are precisely those
variables A for which A =>* A.
6.34. In each case, find a context-free grammar with no A-productions that
generates the same language, except possibly for A, as the given CFG.
a.
S—> ABA A — aASb|a B— bS
246 PART 3 Context-Free Languages and Pushdown Automata

b.
S— AB|ABC
A— BA|BC|Ala
B+ AC|CB|A|b
C—> BC|AB|Alc

6.35. In each case, given the context-free grammar G, find a CFG G’ with no
A-productions and no unit productions that generates the language
L(G) — {A}.
a. G has productions
S — ABA A —> aA|A B— bB\A
b. G has productions
S — aSa|bSb|A A — aBb|bBa B— aB\|bB\|A
c. G has productions

S—> A|BI|C A — aAa|B B — bB|bb


C —> aCaa|D D — baD|\abD |aa

6.36. A variable A in a context-free grammar G = (V, D, S, P) is live if A >* x


for some x € &*. Give a recursive definition, and a corresponding
algorithm, for finding all live variables in G.
6.37. A variable A in a context-free grammar G = (V, D, S, P) is reachable if
S =* aAB for some a, B € (X U V)*. Give a recursive definition, and a
corresponding algorithm, for finding all reachable variables in G.
6.38. A variable A is a context-free grammar G = (V, 5, S , P) is useful if for
some string x € &*, there is a derivation of x that takes the form

S => eA Biss hx
A variable that is not useful is useless. Clearly if a variable is either not live
or not reachable (Exercises 6.36—6.37), then it is useless.
a. Give an example in which a variable is both live and reachable but still
useless.
b. Let G be a CFG. Suppose G1 is obtained by eliminating all dead
variables from G and eliminating all productions in which dead
variables appear. Suppose G2 is then obtained from G1 by eliminating
all variables unreachable in G1, as well as productions in which such
variables appear. Show that G2 contains no useless variables,
and
L(G2) = L(G).
c. Show that if the two steps are done in the opposite order, the resulting
grammar may still have useless variables.
d. In each case, given the context-free grammar G, find an equivalent
CFG
with no useless variables.
CHAPTER 6 Context-Free Grammars 247

i. G has productions
S — ABC|BaB A — aA|BaC
|aaa
B — DbBb\a C — CA|AC

ii. G has productions


S > AB|AC A — aAb|bAa|a B — bbA|\aaB|
AB
C — abCa|aDb D > bD\aC

6.39. In each case, given the context-free grammar G, find a CFG G’ in Chomsky
normal form generating L(G) — {A}.
a. Ghas productions $ > SS|(S)|A
b. Ghas productions S$ > S(S)| A
c. Gis the CFG in Exercise 6.35c
d. G has productions
S — AaA|CA|BaB A — aaBa|CDA|aa|DC
B — bB|bAB\|bb\|aS C > Ca|bC|D D— bDD\A

6.40. If G is acontext-free grammar in Chomsky normal form and x € L(G) with


|x| = k, how many steps are there in a derivation of x in G?

MORE CHALLENGING PROBLEMS


6.41. Describe the language generated by the CFG with productions
S—>aS|aSbS|A
One way to understand this language is to replace a and b by left and right
parentheses, respectively. However, the language can also be characterized
by giving a property that every prefix of a string in the language must have.
6.42. Show that the language of all nonpalindromes over {a, b} (see Example 6.3)
cannot be generated by any CFG in which S > aSa |bSb are the only
productions with variables on the right side.
6.43. Show using mathematical induction that every string produced by the
context-free grammar with productions
S5—>0|S0|0S|1SS|SS1|
S1S

has more 0’s than 1’s.


6.44. Complete the proof in Example 6.10 that every string in {0, 1}* with more
0’s than 1’s can be generated by the CFG with productions
S > 0|0S|1SS | SS1 | S1S. (Take care of the two remaining cases.)
6.45. Let L be the language generated by the CFG with productions
S > aSb|ab\| SS

Show using mathematical induction that no string in L begins with abb.


248 PART 3 Context-Free Languages and Pushdown Automata

6.46. Describe the language generated by the CFG with productions

S—> ST|A T — aS|bT


|b

Prove that your answer is correct.


6.47. Show that the context-free grammar with productions

S—> bS|aT|A
T—>aT|bU|A
U—-aT\Aa

generates the language of all strings over the alphabet {a, b} that do not
contain the substring abb. One approach is to use mathematical induction to
prove two three-part statements. In both cases, each part starts with “For
every n = 0, if x is any string of length n,”. In the first statement, the three
parts end as follows: (i) if $ >* x, then x does not contain the substring
abb; (ii) if T =* x, then x does not contain the substring bd; (iii) if
U =>* x, then x does not start with b and does not contain the substring bb.
In the second statement, the three parts end with the converses of (i), (ii), and
(iii). The reason for using two three-part statements, rather than six separate
statements, is that in proving each of the two, the induction hypothesis will
say something about all three types of strings: those derivable from S, those
derivable from 7, and those derivable from U.
6.48. What language over {0, 1} does the CFG with productions

S — 00S | 11S | SOO | S11 |01S01 |01510 |10510] 10501 | A


generate? Prove your answer.
6.49. Complete the proof of Theorem 6.3, by taking care of the two remaining
cases in the first part of the proof.
6.50. Show using mathematical induction that the CFG with productions

SOB TA TK
A— 0S|1AA
B—-1S|0BB
generates the language L = {x € {0, 1}* |no(x) = n,(x)} (See Example 6.8)
It would be appropriate to formulate two three-part statements, as in
Exercise 6.47, this time involving the variables §, A, and B and the
languages L, Lo, and L).
6.51. Prove that the CFG with productions § > 0515S | 1SOS | A generates
the
language L = {x € {0, 1}* |No(x) = n,(x)}.
6.52. a. Describe the language generated by the CFG G with productions

S SSCS) A
b. Show that the CFG G with productions

St > (S1)S; | A
CHAPTER 6 Context-Free Grammars 249

generates the same language. (One inclusion is easy. For the other one,
it may be helpful to prove the following statements for a string x € L(G)
with |x| > 0. First, if there is no derivation of x beginning with the
production S$ — (S), then there are strings y and z, both in L(G) and
both shorter than x, for which x = yz. Second, if there are such strings y
and z, and if there are no other such strings y’ and z’ with y’ shorter than
y, then there is a derivation of y in G that starts with the production
S — (S).)
6.53. Show that the CFG with productions
S — aSaSbS |aSbSaS | bSaSaS | A
generates the language {x € {a, b}* |ng(x) = 2n,(x)}.
6.54. Does the CFG with productions
S — aSaSb |aSbSa |bSaSaS | A
generate the language of the previous problem? Prove your answer.
6.55. Show that the following CFG generates the language
{x € {a, b}* |na(x) = 2np(x)}.
S > SS|bTT|TbT|TTb|A T — aS|SaS|Sa|\a
6.56. For alphabets ©; and £2, a homomorphism from Xj to X3 is defined in
Exercise 4.46. Show that if f : Lf — Xj is ahomomorphism and L C Xf
is a context-free language, then f(L) C &5 is also a CFG.
6.57. Show that the CFG with productions
S— S(S)|A
is unambiguous.
6.58. Find context-free grammars generating each of these languages.
a. {aiblck |i Aj +k}
b. {a'bick | j Ai +k}
6.59. Find context-free grammars generating each of these languages, and prove
that your answers are correct.
ae ap Wits j = 31/2}
b. {aib/ |i/2 < j < 3i/2}
6.60. Let G be the context-free grammar with productions
S > aS |aSbS |c

and let G, be the one with productions


S; ~T|U T — aTbT |\c U — aS, |aTbU

(G, is a simplified version of the second grammar in Example 6.13.)


a. Show that G is ambiguous.
b. Show that G and G, generate the same language.
c. Show that G, is unambiguous.
250 PART 3 Context-Free Languages and Pushdown Automata

6.61. Let x be a string of left and right parentheses. A complete pairing of x is a


partition of the parentheses of x into pairs such that (i) each pair consists of
one left parenthesis and one right parenthesis appearing somewhere after it;
and (ii) the parentheses between those in a pair are themselves the union of
pairs. Two parentheses in a pair are said to be mates with respect to that
pairing.
a. Show that there is at most one complete pairing of a string of parentheses.
b. Show that a string of parentheses has a complete pairing if and only if it
is a balanced string, according to Definition 6.5, and in this case the two
definitions of mates coincide.
6.62. Give a proof of Theorem 6.6. Suggestion: in order to show that
L(G) € L(G1), show that for every n > 1, and every string of variables
and/or terminals that can be derived from S in G by an n-step leftmost
derivation in which the last step is not a unit production can be derived from
Seim:Gal.
6.63. Show that if a context-free grammar is unambiguous, then the grammar
obtained from it by Algorithm 6.1 is also.
6.64. Show that if a context-free grammar with no A-productions is unambiguous,
then the one obtained from it by Algorithm 6.2 is also.
C HAPTER

Pushdown Automata

7.1| INTRODUCTION BY WAY


OF AN EXAMPLE
In this chapter we investigate how to extend our finite-state model of computation so
that we can recognize context-free languages. In our first example, we consider one
of the simplest nonregular context-free languages. Although the abstract machine we
describe is not obviously related to a CFG generating the language, we will see later
that a machine of the same general type can be used with any CFL, and that one can
be constructed very simply from a grammar.

An Abstract Machine to Accept Simple Palindromes | EXAMPLE7.1 |


Let G be the context-free grammar having productions

S —> aSa|bSb\c

G generates the language

L = {xcx” |.x € {a, b}"}

The strings in L are odd-length palindromes over {a, b} (Example 6.3), except that the middle
symbol is c. (We will consider ordinary palindromes shortly. For now, the “marker” in the
middle makes it easier to recognize the string.)
It is not hard to design an algorithm for recognizing strings in L, using a single left-to-right
pass. We will save the symbols in the first half of the string as we read them, so that once
we encounter the c we can begin matching incoming symbols with symbols already read. In
first
order for this to work, we must retrieve the symbols we have saved using the rule “last in,
out” (often abbreviated LIFO): The symbol used to match the next incoming symbol is the one
which
most recently read, or saved. The data structure incorporating the LIFO rule is a stack,

251
252 PART 3 Context-Free Languages and Pushdown Automata

is usually implemented as a list in which one end is designated as the top. Items are always
added (“pushed onto the stack’’) and deleted (“popped off the stack’’) at this end, and at any
time, the only element of the stack that is immediately accessible is the one on top.
In trying to incorporate this algorithm in an abstract machine, it would be reasonable to
say that the current “state” of the machine is determined in part by the current contents of the
stack. However, this approach would require an infinite set of “states,” because the stack needs
to be able to hold arbitrarily long strings. It is convenient instead to continue using a finite
set of states—although the machine is not a “finite-state machine” in the same way that an
FA is, because the current state is not enough to specify the machine’s status—and to think of
the stack as a simple form of auxiliary memory. This means that a move of our machine will
depend not only on the current state and input, but also on the symbol currently on top of the
stack. Carrying out the move may change the stack as well as the state.
In this simple example, the set Q of states will contain only three elements, go, g1, and
q2. The state qo, the initial state, is sufficient for processing the first half of the string. In this
state, each input symbol is pushed onto the stack, regardless of what is currently on top. The
machine stays in qo as long as it has not yet received the symbol c; when that happens, the
machine moves to state q,, leaving the stack unchanged. State q, is for processing the second
half of the input string. Once the machine enters this state, the only string that can be accepted
is the one whose second half (after the c) is the reverse of the string already read. In this state
each input symbol is compared to the symbol currently on top of the stack. If they agree, that
symbol is popped off the stack and both are discarded; otherwise, the machine will crash and
the string will not be accepted. This phase of the processing ends when the stack is empty,
provided the machine has not crashed. An empty stack means that every symbol in the first
half of the string has been successfully matched with an identical input symbol in the second
half, and at that point the machine enters the accepting state qo.
Now we consider how to describe precisely the abstract machine whose operations we
have sketched. Each move of the machine will be determined by three things:

1. The current state


2. The next input
3. The symbol on top of the stack

and will consist of two parts:

1. Changing states (or staying in the same state)


2. Replacing the top stack symbol by a string of zero or more symbols

Describing moves this way allows us to consider the two basic stack moves as special cases:
Popping the top symbol off the stack means replacing it by A, and pushing Y onto the stack
means replacing the top symbol X by Y X (assuming that the left end of the string corresponds
to the top). We could enforce the stack rules more strictly by requiring that a single
move
contain only one stack operation, either a push or a pop. However, replacing the stack
symbol
X by the string a can be accomplished by a sequence of basic moves (a pop, followed
by
a sequence of zero or more pushes), and allowing the more general move helps to
keep the
number of distinct moves as small as possible.
In the case of a finite automaton, our transition function took the form

6:Q0xxi->Q
CHAPTER 7 Pushdown Automata 253

Here, if we allow the possibility that the stack alphabet I (the set of symbols that can appear
on the stack) is different from the input alphabet ©, it looks as though we want

o-Oxyxl— Oxi"

For a state q, an input a, and a stack symbol X,

é(q, a, X) = (p, a)

means that in state g, with X on top of the stack, we read the symbol a, move to state p, and
replace X on the stack by the string a.
This approach raises a few questions. First, how do we describe a move if the stack is
empty (5(q, a, ?))? We avoid this problem by saying that initially there is a special start symbol
Zo on the stack, and the machine is not allowed to move when the stack is empty. Provided
that Zo is never removed from the stack and that no additional copies of Zp are pushed onto
the stack, saying that Zp is on top means that the stack is effectively empty.
Second, how do we describe a move when the input is exhausted (6(qg, ?, X))? (Remember
that in our example we want to move to q> if the stack is empty when all the input has been
read.) The solution we adopt here is to allow moves that use only A as input, corresponding
to A-transitions in an NFA-A. This suggests that what we really want is

6:0x(ZU{A})xT
> QxI*

Of course, once we have moves of the form 5(q, A, X), we can make them before all the
input has been read; if the next input symbol is not read in a move, it is still there to be read
subsequently.
We have already said that there may be situations when the machine will crash—that is,
when no move is specified. In the case of a finite automaton, when this happened we decided
to make 5 (q, a) a subset of Q, rather than an element, so that it could have the value O. At the
same time we allowed for the possibility that 5(q¢, a) might contain more than one element, so
that the FA became nondeterministic. Here we do the same thing, except that since Q x I* is
an infinite set we should say explicitly that 5(q, a, X) and 6(q, A, X) will always be finite. In
our current example the nondeterminism is not necessary, but in many cases it is. Thus we are
left with
5:Qx(ZU{A}) x I — the set of finite subsets of Q x I"

Now we can give a precise description of our simple-palindrome recognizer. Q will be the
set {Go, 1, 92}, qo is the initial state, and q, is the only accepting state. The input alphabet
¥ is {a, b,c}, and the stack alphabet I" is {a, b, Zo}. The transition function 6 is given by
Table 7.1. Remember that when we specify a string to be placed on the stack, the top of the
stack corresponds to the left end of the string. This convention may seem odd at first, since
if we were to push the symbols on one at a time we would have to do it right-to-left, or in
reverse order. The point is that when we get around to processing the symbols on the stack,
the order in which we encounter them is the same as the order in which they occurred in the
string.
Moves 1 through 6 push the input symbols a and b onto the stack, moves 7 through 9
change state without affecting the stack, moves 10 and 11 match an input symbol with a stack
on
symbol and discard both, and the last move is to accept provided there is nothing except Zp
the stack.
254 PART 3 Context-Free Languages and Pushdown Automata

Table 7.1 |Transition table for Example 7.1

Move number _ ‘State Inpu k symbol


1 qo a Zo (qo, AZo)
2 qo b Zo (4o, bZo)
3 qo a a (qo, aa)
4 qo b a (qo; ba)
5 qo a b (qo, ab)
6 qo b b (qo, bb)
7 qo c Zo (%1, Zo)
8 qo é a (qi, 4)
9 qo c b (q1, 5)
10 q1 a a (q1, A)
11 q1 b b (41, A)
12 1 A Zo (92, Zo)
(all other combinations) : none

Let us trace the moves of the machine for three input strings: abcba, ab, and acaa.

(initially) qo abcba Zo
1 qo bcba aZy
4 qo cba baZo
9 q\ ba ba Zo
11 q1 a aZo
10 q1 = Zo
12 QD - Zo
(accept)
(initially) do ab Zo
i qo b aZo
4 do = ba Zo
(crash)
(initially) q0 acaa Zo
1 qo caa aZo
8 q aa aZo
10 nN a De
12 q2 a Zo
a a Re eee a

Note the last move on input string acaa. Although there is no


move 5(qi, a, Zo), the
machine can take the A-transition 5(q1, A, Zo) before running
out of choices. We could say
that the portion of the string read so far (i.e., aca) is accepted,
since the machine is in an
accepting state at this point; however, the entire input string is not
accepted, because not all of
it has been read.
Figure 7.1 shows a diagram corresponding to Example 7.1,
modeled after (but more
complicated than) a transition diagram for an FA. Each transitio
n is labeled with an input
(either an alphabet symbol or A), a stack symbol X,
a slash (/), and a string @ of stack
symbols. The interpretation is that the transition may occur
on the specified input and involves
CHAPTER 7 Pushdown Automata 255

b, a/ba b, b/A

Figure 7.1 |
Transition diagram for the pushdown automaton (PDA) in
Example 7.1.

replacing X on the stack by a. Even with the extra information required for labeling an arrow,
a diagram of this type does not capture completely the PDA’s behavior in the same way that
a transition diagram for an FA does. With an FA, you can start at any point with just the
diagram and the input symbols and trace the action of the machine by following the arrows. In
Figure 7.1, however, you cannot follow the arrows without keeping track of the stack contents—
possibly the entire contents—as you go. The number of possible combinations of state and
stack contents is infinite, and it is therefore not possible to draw a “finite-state diagram” in the
same sense as for an FA. In most cases we will describe pushdown automata in this chapter by
transition tables similar to the one in Table 7.1, although it will occasionally also be useful to
show a transition diagram. »

7.2| THE DEFINITION OF A PUSHDOWN


AUTOMATON
Below is a precise definition of the type of abstract machine illustrated in Example 7.1.
Remember that what is being defined is in general nondeterministic.

Definition 7.1 Definition of a PDA


256 PART 3 Context-Free Languages and Pushdown Automata

The stack alphabet I and the initial stack symbol Zo are what make it necessary
to have a 7-tuple rather than a 5-tuple. Otherwise, the components of the tuple are the
same as in the case of an FA, except that the transition function 6 is more complicated.
We can trace the operation of a finite automaton by keeping track of the current
state at each step. In order to trace the operation of a PDA M, we must also keep track
of the stack contents. If we are interested in what the machine does with a specific
input string, it is also helpful to monitor the portion of the string yet to be read. A
configuration of the PDA M = (Q, X,T, go, Zo, A, 5) isa triple

(q, x, a)

where q € Q,x € X*, anda € I*. Saying that (g, x, a) is the current configuration
of M means that q is the current state, x is the string of remaining unread input, and
qa is the current stack contents, where as usual it is the left end of a that corresponds
to the top of the stack.
We write

(p, x, a) ru (q, y, B)

to mean that one of the possible moves in the first configuration takes M to the
second. This can happen in two ways, depending on whether the move consumes an
input symbol or is a A-transition. In the first case, x = ay for some a € &, and in
the second case x = y; we can summarize both cases by saying x = ay for some
a € XU {A}. In both cases, the string B of stack symbols is obtained from a by
replacing the first symbol X by a string & (in other words, a = X y for some X € T
and some y < I*, and B = &y for some —é € I'*), and

(q,5) € 5(p,a, X)
More generally, we write

(p, x, 2) Far (@sy, B)


if there is a sequence of zero or more moves that takes M from the first configuration
to the second. As usual, if there is no possibility of confusion, we shorten
+ mu tok
and 7, to -*. Using the new notation, we may define acceptance of
a string by a
PDA.

Definition 7.2. Acceptance by a PDA

Note that whether or not a string is accepted depends only


on the current state
when the string has been processed, not on the stack content
s. We use the phrase
CHAPTER 7 Pushdown Automata 257

accepting configuration to denote any configuration in which the state is an accepting


state. This type of acceptance is sometimes called acceptance by final state. It will
be convenient in Section 7.5 to look briefly at another type, acceptance by empty
stack. In this approach, a string is said to be accepted if it allows the PDA to reach
a configuration in which the stack is empty, regardless of whether the state is an
accepting state. It is not hard to see (Section 7.5 and Exercises 7.41 and 7.42) that
the two types of acceptance are equivalent, in the sense that if a language is accepted
by some PDA using one mode of acceptance, there is another PDA using the other
mode that also accepts the language.
It is worth emphasizing that when we say a string x is accepted by a PDA, we
mean that there is a sequence of moves that cause the machine to reach an accepting
configuration as a result of reading the symbols of x. Since a PDA can be nonde-
terministic, there may be many other possible sequences of moves that do not lead
to an accepting configuration. Each time there is a choice of moves, we may view
the PDA as making a guess as to which one to make. Acceptance means that if the
PDA guesses right at each step, it can reach an accepting configuration. In our next
example, we will see a little more clearly what it means to guess right at each step.

A PDA Accepting the Language of Palindromes | EXAMPLE 7.2 _


This example involves the language pal of palindromes over {a, b} (both even-length and odd-
length), without the marker in the middle that provides the signal for the PDA to switch from
the “pushing-input-onto-stack” state to the ““comparing-input-symbo]-to-stack-symbol” state.
The general strategy for constructing a PDA to recognize this language sounds the same as in
Example 7.1: Remember the symbols seen so far, by saving them on the stack, until we are
ready to begin matching them with symbols in the second half of the string. This switch from
one type of move to the other should happen when we reach the middle of the string, just as in
Example 7.1. However, without a symbol marking the middle explicitly, the PDA has no way
of knowing that the middle has arrived; it can only guess. Fortunately, there is no penalty for
guessing wrong, as long as the guess does not allow a nonpalindrome to be accepted.
We think of the machine as making a sequence of “not yet” guesses as it reads input
symbols and pushes them onto the stack. This phase can stop (with a “yes” guess) in two
possible ways: The PDA may guess that the next input symbol is the one in the very middle
of the string (and that the string is of odd length) and can therefore be discarded since it need
not be matched by anything; or it may guess that the input string read so far is the first half
of the (even-length) string and that any subsequent input symbols should therefore be used to
match stack symbols. In effect, if the string read so far is x, the first “yes” guess that might be
made is that another input symbol s should be read and that the input string will be xsx”. The
second possible guess, which can be made without reading another symbol, is that the input
string will be xx". In either case, from this point on the PDA is committed. It makes no more
and
guesses, attempts to process the remaining symbols as if they belong to the second half,
can accept no string other than the one it has guessed the input string to be.
PDA
This approach cannot cause a nonpalindrome to be accepted, because each time the
to accept any string not of the form xsx” or xx”. On the
makes a “yes” guess, it is then unable
to be accepted: Every palindrome looks like
other hand, the approach allows every palindrome
258 PART 3 Context-Free Languages and Pushdown Automata

either xsx” or xx", and in either case, there is a permissible sequence of choices that involves
making the correct “yes” guess at just the right time to cause the string to be accepted. It is
still possible, of course, that for an input string z that is a palindrome the PDA guesses “yes”
at the wrong time or makes the wrong type of “yes” guess; it might end up accepting some
palindrome other than z, or simply stop in a nonaccepting state. This does not mean that the
PDA is incorrect, but only that the PDA did not choose the particular sequence of moves that
would have led to acceptance of z.
The transition table for our PDA is shown in Table 7.2. The sets Q, I, and A are the same
as in Example 7.1, and there are noticeable similarities between the two transition tables. The
moves in the first six lines of Table 7.1 show up as possible moves in the corresponding lines
of Table 7.2, and the last three lines of the two tables (which represent the processing of the
second half of the string) are identical.
The fact that the first six lines of Table 7.2 show two possible moves tells us that there is
genuine nondeterminism. The two choices in each of these lines are to guess “not yet,” as in
Table 7.1, and to guess that the input symbol is the middle symbol of the (odd-length) string.
The input symbol is read in both cases; the first choice causes it to be pushed onto the stack,
and the second choice causes it to be discarded.
However, there is also nondeterminism of a less obvious sort. Suppose for example that
the PDA is in state qo, the top stack symbol is a, and the next input symbol is a, as in line 3.
In addition to the two moves shown in line 3, there is a third choice shown in line 8: not to
read the input symbol at all, but to execute a A-transition to state qi. This represents the other
“yes” guess, the guess that as a result of reading the most recent symbol (now on top of the
stack), we have reached the middle of the (even-length) string. This choice is made without
even looking at the next input symbol. (Another approach would have been to read the a, use it
to match the a on the stack, and move to qj, all on the same move; however, the moves shown
in the table preserve the distinction between the state qo in which all the guessing occurs and
the state q; in which all the comparison-making occurs.)
Note that the A-transition in line 8 is not in itself the source of nondeterminism. The
move in line 12, for example, is the only possible move from state 4 if Zo is the top
stack

Table 7.2 Transition table for Example 7.2

1 qo a Zo (qo, AZo), (G1, Zo)


2 qo b Zo (qo, bZo), (41, Zo)
3 90 a a (qo, aa), (n1, a)
4 70 b a (qo, ba), (q1, a)
=) 90 a b (qo, ab), (a1, b)
6 90 b b (qo, bb), (qi, b)
fh qo A Zo (41, Zo)
8 90 A a (41, a)
9 qo A b (q1, b)
10 1 a a (qi, A)
11 1 b b (q1, A)
12 1 A Zo (42, Zo)
(all other combinations) none
CHAPTER 7 Pushdown Automata

(Go baab, Zo)

(q;, aab, Zo) (q1, baab, Zp)

(qo, aab, bZ) (42, aab, Zo) (q2, baab, Zp)

(qo, ab, abZy) (q1, ab, bZp) (qy, aab, bZp)

(qo, b, aabZ) (q,, b, abZ) (q,, ab, abZp)

(do, A, baabZy) (q;, A, aabZy) (q1, 6, aabZy) (41, 5, bZ)

(q,, A, baabZy) ‘ (41, A, Zo)

(>, A, Zo)

Figure 7.2|
Computation tree for the PDA in Table 7.2, with input baab.

symbol. Line 8 represents nondeterminism because if the PDA is in state qo and a is the top
stack symbol, there is a choice between a move that reads an input symbol and one that does
not. We will return to this point in Section 7.3.
Just as in Section 4.1, we can draw a computation tree for a PDA such as this one, showing
the configuration at each step and the possible choices of moves at each step. Figure 7.2 shows
such a tree for the string baab, which is a palindrome.
Each time there is a choice, the possible moves are shown left-to-right in the order they
appear in Table 7.2. In particular, in each configuration along the left edge of Figure 7.2 except
the last one, the PDA is in state go and there is at least one unread input symbol. At each of
these points, the PDA can choose from three possible moves. Continuing down the left edge
of the figure represents a “not yet” guess that reads an input and pushes it onto the stack. The
other two possibilities are the two moves to state q:, one that reads an input symbol and one
that does not.
260 PART 3 Context-Free Languages and Pushdown Automata

The sequence of moves that leads to acceptance is

(qo, baab, Zo) + (qo, aab, bZo)

F (qo, ab, abZ)


= (1, ab, abZo)

[= ((1 5 b, bZo)

= (Nn ’ NG Zo)

F (q2, A, Zo) (accept)

This sequence of moves is the one in which the “yes” guess of the right type is made at
exactly the right time. Paths that deviate from the vertical path too soon terminate before the
PDA has finished reading the input; the machine either crashes or enters the accepting state q>
prematurely (so that the string accepted is a palindrome of length 0 or 1, not the one we have
in mind). Paths that follow the vertical path too long cause the PDA either to crash or to run
out of input symbols before getting a chance to empty the stack.

7.3 |DETERMINISTIC PUSHDOWN


AUTOMATA
The PDA in Example 7.1 never has a choice of more than one move, and it is appro-
priate to call it a deterministic PDA. The one in Example 7.2 illustrates both types
of nondeterminism: that in which there are two or more moves involving the same
combination of state, stack symbol, and input, and that in which, for some combina-
tion of state and stack symbol, the machine has a choice of reading an input symbol
or making a A-transition.

Definition 7.3. De on of a deterministic PDA

era. Os

Note that our definition does not require the transition function to be defined
for
every combination of state, input, and stack symbol; in a determin
istic PDA, it is
CHAPTER 7 Pushdown Automata 261

still possible for one of the sets 6(q, a, X) to be empty. In this sense, our notion of
determinism is a little less strict than in Chapter 4, where we called a finite automaton
nondeterministic if there was a pair (gq, a) for which 5(q, a) did not have exactly one
element.
The last statement in the definition anticipates to some extent the results of the
next two sections, which show that the languages that can be accepted by PDAs are
precisely the context-free languages. The last statement also suggests another way
in which CFLs are more complicated than regular languages. We did not define a
“deterministic regular language” in Chapter 4, although we considered both NFAs
and deterministic FAs. The reason is that for any NFA there is an FA recognizing
the same language; any regular language can be accepted by a deterministic FA. Not
every context-free language, however, can be accepted by a deterministic PDA. It
probably seemed obvious in Example 7.2 that the standard approach to accepting the
language of palindromes cannot work without nondeterminism; we will be able to
show in Theorem 7.1 that no other PDA can do any better, and that the language of
palindromes is not a DCFL.

A DPDA Accepting Balanced Strings of Brackets | EXAMPLE7.3 |


Consider the language L of all balanced strings involving two types of brackets: { } and []. L
is the language generated by the context-free grammar with productions

S — SS | [S]| {S}| A

(It is also possible to describe this type of “balanced” string using the approach of Definition 6.5;
see Exercise 7.20.)
Our PDA will have two states: the initial state go, which is also the accepting state (note
that A is one element of L), and another state q,. Left brackets of either type are saved on the
stack, and one is discarded whenever it is on top of the stack and a right bracket of the same
type is encountered in the input. The feature of strings in L that makes this approach correct,
and therefore makes a stack the appropriate data structure, is that when a right bracket in a
balanced string is encountered, the left bracket it matches is the Jast left bracket of the same
type that has appeared previously and has not already been matched. The signal that the string
read so far is balanced is that the stack has no brackets on it (i.e., Zo is the top symbol), and if
this happens in state q, the PDA will return to the accepting state qo via a A-transition, leaving
the stack unchanged. From this point, if there is more input, the machine proceeds as if from
the beginning.
Table 7.3 shows a transition table for such a deterministic PDA. To make it easier to read,
the parentheses with which we normally enclose a pair specifying a single move have been
omitted.
The input string {[ ]}[], for example, results in the following sequence of moves.

(qo, {(}0), Zo) F (a1, UFO, {Zo)


F (qi, IHU, (Zo)
F (41, }U, {Zo)
F (qi, LI, Zo)
262 PART 3 Context-Free Languages and Pushdown Automata

Table 7.3 |Transition table for Example 7.3


Movenumber State =Inp _ Mo
1 qo { Zo M1, {Zo
2 qo [ Zo 41, 1Zo
3 71 { { a, {{
ae q1 [ aU
5 71 { [ ni. {l
6 q1 [ [ all
ui q1 } { n,A
8 1 ] [ a, A
9 qi A Zo Go, Zo
(all other combinations) none

F (go, [], Zo)


F (1, J, [Zo)
F (qi, A, Zo)
F (qo, A, Zo) (accept)
You may very well have seen stacks used in the way we are using them here, with lan-
guages closely related to the set of balanced strings of parentheses. If you have written or
studied a computer program that reads an algebraic expression and processes it (to “process”
an expression could mean to evaluate it, to convert it to postfix notation, to build an expression
tree to store it, or simply to check that it obeys the syntax rules), the program almost certainly
involved at least one stack. If the program did not use recursion, the stack was explicit—storing
values of subexpressions, perhaps, or parentheses and operators; if the al gorithm was recursive,
there was a stack behind the scenes, since stacks are the data structures involved whenever
recursive algorithms are implemented.

| EXAMPLE7.4 | A DPDA to Accept Strings with More a’s Than b’s


For our last example of a DPDA, we consider

L = {x € {a, b}* |ng(x) > ny (x)}


The approach we use is similar in some ways to that in the previous example.
There we
saved left brackets on the stack so that they could eventually be matched by right
brackets.
Now we save excess symbols of either type, so that they can eventually be matched
by symbols
of the opposite type. The other obvious difference is that since the null string
is not in L, the
initial state is not accepting. With those differences in mind, it is easy to understand
the DPDA
described in Table 7.4. The only two states are the initial state qo and the
accepting state q).
At any point when the stack contains only Zo, the string read so far has equal
numbers of
a’s and b’s. It is almost correct to say that the machine is in the accepting
state g, precisely
when there is at least one a on the stack. This is not quite correct, because
input b in state q,
requires removing a from the stack and returning to go, at least temporaril
y; this guarantees
that if the a just removed from the stack was the only one on the stack,
the input string read so
far (which has equal numbers of a’s and b’s) is not accepted. The
A-transition is the way the
PDA returns to the accepting state once it determines that there
are additional a’s remaining
on the stack.
CHAPTER 7 Pushdown Automata 263

Table 7.4 |Transition table for a DPDA to accept L


aN

1 qo a Zo (1 ’ aZo)
2 qo b Zo (qo, bZo)
3 qo a b (qo, A)
A qo b b (qo, bb)
5 q1 a a (41, aa)
6 q b a (qo, A)
7 qo A a (1 ’ a)
(all other combinations) none

Table 7.51 A DPDA with no A-transitions accepting L

1 qo a Zo (41; Zo)
2 qo b Zo (qo, bZo)
3 qo a b (qo, A)
4 qo b b (qo, bb)
5 1 a Zo (q1, AZo)
6 q b Zo (qo, Zo)
7 71 a a (q1, 4a)
8 q1 b a (q1, A)
(all other combinations) none
v

If we can provide a way for the PDA to determine in advance whether an a on the stack
is the only one, then we can eliminate the need to leave the accepting state when a is popped
from the stack, and thereby eliminate A-transitions altogether. There are at least three natural
ways we might manage this. One is to say that we will push a’s onto the stack only when we
have an excess of at least two, so that in state g,, top stack symbol Zp means one extra a, and
top stack symbol a means more than one. Another is to use a different stack symbol, say A,
for the first extra a. A third is simply to introduce a new state specifically for the case in which
there is exactly one extra a. The DPDA shown in Table 7.5 takes the first approach. As before,
qi is the accepting state. There is no move specified from q; with stack symbol b or from qo
with stack symbol a, because neither of these situations will ever occur.
This PDA may be slightly easier to understand with the transition diagram shown in
Figure 7.3.
We illustrate the operation of this machine on the input string abbabaa:
(qo, abbabaa, Zo) + (q1, bbabaa, Zo)
F (qo, babaa, Zo)
F (qo, abaa, bZo)
- (qo, baa, Zo)
F (qo, aa, bZo)

F (qo, 4, Zo)
- (q;, A, Zo) (accept)
264 PART 3 Context-Free Languages and Pushdown Automata

b, Z/bZ a, Z0/aZo

Figure 7.3 |
The DPDA in Table 7.5.

We conclude this section by showing the result we promised at the beginning:


Not every language that can be accepted by a PDA can be accepted by a deterministic
PDA.
CHAPTER 7 Pushdown Automata 265

See Exercise 7.17 for some other examples of CFLs that are not DCFLs, and see
Section 8.2 for some other methods of showing that languages are not DCFLs.

7.4|A PDA CORRESPONDING TO A GIVEN


CONTEXT-FREE GRAMMAR
Up to this point, the pushdown automata we have constructed have been based on
simple symmetry properties of the strings in the languages being recognized, rather
266 PART 3 Context-Free Languages and Pushdown Automata

than on any features of context-free grammars generating the languages. As a result,


it may not be obvious that every context-free language can be recognized by a PDA.
However, that is what we will prove in this section.
Starting with a context-free grammar G, we want to build a PDA that can test an
arbitrary string and determine whether it can be derived from G. The basic strategy is
to simulate a derivation of the string in the given grammar. This will require guessing
the steps of the derivation, and our PDA will be nondeterministic. (As we will see
in Section 7.6, there are certain types of grammars for which we will be able to
modify the PDA, keeping its essential features but eliminating the nondeterminism.
The approach will be particularly useful in these cases, because for a string x in the
language, following the moves of the machine on input x will not only allow us to ©

confirm x’s membership in the language, but also reveal a derivation of x in the
grammar. Because of languages like pal, however, finding such a deterministic PDA
is too much to expect in general.) As the simulation progresses, the machine will test
the input string to make sure that it is still consistent with the derivation-in-progress.
If the input string does in fact have a derivation from the grammar, and if the PDA’s
guesses are the ones that correctly simulate this derivation, the tests will confirm this
and allow the machine to reach an accepting state.
There are at least two natural ways a PDA can simulate a derivation in the gram-
mar. A step in the simulation corresponds to constructing a portion of the derivation
tree, and the two approaches are called top-down and bottom-up because of the order
in which these portions are constructed.
We will begin with the top-down approach. The PDA starts by pushing the start
symbol S (at the top of the derivation tree) onto the stack, and each subsequent step
in the simulated derivation is carried out by replacing a variable on the stack (at a
certain node in the tree) by the right side of a production beginning with that variable
(in other words, adding the children of that node to the tree.) The stack holds the
current string in the derivation, except that as terminal symbols appear at the left of
the string they are matched with symbols in the input and discarded.
The two types of moves made by the PDA, after S is placed on the stack, are
1. Replace a variable A on top of the stack by the right side a of some production
A — a. This is where the guessing comes in.
2. Pop a terminal symbol from the stack, provided it matches the next input
symbol. Both symbols are then discarded.
At each step, the string of input symbols already read (which have been successfu
lly
matched with terminal symbols produced at the beginning of the string by the
deriva-
tion), followed by the contents of the stack, exclusive of Zo, constitute
s the current
string in the derivation. When a variable appears on top of the stack, it is because
terminal symbols preceding it in the current string have already been
matched, and
thus it is the leftmost variable in the current string. Therefore, the derivatio
n being
simulated is a leftmost derivation. If at some point there is no longer
any part of
the current string remaining on the stack, the attempted derivation
must have been
successful at producing the input string read so far, and the PDA can
accept.
We are now ready to give a more precise description of this top-dow
n PDA and
to prove that the strings it accepts are precisely those generated
by the grammar.
CHAPTER 7 Pushdown Automata 267

Definition 7.4 Top-down PDA pata! toa cS

lee gs .V Uobe

YA hasthe tapof cae thestring yy1 pais


of
minals from the stack, leaving the string B (oractually
. Hie the PDAiinstate qe Ttwill follow, fromLe

is step of _
istou ea production ofthe a s— y’B,$0 that the string y in (7. 1):
268 PART 3 Context-Free Languages and Pushdown Automata

the initial mo
.
CHAPTER 7 Pushdown Automata 269

A Top-down PDA for the Strings with More a's than b's | EXAMPLE7.5 |
Consider the language

L = {x € {a, b}" |ng(x) > ny(x)}


for which we constructed a DPDA in Example 7.4. From Example 6.10, we know that one
context-free grammar for L is the one with productions

S—>al|aS|bSS|SSb|
SbS

Following the construction in the proof of Theorem 7.2, we obtain the PDA M = (Q, 2, T,
qo, Zo, A, 6), where Q = {q0,q1,q}, & = {a,b}, T = {S,a,b, Zo}, A = {q2}, and the
transition function 64 is defined by this table.

A
q1 A S (qi, 4), (gi, 4S), (qi, BSS), (qi, SSB), (qi, SBS)
q1 a a (qi, A)
q b b (qi, A)
nN A Zo (42, Zo)
(all other combinations) none

We consider the string x = abbaaa € L and compare the moves made by M in accepting
x with a leftmost derivation of x in the grammar. Each move in which a variable is replaced on
the stack by a string corresponds to a step in a leftmost derivation of x, and that step is shown
to the right of the move. Observe that at each step, the stack contains (in addition to Zo) the
portion of the current string in the derivation that remains after removing the initial string of
270 PART 3 Context-Free Languages and Pushdown Automata

terminals read so far.

(qo, abbaaa, Zo)


(q1, abbaaa, SZo) S
(q1, abbaaa, SbSZo) => SbS
(q1, abbaaa, abS Zo) => abS
(qi, bbaaa, bSZo)
(41; baaa, SZo)

(q1, baaa, bSSZp) => abbSS


(qi, aaa, SSZo)
(q1, aaa, aSZo) => abbaS
(q1, aa, SZo)
(qi, aa, aSZo) = abbaaS
(91,4, SZo)
(91,4, Zo) => abbaaa
(q1, A, Zo)
ys (q2, A, Zo)
ame
pee
calle
Welle
ale
alls
alee
ae
pea

You may wish to trace the other possible sequences of moves by which x can be accepted,
corresponding to other possible leftmost derivations of x in this CFG.

The opposite approach to top-down is bottom-up. In this approach, there are op-
posite counterparts to both types of moves in the top-down PDA. Instead of replacing
a variable A on the stack by the right side a of a production A > a (which effectively
extends the tree downward), the PDA removes a from the stack and replaces it by
(or “reduces it to”) A, so that the tree is extended upward. In both approaches, the
contents of the stack represents a portion of the current string in the derivation being
simulated; instead.ofremoving a terminal symbol from the beginning of this portion
(which appeared on the stack as a result of applying a production), the PDA “shifts”
a terminal symbol from the input to the end of this portion, in order to prepare for a
reduction.
Note that because shifting input symbols onto the stack reverses their order, the
string @ that is to be reduced to A will appear on the stack in reverse; thus the PDA
begins the reduction with the last symbol of on top of the stack.
Note also that while the top-down approach requires only one move to apply a
production A — a, the corresponding reduction in the bottom-up approach requires
a sequence of moves, one for each symbol in the string w. We are interested primarily
in the sequence as a whole, and with the natural correspondence between a production
in the grammar and the sequence of moves that accomplishes the reduction.
The process terminates when the start symbol S, left on the stack by the
last
reduction, is popped off the stack and Zo is the only thing left. The entire process
simulates a derivation, in reverse order, of the input string. At each step, the current
string in the derivation is formed by the contents of the stack (in reverse), followed
by
the string of unread input; because after each reduction the variable on top of
the stack
is the rightmost one in the current string, the derivation being simulated
in reverse is
a rightmost derivation.
CHAPTER 7 Pushdown Automata 271

A Bottom-up PDA for Simple Algebraic Expressions | EXAMPLE7.6 |


Let us illustrate this approach using the grammar G, a simplified version of the one in Section
6.4, with productions

Gd) S > S+T


(2) S>T
(3) T > Txa
(4) T->a

The reason for numbering the productions will be seen presently. Suppose the input string is
a +a x a$, which has the rightmost derivation

S>S+T
=>S+Txa
= Stata
=> ff -axia
>a+axa

The corresponding steps or groups of steps executed by the bottom-up PDA as the string a+axa
is processed are shown in Table 7.6. Remember that at each point, the reverse of the string on
the stack (omitting Zo), followed by the string of unread input, constitutes the current string in
the derivation, and the reductions occur in the opposite order from the corresponding steps in
the derivation. For example, since the last step in the derivation is to replace T by a, the first
reduction replaces a on the stack by T. *
In Table 7.7 we show the details of the nondeterministic bottom-up PDA that carries out
these moves. The shift moves allow the next input to be shifted onto the stack, regardless of
the current stack symbol. The sequence of moves in a reduction can begin when the top stack
symbol is the last symbol of a, for some string a in a production A > a. If |a| > 1, the
moves in the sequence proceed on the assumption that the symbols below the top one are in
fact the previous symbols of a; they remove these symbols, from back to front, and place A

Table 7.6 |Processing of a + a * a by the bottom-up PDA


corresponding to G

shift aZy
reduce Ta T Zo
reduce S—> T SZo
shift +S$Zo
shift a+SZo
reduce T- a T+ SZo
shift *T + SZo a
shift axT+SZo _
reduce T > Txa T +SZpo -
reduce S > S+T SZo fa
(pop S) Zo =
(accept)
Se
272 PART 3 Context-Free Languages and Pushdown Automata

Table 7.7 |The nondeterministic bottom-up


PDA for G

q A Ss (q1, A)
1 A Zo (42, A)
(all other combinations) none

on the stack. Once such a sequence is started, a set of states unique to this sequence is what
allows the PDA to remember how to complete the sequence. Suppose for example that we
want to reduce the string T « a to T. If we begin in some state g, with a on top of the stack,
the first step will be to remove a and enter a state that we might call q3,;. (Here is where we
use the numbering of the productions: The notation is supposed to suggest that the PDA has
completed one step of the reduction associated with production 3.) Starting in state q3,,, the
machine expects to see « on the stack. If it does, it removes it and enters state 43,2, from which
the only possible move is to remove T from the stack, replace it by T, and return to g.
Of
course, all the moves of this sequence are A-transitions, affecting only the stack.
Apart from the special states used for reductions, the PDA stays in the state q during
almost all the processing. When S is on top of the stack, it pops S and moves to g;, from
which it enters the accepting state q, if at that point the stack is empty except for Zo.
The input
alphabet is {a, +, *} and the stack alphabet is {a, +, *, S, T, Zo}.
Note that in the shift moves, a number of combinations of input and stack symbol could
be omitted. For example, when a string in the language is processed, the symbol +
will never
occur simultaneously as both the input symbol and stack symbol. It does no harm
to include
these, however, since no string giving rise to these combinations will be reduced to
S.
It can be shown without difficulty that this nondeterministic PDA accepts
the language
generated by G. Moreover, for any CFG, a nondeterministic PDA can be construct
ed along
the same lines that accepts the corresponding language.
eS
CHAPTER 7 Pushdown Automata 273

7.5|A CONTEXT-FREE GRAMMAR


CORRESPONDING TO A GIVEN PDA
It will be useful in this section to keep in mind the nondeterministic top-down PDA
constructed in the previous section to simulate leftmost derivations in a given context-
free grammar. If at some point in a derivation the current string is xa, where x is a
string of terminals, then at some point in the simulation, the input string read so far
is x and the stack contains the string a. The stack alphabet of the PDA is © UV.
The moves are defined so that variables are removed from the stack and replaced
by the right sides of productions, and terminals on the stack are used to match input
symbols. In this construction the states are almost incidental; after the initial move,
the PDA stays in the same state until it is ready to accept.
Now we consider the opposite problem, constructing a context-free grammar
that generates the language accepted by a given PDA. The argument is reasonably
complicated, but it will be simplified somewhat if we can assume that the PDA accepts
the language by empty stack (see Section 7.2). Our first job, therefore, is to convince
ourselves that this assumption can be made without loss of generality. We state the
result in Theorem 7.3 below and give a brief sketch of the proof, leaving the details
to the exercises.

a uppose M = (0, vw, oe A, 5)iisa aoe automatonaccemine


the language1 © 5*. Then there is another PDA M, = (Q1, 4,9
/MN:z. oe5) seeping L beae oe - one Mos
ani string

- ‘empty es whew M enters an seceptne state. oie .


we ch nae
[a pliesofM, butwith thebe to empty its stack anions: without

MGit wil| thetetore accept by ails ick:a string that M does not arcenl -
id tlBt we letu1 startyypareson itsHah Ae a peeunder

is appropriate to‘doSO.
‘is
we allow M, to empty itsstack aeneuy. when M enters ao.
Stat isto provide it with a A-transition from this state toa p jal
tying state, rom which the 1 just:
274 PART 3 Context-Free Languages and Pushdown Automata

Now we may return to the problem we are trying to solve. We have a PDA
M that accepts a language L by empty stack (let us represent this fact by writing
L = L,(M)), and we would like to construct a CFG generating L. It will be helpful
to try to preserve as much as possible the correspondence in Theorem 7.2 between
the operation of the PDA and the leftmost derivation being simulated. The current
string in the derivation will consist of two parts, the string of input symbols read by
the PDA so far and a remaining portion corresponding to the current stack contents.
In fact, we will define our CFG so that this remaining portion consists entirely of
variables, so as to highlight the correspondence between it and the stack contents: In
order to produce a string of terminals, we must eventually eliminate all the variables
from the current string, and in order for the input string to be accepted by the PDA
(by empty stack), all the symbols on the stack must eventually be popped off.
We consider first a very simple approach, which is too simple to work in general.
Take the variables in the grammar to be all possible stack symbols in the PDA,
renamed if necessary so that no input symbols are included; take the start symbol to
be Zo; ignore the states of the PDA completely; and for each PDA move that reads a
(either A or an element of 2) and replaces A on the stack by B; B2 --- B,,, introduce
the production

A — aB,B,::- By

This approach will give us the correspondence outlined above between the current
stack contents and the string of variables remaining in the current string being derived.
Moreover, it will allow the grammar to generate all strings accepted by the PDA. The
reason it is too simple is that by ignoring the states of the PDA we may be allowing
other strings to be derived as well. To see an example, we consider Example 7.1.
This PDA accepts the language {xcx" |x € {a, b}*}. The acceptance is by final state,
rather than by empty stack, but we can fix this, and eliminate the state q2 as well, by
changing move 12 to

6(q1, a, Zo) = {(q1, A)}

instead of {(g2, Zo)}. We use A and B as stack symbols instead of a and b. The
moves of the PDA include these:

(qo, 4, Zo) = {(Go, AZo)}


8(qo, c, A) = {(q1, A)}
6(q1,a, A) = {(q1, A)}
6(q1, A, Zo) = {(qi, A)}
Using the rule we have tentatively adopted, we obtain the corresponding productions

Zo re aAZpo

A->cA
A->a
Zo7>A
CHAPTER 7 Pushdown Automata 275

The string aca has the leftmost derivation


Zo => aAZo => acAZy => acaZy => aca
corresponding to the sequence of moves

(qo, aca, Zo) F (qo,ca, AZo) F (q1,a, AZo) F (qi, A, Zo) F (qi, A, A)
If we run the PDA on the input string aa instead, the initial move is
(qo, aa, Zo) F (Go, a, AZo)
and at this point the machine crashes, because it is only in state g; that it is allowed
to read a and replace A on the stack by A. However, our grammar also allows the
derivation
Ly => 4ALy > aaZy — ad
In order to eliminate this problem, we must modify our grammar so as to in-
corporate the states of the PDA. Rather than using the stack symbols themselves as
variables, we try things of the form

[p, A, q]
where p and q are states. For the variable [p, A, q] to be replaced by a (either A
or a terminal symbol), it must be the case that there is a PDA move that reads a,
pops A from the stack, and takes the machine from state p to state g. More general
productions involving the variable [p, A, q] are to be thought of as representing any
sequence of moves that takes the PDA from state p to state q and has the ultimate
effect of removing A from the stack. “
If the variable [p, A, g] appears in the current string of a derivation, our goal is
to replace it by A or a terminal symbol. This will be possible if there is a move that
takes the PDA from p to g and pops A from the stack. Suppose instead, however,
that there is a move from p to some state p,, that reads a and replaces A on the
stack by B, Bz --+ Bm. It is appropriate to introduce a into our current string at this
point, since we want the initial string of terminals to correspond to the input read so
far. But it is now also appropriate to think of our original goal as being modified,
as a result of all the new symbols that have been introduced on the stack. The most
direct way to eliminate these new symbols Bi, ..., Bm is as follows: to start in p
and make a sequence of moves—ending up in some state p2, say—that result in By
being removed from the stack; then to make some more moves that remove Bz and
in the process move from p2 to some other state p3; ...; to move from pm—1 to some
Pm and remove B,—1; and finally, to move from pm to gq and remove Br. The actual
moves of the PDA may not accomplish these steps directly, but this is what we want
their ultimate effect to be. Because it does not matter what the states p2, p3,..., Pm
are, we will allow any string of the form
a[pi, iB: p2\(p2; Bo, D3] ae LPas By; q)

to replace [p, A,q] in the current string. In other words, we will introduce the
productions
[p, A, q] > alpi, B, pollp2, Bo, p3]--- Pm, Bm» 4]
276 PART 3 Context-Free Languages and Pushdown Automata

for all possible sequences of states p2,..., Pm. Some such sequences will be dead
ends, in the sense that there will be no sequence of moves following this sequence
of states and having this ultimate effect. But no harm is done by introducing these
productions, because for any derivation in which one of these dead-end sequences
appears, there will be at least one variable that cannot be eliminated from the string,
and so the derivation will not produce a string of terminals. If we denote by S the
start symbol of the grammar, the productions that we need to begin are those of the
form

S — [qo, Zo,g]
where qo is the initial state. When we accept strings by empty stack the final state is
irrelevant, and thus we include a production of this type for every possible state q.
We now present the proof that the CFG we have described generates the language
accepted by M.

Bn) for some Bi.


2.
the
Ps
CHAPTER 7 Pushdown Automata 277
278 PART 3 Context-Free Languages and Pushdown Automata

‘For the induction step,weeas | foranyn ie


< , and any
a) of q, q’ e 2 -€ . :

ee Ba: Xm:
oa
ins ueBr). we know t
:

- This completes the inducti mn and t

Obtaining a CFG from a PDA Accepting Simple Palindromes


We return once more to the language L = {xcx" | x € {a, b}*}
of Example 7.1, which we
used to introduce the construction in Theorem 7.4. In that discussion
we used the PDA whose
transition table is shown below. (It is modified from the one in
Example 7.1, both in using
uppercase letters for stack symbols and in accepting by empty
stack.)
CHAPTER 7 Pushdown Automata 279

q0 a ze0 (qo, AZo)


qo b Zo (qo, BZ)
90 a A (go, AA)
90 b A (go, BA)
90 a B (go, AB)

qo b B (qo, BB)
qo c Zo (41, Zo)
qo c A (41, A)
KE
OMANIANFWNY
0 c B (1; B)

1 a A (q1, A)
qi b B (41, A)
qq A Zo (qi, A)
(all other combinations) none

In the grammar G = (V, D, S, P) obtained from the construction in Theorem 7.4, V


contains S as well as every object of the form [p, X, q], where X is a stack symbol and p and
q can each be either go or g,. Productions of the following types are contained in P:

(0) S$ — [40; Zo,4]


(1) [40, 20,9] — algo, A, PIP, Zo,4]
(2) [q0,Z0,4] — blqo, B, pllp, Zo,4]
(3) [q,4,9] — algo, A, pllp, 4,4]
(4) [qo,4,9] — blqo, B, pllp, A.)
(5) [q0,B,9g] — algo, A, pllp, B.q] ~
(6) [qo,B,q] — blao, B, pllp, Bq]
(7) [q0,Zo,9) — cla, Zo, 4]
(8) [q,4,9] > clu, A,4]
(9) [q0,B,9q] —> cla, B,q]
(10) [41,A,q1] >a
(11) [q,B,41] >)
(12) [q1, Zo; qi) rrigican

Allowing all combinations of p and q gives 35 productions in all.


Consider the string bacab. The PDA accepts it by the sequence of moves

(qo, bacab, Zo) i (Go; acab, BZ)

- (qo, cab, ABZo)


F (q;,ab, ABZo)

F (q1,b, BZo)
F (qi, A, Zo)
dase. 2s)
The corresponding leftmost derivation in the grammar is

S = (qo, Zo, 91]


=> blo, B, alla, Zo, 91]
280 PART 3 Context-Free Languages and Pushdown Automata

= balqo, A, alla, B, aillgi, Zo, 91]


=> baclq, A, alla, B, qillqi, Zo, 91]
= baca[q, B, ail, Zo, 41]
= bacab[q), Zo, 41]
=> bacab

From the sequence of PDA moves, it may look as though there are several choices of
leftmost derivations. For example, we might start with the production S — [qo, Zo, qo].
Remember, however, that [qo, Zo, gq] represents a sequence of moves from go to q that has the
ultimate effect of removing Zp from the stack. Since the PDA ends up in state qj, it is clear
that q should be q;. Similarly, it may seem as if the second step could be

[go, Zo, 91] = blqo, B, gollgo, Zo, 41)


However, the sequence of PDA moves that starts in qo and eliminates B from the stack ends
with the PDA in state g;, not go. In fact, because every move to state qo adds to the stack, the
variable [go, B, qo] in this grammar is useless: No string of terminals can be derived from it.
ee eee ee ee eee

7.6 |PARSING
Suppose that G is a context-free grammar over an alphabet X. To parse a string
x € &* (to find a derivation of x in the grammar G, or to determine that there is none)
is often useful. Parsing a statement in a programming language, for example, is
necessary in order to classify it according to syntax; parsing an algebraic expression
is essentially what allows us to evaluate the expression. The problem of finding
efficient parsing algorithms has led to a great deal of research, and there are many
specialized techniques that depend on specific properties of the grammar.
In this section we return to the two natural ways presented in Section 7.4 of
obtaining a PDA to accept the language L(G). In both cases, the PDA
not only
accepts a string in L(G) but does it by simulating a derivation of x (in
one case a
leftmost derivation, in the other case rightmost). Although the official
output of a
PDA is just a yes-or-no answer, it is easy enough to enhance the machine
slightly by
allowing it to record its moves, so that any sequence of moves leading to
acceptance
causes a derivation to be displayed. However, neither construction by itself
can be said
to produce a parsing algorithm, because both PDAs are inherently
nondeterministic.
In each case, the simulation proceeds by guessing the next step in the
derivation; if
the guess is correct, its correctness will be confirmed eventually
by the PDA.
One approach to obtaining a parsing algorithm would be to consider
all possible
sequences of guesses the PDA might make, in order to see whether
one of them
leads to acceptance. Exercise 7.47 asks you to use a backtracking
strategy for doing
this with a simple CFG. However, it is possible with both types
of nondeterministic
PDAs to confront the nondeterminism more directly: rather than
making an arbitrary
choice and then trying to confirm that it was the right one, trying
instead to use all
the information available in order to select the choice that
will be correct. In the
CHAPTER 7 Pushdown Automata 281

remainder of this section, we concentrate on two simple classes of grammars, for


which the next input symbol and the top stack symbol in the corresponding PDA
provide enough information at each step to determine the next move in the simulated
derivation. In more general grammars, the approach at least provides a starting point
for the development of an efficient parser.

7.6.1 Top-down parsing

A Top-down Parser for Balanced Strings of Parentheses | EXAMPLE7.8


7.8 |
We consider the language of balanced strings of parentheses. For convenience, we modify it
slightly by adding a special endmarker $ to the end of each string. The new language will be
denoted L. If we use [] as our parentheses, the context-free grammar with productions

S—>T$
T—>([T|T|A

is an unambiguous CFG generating L. In the top-down PDA obtained from this grammar,
the only nondeterminism arises when the variable T is on the top of the stack, and we have a
choice of two moves using input A. If the next input symbol is [, then the correct move (or,
conceivably, sequence of moves) must produce a [ on top of the stack to match it. Replacing T
by [T]T will obviously do this; replacing T by A would have a chance of being correct only
if the symbol below T were either [ or T, and it is not hard to see that this never occurs. It
appears, therefore, that if T is on top of the stack, T should be replaced by [TT if the next
input symbol is [ and by A if the next input is ] or $. The nondeterminism can be eliminated by
lookahead—using the next input symbol as well as the stack symbol to determine the move.
In the case when T is on the stack and the next input symbol is either ] or $, popping T
from the stack will lead to acceptance only if the symbol beneath it matches the input; thus the
PDA needs to remember the input symbol long enough to match it with the new stack symbol.
We can accomplish this by introducing the two states q) and gs to which the PDA can move on
the respective input symbol when T is on top of the stack. In either of these states, the only
correct move is to pop the corresponding symbol from the stack and return to q1. For the sake
of consistency, we also use a state g, for the case when T is on top of the stack and the next
move
input is [. Although in this case T is replaced on the stack by the longer string [T ]T, the
stack and return to q1. The alternative, which would be
from q) is also to pop the [ from the
replace these two moves by a single one that leaves the
slightly more efficient, would be to
PDA in q; and replaces T on the stack by TT.
The transition table for the original nondeterministic PDA is shown in Table 7.8, and
Table 7.9 describes the deterministic PDA obtained by incorporating lookahead.
ing
The sequence of moves by which the PDA accepts the string []$, and the correspond
steps in the leftmost derivation of this string, are shown below.

(qo, []$, Zo)


F (qi, []$, SZo) S
F (qi, []$, T$Zo) = T$
F (qr, 1$, [T]T$Zo) = (T]T$
+ (q1, 1$,TIT$Zo)
282 PART 3 Context-Free Languages and Pushdown Automata

Table 7.81 A nondeterministic top-down PDA for balanced strings of


parentheses

1 qo A N ro) (41, SZo)


2 1 A S (1, T$)
3 q\ A If (41, [RE); (q1, A)
4 1 [ [ (qi, A)
5 q1 ] ] (q1, A)
6 q $ $ (q1, A)
th 11 A Zo (42, Zo)
(all other combinations) none
——<—————
8
ee

Table 7.9 |Using lookahead to eliminate


the nondeterminism from
Table 7.8
a
1 qo A Zo (41, SZo)
2 q1 A S (41, T$)
3 q1 [ T (q. [T]T)
4 qq A [ (41, A)
5 1 ] £ (q), A)
6 q A ] (q1, A)
7 q1 $ T (qs, A)
8 KS A $ (41, A)
9 1 [ [ (q1, A)
10 1 ] ] (q1, A)
11 q $ $ (q1, A)
12 1 A Zo (42, Zo)
(all other combinations) none

F (q,$,]T$Zo) => [IT$


F (qi, $, T$Zo)
F (gs, A, $Zo) => []$
re. (gu, A, -Zo)
F (@, A, Zo)
You can probably see that moves 9, 10, and 11 in the deterministic
PDA, which were
retained from the nondeterministic machine, will never actually be
used. We include them
because in a more general example moves of this type may still
be necessary. Although we do
not give a proof that the deterministic PDA accepts the language
, you can convince yourself
by tracing the moves for a few longer input strings.
a

In the top-down PDA obtained from a context-free gramma


r as in Definition eas
looking ahead to the next input may not be enough to determi
ne the next move.
Sometimes, however, straightforward modifications of the
grammar are enough to
establish this property, as the next two examples indicate.
CHAPTER 7 Pushdown Automata 283

Eliminating Left Recursion in a CFG | EXAMPLE7.9 |


Another unambiguous CFG for the language of Example 7.8 is the one with productions

S—> T$

T—>T(T]|A

The standard top-down PDA produced from this grammar is exactly the same as the one in
Example 7.8, except for the string replacing T on the stack in the first move of line 3. We
can see the potential problem by considering the input string [][ ][ ]$, which has the leftmost
derivation

S => T$ => T[T]$ > T(TIIT]$ > TITITITI = --- > (IIS
The correct sequence of moves for this input string therefore begins

(qo, LILILIS$, Zo) F (qi, CILILIS, SZ)


F (qi, LILILI$, T$Zo)
F (qi, (ILILIS, T[T]$Zo)
+ (qi, CILILI$, TITIT1$Zo)
F (q1, (ILILI$, TTT
IT ]$Zo)
In each of the last four configurations shown, the next input is [ and the top stack symbol is T,
but the correct sequences of moves beginning at these four points are all different. Since the
remaining input is exactly the same in all these configurations, looking“ahead to the next input,
or even farther ahead, will not help; there is no way to choose the next move on the basis of
the input.
The problem arises because of the production T — T[T], which illustrates the phe-
nomenon of left recursion. Because the right side begins with T, the PDA must make a certain
number of identical moves before it does anything else, and looking ahead in the input provides
no help in deciding how many. In this case we can eliminate the left recursion by modifying
the grammar. Suppose in general that a grammar has the T-productions
T — Ta|B

where the string 6 does not begin with T. These allow all the strings Ba", for n > 0, to be
obtained from T. If these two productions are replaced by
T — BU U > aUu|aA

the language is unchanged and the left recursion has been eliminated. In our example, with
a = [T] and f = A, we replace
T > T[T]|A

by
T—-U U > [TJU|A
in Example 7.8.
and the resulting grammar allows us to construct a deterministic PDA much as
e
ee 1 1 Bd As ee ae ee
284 PART 3 Context-Free Languages and Pushdown Automata

| EXAMPLE 7.10 | Factoring in a CFG


Consider the context-free grammar with productions

S— T$

bia aA a Sm ee sg
This is the unambiguous grammar obtained from the one in Example 7.8 by removing A-
productions from the CFG with productions T — [T]T |A; the language is unchanged except
that it no longer contains the string $.
Although there is no left recursion in the CFG, we can tell immediately that knowing the
next input symbol will not be enough to choose the nondeterministic PDA’s next move when T
is on top of the stack. The problem here is that the right sides of all four T-productions begin
with the same symbol. An appropriate remedy is to “factor” the right sides, as follows:

T > [U Oe IE | eer ey
More factoring is necessary because of the U-productions; in the ones whose right side begins
with 7, we can factor out T]. We obtain the productions

SiS S T > [U
U > TW | |W W->TI\A
We can simplify the grammar slightly by eliminating the variable 7, and we obtain

S —> [U$ U — [U]W | |W W-> [U|A


The DPDA we obtain by incorporating lookahead is shown in Table 7.10.

Table 7.101 A deterministic top-down PDA for Example 7.10

1 Jo A Zo (41, SZo)
D Onl IX S (41, [U$)
3 q1 [ U (q, [U]W
4 q A [ (qi, A)
5 1 ] U (q, ]W)
6 q A ] (q1, A)
U q1 [ Ww (qi, [U)
8 q1 ] W (qj, A)
9 1 $ W (qs, A)
10 qs A $ (q1, A)
11 q [ [ (q1, A)
12 M1 ] ] (q1, A)
13 1 $ $ (q1, A)
14 q1 A Zo (42, Zo)
ns) (all other combinatio
a
Se ee none
ee
e ee

In Examples 7.9 and 7.10, we were able by a combination of


factoring and
eliminating left recursion to transform the CFG into what is
called an LL(1) grammar,
CHAPTER 7 Pushdown Automata

meaning that the nondeterministic top-down PDA produced from the grammar can be
turned into a deterministic top-down parser by looking ahead to the next symbol. A
grammar is LL(k) if looking ahead k symbols in the input is always enough to choose
the next move of the PDA. Such a grammar allows the construction of a deterministic
top-down parser, and there are systematic methods for determining whether a CFG
is LL(k) and for carrying out this construction (see the references).
For an LL(1) context-free grammar, a deterministic PDA is one way of formu-
lating the algorithm that decides the next step in the derivation of a string by looking
at the next input symbol. The method of recursive descent is another way. The name
refers to a collection of mutually recursive procedures corresponding to the variables
in the grammar.

A Recursive-descent Parser for the LL(1) Grammar in Example 7.10


The context-free grammar is the one with productions

S — [U$

U > |W |(UJW

Ww-— [U|A

We give a C++ version of a recursive-descent parser. The term recognizer is really more
accurate than parser, though it would not be difficult to add output statements to the program
that would allow one to reconstruct a derivation of the string being recognized.
The program involves functions s, u, and w, corresponding to the three variables. Calls
on these three functions correspond to substitutions for the respective variables during a
derivation—or to replacement of those variables on the stack in a PDA implementation. There
is a global variable curr_ch, whose value is assigned before any of the three functions is
called. If the current character is one of those that the function expects, it is matched, and
the input function is called to read and echo the next character. Otherwise, an error-handling
function is called and told what the character should have been; in this case, the program
terminates with an appropriate error message.
Note that the program’s correctness depends on the grammar’s being LL(1), since each
of the functions can select the correct action on the basis of the current input symbol.

#include <iostream.h>
#include <stdlib.h>

char Gunn ehy, lid the current symbol

void Sie uO wi) // recognize S, U, W, respectively


argument; aborts with
void match (char) ; // compares curr_ch to the
otherwise returns.
// error message if no match,
CuLEECh with echo.
void getich(); fa reads the next symbol into
reports an error and aborts. String argument.
void error (char*) ;
Character argument.
void error (char) ; Tell
286 PART 3 Context-Free Languages and Pushdown Automata

void main()
{ get_ch(); s();
cout << endl << "Parsing complete. "
<< "The above string is in the language." << endl;

void s() // recognizes [US$


{ “mateh('[’); w()ipiimatchvis9 7}

void u() // recognizes Jw | [U]W


{ switch (curr_ch)
{ “ase ‘]*:: match(?]“); wt); break; // production ]W
case! ai": // production [U]W
match(’[’); u(); match(']’); w(); break;
default: senrorn ("om 3).

}
void w() // recognizes [U | <Lambda>
(o 2£ (currch == / [*) { mateh(’[*); iaiihye ee

void get_ch() // read and echo next nonblank symbol


{ if (cin >> curr ch) cout << eurr. ch;
if (cin.eof() && curr_ch != '$')
{ cout << " (End of Data)"; error("[{ or syle da
}
void match(char this ch)
{ if (curr_ch == this_ch).. get_ch(); else error(thisich); }

void error (char* some_chars)


{ cout << "\n ERROR : Expecting one of " << some_chars << ".\n";
exit (0);
}
void error(char a_char)
{ cout << "\n ERROR : Expecting " << a_char << wi\n";
exit (0) ;
}
Here is a sample of the output produced, for the strings
[] [[] [[]]]$, S, []], and
[ [], respectively.

Th Llle a
Parsing complete. The above Strings tne the language.
CHAPTER 7 Pushdown Automata 287

ERROR : Expecting [.
[]]
ERROR : Expecting $.
[[] (End of Data)
ERROR : Expecting one of [ or ].

The program is less complete than it might be, in several respects. In the case of a string
not in the language, it reads and prints out only the symbols up to the first illegal one—that
is, up to the point where the DPDA would crash. In addition, it does not read past the $; if
the input string were [] $], for example, the program would merely report that [] $ is in the
language. Finally, the error messages may seem slightly questionable. The second error, for
example, is detected in the function s, after the return from the call on u. The production is
S — [U$; the function “expects” to see $ at this point, although not every symbol other than
$ would have triggered the error message. The symbol [ would be valid here and would have
resulted in a different sequence of function calls, so that the program would not have performed
the same test at this point in s.

7.6.2 Bottom-up parsing


Example 7.12 illustrates one of the simplest ways of obtaining a deterministic bottom-
up parser from a nondeterministic bottom-up PDA. »

| A Deterministic Bottom-up Parser for a CFG | EXAMPLE 7.12 |


We consider the context-free grammar G with productions

(0) S—> S$
GD) Sp Se Mi aete
Qs oi
(3) T > Tea
(4) a

The last four are essentially those in Example 7.6; the endmarker $ introduced in production
(0) will be useful here, as it was in the discussion of top-down parsing.
Table 7.11 shows the nondeterministic PDA in Example 7.6 with the additional reduction
corresponding to grammar rule (0).
The other slight difference from the PDA in Example 7.6 is that because the start symbol
state as
S occurs only in production (0) in the grammar, the PDA can move to the accepting
soon as it sees S on the stack.
to shift
Nondeterminism is present in two ways. First, there may be a choice as to whether
stack or to try to reduce a string on top of the stack. For example, if
an input symbol onto the
the first choice is correct if it is the T in the right side of T * a, and
T is the top stack symbol,
there may be some
the second is correct if it is the J in one of the S,-productions. Second,
288 PART 3 Context-Free Languages and Pushdown Automata

Table 7.111 A nondeterministic bottom-up


parser for G

q A Ip (qi1, A)
41,1 A = (41,2, A)
q1,2 A Sj ‘ (q, S1)

(all other combinations) none


a aa ee

doubt as to which reduction is the correct one; for example, there are two
productions whose
right sides end with a. Answering the second question is easy. When we
pop a off the stack, if
we find * below it, we should attempt to reduce T * a to T, and otherwise,
we should reduce
a to T. Either way, the correct reduction is the one that reduces the longest
possible string.
Returning to the first question, suppose the top stack symbol is 7, and
consider the
possibilities for the next input. If it is +, we should soon have the string
S; + T, in reverse
order, on top of the stack, and so the correct move at this point is a
reduction of either T or
S, + T (depending on what is below T on the stack) to S;. If the next
input is , the reduction
will be that of T * a to a, and since we have T already, we
should shift. Finally, if it is $,
we should reduce either T or S; + T to 5S, to allow the reduction
of S;$. In any case, we can
make the decision on the basis of the next input symbol. What
is true for this example is that
there are certain combinations of top stack symbol and input
symbol for which a reduction is
always appropriate, and a shift is correct for all the other
combinations. The set of pairs for
which a reduction is correct is an example of a precedence
relation. (It is a relation from I to
4, in the sense of Section 1.4.) There a number of types
of precedence grammars, for which
precedence relations can be used to obtain a deterministic
shift-reduce parser. Our example,
CHAPTER 7 Pushdown Automata

in which the decision to reduce can be made by examining the top stack symbol and the next
input, and in which a reduction always reduces the longest possible string, is an example of a
weak precedence grammar.
A deterministic PDA that acts as a shift-reduce parser for our grammar is shown in Ta-
ble 7.12. In order to compare it to the nondeterministic PDA, we make a few observations.
The stack symbols can be divided into three groups: (1) those whose appearance on top of the
stack requires a shift regardless of the next input (these are Zo, S|, +, and *); (2) those that
require a reduction or lead to acceptance (a, $, and S$); and (3) one, 7, for which the correct
choice can be made only by consulting the next input. In the DPDA, shifts in which the top
stack symbol is of the second type have been omitted, since they do not lead to acceptance of
any string and their presence would introduce nondeterminism. Shifts in which the top stack
symbol is of the first or third type are shown, labeled “shift moves.” If the top stack symbol
is of the second type, the moves in the reduction are all A-transitions. If the PDA reads a
symbol and decides to reduce, the input symbol will eventually be shifted onto the stack, once
the reduction has been completed (the machine must remember the input symbol during the
reduction); the eventual shift is shown, not under “shift moves,” but farther down in the table,
as part of the sequence of reducing moves.

Table 7.121 A deterministic bottom-up parser for G

Move number State Input Stack symbol

1 qd oO X (q,0X)
(o is arbitrary; X is either Zo, S1, +, or *.)
2 q o r (q,0T)
G is any input symbol other than iheor eeD
Moves to reduce S 1$to ..
3 q A $ (qs, A)
4 KS A S| (q; S)

5 q A a (Ga, A)
6 a, A * (Ga,2, A)
7 a2 A ie (q,T)
8 qa,1 A Xx (q, TX)
(X is any stack symbol other than +.)

S q o i (Gro, A)
10 Ir. A + (975° A)
11 GT.0 A S; (q,08})
12 Gr.0 A x (q,0S,X)
(o is ene = Of $;X is any stack Sy poe ores than ae )
_ - _ Move toaccept -
13 q A § (qi, A)
(all other combinations) none
290 PART 3 Context-Free Languages and Pushdown Automata

We trace the moves of this PDA on the input string a + a « a$.


(q,a+axa$, Zo) + (q, +a x a$, aZo) (move 1)
F (Ga,1, +a * a$, Zo) (move 5)
F (q, ta * a$, T Zo) (move 8)
F (qr.4,a *a$, Zo) (move 9)
F (q,a * a$, +S) Zo) (move 12)
F (q, xa$,a + S,Zo) (move 1)
F (Ga,1, ¥a$, +S; Zo) (move 5)
t (q, xa$, T + S;Zo) (move 8)
F (gq, a$, *T + S,, Zo) (move 2)
E q,'$,a*T
+ S,Zo) (move 1)
F (gai, $, *T + S,Zo) (move 5)
F @a2,$, T + SiZo) (move 6)
+ q,$,T +5S1Z) (move 7)
F (q7,5, A, +S; Zo) (move 9)
F (G75, A, SiZo) (move 10)
FE (q, A, $8; Zo) (move 11)
F (gs, A, S1Zo) (move 3)
i= q, ep SZo) (move 4)

F (qi, A, Zo) (move 13)


(accept)
ee a

EXERCISES
sks For the PDA in Example 7.1, trace the sequence of moves made for each of
the input strings bbcbb and baca.
7.2. For the PDA in Example 7.2, draw the computation tree showing all possible
sequences of moves for the two input strings aba and aabab.
73s For a string x € {a, b}* with |x| =n, how many possible complete
sequences of moves can the PDA in Example 7.2 make, starting with input
string x? (By a “complete” sequence of moves, we mean a sequence of
moves starting in the initial configuration (go, x, Zo) and terminati
ng in a
configuration from which no move is possible.)
7.4. Modify the PDA described in Example 7.2 to accept each of the followin
g
subsets of {a, b}*.
a. The language of even-length palindromes.
b. The language of odd-length palindromes.
Taos Give transition tables for PDAs recognizing each of the following
languages.
a. The language of all nonpalindromes over {a, b}.
b. {a"x |n>0, x € {a, b}* and |x| <n}.
C. {a'b/c* |i, j,k > Oand j =i or j =k}.
GA see(Gnb;.6}" |e(@) ee Ny(X) Or Ng(x) < n_(x)}.
CHAPTER 7 Pushdown Automata 291

7.6. In both cases below, a transition table is given for a PDA with initial state qo
and accepting state g2. Describe in each case the language that is accepted.

enumber State Input —_Stacksymbol__—_—_—Move(s)__


1 qo a Zo (41, 4Zo)
2 qo b Zo (q1, bZo)
3 1 a a (41,
4), (42, a)
4 al b a (11, a)

5 1 a b (41, 5)
6 U1 b b (41, 5), (G2, b)
(all other combinations) none

1 qo a Zo (qo, X Zo)
2 qo b Zo (qo, X Zo)
3 qo a X (qo, XX)
4 qo b Xx (qo, XX)
5 qo c Xx (q1, X)
6 Jo c Zo (41, Zo)
7 q1 a x (q1, A)
8 nN b Xx (qi, A)
9 Ch A Zo (42, Zo)
(all other combinations) 7 none

THE Give a transition table for a PDA accepting the language in Example 7.1 and
having only two states, the nonaccepting state go and the accepting state qo.
(Use additional stack symbols.)
7.8. Show that every regular language can be accepted by a deterministic PDA M
with only two states in which there are no A-transitions and no symbols are
ever removed from the stack.
7.9: Show that if L is accepted by a PDA in which no symbols are ever removed
from the stack, then L is regular.
DAO: Suppose L C &* is accepted by a PDA M, and for some fixed k, and every
x € D*, no sequence of moves made by M on input x causes the stack to
have more than k elements. Show that L is regular.
7.11. Show that if L is accepted by a PDA, then L is accepted by a PDA that never
crashes (i.e., for which the stack never empties and no configuration is
reached from which there is no move defined).
7.12. Show that if L is accepted by a PDA, then L is accepted by a PDA in which
every move either pops something from the stack (i.e., removes a stack
symbol without putting anything else on the stack); or pushes a single
symbol onto the stack on top of the symbol that was previously on top; or
leaves the stack unchanged.
292 PART 3 Context-Free Languages and Pushdown Automata

7.13. Give transition tables for deterministic PDAs recognizing each of the
following languages.
a. {x € {a, b}* |ng(x) = np (x)}
b. {x € {a, b}* |na(x) F no(x)}
ce {xe {a.b) |ng@) = 2ngx)}
dentato "a" |nym 0}
7.14, Suppose M, and M) are PDAs accepting L; and L>, respectively. Describe a
procedure for constructing a PDA accepting each of the following languages.
Note that in each case, nondeterminism will be necessary. Be sure to say
precisely how the stack of the new machine works; no relationship is
assumed between the stack alphabets of M, and M).
a. Ios U Lo

be Ti Ls
ea L7
7.15. Show that if there are strings x and y in the language L so that x is a prefix
of y and x # y, then no DPDA can accept L by empty stack.
7.16. Show that if there is a DPDA accepting L, and $ is not one of the symbols in
the input alphabet, then there is a DPDA accepting the language L{$} by
empty stack.
TAT. Show that none of the following languages can be accepted by a DPDA.
(Determine exactly what property of the language pal is used in the proof of
Theorem 7.1, and show that these languages also have that property. )
a. The set of even-length palindromes over {a, b}
b. The set of odd-length palindromes over {a, b}
c. {xx™~ |x € {0, 1}*} (where x~ means the string obtained from x by
changing 0’s to 1’s and 1’s to 0’s)
d. {xy |x € {0, 1}* and y is eitherx or x~}
7.18. A counter automaton is a PDA with just two stack symbols, A and Zo, for
which the string on the stack is always of the form A” Zp for some n > 0. (In
other words, the only possible change in the stack contents is a change in the
number of A’s on the stack.) For some context-free languages, such as
{0'1' | i > 0}, the obvious PDA to accept the language is in fact a counter
automaton. Construct a counter automaton to accept the given language in
each case below.
a. {x € {0, 1}* |nox) = nj(x)}
b. {x € {0, 1}* |no(x) < 2ni(x)}
7.19. Suppose that M = (Q, 4,1, qo, Zo, A, 5) is a deterministic PDA accepting
a language L. If x is a string in L, then by definition there is a sequence of
moves of M with input x in which all the symbols of x are read. It is
conceivable, however, that for some strings y ¢ L, no sequence of moves
causes M to read all of y. This could happen in two ways: M could either
crash by not being able to move, or it could enter a loop in which there
were
CHAPTER 7 Pushdown Automata 293

infinitely many repeated A-transitions. Find an example of a DCFL


L C {a, b}*, astring y ¢ L, anda DPDA M accepting L for which M
crashes on y by not being able to move. (Say what L is and what y is, and
give a transition table for M.) Note that once you have such an M, it can
easily be modified so that y causes it to enter an infinite loop of A-transitions.
7.20. Give a definition of “balanced string” involving two types of brackets (such
as in Example 7.3) corresponding to Definition 6.5.
7.21. In each case below, you are given a CFG and a string x that it generates. For
the top-down PDA that is constructed from the grammar as in Definition 7.4,
trace a sequence of moves by which x is accepted, showing at each step the
state, the unread input, and the stack contents. Show at the same time the
corresponding leftmost derivation of x in the grammar. See Example 7.5 for
a guide.
a. The grammar has productions
S—> S+T|T T > TeF|F F—> (S)la

and x = (a+a*a)
*a.
b. The grammar has productions S > $+ S| SxS |(S)|a, and
x=(a*xa+t+a).
c. The grammar has productions S— (S)S | A, and x = Q(QQ).
7.22. Let M be the PDA in Example 7.2, except that move number 12 is changed
to (qz, A), so that M does in fact accept by empty stack. Let x = ababa.
Find a sequence of moves of M by which x is accepted, and give the
‘ corresponding leftmost derivation in the CFG obtained from M as in
Theorem 7.4.
7.23. Under what circumstances is the “nondeterministic” top-down PDA
described in Definition 7.4 actually deterministic? (For what kind of
language could this happen?)
7.24. In each case below, you are given a CFG and a string x that it generates. For
the nondeterministic bottom-up PDA that is constructed from the grammar
as in Example 7.6, trace a sequence of moves by which x is accepted,
showing at each step the state, the stack contents, and the unread input.
Show at the same time the corresponding rightmost derivation of x (in
reverse order) in the grammar. See Example 7.6 for a guide.
a. The grammar has productions S> S ES) pAyand x i=] (01-
b. The grammar has productions S > FS)S:\:As anda = EITM:
7.25. If the PDA in Theorem 7.4 is deterministic, what does this tell you about the
grammar that is obtained? Can the resulting grammar have this property
without the original PDA being deterministic?
7.26. Find the other useless variables in the CFG obtained in Example 7.7.
7.21. In each case, the grammar with the given productions satisfies the PL)
property. For each one, give a transition table for the deterministic PDA
obtained as in Example 7.8.
PART 3 Context-Free Languages. and Pushdown Automata

a S —> Si$ S; > AS;|A A —> aA|b


b. S > SiS Sj; > aA A> aA|bA|A
c S > S\$ S,; ~ aAB|bBA A —> bS,;|a B —> aS,|b
7.28. In each case, the grammar with the given productions does not satisfy the
LL(1) property. Find an equivalent LL(1) grammar by factoring and
eliminating left recursion.
a S > Si$ S$, > aaS,b |ab |bb
b. S > Si$ S; — S|A|A A — Aalb
c. S > S$ S; > S\T|ab T — aTbb|\ab
d. S > S;\$ S; > aAb|aAA|aB\|bbA
A — aAb|ab B — bBa\|ba
7.29. Show that for the CFG in part (c) of the previous exercise, if the last
production were T — a instead of T — ab, the grammar obtained by
factoring and eliminating left recursion would not be LL(1). (Find a string
that doesn’t work, and identify the point at which looking ahead one symbol
in the input isn’t enough to decide what move the PDA should make.)
7.30. Consider the CFG with productions
Soars oo) > Sie? |e T > Pf ak |e OF (Sig
a. Write the CFG obtained from this one by eliminating left recursion.
b. Give a transition table for a DPDA that acts as a top-down parser for this
language.
131, Suppose that in a grammar having a variable 7, the T-productions are
T > Ta; (1<i<m) I me Yon CL Gaa a 7))
where none of the strings 8; begins with T. Find a set of productions with
which these can be replaced, so that the resulting grammar will be equivalent
to the original and will have no left recursion involving T.
7.32. Let G be the CFG with productions

S—> SiS Si > (81 +51) 1 6S, * S)) | @


so that L(G) is the language of all fully parenthesized algebraic expressions
involving the operators + and « and the identifier a. Give a transition table
for a deterministic bottom-up parser obtained from this grammar as in
Example 7.12.
7.33. Let G have productions

S> S$ S > S[8)) S01) 08010


and let G; have productions
S > SS Sy > [S]S. | £5))) 0S) 10
a. Give a transition table for a deterministic bottom-up parser obtained
from G.
b. Show that G; is not a weak precedence grammar.
CHAPTER 7 Pushdown Automata 295

7.34. In the nondeterministic bottom-up parser given for the grammar in


Example 7.12, the implicit assumption in the transition table was that the
start symbol S$ did not appear on the right side of any production. Why is
there no loss of generality in making this assumption in general?
135 In the standard nondeterministic bottom-up parsing PDA for a grammar,
obtained as in Example 7.12, consider a configuration in which the right side
of a production is currently on top of the stack in reverse, and this string does
not appear in the right side of any other production. Why is it always correct
to reduce at this point?
7.36. a. Say exactly what the precedence relation is for the grammar in
Example 7.12. In other words, for which pairs (X, 0), where X is a
stack symbol and o an input symbol, is it correct to reduce when X is on
top of the stack and o is the next input?
b. Answer the same question for the larger grammar (also a weak
precedence grammar) with productions
S-> SiS Sy > Sp+T
|S —TTP
T > TxF|T/F\F F > (Si) la

MORE CHALLENGING PROBLEMS


T31. Give transition tables for PDAs recognizing each of the following languages.
Ana Vi <= 7 = 21}
be fee (a,. DY |nit) <x), <2nglx)}
7.38. ‘Suppose L C &* is accepted by a PDA M, and for some fixed k, and every
x € D*, at least one choice of moves allows M to process x completely so
that the stack never contains more than k elements. Does it follow that L is
regular? Prove your answer.
7.39. Suppose L C * is accepted by a PDA M, and for some fixed k, and every
x € L, at least one choice of moves allows M to accept x in such a way that
the stack never contains more than k elements. Does it follow that L is
regular? Prove your answer.
7.40. Show that if L is accepted by a DPDA, then there is a DPDA accepting the
language {x#y |x € L and xy € L}. (The symbol # is assumed not to occur
in any of the strings of L.)
7.41. Complete the proof of Theorem 7.3. Give a precise definition of the PDA
My, and a proof that it accepts the same language as the original PDA M.
7.42. Prove the converse of Theorem 7.3: If there isa PDA M = (Q, 4, T, go,
Zo, A, 5) accepting L by empty stack (that is, x € L if and only if
(go, x, Zo) Ky (gq, A, A) for some state q), then there isa PDA M,
accepting L by final state (i.e., the ordinary way).
7.43. Show that in the previous exercise, if M is a deterministic PDA, then M, can
also be taken to be deterministic.
7.44, Show that if L is accepted by a PDA, then L is accepted by a PDA having at
most two states and no A-transitions.
PART 3 Context-Free Languages and Pushdown Automata

7.45. Show that if L is accepted by a PDA, then L is accepted by a PDA in which


there are at most two stack symbols in addition to Zo.
7.46. Show that if M is a DPDA accepting a language L C &*, then there is a
DPDA M, accepting L for which neither of the phenomena in Exercise 7.19
occurs—that is, for every x € &*, (go, x, Zo) ri, (q, A, y) for some state g
and some string y of stack symbols.
7.47. Starting with the top-down nondeterministic PDA constructed as in
Definition 7.4, one might try to produce a deterministic parsing algorithm by
using a backtracking approach: specifying an order in which to try all the
moves possible in a given configuration, and trying sequences of moves in
order, backtracking whenever the machine crashes.
a. Describe such a backtracking algorithm in more detail for the grammar
in Example 7.8, and trace the algorithm on several strings, including
strings derivable from the grammar and strings that are not.
b. What possible problems may arise with such an approach for a general
grammar?
C H APT ER

Context-Free and
Non-Context-Free Languages

8.1 THE PUMPING LEMMA FOR


CONTEXT-FREE LANGUAGES
Neither the definition of context-free languages in terms of. grammars nor the
pushdown-automaton characterization in Chapter 7 makes it immediately obvious
that there are formal languages that are not context-free. However, our brief look at
natural languages (Example 6.6) has suggested some of the limitations of CFGs. In
the first section of this chapter we formulate a principle, similar to the pumping lemma
for regular languages (Theorem 5.2a), which will allow us to identify a number of
non-context-free languages.
The earlier pumping lemma used the fact that a sufficiently long input string
causes a finite automaton to visit some state more than once. Any such string can
be written x = uwvw, where v is a substring that causes the FA to start in a state and
return to that state; the result is that all the strings of the form uv' w are also accepted
by the FA. Although we will get the comparable result for CFLs by using grammars
instead of automata, the way it arises is similar. Suppose a derivation in a context-free
grammar G involves a variable A more than once, in this way:
S =* vAz >* vwAyz >* vwxyz

where v, w, x,y,z € D*. Within this derivation, both the strings x and wAy are
derived from A. We may write

S =>* vAz >* vwAyz =* vw Ay’z >* vwiAy’z S* +


and since x can be derived from each of these A’s, we may conclude that all the
strings vxz, vwxyz, vw°xyz,... are in L(G).
297
298 PART 3 Context-Free Languages and Pushdown Automata

In order to obtain our pumping lemma, we must show that this duplication of
variables occurs in the derivation of every sufficiently long string in L(G). It will
also be helpful to impose some restrictions on the strings v, w, x, y, and z, just as we
did on the strings u, v, and w in the simpler case.
The discussion will be a little easier if we can assume that the tree representing
a derivation is a binary tree, which means simply that no node has more than two
children. We can guarantee this by putting our grammar into Chomsky normal form
(Section 6.6). The resulting loss of the null string will not matter, because the result
we want involves only long strings.
Let us say that a path in a nonempty binary tree consists either of a single node
or of a node, one of its descendants, and all the nodes in between. We will say that the
length of a path is the number of nodes it contains, and the height of a binary tree is the
length of the longest path. In any derivation whose tree has a sufficiently long path,
some variable must reoccur. Lemma 8.1 shows that any binary tree will have a long
path if the number of leaf nodes is sufficiently large. (In the case we are interested
in, the binary tree is a derivation tree, and because there are no A-productions the
number of leaf nodes is simply the length of the string being derived.)
Lemma 8.1 For any A > 1, a binary tree having more than 2’! leaf nodes must
have height greater than h.

Proof We prove, by induction on A, the contrapositive statement: If the height is no


more than h, the number of leaf nodes is no greater than 2"—!. For the basis step, we
observe that a binary tree with height < 1 has no more than one node and therefore
no more than one leaf node.
In the induction step, suppose that k > 1 and that any binary tree of height < k
has no more than 2*~! leaf nodes. Now let T be a binary tree with height < k +
1.
If T has no more than one node, the result is clear. Otherwise, the left and right
subtrees of T both have height < k, and so by the induction hypothesis each has 2*—!
or fewer leaf nodes. The number of leaf nodes in T is the sum of the numbers
in the
two subtrees and is therefore no greater than 2-! + 24-1 = 2k. gy

Theoremat 9
_ LetG =(V,%, S, P) bea context-fres
_ with a total of p variables. Any string
_ Written as u = vwxyz, forsome str

i Lemma 8.1 shows that any deriv


_ p +2. (thas more than2? leaf nodes, a1
_ Let us consider a path of maximum length and look
CHAPTER 8 Context-Free and Non-Context-Free Languages 299

Just as in the case of the earlier pumping lemma, it is helpful to restate the result
so as to emphasize the essential features.

“Solara ere
1 C,

alors
C3 (Gy Cs

a b B (Cee

b C7

Cs, Cy b

a b
u = (ab)(b)(ab)(b)(a)
= Vi Wx yz

Figure 8.1 |
300 PART 3 Context-Free Languages and Pushdown Automata

‘Theorem 8.1a_ The Pump


| LetL be a CFL. Then there
on 2 n, there arestring:

Using the pumping lemma for context-free languages requires the same sorts of
precautions as in the case of the earlier pumping lemma for regular languages. In order
to show that L is not context-free, we assume it is and try to derive a contradiction.
Theorem 8.1a says only that there exists an integer n, nothing about its value; because
we can apply the theorem only to strings wu with length > n, the u we choose must be
defined in terms of n. Once we have chosen u, the theorem tells us only that there
exist strings v, w, x, y, and z satisfying the four properties; the only way to guarantee
a contradiction is to show that every choice of v, w, x, y, z satisfying the properties
leads to a contradiction.
According to Chapter 7, a finite-state machine with an auxiliary memory in the
form of a stack is enough to accept a CFL. An example of a language for which a
single stack is sufficient is the language of strings of the form (')' (see Example 7.3),
which could just as easily have been called a'b'. The a’s are saved on the stack so
that the number of a’s can be compared to the number of b’s that follow. By the
time all the b’s have been matched with a’s, the stack is empty, which means that the
machine has forgotten how many there were. This approach, therefore, would not
allow us to recognize strings of the form a‘b'c'. Rather than trying to show directly
that no other approach using a single stack can work, we choose this language for our
first proof using the pumping lemma.

| EXAMPLE8.1
Pe | =The Pumping Lemma Applied to {a'b/c"}
Let

Detahie |; =)
Suppose for the sake of contradiction that L is context-free, and let n be the integer
in Theorem
8.1a. An obvious choice for u with |u| > nis u = a"b"c". Suppose v, w, x,
y, and z are any
strings satisfying conditions (8.1)-(8.4). Since |wxy| <n, the string wxy can
contain at most
two distinct types of symbol (a’s, b’s, and c’s), and since |wy| > 0, w and y
together contain
at least one. The string vw?xy?z contains additional occurrences of the
symbols in w and y;
CHAPTER 8 Context-Free and Non-Context-Free Languages 301

therefore, it cannot contain equal numbers of all three symbols. On the other hand, according
to (8.4), vw*xy?z € L. This is a contradiction, and our assumption that L is a CFL cannot be
correct.
Note that to get the contradiction, we started with uv € L and showed that vw?xy?z fails
to be an element, not only of L but also of the bigger language

L, = {u € {a, b, c}" |nau) = ny(u) =n,(u)}


Therefore, our proof is also a proof that L, is not a context-free language.

The Pumping Lemma Applied to {ss |s € {a, b}} | EXAMPLES.2


8.2 |
Let

L = {ss |s € {a, b}*}

This language is similar in one obvious respect to the language of even-length palindromes,
which is a context-free language: In both cases, in order to recognize a string in the language,
a machine needs to remember the first half so as to be able to compare it to the second. For
palindromes, the last-in, first-out principle by which a stack operates is obviously appropriate.
In the case of L, however, when we encounter the first symbol in the second half of the string
(even assuming that we know when we encounter it), the symbol we need to compare it to is
the first in, not the last—in other words, it is buried at the bottom of the stack. Here again,
arguing that the obvious approach involving a PDA fails does not prove that a PDA cannot be
made to work. Instead, we apply the pumping lemma.
Suppose L is a CFL, and let n be the integer in Theorem 8.la. This time the choice of
u is not as obvious; we try u = a"b"a"b". Suppose that v, w, x, y, and z are any strings
satisfying (8.1)-(8.4). We must derive a contradiction from these facts, without making any
other assumptions about the five strings.
As in Example 8.1, condition (8.2) tells us that wxy can overlap at most two of the four
contiguous groups of symbols. We consider several cases.
First, suppose w or y contains at least one a from the first group of a’s. Since |wxy| <n,
neither w nor y can contain any symbols from the second half of u. Consider m = 0 in
condition (8.4). Omitting w and y causes at least one initial a to be omitted, and does not affect
the second half. In other words,

vwxyz = a'bia"b"

where i < n and 1 < j <n. The midpoint of this string is somewhere within the substring
a", and therefore it is impossible for it to be of the form ss.
Next, suppose wxy contains no a’s from the first group but that w or y contains at least
one b from the first group of b’s. Again we consider m = 0; this time we can say
vw°xy’z = a"bia/b"

where i <n and 1 < j <n. The midpoint is somewhere in the substring bia, and as before,
the string cannot be in L.
We can get by with two more cases: case 3, in which wxy contains no symbols from
the first half of u but at least one a, and case 4, in which w or y contains at least one b from
302 PART 3 Context-Free Languages and Pushdown Automata

the second group of b’s. The arguments in these cases are similar to those in cases 2 and 1,
respectively. We leave the details to you.
Just as in Example 8.1, there are other languages for which essentially the same proof
works. Two examples in this case are {a‘bia‘b' | i > 0} and {a'b/a‘b/ |i, j > 0}. A similar
proof shows that {scs |s € {a, b}*} also fails to be context-free. Although the marker in the
middle may appear to remove the need for nondeterminism, the basic problem that prevents a
PDA from recognizing this language is still present.

In proofs of this type, there are several potential trouble-spots. It may not be
obvious what string u to choose. In Example 8.2, although a”b"a"b" is not the only
choice that works, there are many that do not (Exercise 8.2).
Once u is chosen, deciding on cases to consider can be a problem. A straightfor-
ward way in Example 8.2 might have been to consider seven cases:
1. wy contains only a’s from the first group;
2. wy contains a’s from the first group and b’s from the first group;
3. wy contains only b’s from the first group;

7. wy contains only b’s from the last group.


There is nothing wrong with this, except that all other things being equal, four cases is
better than seven. If you find yourself saying “Cases 6, 8, and 10 are handled exactly
like cases 1, 3, and 5,” perhaps you should try to reduce the number of cases. (Try to
rephrase the proof in Example 8.2 so that there are only two cases.) The important
thing in any case is to make sure your cases cover all the possibilities and that you do
actually obtain a contradiction in each case.
Finally, for a specific case, you must choose the value of m to use in order to
obtain a contradiction. In the first case in Example 8.2, it was not essential to choose
m = 0, but the string vw®xy°z is probably easier to describe exactly than the string
vw?xyz. In some situations, choosing m = 0 will not work but choosing m >
1
will, and in some other situations the opposite is true.

| EXAMPLES.3
8.3 | A Third Application of the Pumping Lemma
Let

J be Gai, b, c}* |ng(x) < np(x) and n(x) < N-(x)}


The intuitive reason that no PDA can recognize L is similar to the
reason in Example 8.1;
a stack allows the machine to compare the number of a’s to either
the number of b’s or the
number of c’s, but not both. Suppose L is a CFL, and let n be the integer
in Theorem 8.la. Let
u=a"b"*!c"*!, Tf (8.1)-(8.4) hold for strings v, w, x, y, and z, the
string wxy can contain at
most two distinct symbols. This time, two cases are sufficient.
If w or y contains at least one a, then wy cannot contain any
c’s. Therefore, vw?xy?z
contains at least as many a’s as c’s and cannot be in L.
If neither w nor y contains an a,
CHAPTER 8 Context-Free and Non-Context-Free Languages 303

then vw°xy%z still contains n a’s; since wy contains one of the other two symbols, vw°x yz
contains fewer occurrences of that symbol than u does and therefore is not in L. We have
obtained a contradiction and may conclude that L is not context-free.
The language {a'b/c* |i < j andi < k} can be shown to be non-context-free by exactly
the same argument.

The Set of Legal C Programs Is Not a CFL | EXAMPLE8.4


8.4 |
The feature of the C programming language that we considered in Section 5.5, which prevents
the language from being regular, can be taken care of by context-free grammars. In Chapter 6
we saw other examples of the way in which CFGs can describe much of the syntax of such
high-level languages. CFGs cannot do this completely, however: There are some rules in these
languages that depend on context, and Theorem 8.1a allows us to show that the set L of all
legal C programs is not a context-free language. (Very little knowledge of C is required for
this example.)
A basic rule in C is that a variable must be declared before it is used. Checking that this
rule is obeyed is essentially the same as determining whether a certain string has the form xcx,
where x is the identifier and c is the string appearing between the declaration of x and its use.
As we observed in Example 8.2, the language {xcx |x € {a, b}*} is not a CFL. Although we
are now using a larger alphabet, and c is no longer a single symbol, the basic problem is the
same, provided identifiers are allowed to be arbitrarily long. Let us try to use the pumping
lemma to show that L is not a CFL.
Assume that L is a CFL, and let n be the integer in Theorem 8.1a. We want to choose
a string u in L whose length is at least n, containing both a declaration of a variable and a
separate, subsequent reference to the variable. The following will (barely) qualify:

main() {int aa...a;aa...a;}

where both identifiers have n a’s. However, for a technical reason that will be mentioned in
a minute, we complicate the program by including two subsequent references to the variable
instead of one:

main() {int aa...a;aa...a;aa...a;}

Here it is assumed that all three identifiers are a”. There is one blank in the program, after
int, and it is necessary as a separator. About all that can be said for this program is that it
will make it past the compiler, possibly with a warning. It declares an integer variable, then,
twice, it evaluates the expression consisting of that identifier (the value is probably garbage,
since the program has not initialized the variable), and does nothing with the value.
According to the pumping lemma, u = vwxyz, where (8.2)—(8.4) are satisfied. In particu-
lar, vw°xy°z is supposed to be a valid C program. However, this is impossible. If wy contains
the blank or any of the symbols before it, then vxz still contains at least most of the first oc-
currence of the identifier, and without main () {int and the blank, it cannot be syntactically
correct. We are left with the case in which wy is a substring of “a”; a"; a”;”. If wy contains
either the final semicolon or bracket, the string vxz is also illegal. If it contains one of the
two intermediate semicolons, and possibly portions of one or both of the identifiers on either
304 PART 3 Context-Free Languages and Pushdown Automata

side, then vxz has two identifiers, which are now not the same. Finally, if wy contains only a
portion of one of the identifiers and nothing else, then vxz still has a variable declaration and
two subsequent expressions consisting of an identifier, but the three identifiers are not all the
same. In either of these last two situations, the declaration-before-use principle is violated. We
conclude that vxz is not a legal C program, and therefore that L is not a context-free language.
(The argument would almost work with the shorter program having only two occurrences
of the identifier, but not quite. The case in which it fails is the case in which wy contains the
first semicolon and nothing else. Deleting it still leaves a valid program, consisting simply of
a declaration of a variable with a longer name. Adding multiple copies is also legal because
they are interpreted as harmless “empty” statements.)
There are other examples of syntax rules whose violation cannot be detected by a PDA.
We noted in Example 8.2 that {a"b”a"b"} is not a context-free language, and we can imagine
a situation in which being able to recognize a string of this type is essentially what is required.
Suppose that two functions f and g are defined, having n and m formal parameters, respectively,
and then calls on f and g are made. Then the numbers of parameters in the calls must agree
with the numbers in the respective definitions.
a ee ee

In the remainder of this section, we discuss a generalization of the pumping


lemma, a slightly weakened form of what is known as Ogden’s lemma. Although the
pumping lemma provides some information about the strings w and y that are pumped,
in the form of (8.2) and (8.3), it says very little about the location of these substrings
in the string wu. Ogden’s lemma makes it possible to designate certain positions of u
as “distinguished” and to guarantee that the pumped portions include at least some of
these distinguished positions. As a result, it is sometimes more convenient than the
pumping lemma, and occasionally it can be used when the pumping lemma fails.

ext-free language. Then there i


of length n or geo andany n c
ist nen ther
CHAPTER 8 Context-Free and Non-Context-Free Languages 305

Lemma 8.2 If the binary tree consisting of a node on the path and its descendants
has h or fewer branch points, its leaf nodes include no more than 2” distinguished
nodes.
Proof The proof is by induction and is virtually identical to"that of Lemma 8.1,
except that the number of branch points is used instead of the height, and the number
of distinguished leaf nodes is used instead of the total number of leaf nodes. The
reason the statement involves 2” rather than 2’~! is that the bottom-most branch
point has two distinguished descendants rather than one.

ision
aeproofa Theorem 8.2

iscleatly se because the potions Ai is aranch


Property @.8)i
observed aee the a twoae follo w as in He prot "

Note that the ordinary pumping lemma is identical to the special case of Theo-
rem 8.2 in which all the positions of u are distinguished.
306 PART 3 Context-Free Languages and Pushdown Automata

STU Using Ogden’s Lemma on { a'b'c! |j 4 1}


Let L = {a‘bic/ |i, j => 0 and j #1}. Suppose L is a context-free language, and let n be the
integer in the statement of Theorem 8.2. One choice for the string u is
u=a'piat

(The reason for this choice will be clear shortly.) Let us also designate the first n positions of
u as the distinguished positions, and let us suppose that v, w, x, y, and z satisfy (8.5)-(8.9).
First, we can see that if either w or y contains two distinct symbols, then we can obtain a
contradiction by considering the string-vw*xyz, which will no longer have the form a*b*c’*.
Second, we know that because wy contains at least one distinguished position, w or y consists
of a’s. It follows from these two observations that unless w consists of a’s and y consists of
the same number of b’s, looking at vw?x yz will give us a contradiction, because this string
has different numbers of a’s and b’s. Suppose now that w = a/ and y = b/. Letk = n!/j,
which is still an integer, and let m = k + 1. Then the number of a’s in vw” xyz is

n+(m—l)exj=Hn+kej=n+n!

which is the same as the number of c’s. We have our contradiction, and therefore L cannot be
context-free.
With just the pumping lemma here, we would be in trouble: There would be no way to
rule out the possibility that w and y contained only c’s, and therefore no way to guarantee a
contradiction.

| EXAMPLES.6
8.6 | Using Ogden’s Lemma when the Pumping Lemma Fails
Let L = {a?bic'd’ | p = 0Oorg =r = s}. It seems clear that L should not be a CFL,
because {a’b‘c*} is not. The first thing we show in this example, however, is that L satisfies
the properties of the pumping lemma; therefore, the pumping lemma will not help us to show
that L is not context-free.
Suppose n is any positive integer, and u is any string in L with |u| > n, say u = aPbic'd’.
We must show the existence of strings v, w, x, y, and z satisfying (8.1)-(8.4). We consider
two cases. If p = 0, then there are no restrictions on the numbers of b’s, c’s, or d’s, and the
choices w = b, v = x = y = A work. If p > 0, then we know that g = r = s, and the
choices w = a, v =x = y = A work.
Now we can use Theorem 8.2 to show that L is indeed not a CFL. Suppose L is a CFL,
let n be the integer in the theorem, let u = ab"c"d", and designate all but the first
position of
u as distinguished. Suppose that v, w, x, y, and z satisfy (8.5)-(8.9). Then the string wy
must
contain one of the symbols b, c, or d and cannot contain all three. Therefore, vw?xy*z
has one
a and does not have equal numbers of b’s, c’s, and d’s, which means that it cannot be
in L.

8.2 |INTERSECTIONS AND COMPLEMENTS


OF CONTEXT-FREE LANGUAGES
According to Theorem 6.1, the set of context-free languages is closed under
the
operations of union, concatenation, and Kleene *. For regular languages,
we can add
CHAPTER 8 Context-Free and Non-Context-Free Languages 307

the intersection and complement operations to the list. We can now show, however,
that for context-free languages this is not possible.

A CFL Whose Complement Is Not a CFL EXAMPLE 8.7

The second part of the proof of Theorem 8.3 is a proof by contradiction and appears to be a
nonconstructive proof. If we examine it more closely, however, we can use it to find an exam-
ple. Let L; and L» be the languages defined in the first part of the proof. Then the language
L,N Lz = (Li UL5)’ is not a CFL. Therefore, because the union of CFLs is a CFL, at least
one of the three languages L, Lo, LU L} is a CFL whose complement is not a CFL. Let us
try to determine which.
There are two ways a string can fail to be in L;. It can fail to be an element of R, the
language {a}*{b}*{c}*, or it can be a string a’ bick for which i > j. In other words,
Li = RU {a'bict |i = 7}
308 PART 3 Context-Free Languages and Pushdown Automata

The language R’ is regular because R is, and therefore R’ is context-free. The second
language involved in the intersection can be expressed as the concatenation

{abi cht.
j}—= {a | m= 0} (aid! | jie Oh ick 0}
each factor of which is a CFL. Therefore, L' is a CFL. A similar argument shows that L’, is
also a CFL. We conclude that L{ U L}, or

R’ U {a'bic® |i = j ori > k}


is a CFL whose complement is not a CFL. (In fact, the second part alone is also an example;
see Exercise 8.8.)

At this point it might be interesting to go back to Theorem 3.4, in which the


intersection of two regular languages was shown to be regular, and see what goes
wrong when we try to use the same construction for CFLs. We began with FAs M,
and M) recognizing the two languages, and we constructed a composite machine
M whose states were pairs (p, q) of states in M; and Mp, respectively. This allows
M to keep track of both machines at once, or to simulate running the two machines
in parallel. A string is accepted by M if it is accepted simultaneously by M, and M>.
Suppose we have CFLs L; and L>, and PDAs M, = (Qi, 251 ,GieZ1, Aree)
and M7 = (Q2, &, M2, gz, Z2, Az, 52) accepting them. We can define states of anew
machine M in the same way, by letting Q = Q,; x Q>. We might also construct
the stack alphabet of M by letting [ = T, x IP, because this would let us use
the top stack symbol of M to determine those of M; and M>. When we try to
define the moves of M, however, things become complicated, even aside from the
question of nondeterminism. In the simple case where 5;(p, a, X) = {(p’, X’)} and
60(q,a, Y) = {(q', Y’)}, where X, X’ € T, and Y, Y’ € Ts, it is reasonable to let

5((p, 4), a, (X, Y)) = {((p', g’), (X, Y’))}


However, what if 5,(p,a, X) = {(p’, X’X)} and 62(q,a, Y) = {(qg’, A)}? Or, what
if )(p, a, X) = {(p’, X)} and 63(g, a, Y) = {(q', YYY)}? There is no obvious way
to have M keep track of both the states of M, and Mp and the stacks of M; and
M
and still operate like a PDA. Theorem 8.3 confirms that such a machine is indeed
impossible in general.
We can salvage some positive results from this discussion. If M 1 is a PDA and
Mp? is a PDA with no stack (in particular, a deterministic one—i.e., an FA) there
is no
obstacle to carrying out this construction. The stack on our new machine is simply
the one associated with M,. The result is the following theorem.

_ Theorem 8.4 De —=—s—s—ee : .


If L; is a context-free language and Ly isaregular language, then L,; NL)
_ is acontext-free language. — — =. oe
. Proof : S

— Let M, Le
andletMz= (Qo, B, qx, Ar, 82) be an FA recognizing L>. We de
CHAPTER 8 Context-Free and Non-Context-Free Languages 309
310 PART 3 Context-Free Languages and Pushdown Automata

oe Consider the last move in the se


then
_ A-transition,

(asa), v2, Z1) Hy (sg),


oe

(Pp+). 2, B) mu ((P,q)s2, a)
t Otherwise, y = y’a for some a
oe i : . — |

It is also worthwhile, in the light of Theorems 8.3 and 8.4, to re-examine the
proof that the complement of a regular language is regular. If the finite automaton
M = (Q, %, qo, A, 5) recognizes the language L, then the finite automaton
M’ =
(Q, X, qo, Q— A, 5) recognizes D* — L. We are free to apply the same construction
to the pushdown automaton M = (Q, X,T', qo, Zo, A, 5) and to consider
the PDA
M = (Q,%,T, qo, Zo, Q — A, 5). Theorem 8.3 says that even if
M accepts the
context-free language L, M’ does not necessarily accept U* — L. Why not?
The problem is nondeterminism. It may happen that for some x € a

(qo,x,Zo) Fy (p, A, @)
for some state p € A, and

(qo, x, Zo) ie (q, A, B)

for some other state g ¢ A. This means that the string x is accepte
d by M as well as by
M", since q is an accepting state in M’. In the case of finite automat
a, nondeterminism
can be eliminated: Every NFA-A is equivalent to some FA. The
corresponding result
about PDAs is false (Theorem 7.1), and the set of CFLs
is not closed under the
complement operation.
We would expect that if M is a deterministic PDA (DPDA)
recognizing L, then
the machine M’ constructed as above would recogni
ze >)* =]. Unfortunately, this
is still not quite correct. One reason is that there might be
input strings that cause M
to enter an infinite sequence of A-transitions and are never
processed to completion.
Any such string would be accepted neither by M nor
by M’. However, the result
CHAPTER 8 Context-Free and Non-Context-Free Languages 311

in Exercise 7.46 shows that this difficulty can be resolved and that a DPDA can be
constructed that recognizes &* — L. It follows that the complement of a deterministic
context-free language (DCFL) is a DCFL, and in particular that any context-free
language whose complement is not a CFL (Theorem 8.3 and Example 8.7) cannot be
a DCFL.

8.3 |DECISION PROBLEMS INVOLVING


CONTEXT-FREE LANGUAGES
At the end of Chapter 5, we considered a number of decision problems involving
regular languages, beginning with the membership problem: Given an FA M and a
string x, does M accept x? We may formulate the same sorts of questions for context-
free languages. For some of the questions, essentially the same algorithms work; for
others, the inherent nondeterminism of PDAs requires us to find new algorithms; and
for some, no algorithm is possible, although proving this requires a more sophisticated
model of computation than we now have.
The basic membership problem for regular languages has a simple solution: To
decide whether an FA M accepts a string x, run M with input x and see what happens.
In the case of PDAs, a solution cannot be this simple, because nondeterminism cannot
always be eliminated. If we think of a PDA M as making a move once a second, and
choosing arbitrarily whenever it has a choice, then “what happens” may be different
from one time to the next. A sequence of moves that ends up with the input x being
rejected does not mean there is no other sequence leading to acceptance. From the
specifications for a PDA, we can very likely formulate some backtracking algorithm
that will be guaranteed to answer the question. Simpler to describe (though probably
inefficient to carry out) is the following approach, which depends on the fact that a
CFG without A-productions or unit productions requires at most 2n — | steps for the
derivation of a string of length n.

Decision algorithm for the membership problem. (Given a pushdown automa-


ton M and a string x, does M accept x?) Use the construction in the proof of
Theorem 7.4 to find a CFG G generating the language recognized by M. If x = A,
use Algorithm FindNull in Section 6.5 to determine whether the start symbol of G
is nullable. Otherwise, eliminate A-productions and unit productions from G, using
Algorithms 6.1 and 6.2. Examine all derivations with one step, all those with two
steps, and so on, until either a derivation of x has been found or all derivations of
length 2|x| — 1 have been examined. If no derivation of x has been found, M does
not accept x.
Note that since Theorems 7.2 and 7.4 let us go from a CFG to a PDA and vice
versa, we can formulate any of these decision problems in terms of either a CFG or
a PDA. We consider the decision problems corresponding to problems 2 and 3 in
Chapter 5, but stated in terms of CFGs:

1. Given a CFG G, does it generate any strings? (Is L(G) = 8?)


2. Given a CFG G, is L(G) finite?
312 PART 3 Context-Free Languages and Pushdown Automata

Theorem 8.1 provides us with a way of answering both questions. We transform


G into Chomsky normal form. Let G’ be the resulting grammar, p the number of
variables in G’, and n = 2?*!. If G’ generates any strings, it must generate one of
length less than n. Otherwise, apply the pumping lemma to a string wu of minimal
length (= n) generated by G’; then u = vwxyz, for some strings v, w, x, y, and z
with |wy| > Oand vxz € L(G’)—and this contradicts the minimality of wu. Similarly,
if L(G’) is infinite, there must be a string u € L(G’) withn < |u| < 2n; the proof
is virtually identical to that of the corresponding result in Chapter 5. We therefore
obtain the two decision algorithms that follow.

Decision algorithms for problems 1 and 2. (Given a CFG G, is L(G) = 6? Is


L(G) finite?) First, test whether A can be generated from G, using the algorithm
for the membership problem. If it can, then L(G) 4 @. In any case, let G’ be a
Chomsky-normal-form grammar generating L(G):— {A}, and let n = 2?+!, where
p is the number of variables in G’. For increasing values of i beginning with 1, test
strings of length i for membership in L(G). If for noi <n is there a string of length
iin L(G’), and A ¢ L(G), then L(G’) = L(G) = 9. If for noi withn <i < 2nis
there a string of length i in L(G), then L(G) is finite; if there is a string x € L(G)
with n < |x| < 2n, then L(G) is infinite.
These algorithms are easy to describe, but obviously not so easy to carry out.
Fortunately, for problems such as the membership problem for which it is important
to find practical solutions, there are considerably more efficient algorithms [two well-
known ones are those by Cocke- Younger-Kasami, described in the paper by Younger
(Information and Control 10(2): 189-208, 1967) and Earley (Communications of the
ACM 13(2): 94-102, 1970)]. If we go any farther down the list of decision problems
in Chapter 5, formulated for CFGs or PDAs instead of regular expressions or FAs,
we encounter problems for which there is no possible decision algorithm, even an
extremely inefficient one. To take an example, given two CFGs, are there any strings
generated by both? The algorithm given in Chapter 5 to solve the corresponding
problem for FAs depends on the fact that the set of regular languages is closed under
the operations of intersection and complement. Because the set of CFLs is not closed
under these operations, we know at least that the earlier approach does not give us an
algorithm; we will see in Chapter 11 that the problem is actually “unsolvable.”

EXERCISES
8.1. In each case, show using the pumping lemma that the given language is not
a
CFIy,
a! e=fa
bic ia
b. L= {x € {a, b}* |n(x) = n,(x)?}
CG ES ao aly = 0}
d. L = {x € {a, b, c}* |ng(x) = max {np(x), ne(x)}}
e. L= {a"b"a"b"*™ |m,n > 0}
8.2. In the pumping-lemma proof in Example 8.2, give some example
s of choices
of strings wu € L with |u| > n that would not work.
CHAPTER 8 Context-Free and Non-Context-Free Languages 313

8.3. In the proof given in Example 8.2 using the pumping lemma, the
contradiction was obtained in each case by considering the string vw°xy°z.
Would it have been possible instead to use vw*xyz in each case?
8.4. In Example 8.4, is it possible with Ogden’s lemma rather than the pumping
lemma to use the string u mentioned first, with only two occurrences of the
identifier?
8.5. Decide in each case whether the given language is a CFL, and prove your
answer.
ae oe {a"
bh? a bein: n>. 0}
b. L = {xayb
| x, y € {a, b}* and |x|= |y|}
Ct = ixvex |X G214,.0)"}
d. 1 ={xyx
| x, yea, b}* and |x| = 1}
Gus bee x.e la. DY |n(x) <r). < 2n,(x)}
fe Pox e{asb}” [me (x) = 10np(x)}
L = the set of non-balanced strings of parentheses
8.6. State and prove theorems that generalize Theorems 5.3 and 5.4 to
context-free languages. Then give an example to illustrate each of the
following possibilities.
a. Theorem 8.1a can be used to show that the language is a CFL, but the
generalization of Theorem 5.3 cannot.
b. The generalization of Theorem 5.3 can be used to show the language is
not a CFL, but the generalization of Theorem 5.4 cannot.
c. The generalization of Theorem 5.4 can be used to show the language is
not a CFL.
8.7. Show that if L is a DCFL and R is regular, then L M R is a DCFL.
8.8. In each case, show that the given language is a CFL but that its complement
is not. (It follows in particular that the given language is not a DCFL.)
Aa coli = f,0t} = Kk}
b lable (i FZ ei ek}
c. {x € {a, b}* |x is not ww for any w}
8.9. Use Ogden’s lemma to show that the languages below are not CFLs:
faibitkat | k # i}

{a'bialb/ |j #i}
{a'bia' |j Fi}
8.10. Show that if L is a CFL and F is finite, L — F is a CFL.
Show that if L is not a CFL and F is finite, then L — F is not a CFL.
oe
sO?
xe
OSS Show that if L is not a CFL and F is finite, then L U F is nota CEL.
8.11. For each part of the previous exercise, say whether the statement is true if
“finite” is replaced by “regular,” and give reasons.
8.12. For each part of Exercise 8.10, say whether the statement is true if “CFL” is
replaced by “DCFL,” and give reasons.
314 PART 3 Context-Free Languages and Pushdown Automata

8.13. Give an example of a DPDA M accepting a language L for which the


language accepted by the machine obtained from M by reversing accepting
and nonaccepting states is not L’.

MORE CHALLENGING PROBLEMS


8.14. If L is a CFL, does it follow that rev(L) = {x” |x € L} is a CFL? Give
either a proof or a counterexample.
8.15. Decide in each case whether the given language is a CFL, and prove your
answer.
a. L = {x € {a, b}* |nq(x) is a multiple of n,(x)}
b. Given a CFG L, the set of all prefixes of elements of L
c. Given a CFG L, the set of all suffixes of elements of L
d. Given aCFG L, the set of all substrings of elements of L
e. {x € {a, b}* | |x| is even and the first half of x has more a’s than the
second}
f. {x € {a, b, c}* |ng(x), np (x), and n.(x) have a common factor greater
than 1}
8.16. Prove the following variation of Theorem 8.1la. If L is a CFL, then there is
an integer n so that for any u € L with |u| > n, and any choice of u;, v2, and
3 Satisfying u = uju2uU3 and |u| > n, there are strings uv, w, x, y, and z
satisfying the following conditions:
Cd) w= vwxyz
(2) wy4A
(3) Either w or y is a nonempty substring of u>
(4) Forevery m > 0, vwixyzeL
Hint: Suppose # is a symbol not appearing in strings in L, and let L;
be the
set of all strings that can be formed by inserting two occurrences of # into
an
element of L. Show that L, is a CFL, and apply Ogden’s Lemma
to L).
This result is taken from Floyd and Beigel (1994),
8.17. Show that the result in the previous problem can be used in each
part of
Exercise 8.9.
8.18. Show that the result in Exercise 16 can be used in both Examples
8.5 and 8.6
to show that the language is not context-free.
8.19. The class of DCFLs is closed under the operation of complem
ent, as
discussed in Section 8.2. Under which of the following other
operations is
this class of languages closed? Give reasons for your answers
.
a. Union
b. Intersection
c. Concatenation
d. Kleene *
e. Difference
CHAPTER 8 Context-Free and Non-Context-Free Languages 315

8.20. Use Exercise 7.40 and Exercise 8.7 to show that the following languages are
not DCFLs. This technique is used in Floyd and Beigel (1994), where the
language in Exercise 7.40 is referred to as Double-Duty(L).
a. pal, the language of palindromes over {0, 1} (Hint: Consider the regular
language corresponding to 0*1*0*#1*0*.)
b. {x € {a, b}* |no(x) = na(X) or ny (x) = 2na(x)}
C1x.6 (a, bi |,(x) <n, (%) or np) = 2ng()}
8.21. (Refer to Exercise 7.40) Consider the following argument to show that if
L © &* is a CFL, then so is {x#y | x € L andxy € L}. (# is assumed to be a
symbol not in &.)
Let M be a PDA accepting L, with state set Q. We construct anew PDA
M, whose state set contains two states g and q’ for every state g € Q. M,
copies M up to the point where the input symbol # is encountered, but using
the primed states rather than the original ones. Once this symbol is seen, if
the current state is g’ for some g € A (i.e., if M would have accepted the
current string), then the machine switches over to the original states for the
rest of the processing. Therefore, it enters an accepting state subsequently if
and only if both the substring preceding the # and the entire string read so
far, except for the #, would be accepted by M.
Explain why this argument is not correct.
8.22. Show that the result at the beginning of the previous exercise is false. (Find
a CFL L so that {x#y |x € L and xy € L} is not a CFL.)
8.23. Show that if L C {a}* is a CFL, then L is regular.
8.24. Consider the language L = {x € {a, b}* |na(x) = f(np(x))}. Exercise 5.52
is to show that L is regular if and only if f is ultimately periodic; in other
words, L is regular if and only if there is a positive integer p so that for each r
with 0 <r < p, f is eventually constant on the set S,, = {jp +r | Jj = 9}.
Show that L is a CFL if and only if there is a positive integer p so that for
each r with O <r < p, f is eventually linear on the set S,,,. “Eventually
linear” on S,,- means that there are integers N, c, and d so that for every
j=>N, fGpt+r) =cj +d. (Suggestion: for the “if” direction, show how
to construct a PDA accepting L; for the converse, use the pumping lemma.)
8.25. Let
4n+7 if n is even
MD ai Mat Pipa Oda

a. Show that the language {x € {a, b}* |np(x) = f(na(x))} is a DCFL.


b. Show that if 4n + 13 is changed to 5n + 13, L is not a DCFL.
‘ in

apie ‘aes atwf


er

sainaieaiics wikia a eae yy hai :|


“it pent:mre hg eerie a al paul
5 tay rut atin XS ph nehen i 3 .
WsHasy Sie brainsYD We fH ) hvouny cans ania
4 7” } al iw f ite ns) Asi

ety 4: o i ha rt Be tia ds
hase ne {4 Nidboh‘ae és
Thon Y jen gs Pts “ =i
ana 7
oT
re Ete: “vt <bean: t) jars
sith Peau , oj beet 7 iMs
a)
AE TA > wn
ae ’ SS tl

‘ aly wimps a, Pasha] tae fitts. (017 © (nities << cee


Pg i Fekete: oat op Dit we 0 ls teas ie Me
Stlzii dad) Peraoorares SMe ger (8a 98hSteep SONA Oi iw het
ey
Himes hotiowenials ai} PaO e) NE tran eee
PEL bagooas dps ASTA oath, ;
ait Ntesiats (ais qty Ot Da we or ant
Liidthresrehia sclay gciieemrsty ise Spee fy Se SR
Peis
er
pili i ia esis sig eRe ei Bod i ineI ae ‘oa
aA é Nh PAR ehniod eke ED TESi ae oa
Ja) PALS aa wid ensign Bids y@e peel ie” a
MEME; Qe wale ae OS A aps ideal gM se Haden nn WC PETS ha
ee 19s «chines SD ora ¥
a gers 4 ita Pea Fate ns Tit
5 ee ane eel een nal ar
ik is eeleyesok 5 teit wrathbi
ae eae ea a vcitatin cf 9a
os :
Eg Fil
td he VP SL i Ah)© #0 St We eenet We oea
hi 5 as ee cok tv ferteilanys seine 0 i > ee
Cen ifsc ae einsate ri ob A’ ase oeror wilt PD eee) me
Seetalnctensy” culm fo peal’ Nini Tah G:
= ight) hulk Aw ety Ae PR, St) ATT ORS CaM
re
went sigs NERS Sere aly ale nl aca
Aco ee Srp copay oA. reli
See het Sa PATS os irate
hose be — to Finch§
et Wit) Ae
hha ) } ia) meet:
Nollssa
’ 7 i.

SAuieteast: ey lhc a ie v
i ;gu
Bil a,io ‘a ord, : opener| f

aS : ie > La 1 am
Turing Machines and
Their Languages

he abstract machines in Parts II and III can be viewed as embodying certain


types of algorithms: those in which no more than a fixed amount of information
can be remembered, and those in which information developed during the course
of the algorithm can be retrieved only in accordance with a last-in-first-out rule. In
this part, we describe an abstract machine called a Turing machine that is widely
accepted as a general model of computation. Although the basic operations of such a
machine are comparable in their simplicity to those of our earlier machines, the new
machines can carry out a wide variety of computations. Besides accepting languages,
they can compute functions and, according to the Church-Turing thesis, carry out any
conceivable algorithmic procedure. In particular, the model can accommodate the
idea of a stored-program computer, so that we can have a single “universal” machine
execute any algorithm by giving it an input string that includes an encoding of the
algorithm.
We discuss a few slightly different formulations of Turing machines, which en-
hance the efficiency and convenience of the basic model but do not enlarge the set
of languages it can recognize or the set of functions it can compute. Then, after for-
malizing the distinction between accepting a language and recognizing a language
(a distinction that is necessary because of the possibility that a Turing machine may
loop forever on some inputs and never actually return an answer), we look more
closely at the languages that Turing machines can accept. We consider other ways
to characterize these languages, including grammars more general than context-free
grammars. Finally, we confront the fact that there are many languages too complex
to be accepted by any Turing machine (and therefore, by the Church-Turing thesis,
too complex to be accepted by any algorithmic procedure).

317
2 _pohut. 7

Pe e he? & “~, aT. Gi .


ae. GE

¥ dares one enidanen Shivam


af) Stow a ditty or aaall inuiirsraly| tO 2Kr
ites oT TT tin bree eck
34 eh La} (ory: O0 tu oo
bey; iivine Ouuede 1m adivers’ ow
ons et ons ht) eyow 0 Thing Abe
heron or wicalennte 790)11 oideaemnes ne toe
i AL { ais B [ieibgied ide 1
AL att Oe unsltea e lone spettont) Shree
Ct lg tie AF aubrey iacieei oidiy ie
ye) rip Gliwice ote wen betle ied
hele L087 hie Sf Solve vd orgs ina Beal be
at ue
MLM ty viiiie wSy es ceva Ws —
beaut Maer AoW sh Pdce APs (nreruites adh 4 Bs eo i
HAIN fo bot OD a) ania aS ae j
Dat) QO ii oot oagronbalb ‘ly ed
lieie) SA he weel Gowen @ Ad OR ashe
iat Ceo vouew ba eset Seedie papetices
ve cro meq) duct yatta hall eaemammalhg
unerlivernty Cilia oteveut Ape
Wks raul ne i ae beer. an ey
UhWiach mint) eae, oni T wa b

sabi : 8
M; ity: wanhie Tiga “etn 4,

;
CSLestat a

; j P
i. “~ i <= ~~

ia
_
SeS oe
a
oo
an
ve a é
TUG q *)

Lae ay @: =a
C HAPTER

Turing Machines

9.1! DEFINITIONS AND EXAMPLES


The two models of computation we have studied so far involve severe restrictions on
either the amount of memory (an FA can remember only its current state) or the way
the memory is accessed (a PDA can access only the top stack symbol). Machines
implementing these models turn out to be significantly less powerful, at least in
principle, than the real computers we are familiar with.
In this chapter we study an abstract machine introduced by. the English mathe-
matician Alan Turing (Proceedings of the London Mathematical Society 2:230-265,
1936) and for that reason now called a Turing machine. Although it may still seem
substantially different from a modern electronic computer (which did not exist when
Turing formulated the model), the differences have more to do with efficiency, and
how the computations are carried out, than with the types of computations possi-
ble. The work of Turing and his contemporaries provided much of the theoretical
foundation for the modern computer.
Turing began by considering a human computer (that is, a human who is solving
some problem algorithmically using a pencil and paper). He decided that without any
loss of generality, the computer could be assumed to operate under these three rules:
First, the only things written on the paper are symbols from some fixed finite set;
second, each step taken by the computer depends only on the symbol he is currently
examining and on his “state of mind” at the time; and third, although his state of mind
might change as a result of symbols he has seen or computations he has made, only
a finite number of distinct states of mind are possible.
Turing then set out to build an abstract machine that obeys these rules and can
duplicate what he took to be the primitive steps carried out by a human computer
during a computation:

1. Examining an individual symbol on the paper;


2. Erasing a symbol or replacing it by another;
3. Transferring attention from one part of the paper to another.

319
320 PART 4 Turing Machines and Their Languages

Some of these elements should seem familiar. A Turing machine will have a finite
alphabet of symbols (actually two alphabets, an input alphabet and a possibly larger
alphabet for use during the computation) and a finite set of states, corresponding to
the possible “states of mind” of the human computer. Instead of a sheet of paper,
Turing specified a linear “tape,” which has a left end and is potentially infinite to
the right. The tape is marked off into squares, each of which can hold one symbol
from the alphabet; if a square has no symbol on it, we say that it contains the blank
symbol. For convenience, we may think of the squares as being numbered, left-to-
right, starting with 0, although this numbering is not part of the official model and it
is not necessary to refer to the numbers in describing the operation of the machine.
We think of the reading and writing as being done by a tape head, which at any time
is centered on one square of the tape. In our version of a Turing machine—which
is similar although not identical to the one proposed by Turing—a single move
is
determined by the current state and the current tape symbol, and consists of three
parts:

1. Replacing the symbol in the current square by another, possibly different


symbol;
2. Moving the tape head one square to the right or left (except that if it is already
centered on the leftmost square, it cannot be moved to the left), or
leaving it
where it is;
3. Moving from the current state to another, possibly different state.
The tape serves as the input device (the input is simply the string,
assumed to
be finite, of nonblank symbols on the tape originally), the memory
available for use
during the computation, and the output device (the output is the
string of symbols
left on the tape at the end of the computation). The most
significant difference
between the Turing machine and the simpler machines we
have studied is that ina
Turing machine, processing a string is no longer restricted
to a single left-to-right pass
through the input. The tape head can move in both directio
ns and erase or modify
any symbol it encounters. The machine can examine part
of the input, modify it,
take time out to execute some computations in a different
area of the tape, return to
re-examine the input, repeat any of these actions, and
perhaps stop the processing
before it has looked at all the input.
For similar reasons, we can dispense with one duty
previously performed by
certain states—that of indicating provisional accept
ance of the string read so far.
In particular, we can get by with two final, or
halting, states, beyond which the
computation need not continue: a state hg that
indicates acceptance and anothe r h,
that indicates rejection. If the machine is intended simply
to accept or reject the input
string, then it can move to the appropriate halt state
once it has enough information to
make a decision. If it is supposed to carry out some
other computation, the accepting
state indicates that the computation has terminated
normally; the state h, can be used
to indicate a “crash,” arising from some abnormal
situation in which the machin e
Cannot carry out its mission as expected. In any
case, the computation stops if the
Turing maching reaches either of the two halt
states. However—and this will turn
out to be very important—it is also possible for
the computation not to stop, and for
the Turing machine to continue making moves
forever.
CHAPTER 9 Turing Machines 321

ed Mochines

e initial state, is an element of 0;


<OU = (PU {A}) > (OU {ha, hy }) x(TU {A}) x (RL, } is pi
function (thatis, possibly undefined atcertain pon:

For elements g € O,r € QU {hy,h,}, X,Y € T U {A}, and D € {R,L,S}, we


interpret the formula
Oo XV = FA), D)
to mean that when T is in state g and the symbol on the current tape square is X,
the machine replaces X by Y on that square, changes to state r, and either moves the
tape head one square right, moves it one square left (if the tape head is not already
on the leftmost square), or leaves it stationary, depending on whether D is R, L, or S,
respectively. When r is either h, or h, in the formula, we say that T halts. Once it has
halted, it cannot move further, since 6 is not defined at any pair (hg, X) or (h,, X).
Finally, we permit the machine to crash by entering the reject state in case it tries
to move the tape head off the left end of the tape. This is a way for the machine to
halt that is not reflected by the transition function 6. If the tape head is currently on
the leftmost square, the current state and tape symbol are g and a, respectively, and
5(g,a) = (r,b,L), we will say that the machine leaves the tape head where it is,
replaces the a by b, and enters the state h, instead of r.
This terminology and these definitions are not completely standard. In our ap-
proach, a TM accepts a string by eventually entering the state hg after it starts with
that input. Sometimes acceptance is defined to mean halting (in any halt state), and
the only other way the computation is allowed to terminate is by crashing because
there is no move possible. In either approach, what is significant is that an observer
can see that the TM has stopped its processing and why it has stopped.
Normally a TM begins with an input string x € &* near the beginning of its
tape and all other tape squares blank. We do not always insist on this, for reasons to
be explained in Section 9.3; however, we do always assume that when a TM begins
its operation, there are at most a finite number of nonblank symbols on the tape. It
follows that at any stage of aTM’s computation, this will still be true. To describe the
status of aTM at some point, we must specify the current state, the complete contents
of the tape (through the rightmost nonblank symbol), and the current position of the
tape head. With this in mind, we represent a configuration of the TM by a pair
(q, xay)
is
where g € Q, x and y are strings over I U {A} (either or both possibly null), a
a symbol in I U {A}, and the underlined symbol represents the tape head position.
322 PART 4 Turing Machines and Their Languages

The notation is interpreted to mean that the string xay appears on the tape, beginning
in square 0, that the tape head is on the square containing a, and that all squares to
the right of y are blank. For a nonnull string w, writing (g, xw) or (q,xwy) will
mean that the tape head is positioned at the first symbol of w. If (g, xay) represents
a configuration, then y may conceivably end in one or more blanks, and we would
also say that (q, xayA) represents the same configuration; usually, however, when
we write (q, xay) the string y will either be null or have a nonblank last symbol.
Just as in the case of PDAs, we can trace a sequence of moves by showing the
configuration at each step. We write

(q, xay) Fr (r, zbw)


to mean that T passes from the configuration on the left to that on the right in one
move, and

(q, xay) FF (7, zbw)


to mean that T passes from the first configuration to the second in zero or more
moves. For example, if T is currently in the configuration (q, aabaAa) and 6(q, a)
=
(r, A, L), we would write

(q, aabaAa) Fz (r, aabA Aa)


The notations Fy and +%. are usually shortened to K and F*, respectively, as
long as
there is no ambiguity.
Input is provided to aTM by having the input string on the tape initially, beginnin
g
in square 1, and positioning the tape head on square 0, which is blank.
The initial
configuration corresponding to input x is therefore the configuration

(Go, Ax)

Now we can say how a TM accepts a string.

Definition 9.2 cceptance by a TM

When a Turing machine processes an input string


x, there are three possibilities.
The machine can accept the string, by entering the
state h,; it can explicitly reject
x, by entering the state h,; or it can enter an infinit
e loop, so that it never halts
but continues moving forever. In either of the first two cases, an observer
sees the
outcome and can tell whether or not the string is accept
ed. In the third case, however,
although the string will not be accepted, the observ
er will never find this out—there is
CHAPTER 9 Turing Machines 323

no outcome, and he is left in suspense. As undesirable as this may seem, we will find
that it is sometimes inevitable. In the examples in this chapter, we can construct the
machine so that this problem does not arise, and every input string is either accepted
or explicitly rejected.
In most simple examples, it will be helpful once again to draw transition diagrams,
similar to but more complicated than those for FAs. The move X/Y,D

5(q,
X) = (7, Y, D)
Figure 9.1 |
(where D is R, L, or S) will be represented as in Figure 9.1. A single Turing
Our first example should make it clear, if it is not already, that Turing machines machine move.
are at least as powerful as finite automata.

A TM Accepting {a, b}*{aba} {a, b}*


Consider the language

L = {a, b}*{aba}{a, b}* = {x € {a, b}* |x contains the substring aba}

L is a regular language, and we can draw an FA recognizing L as in Figure 9.2a. It is not


surprising that constructing a Turing machine to accept L is also easy, and that in fact we can
do it so that the transition diagrams look much alike. The TM is illustrated in Figure 9.2b. Its
input and tape alphabets are both {a, b}. The initial state does not really correspond to a state
in the FA, because the TM does not see any input until it moves the tape head past the initial
blank.
Figure 9.2b shows explicitly the transitions to the reject state h,” at each point where a
blank (the one to the right of the input) is encountered before an occurrence of aba has been
found. Figure 9.2c shows a simplified diagram, even more similar to the transition diagram
for the FA, in which these transitions are omitted. It is often convenient to simplify a diagram

Figure 9.2 |
An FA and a TM to accept {a, b}*{aba}{a, b}*.
324 PART 4 Turing Machines and Their Languages

Figure 9.3 |
An FA and a TM to accept {a, b}*{aba}.

this way; whenever we do, the diagram is to be interpreted as moving to


the state h, for each
combination of state and tape symbol for which no move is shown explicitly.
What a TM does
with the tape head on a final move of this type is essentially arbitrary, since
the computation
is now over; we may as well assume in general that the tape head moves
to the right, as in
Figure 9.2b. (We will talk later about combining two or more TMs, so
that a second one picks
up where the first one stops. In this case, what the first machine
does on its last move is not
arbitrary; however, we will allow such a composite TM to carry
out a two-phase computation
only if the first phase halts normally in the accept state h,.)
Because this language is regular, the TM in Figure 9.2a or Figure
9.2 is able to process
input strings the way a finite automaton is forced to, moving the
tape head to the right at each
step and never changing any tape symbols. Any regular language
can be accepted by a TM
that mimics an FA this way. As we would expect, this type of
processing will not be sufficient
to recognize a nonregular language.
Note also that as soon as the TM discovers aba on the tape,
it enters the state h, and
thereby accepts the entire input string, even though it may
not have read all of it. Of course,
some TMs must read all the input, even if the languages
they accept are regular. For
L, = {x € {a, b}* | x ends with aba}
for example, an FA and a TM are shown in Figure 9.3.
Since the TM moves the tape head to
the right on each move, it cannot accept without reading
the blank to the right of the last input
symbol. As in Figure 9.2c, the transitions to the reject
state are not shown explicitly.

A TM Accepting pal
To see a little more of the power of Turing machin
es, let us construct a TM to accept the
language pal of palindromes over {a, b}. Later
in this chapter we will introduce the possibility
of nondeterminism in a TM, which would allow
us to build a machine simulating the PDA
CHAPTER 9 Turing Machines 325

in Example 7.2 directly. However, the flexibility of TMs allows us to select any algorithm,
without restricting ourselves to a specific data structure such as a stack. We can easily formulate
a deterministic approach by thinking of how a long string might be checked by hand. You might
position your two forefingers at the the two ends. As your eyes jump repeatedly back and forth
comparing the two end symbols, your fingers, which are the markers that tell your eyes how
far to go, gradually move toward the center. In order to translate this into a TM algorithm,
we can use blank squares for the markers at each end. Moving the markers toward the center
corresponds to erasing (i.e., changing to blanks) the symbols that have just been tested. The
tape head moves repeatedly back and forth, comparing the symbol at one end of the remaining
nonblank string to the symbol at the other end. The transition diagram is shown in Figure 9.4.
Again the tape alphabet is {a, b}, the same as the input alphabet. The machine takes the top
path each time it finds an a at the beginning and attempts to find a matching a at the end.
If it encounters a b in state q3, so that it is unable to match the a at the beginning, it enters
the reject state h,. (As in Figure 9.2c, this transition is not shown.) Similarly, it rejects from
state go if it is unable to match a b at the beginning.
We trace the moves made by the machine for three different input strings: anonpalindrome,
an even-length palindrome, and an odd-length palindrome.

(qo, Aabaa) + (q;, Aabaa) | (gq, AAbaa) -* (qo, AAbaadA)


F (q3,\Abaa) | (q4, AAba) F* (q4, AAba)
F (q1, AAba) FF (gs, AAAa F (qs, AAAaA)
E (¢6, AA Aa) F(t, AAAGA) (reject)

A/A,R
(odd palindrome)

A/A,R
(even palindrome)

Figure 9.4 |
ATM to accept palindromes over {a, D}.
326 PART 4 Turing Machines and Their Languages

(qo, Aaa) |- (q,, Aaa) F (q2, AAa) F (q, AAaA)


- (93, AAa) I (q4, AA) F (gy, AAA)
F (hg, AAAA) (accept)

(qo, Aaba) + (q1,Aaba) + (qr, AAba) +* (q@, AAbaA)


F (q3, Aba) + (q4, AAD) F (q4, AAD)
F (qi, AAD) F (qs, AAAA) F (qs, AAA)
F (hg, AAAA) (accept)

A 1M Accepting {ss |s € {a, b}*}


For our third example of a Turing machine as a language acceptor, we consider a language that
we know from Example 8.2 not to be context-free. Let ,

ies SsalsmealcD

The idea behind the TM will be to separate the processing into two parts: first,
finding the
middle of the string, and making it easier for the TM to distinguish the symbols in
the second
half from those in the first half; second, comparing the two halves. We accomplish the
first task
by working our way in from both ends simultaneously, changing symbols to
their uppercase
versions as we go. This means that our tape alphabet will include A and B in
addition to the
input symbols a and b. Once we arrive at the middle—which will happen
only if the string is
of even length—we may change the symbols in the first half back to their
original form. The
second part of the processing is to start at the beginning again and, for
each lowercase symbol
in the first half, compare it to the corresponding uppercase symbol
in the second. We keep
track of our progress by changing lowercase symbols to uppercase and
erasing the matching
uppercase symbols.
There are two ways that an input string can be rejected. If its length
is odd, the TM will
discover this in the first phase. If the string has even length but
a symbol in the first half fails
to match the corresponding symbol in the second half, the TM
will reject the string during the
second phase.
The TM suggested by this discussion is shown in Figure
9.5. Again we trace it for three
Strings: two that illustrate both ways the TM can reject the input,
and one that is in the language.

(go, Aaba) + (q, Aaba) F (q2, AAba) F* (qx, AAbaA)


F (q3, AAba) F (q4, AADA) F (q4, AADA)
F (q:, AAba) - (q2, AABA F (q3, AABA)
+ (h., AABA) (reject)

(qo, Aabaa) Kk
(q1, Aabaa) ~~ (q, AAbaa) (42, AAbaaA)
F (qs, MAbaa) + (q4, AADGA) ~—E* (gg, Aba A)
(qi, AAbaA) + (qo, AABaA) + (q2, AABaA)
F (qs, AABaA) + (q4, AABAA) | (gq), AABAA)
F (qs, AABAA) + (gs, AABAA) | (qs, AabAA)
(first phase completed)
CHAPTER 9 Turing Machines 327

A/A,R
B/B,R

Figure 9.5 |
A Turing machine to accept {ss |s € {a, b}*}.

- (qe, AabAA) + (gg, AABAA) FF (Gg, AADAA)


+ (qo, AADAA) + (qo, AADAA) F (qo, AABAA)
+ (q;, AABAA) | (q7;, AABAA) + (h,, AABAAA) (reject)

(qo, Aabab) F*
(same as previous case, up to 3rd-from-last move)
+ (qs, AABAB) + (q7, AABAB) + (q7, AABAB)
F (q9, AABA) + (qo, AAB) F (qo, AABA)
- (hz, AABA) (accept)
Sees
pee ee ae ee
328 PART 4 Turing Machines and Their Languages

9.2| COMPUTING A PARTIAL FUNCTION


WITH A TURING MACHINE
Any computer program whose purpose is to produce a specified output string for
every legal input string can be thought of as computing a function from one set of
strings to another. Similarly, a Turing machine 7 with input alphabet © can compute
a function f whose domain is a subset of 5*. The idea is that for any string x in the
domain of f, whenever T starts in the initial configuration corresponding to input x,
T will eventually halt with the output string f(x) on the tape.
TMs in Section 9.1, which were used as language acceptors, did their jobs simply
by halting in the accepting state or failing to do so, depending on whether the input
string was in the language being accepted. The contents of the tape at the end of
the computation were not important. In the case of aTM computing a function f,
the emphasis is on the output produced for an input string in the domain of f. We
might say that for an input string not in the domain, the result of the computation
is irrelevant. However, we would like the TM to compute precisely the function f,
not some other function with a larger domain. Therefore, we will also specify
that
for an input string x not in the domain of f, the TM should not accept the input x.
It follows that in the process of computing the function, the TM also incidentally
accepts a language: the domain of the function.
It will be helpful in subsequent chapters to shift emphasis just slightly, and
to
talk about partial functions on *, rather than functions on subsets
of X*. This is
largely a matter of convenience, and there is no real difference except
in some of
the terminology. A partial function f on D* may be undefined at certain
points (the
points not in the domain of f); if it happens that f is defined everywhere
on D*, we
often emphasize the fact by referring to f as a total function. In order
for a Turing
machine to compute f, it is appropriate for the values of f to be strings
over the tape
alphabet of the machine.
ATM can handle a function of several variables as well. If the input
is to represent
the k-tuple (x1, x2,..., x,) € (*)*, the only change required
is to relax slightly the
tule for the input to a TM, and to allow the initial tape to contain
all k strings, separated
by blanks.

Definition 9.3 ATM Computing a-Function

etT = (0, 2,1, qo, 5) bea Turi yma ine, and let f beapartial function.
| &* with values in I'*. We tes f ifforeveryx e S*at
hich f is defined, — 7

and no other x é 2 is accepted b >. oS


__ If fis apartial function on (X values inI*, T computes f if
_ for every k-tuple (42, Xp) at f is defined, —
(@, A AmA--- Ax) H (he, AF >X2,. . AD) : _
CHAPTER 9 Turing Machines 329

It is still not quite correct to say that a TM computes only one function. One rea-
son is that two functions can look exactly alike except for having officially different
codomains (see Section 1.3). Another reaSon is thataTM might be viewed as comput-
ing either a function of one variable or a function of more than one. For example, if T
computes the function f : (D*)* — I’*, then T also computes f, : ©* > I* defined
by fi(x) = f(x, A). Wecan say, however, that for any specified k, and any C CI,
a given TM computes at most one function of k variables having codomain C.
Numerical functions of numerical arguments can also be computed by Turing ma-
chines, once we choose a way of representing the numbers by strings. We will restrict
ourselves to natural numbers (nonnegative integers), and we generally use the “unary”
representation, in which the integer n is represented by the string 1" = 11...1.

Reversing a String EXAMPLE 9.4


We consider the reverse function

rev :{a, b}* > {a, b}*

The TM we construct in Figure 9.6 to compute the function will reverse the input string “in
place” by moving from the ends toward the middle, at each step swapping a symbol in the first
half with the matching one in the second half. In order to keep track of the progress made so far,
symbols will also be changed to uppercase. A pass that starts in state g; with a lowercase symbol
state
on the left changes it to the corresponding uppercase symbol and remembers it (by going to
When the
q in the case of an a and q, in the case of a b) as it moves the tape head to the right.
the one on
TM arrives at g3 or qs, if there is a lowercase symbol on the right corresponding to
330 PART 4 Turing Machines and Their Languages

A/A,R

Figure 9.6 |
Reversing a string.

the left, the TM sees it, remembers it (by going to either 46 OF q7), and changes it to the symbol
it remembers from the left. The tape head is moved back to the left, and the first
uppercase
symbol that is encountered is changed to the (uppercase) symbol that had been on
the right.
For even-length strings, the last swap will return the TM to qi, and at that point the absence
of
any more lowercase symbols sends the machine to gg and the final phase of processing.
In the
case of an odd-length string, the last pass will not be completed normally, because
the machine
will discover at either q3 or qs that there is no lowercase symbol to swap
with the one on the
left (which therefore turns out to have been the middle symbol of the string).
When the swaps
have been completed, all that remains is to move the tape head to the
end of the string and
make one final pass back to the left, changing all the uppercase symbols
back to lowercase.
We trace the TM in Figure 9.6 for the odd-length string abb and
the even-length string
baba.

(qo, Aabb) L (q1, Aabb) ~ (q2, AAbb) F (q2, AAbb)


~ (q2, AAbbA) F (q3, AAbb) F (q7, AAbDA)
\- (q7, AAbDA) i (q1, ABbA) F (q4, ABBA)
- (qs, ABBA) - (qs, ABBA) F (gg, ABBAA)
~ (go, ABBA) b (qo, ABBa) F (qo, ABba)
F (qo, Abba) Fs (ha, Abba)
(qo, Ababa) F (qi, Ababa) + (q4, ABaba) kK
(qa, ABaba)
i (qs, ABaba) - (q4, ABabaA) F (qs, ABaba)
b (qo, ABabB) - (qs, ABabB) F (qo, ABabB)
- (qi, AAabB) b (q2, AAADB) F (q2, AAADB)
F (q3, AAADB) F (q7, AAAAB) + (q;, AABAB)
- (qs, AABAB) b (qs, AABABA) + (qo, AABAB)
-* (gg, Aabab) kb
(ha, Aabab)
CHAPTER 9 Turing Machines 331

Figure 9.7 |
A Turing machine to compute n mod 2.

nmod 2 | EXAMPLE9.5 —
The numerical function that assigns to each natural number n the remainder when n is divided
by 2 can be computed by moving to the end of the input string, making a pass from right to
left in which the 1’s are counted and simultaneously erased, and either leaving a single 1 (if
the original number was odd) or leaving nothing. The TM that performs this computation is
shown in Figure 9.7.

The Characteristic Function of a Set EXAMPLE 9.6


For any language L C >”, the characteristic function of L is the function x, : &* — {0, 1}
defined by the formula

1 ifxeL
XL(x) = |
0 otherwise

Computing the function x, is therefore similar in one respect to accepting the language L (see
Section 9.1); instead of distinguishing between strings in L and strings not in L by accepting or
not accepting, the TM accepts every input, and distinguishes between the two types of strings
by ending up in the configuration (h,, A1) in one case and the configuration (hz, AO) in the
other.
If we have aTM T computing x,, we can easily obtain one that accepts L. All we have to
do is modify T so that when it leaves output 0, it enters the state h, instead of h,. Sometimes it
is possible to go the other way; a simple example is the language L of palindromes over {a, b}
(Example 9.2). ATM accepting L is shown in Figure 9.4, and a TM computing x, is shown
in Figure 9.8.
It is obtained from the previous one by identifying the places in the transition diagram
where the TM might reject, and modifying the TM so that instead of entering the state h, in
those situations, it continues in a way that ends up in state h, with output 0. For any language
L accepted by a TM T that halts on every input string, another TM can be constructed from T
that computes x,, although the construction may be more complicated than in this example.
ATM of either type effectively allows an observer to decide whether a given input string is in
L; a “no” answer is produced in one case by the input being rejected and in the other case by
output 0.
As we saw in Section 9.1, however, a TM can accept a language L and still leave the
question of whether x € L unanswered for some strings x, by looping forever on those inputs.
332 PART 4 Turing Machines and Their Languages

Figure 9.8 |
Computing x, for the set of palindromes.

(If we could somehow see that the TM was in an infinite loop, we would have the answer;
but if we were depending on the TM to tell us, we would wait forever.) In this case, a TM
computing the function x; would be better, because it would guarantee an answer for every
input. Unfortunately, it is no longer clear that such a machine can be obtained from T. We
will return to this question in Chapter 10.
ee ee “eee 2ST se ee

9.3 |COMBINING TURING MACHINES


One of the purposes of this chapter is to suggest the power and generality of Turing
machines. As you can tell from the examples so far, much of the work that goes
on
during a TM computation consists of routine, repetitive tasks such as moving the tape
head from one side of a string to the other or erasing a portion of the tape. If we were
required to describe every TM monolithically, showing all the low-level details, we
would quickly reach a practical limit on the complexity of the problems
we could
solve. The natural way to construct a complicated TM (or any other
complicated
algorithm or piece of software) is to build it from simpler, reusable component
s.
In the simplest case, we can construct a composite Turing machine by executin
g
first one TM and then another. If 7; and T, are TMs, with disjoint
sets of nonhalting
states and transition functions 5, and 55, respectively, we write
7; T> to denote this
composite TM. The set of states is the union of the two sets. T, Ty begins
in the initial
CHAPTER 9 Turing Machines 333

state of T, and executes the moves of T; (using the function 5,) up to the point where
T, would halt; for any move that would cause T; to halt in the accepting state, T; T>
executes the same move except that it moves instead to the initial state of T>. At this
point the tape head is positioned at the square on which 7; halted. From this point
on, the moves of 7; 7> are the moves of T> (using the function 52). If either T; or T>
would reject during this process, T; T) does also, and T;T> accepts precisely if and
when 7> accepts.
In order to use this composite machine in a larger context, in a manner similar to
a transition diagram but without showing the states explicitly, we might also write
T; — Th

We can also make the composition conditional, depending on the current tape
symbol when 7; halts. We might write

Te
to stand for the composite machine 7,7T'T>, where T’ is described by the diagram
in Figure 9.9. This composite machine can be described informally as follows: It e a/a, S (7)

executes the TM 7; (rejecting if 7, rejects, and looping if 7; loops); if and when 7;


accepts, it executes T> if the current tape symbol is a and rejects otherwise. Figure 9.9 |
It is easier to understand at this point why we said in Section 9.1 that a TM is not
always assumed to start with the tape head on the leftmost square of the tape. When
aTM built to carry out some specific task is used as a component of a larger TM, it is
likely to be called in the middle of a computation, when the tape head is at the spot on
the tape where that task needs to be performed. It may be that the TM’s only purpose
is to change in some other specific way the tape contents and/or head position so as
to create a beginning configuration appropriate for the component that follows.
Some TMs that would only halt in the rejecting state when viewed as self-
contained machines (e.g., the TM that executes the algorithm “move the tape head
one square to the left”) can be used successfully in combination with others. On the
other hand, if a TM halts normally when run independently, then it will halt normally
when it is used as a component of a larger machine, provided that the tape has been
prepared properly before its use. For example, a TM T expecting to find an input
string z needs to begin in a configuration of the form (q, yAz). As long as T halts
normally when processing input z in the ordinary way, the processing of z in this way
does not depend on y. (The reason is that if T halts normally when started in the
configuration (q, Az), then in particular T will never attempt to move its tape head
to the left of the blank; therefore, starting in the configuration (¢q, yAz), T will never
actually see any of the symbols of y.) The correct execution of T does, however,
depend on the tape being blank to the right of z, unless more is known about the space
required for the computation involving z.
In order to be able to describe composite TMs without having to describe every
primitive operation as a separate TM, it is sometimes useful to use a mixed notation in
which some but not all of the states of aTM are shown. For example, the diagram in
Figure 9.10a, which is an abbreviated version of the one in Figure 9.10, has a fairly
obvious meaning. If the current tape symbol is a, execute the TM T; if it is b, halt in
334 PART 4 Turing Machines and Their Languages

Figure 9.10 |

the accepting state; and if it is anything else, reject. In the first case, assuming T halts
normally, repeat the execution of T until T halts normally scanning some symbol
other than a; if at that point the tape symbol is b, halt normally, otherwise reject.
(The machine might also reject during one of the iterations of T; and it might loop
forever, either because one of the iterations of T does or because T halts normally
with current tape symbol a every time.)
Although giving a completely precise definition of an arbitrary combination of
TMs would be complicated, it is usually clear in specific examples what is involved.
There is one possible source of confusion, however, in the notation we are adopting.
Consider a TM T of the form 7; -> 7. If 7; halts normally scanning some symbol
not specified explicitly (i.e., other than a), T rejects. However, if T, halts normally,
T does also—even though no tape symbols are specified explicitly. We could avoid
this seeming inconsistency by saying that if 7, halts normally scanning a symbol
other than a, T halts normally, except that T would then not be equivalent to the
composition 7; 7’T> described above, and this seems undesirable. In our notation, if
at the end of one sub-TM’s operation at least one way is specified for the composite
TM to continue, then any option that allows accepting at that point must be shown
explicitly, as in Figures 9.1la and 9.11b. (The second figure is a shortened form of
the first.)
Some basic TM building blocks, such as moving the head a specified number of
positions in one direction or the other, writing a specific symbol in the current square,
and searching to one direction or the other for a specified symbol, are straightforward
and do not need to be spelled out. We consider a few slightly more involved operations.

T, +> T, tye a ey

b/b,§
G (a) (he)
(a) (d)

Figure 9.11 |
CHAPTER 9 Turing Machines 335

a/a, R a/a,R
b/b,R b/b,R

a/a, L a/a, L
b/b, L b/b, L

a/a, R
b/b,R

A/A, R
A/a, Ib, BB, R

Figure 9.12 |
A Turing machine to copy strings.

Copying a String | EXAMPLE9.7 |


Let us construct a TM that creates a copy of its input string, to the right of the input but with
a blank separating the copy from the original. We must be careful to specify the final position
of the tape head as well; let us say that if the initial configuration is (qo, Ax), where x is a
string of nonblank symbols, then the final configuration should be (hz, Ax Ax). The TM will
examine the first symbol, copy it in the right place, examine the second, copy it, and so on. It
will keep track of its progress by changing the symbols it has copied to uppercase. We assume
for simplicity that the input alphabet is {a, b}; all that is needed in a more general situation is
a modified (“uppercase”) version of each symbol in the input alphabet. When the copying is
complete, the uppercase symbols will be changed back to the original. The TM is shown in
Figure 9.12.

Deleting a Symbol EXAMPLE 9.8


It is often useful to delete a symbol from a string. A Turing machine does this by changing the
tape contents from yaz to yz, where y € (X U{A})*,a € UU {A}, and z € &*. (Remember
that yz means that the tape head is positioned on the first symbol of z, or on a blank if z is null.)
Again we assume that the input alphabet is {a, b}. The TM starts by replacing the symbol to be
deleted by a blank, so that it can be located easily later. It moves to the right end of the string
z and makes a single pass from right to left, moving symbols one square to the left as it goes,
until it hits the blank. The transition diagram is shown in Figure 9.13. The states labeled q,
and q, are what allow the machine to remember a symbol between the time it erases it and the
time it writes it in the next square to the left. Of course, before it writes each symbol, it reads
the symbol being written over, which determines the state it should go to next.
336 PART 4 Turing Machines and Their Languages

a/a, L

Figure 9.13 |
A Turing machine to delete a symbol.

Inserting a symbol a, or changing the tape contents from yz to yaz, would be done virtually
the same way, except that the single pass would go from left to right, and the move that starts
it off would write a instead of A. You are asked to complete this machine in Exercise 9.13.

The Delete machine transforms yaz to yz. What if it is called when the tape
contents are not yaz, but yazAw, where w is some arbitrary string of symbols?
At first it might seem that the TM ought to be designed so as to finish with yzAw
on the tape. A closer look, however, shows that this is unreasonable. Unless we
know something about the computations that have gone on before, or unless the
rightmost nonblank symbol has been marked somehow so that we can recognize it,
we have no way of finding it. The instructions “move the tape head to the square
containing the rightmost nonblank symbol” cannot ordinarily be executed by a TM
(Exercise 9.12). In general, a Turing machine is designed according to specifications,
which say that if it starts in a certain configuration it should halt normally in some
other specified configuration. The specifications may say nothing about what the
result should be if the TM starts in some different configuration. The machine may
satisfy its specifications and yet behave unpredictably, perhaps halting inh, or looping
forever, in these abnormal situations.

SeUioeaeme Another Way of Accepting pal


Suppose that Copy is the TM in Figure 9.12 and Reverse is the one in Figure 9.6. Let Equal bea
TM that works as follows: When started with tape Ax Ay, where x, y € {a, b}*, Equal
accepts
if and only if x = y. Then the composite TM shown in Figure 9.14 accepts the
language of
palindromes over {a, b} by comparing the input string to its reverse and accepting if
and only
if the two are equal.
CHAPTER 9 Turing Machines 337

—+( + Copy Reverse Equal

Figure 9.14 |
Another way of accepting pal.

9.4 |VARIATIONS OF TURING MACHINES:


MULTITAPE TMs
The version of Turing machines introduced in Section 9.1 is not the only possible one.
There are a number of variations in which slightly different conventions are followed
with regard to starting configuration, permissible moves, protocols followed to accept
strings, and so forth. In addition, the basic TM model can be enhanced in several
natural ways. In this section we mention a few of the variations and investigate one
enhanced version, the multitape TM, in some detail.
A more user-friendly TM, such as one with additional tapes, can make it easier
to describe the implementation of an algorithm: The discussion can highlight the
individual data items stored on the various tapes, without getting bogged down in
the bookkeeping techniques that would be necessary if all the data were stored on
and retrieved from a single tape. Thus it will often be useful to have these enhanced
versions available in subsequent discussions. However, it will turn out that in spite
of the extra convenience, there is no change in the ultimate computing power. Seeing
the details needed to show this may help you to appreciate the power of a Turing
machine. Finally, the discussion in this section will provide a useful example of how
one type of computing machine can simulate another.
In order to compare two classes of abstract machines with regard to computing
power, we must start by saying what we mean by this term. At this point, we are not
considering speed, efficiency, or convenience; we are concerned only with whether
the two types of machines can solve the same problems and get the same answers. A
Turing machine of any type gives an “answer,” first by accepting or failing to accept,
and second by producing a particular output when it halts in the accepting state. This
means that if we want to show that machines of type B are at least as powerful as
those of type A, we need to show that for any machine Ty of type A, there is amachine
Tp of type B that accepts exactly the same input strings as T' and produces exactly
the same output as T, whenever it halts in the accepting state.
First we mention briefly a few minor variations on the basic model, each of
them slightly more restrictive. One possibility is to require that in each move the
tape head move either to the right or to the left. In this version, the values of the
transition function 6 are elements of (Q U {hg, h,}) x (T U {A}) x {L,R} instead of
(QU {hg, hy}) x (T U{A}) x {L,R,S}. A second possibility is to say that a move can
include writing a tape symbol or moving the tape head but not both. In this case 6
would take values in (OQ U {hg, h,}) x (T U{A} U{L,R}), where L and R are assumed
338 PART 4 Turing Machines and Their Languages

not to be elements of I. In both cases it is easy to see that the restrictions do not
reduce the power of the machine. You are referred to the exercises for the details.
One identifiable difference between a Turing machine and a typical human com-
puter is that a TM has a one-dimensional tape with a left end, rather than sheets of
paper that might be laid out in both directions. One way to try to increase the power of
the machine, therefore, might be to remove one or both of these restrictions: to make
the “tape” two-dimensional, or to remove the left end and make the tape potentially
infinite in both directions. In either case we would start by specifying the rules under
which the machine would operate, and the conventions that would be followed with
regard to input and output. Again the conclusion is that the power of the machine
is not significantly changed by either addition, and again we leave the details to the
exercises.
Rather than modifying the tape, we might instead add extra tapes. We could
decide in that case whether to have a single tape head, which would be positioned at
the same square on all the tapes, or a head on each tape that could move independently
of the others. We choose the second option. An n-tape Turing machine will be
specifiable as before by a 5-tuple T = (Q, XT, go, 5). It will make a move on the
basis of its current state and the n-tuple of tape symbols currently being examined;
since the tape heads move independently, we describe the transition function as a
partial function
620 xd UfA}) => (CUTE, 1) x (UTA x IRE st
The notion of configuration generalizes in a straightforward way: A configuration of
an n-tape TM is specified by an (n + 1)-tuple of the form

(4, X11 1, X242Y2, -.-, XnAnYn)


with the same restrictions as before on the strings x; and y;.
We take theinitial configuration corresponding to input string x to be
(qo, Bx, Ay SA)
In other words, the first tape is the one used for the input. We will also say that the
output of an n-tape TM is the final contents of tape 1. Tapes 2 through n are used
for auxiliary working space, and when the TM halts their contents are ignored. In
particular, such aTM computes a function f if, whenever it begins with an input string
x in (or representing an element in) the domain of f, it halts in some configuration
(ha, Af(x), ...), where the contents of tapes 2 through n are arbitrary, and otherwise
it fails to accept.
It is obvious that for any n > 2, n-tape TMs are at least as powerful as ordinary
1-tape TMs. To simulate an ordinary TM, a TM with n tapes simply acts as if tape 1
were its only one and leaves the others blank. We now show the converse.

Lee ue a, 51)be
Then there is a one-t eT I> = (Q2,%
_ satisfying the following two conditions. —
CHAPTER 9 Turing Machines 339
340 PART 4 Turing Machines and Their Languages

- At this point the actual simulation starts. T; makes moves °


_ O(p, a1, @) =@, by, by, Di, Do)
one a, and az are the symbols in the current squares of the respective
tapes. Because 7> has. only one tape head, it must determine which move it
is to make at the next step by locating the primed symbols on the two tracks :
of its tape; it then carries out the move by making the appropriate changes —
to both its tracks, including the creation of new primed symbols to reflect -
ay ae in ne positions of the oytoe of i

. Ifthe current square contains #,reject, since T; would have craked te cee to _
move the tape head off tape 2. If not, and if the new square does not containa
pair of symbols (becau: :D2 = R and Ty has not previo sly examined positions _
this far to the right), ¢ ert the symbol a there to the pair ((a, A’); if the new
_ Square does contain a pair, say (a, b), convert it to. @,b’).Move the tape head ::
back to the beginning. a -
: Locate the pair (a, ¢) again, as in oe 1.
Fee it to (b;, c) and mo\
CHAPTER 9 Turing Machines 341

Corollary 9.1. Any language that is accepted by an n-tape TM can be accepted by


an ordinary TM, and any function that is computed by an n-tape TM can be computed
by an ordinary TM.
Proof The proof is immediate from Theorem 9.1.

9.5| NONDETERMINISTIC
TURING MACHINES
Nondeterminism plays different roles in the two simpler models of computation we
studied earlier. It is convenient but not essential in the case of FAs, whereas the
language pal is an example of a context-free language that cannot be accepted by
any deterministic PDA. Turing machines have enough computing power that once
again nondeterminism fails to add any more. Any language that can be accepted by a
nondeterministic TM can be accepted by an ordinary one. The argument we present
to show this involves a simulation more complex than that in the previous section.
Nevertheless, the idea of the proof is straightforward, and the fact that the details get
complicated can be taken as evidence that TMs capable of implementing complex
algorithms can be constructed from the same kinds of routine operations we have
used previously.
A nondeterministic Turing machine (NTM) T = (Q, 2, IT, qo, 5) is defined ex-
actly the same way as an ordinary TM, except that values of the transition function 6
are subsets, rather than single elements, of the set (QU {ha, h,}) x T'U{ A}) x {R,L,S}.
We do not need to say that 6 is a partial function, because now 6(q, a) is allowed to
take the value @.
342 PART 4 Turing Machines and Their Languages

The notation for a TM configuration is also unchanged. To say that

(p, xay) Fr (q, whz)


now means that beginning in the first configuration, there is at least one move that
will produce the second. Similarly,

(p, xay) FF (q, wbz)


means that there is at least one sequence of zero or more moves that takes T from the
first configuration to the second. With this definition, we may still say that a string
x € &* is accepted by T if for some a € I U {A} and some y, z € (I U {A})*,

(qo, Ax) +} (ha, yaz)


The idea of output will not be as useful in the nondeterministic case, because for a
given NTM there could conceivably be an infinite set of possible outputs. NTMs that
produce output, such as those in Exercise 9.29, willbeused primarily as components
of larger machines. When we compare NTMs to ordinary TMs, we will restrict
ourselves to machines used as language acceptors.
Because every TM can be interpreted as a nondeterministic TM, it is obvious
that a language accepted by a TM can be accepted by an NTM. The converse is what
we need to show.

accepts input xif and only if there or igo _—


is some sequence ofnmoves of 7 ¢
| : The strat Byfo: constructing riis
x that would cause it to accept.
| lett try

sake a simplicity tha


a LL proof we present inthis case
¢ will ei

; Figure 9.15 to represent the sequences ‘moves 7 hath ‘ake on


Nodes in the tree represent configurations of T;. The root isthe initial
config-
a uration corresponding toinput x, and the children of any node N correspond -—
___ to the configurations 7; m reach in one step fr ~
The convention we hav Limpliesthat every
(o)children, and a leafnode represents ahalting confige ration
CHAPTER 9 Turing Machines 343

Figure 9.15 |
The computation tree for a nondeterministic TM

: We can therefore think of 7,’s job as searching the tree for accepting
configurations. Because the tree might be infinite, a breadth-first approachis
appropriate: T) will try all possible single moves, then all possible sequences —
of two moves, then all possible sequences of three moves, and so on. The
machine we actually construct will be inefficient in that every sequence oO
n+ 1moves will involve repeating a sequence of n moves tried| previously. |
_ Even ifthe tree is finite (which means that for some n, every possible :sc
: quence of n moves 7; can make on input x leads to a halt), 7> will still loop —
forever if T, never accepts: It will attempt to try longer and longer sequences _
of moves, and the effect will be that it ends up repeating the same sequences _
of moves, in the same order, over and over. However, if x €¢ L(T7;), then for —
some n there is a sequence of n moves that causes dq toues jap x,and _
T> willeventually get around to trying that sequence. _
We will take advantage of Theorem 9.1 by giving 72 fet tapes. The _
rst is used only to save the original input string, and its contents are never .
changed. The second is used to keep track of the sequence of moves of Ty :
_ that 7) is currently attempting to execute. The thirdis the “working tape,”
corresponding to hs s tape, where T) actually carries out the steps specified
by the current string on tape 2. Every time 7> begins trying a new sequence, :
the third tape is erased and the input from tape | re-copied onto it. -
A particular sequence of moves will be represented by a string of bi-
nary digits. The string 001, for example, represents the following sequence:
first, ‘the move representing the first (i.e., Oth) of the two choices from the
initial configuration Co, which takes 7, to some configuration C;; next,
the first possible move from the configuration C,, which leads to some .
n
move from C. Because moves
configuration Crsnext, thesecond possible r -
344 PART 4 Turing Machines and Tneir Languages

G ame.) *)
InitializeTapes2&3 CopyInput

Ge a) S)
S#*

EraseTape3

Figure 9.16 |
Simulating a nondeterministic TM by a three-tape TM.

0 and i may be the same, there may be several strings of digits thatdescribe
the same sequences of moves. There may also be strings of digits that do not
_ correspond to sequences of moves, because the first few moves cause 7; to
_ halt. When 7 encounters a digit that does not acu toaan executable
move, it will abandon the string.
We will use the canonical ordering of {0, i in which thestrings are .
i arranged iin the order .
a 0, 1, 00, 01, 10, a 000, 001, . viii, 0000, .

N strings of the Be leng his numerical.) ‘Gics a string oe a


representing a sequence of moves, 7> generates the next string in this ordering
Le -
: eter a assaee 2 is representation
oF a1 and addi ng ae a= av,

. S< ee
ibe the peneral structure
of fivesmaller T Ms called ees picesZ Execute
ii
iss composed
Erase-
e

Jnitiadi eapes2&3writes the symbol 01 in


0 square1 of~ 20
the sequ i

isead off the tape. nine copies thecngiaaltinput string a


x from tape 1 onto tape 3, so that tape 3 has contents #Ax. Execute (which |
oo we will discuss in more detail shortly) 1
is the TM that tually simulates the -
y moves currently
ote on tape 2. Its crucial feature is that it finishes with asymbol 5 in a
_the current square of tape 3, ands = * if and only if the sequence 0 “
- causes r1 to
ee In this case nhaccepts and otherwise itcontin

A. useSth length «fe)


- fape 2 limits howve to the right the rightmost nonblank symbol
can be. Finally, NextSequence is the component alread m
updates the string of digits on tape2using the operat
meee Figure 9. ivea“

outthis ners o) :secondape only,iignoring tapes4anc |


CHAPTER 9 Turing Machines 345

1/0, L 0/0, R 0/0, L

0/0, L
WAIST Y,

Figure 9.17 |
The one-tape version of NextSequence.
346 PART 4 Turing Machines and Their Languages

(0, a)/(O, *), (R, S)

(0, A)/(O, A), (R, S)


(1, A)/(1, A), (R, S)
(1, a)/(1, a), (R, L) (A, a)/(A, a), (S, S)
(A, A)/(A, A), (S, S)
G A/C, #), (S, R)

(F ayl(-, a), (S, S)


(G AVI, A), (S, S)

(a) (b)

Figure 9.18 |
A typical small portion of Execute.

POV Ieee A Simple Example of Nondeterminism


Consider the TM Double that works as follows. Using the Copy TM from Example 9.7,
modified for a one-symbol input alphabet, it makes a copy of the numerical input. It then
deletes the blank separating the original input from the copy and returns the tape head to square
0. Just as the name suggests, it doubles the value of the input.
CHAPTER 9 Turing Machines 347

Figure 9.19 |
An NTM to accept strings of length 2’.

Now look at the nondeterministic TM T in Figure 9.19. T moves past the input string,
places a single 1 on the tape, separated from the input by a blank, and after positioning the tape
head on the blank, executes Double zero or more times before returning the tape head to square
0. Finally, it executes the TM Equal, which works as follows: Starting with tape contents
Ax Ay, where x and y are strings of 1’s, Equal accepts if and only if x = y (see Example 9.9).
The nondeterminism in T lies in the indeterminate number of times Double is executed.
When Equal is finally executed, the string following the input on the tape represents some
arbitrary power of 2. If the original input string happens to represent a power of 2, say 2', then
there is a sequence of choices T can make that will cause the input to be accepted—namely,
the sequence in which Double is executed exactly i times. On the other hand, if the input string
is not a power of 2, it will fail to be accepted, because in the last step it is compared to a string
that must be a power of 2. Our conclusion is that T accepts the language (2 a0}.
We do not need nondeterminism in order to accept this language. (Nondeterminism is
never necessary, by Theorem 9.2.) It merely simplifies the description. One deterministic
way to test whether an integer is a power of 2 is to test the integer first to see if it is 1 and, if
not, perform a sequence of divisions by 2. If at any step before reaching 1 we get a nonzero
remainder, we answer no. If we finally obtain the quotient 1 without any of the divisions
producing a remainder, we answer yes. We would normally say, however, that multiplying
by 2 is easier than dividing by 2. An easier approach, therefore, might be to start with 1 and
perform a sequence of multiplications by 2. We could compare the result of each of these to the
original input, accepting if we eventually obtained a number equal to it, and either rejecting if
we eventually obtained a number larger than the input, or simply letting the iterations continue
forever.
The nondeterministic solution T in Figure 9.19 is closer to the second approach, except
that instead of comparing the input to each of the numbers 2’, it guesses a value of i and tests
that value only. Removing the nondeterminism means replacing the guess by an iteration in
which all the values are tested; a deterministic TM that did this would simply be a more efficient
version of the TM constructed in the proof of Theorem 9.2, which tests all possible sequences
of moves of T.

9.6 |UNIVERSAL TURING MACHINES


In our discussions so far, a Turing machine is created to execute a specific algorithm.
If we have a Turing machine for computing one function, then computing a different
348 PART 4 Turing Machines and Their Languages

function or doing some other calculation requires a different machine. Originally,


electronic computers were limited in a similar way, and changing the calculation to
be performed meant rewiring the machine.
A 1936 paper by Turing, however, anticipated the stored-program computers you
are familiar with. Although a modern computer is still “hard-wired,” it is completely
flexible in that the task it performs is to execute the instructions stored in its memory,
and these can represent any conceivable algorithm. Turing describes a “universal
computing machine” that works as follows. It is a TM T, whose input consists
essentially of a program and a data set for the program to process. The program takes
the form of a string specifying some other (special-purpose) TM 7}, and the data
set is a second string z interpreted as input to T;. 7, then simulates the processing
of z by 7;. In this section we will describe one such universal Turing machine
ee
The first step is to formulate a notational system in which we can encode both
an arbitrary TM 7 and an input string z over an arbitrary alphabet as strings e(7))
and e(z) over some fixed alphabet. The crucial aspect of the encoding is that it must
not destroy any information; given the strings e(7) and e(z), we must be able to
reconstruct the Turing machine 7, and the string z. We will use the alphabet {0, 1},
although we must remember that the TM we are encoding may have a much larger
alphabet. We start by assigning positive integers to each state, each tape symbol, and
each of the three “directions” S, L, and R in the TM 7; we want to encode.
At this point, a slight technical problem arises. We want the encoding function
e to be one-to-one, so that a string of 0’s and 1’s encodes at most one TM. Consider
two TMs 7; and 7> that are identical except that the tape symbols of 7; are a and
b and those of T> are a and c. If we really want to call these two TMs different,
then in order to guarantee that their encodings are different, we must make sure that
the integers assigned to b and c are different. To accommodate any TM and still
ensure that the encoding is one-to-one, we must somehow fix it so that no symbol in
any TM’s alphabet receives the same number as any other symbol in any other TM’s
alphabet. The easiest way to handle this problem is to fix once and for all the set of
symbols that can be used by TMs, and to number these symbols at the outset. This is
the reason for the following

Convention. We assume from this point on that there are two fixed infinite sets
OQ = {q1,q,...} and S = {a,,a,...} so that for any Turing machine T =
(Q, u,T,qo,5), wehave QC Qandr' CS.

It should be clear that this assumption about states is not a restriction at all,
because the names assigned to the states of a TM are irrelevant. Furthermore, as
long as S contains all the letters, digits, and other symbols we might want in our
input alphabets, the other assumption is equally harmless (no more restrictive, for
example, than limiting the character set on a computer to 256 characters). Once we
have a subscript attached to every possible state and tape symbol, we can represent a
state or a symbol by a string of 0’s of the appropriate length; 1’s are used as separators.
CHAPTER 9 Turing Machines 349

inition 9.5 The Encoding Function e

| First we associate to each tape symbol (including ‘>,to each state (including
. lg and h,r), and toeach of the three directions, a string of 0’s. Let
s(A) =
s(a;) = s (for each a; € S)
“s(ha) = 0
s(h,) = 00
s(qi) =0'°? (for each g; € Q)
s(S) =
s(L) =
s(R) = 000
Each move m of a TM, described by the formula
3(p, a) = (q, b, D)
is encoded by the string :
, C0) = s(p)1s(a)1s(q)1s@)1s(D)1
: and 24any TM ie — initial state q, T iis encoded by the string |
ne e(T)=s(q)le(m)le(ms)1---e(m)1
wherem 1,1, - +, m, are the distinct moves of 7, arrangedin some arbitrary
ane Finallyanyeee Z = 2122 +++ Ze, where each z; € S, isencoded by
eZ) =“As()IsCea)1 -s(z) 1

The 1 at the beginning of the string e(z) is included so that in a composite string
of the form e(T )e(z), there will be no doubt as to where e(7) stops. Notice that one
consequence is that the encoding s(a) of a single symbol a € S is different from the
encoding e(a) of the one-character string a.
Because the moves of a TM T can appear in the string e(T’) in any order, there
will in general be many correct encodings of 7. However, any string of 0’s and 1’s
can be the encoding of at most one TM.

The Encoding of a Simple TM Bx


Consider the TM illustrated in Figure 9.20, which transforms an input string of a’s and b’s by
changing the leftmost a, if there is one, to b. Let us assume for simplicity that the tape symbols
a and b are assigned the numbers | and 2, so that s(a) = 00 and s(b) = 000, and that the states
qo, p, and r are given the numbers |, 2, and 3, respectively. If we take the six moves in the
order they appear, left-to-right, the first move 5(go, A) = (p, A, R) is encoded by the string

0°10! 10*10!10°1 = 00010100001010001


350 PART 4 Turing Machines and Their Languages

b/b, R b/b, L

Figure 9.20 |

and the entire TM by the string

0001 000101000010100011 00001000100001000100011 0000100100000100010011


0000101000001010011 000001000100000100010011 000001010101011

Remember that the first part of the string, in this case 0001, is to identify the initial state of the
TM. The individual moves in the remainder of the string are separated by spaces for readability.

The input to the universal TM 7, will consist of a string of the form e(T )e(z),
where T is a TM and z is a string over T’s input alphabet. In Example 9.11, if the
input string to T were baa, the corresponding input string to T,, would consist of the
string e(T) given in the example, followed by 10001001001. On any input string of
the form e(T)e(z), we want T,, to accept if and only if T accepts input z, and in this
case we want the output from 7, to be the encoded form of the output produced by T
on input z.
Now we are ready to construct 7,. It will be convenient to give it three tapes.
According to our convention for multitape TMs, the first tape will be both the input
and output tape. It will initially contain the input string e(T)e(z). The second tape
will be the working tape during the simulation of T, and the third tape will contain
the encoded form of the state T is currently in.
The first step of T,, is to move the string e(z) (except for the initial 1) from the
end of tape | to tape 2, beginning in square 3. Since T begins with its leftmost square
blank, 7,, will write 01 (because 0 = s(A)) in squares | and 2 of tape 2; square 0
is left blank, and the tape head is positioned on square 1. The next step for 7, is to
copy the encoded form of T’s initial state from the beginning of tape | onto tape 3,
beginning in square 1, and to delete it from tape 1.
After these initial steps, 7, is ready to begin simulating the action of T (encoded
on tape 1) on the input string (encoded on tape 2). As the simulation starts, the three
tape heads are all in square 1. The next move of T at any point is determined by T’s
state (encoded on tape 3) and the current symbol on T’s tape, whose encoding starts
in the current position on tape 2. In order to simulate this move, 7,, must search tape
1 for the 5-tuple whose first two parts match this state-input combination. Abstractly
this is a straightforward pattern-matching operation, and a TM that carries it out is
shown in Figure 9.21. Since the search operation never changes any tape symbols,
we have simplified the labeling slightly, by writing as

(a, b,c), (Di, D2, D3)


CHAPTER 9 Turing Machines 351

(0, ~~) -), (R, S, S)

Galn=)n(SeReS)
(GaAr nukes)

(0, =) -), (R, S, S)

G, 0; —), GS, E.'S)

OS) D7 (RS, L) (0, 1, —), (R, L, S)


(1, = -), &, S, S)
(1, -, 0), (R, S, L) (1, 0, -), (R, L, S)

(0, —, 0), (R, S, R) (1,—, A), (R,.S; L) GCG ae A), (S, Ss R)

CON:

(1, 1, -), (R, L, S)


(0, —, 0), (R, S, R) C30) (Sasi)

(Gp 0, =) (S, L, S)

Figure 9.21 |
Finding the right move on tape 1.

what would normally be written as

(a, b, c)/(a, b,c), (Di, D2, D3)


Once the appropriate 5-tuple is found, the last three parts tell 7,, how to simulate
the move. To illustrate, suppose that before the search, T,,’s three tapes look like this:
A00010100001010001100001001000001000100110001...
AOLOO1LOOLOOOLA...
AOO00A...
The corresponding tape of T would be
AaabA...
assuming that the symbols numbered 1 and 2 are a and b, respectively, and T would
be in state 2 (the one with encoding 0000). After the search of tape 1, the tapes look
like this:
A00010100001010001100001001000001000100110001...
AO10010010001A...
AOQOO0A...
352 PART 4 Turing Machines and Their Languages

The 5-tuple on tape 1 specifies that T’s current symbol should be changed to b,
the head should be moved left, and the state should be changed to state 3. These
operations can be carried out by 7, in a fairly straightforward way, and we omit the
details. The final result is
A.00010100001010001100001001000001000100110001...
A0100100010001 A...
AQO000A...
and T, is now ready to simulate the next move.
There are several ways this process might stop. T might halt abnormally, either
because there is no move specified or because the move calls for it to move its tape
head off the tape. In the first case, the search operation pictured in Figure 9.21 also
halts abnormally (although the move to h, is not shown explicitly), because after the
last 5-tuple on tape 1 has been tried unsuccessfully, the second of the 1’s at the end
takes the machine back to the initial state, and there is no move from that state with 1
on tape 1. We can easily arrange for T, to reject in the second case as well. Finally,
T may accept. 7, detects this when it processes a 5-tuple on tape 1 whose third part
is a single 0. In this case, after T,, has changed tape 2 appropriately, it erases tape 1,
copies tape 2 onto tape 1, and accepts.

9.7| MODELS OF COMPUTATION AND


THE CHURCH-TURING THESIS
A Turing machine is a model of computation more general than either a finite au-
tomaton or a pushdown automaton, and in this chapter we have seen examples of
computations that are feasible on a TM but not on the simpler machines. A TM is
not the only possible way of extending a PDA, and we might examine some other
approaches briefly.
Our first example of a non-CFL (Example 8.1) was the language L =
{a"b"c"” | n > 1}. The pumping lemma for CFLs tells us that a finite automaton
with a single stack is not sufficient to recognize strings of this form. One stack is
sufficient to accept {a"b” | n > 1}; it is not hard to see that two stacks are suffi-
cient for L. Of course, if {a"b"c"d" | n > 1} turned out to require three stacks, and
{a"b"c"d"e" |n = 1} four, then this observation would not be useful. The interesting
thing is that two are enough to handle all these languages (Exercise 9.49).
In Example 8.2, we considered L = {ss |s € {a, b}*}. If we ignore the apparent
need for nondeterminism by changing the language to {scs | s € {a, b}*}, then we
might consider trying to accept this language using a finite automaton with a queue
instead of a stack. We could load the first half of the string on the queue, and the “first
in, first out” operation of the queue would allow us to compare the first and second
halves from left to right.
Although it is not at all obvious from these two simple examples, both approaches
lead to families of abstract machines with the same computing power as Turing ma-
chines. With appropriate conventions regarding input and output, both these models
can be considered as general models of computation and have been studied from this
point of view. They are investigated in more detail in Exercises 9.49-9.53.
CHAPTER 9 Turing Machines 353

Still, a Turing machine seems like a more natural approach to a general-purpose


computer, perhaps because of Turing’s attempt to incorporate into TM moves the
primitive steps carried out by ahuman computer. Evena few examples, such as the TM
accepting {ss |s ¢ {a, b}*} in Example 9.3 or the one computing the reverse function
in Example 9.4, are enough to suggest that TMs have the basic features required
to carry out algorithms of arbitrary complexity. The point is not that recognizing
strings of the form ss is a particularly complex calculation, but that even algorithms
of much greater logical complexity depend ultimately on the same sorts of routine
manipulations that appear in these two examples. Designing algorithms to solve
problems can of course be difficult; implementing an algorithm on a TM is primarily
a matter of organizing the data storage areas, and choosing bookkeeping mechanisms
for keeping track of the progress of the algorithm. A simple model of computation
such as an FA puts severe restrictions on the type of algorithm that can be executed. A
TM allows one to design an algorithm without reference to the machine and to have
confidence that the result can be implemented.
To say that the Turing machine is a general model of computation is simply
to say that any algorithmic procedure that can be carried out at all (by a human, a
team of humans, or a computer) can be carried out by a TM. This statement was
first formulated by Alonzo Church, a logician, in the 1930s (American Journal of
Mathematics 58:345—363, 1936), and it is usually referred to as Church’s thesis, or
the Church-Turing thesis. It is not a mathematically precise statement because we do
not have a precise definition of the term algorithmic procedure, and therefore it is not
something we can prove. Since the invention of the TM, however, enough evidence
has accumulated to cause the Church-Turing thesis to be generally accepted. Here is
an informal summary of some of the evidence.

1. The nature of the model makes it seem likely that all the steps that are crucial to
human computation can be carried out by a TM. Of course, there are
differences in the details of how they are carried out. A human normally works
with a two-dimensional sheet of paper, not a one-dimensional tape, and a
human is perhaps not restricted to transferring his or her attention to the
location immediately adjacent to the current one. However, although working
within the constraints imposed by a TM might make certain steps in a
computation awkward, it does not appear to limit the types of computation that
are possible. For example, if the two-dimensional aspect of the paper really
plays a significant role in a computation, the TM tape can be organized so as to
simulate two dimensions. This may mean that two locations contiguous on a
sheet of paper are not contiguous on the tape; the only consequence is that the
TM may require more moves to do what a human could do in one.
2. Various enhancements of the TM model have been suggested in order to make
the operation more like that of a human computer, or more convenient, or more
efficient. These include the enhancements mentioned in this chapter, such as
doubly infinite tapes, multiple tapes, and nondeterminism. In each case, it is
possible to show that the computing power of the machine is unchanged.
3. Other theoretical models of computation have been proposed. These include
machines such as those mentioned earlier in this section, machines that are
354 PART 4 Turing Machines and Their Languages

closer to modern computers in their operation, and various notational systems


(simple programming-type languages, grammars, and others) that can be used
to describe computations. Again, in every case, the model has been shown to be
equivalent to the Turing machine.
4. Since the introduction of the TM, no one has suggested any type of
computation that ought to be included in the category of “algorithmic
procedure” and cannot be implemented on a TM.
As we observed earlier, the Church-Turing thesis is not a statement for which a
precise proof is possible, because of the imprecision in the term “algorithmic pro-
cedure.” Once we adopt the thesis, however, we are effectively giving a precise
meaning to the term: An algorithm is a procedure that can be executed on a TM.
The advantage of having such a definition is that it provides a starting point for a
discussion of problems that can be solved algorithmically and problems (if any) that
cannot. This discussion begins in Chapter 10 and continues in Chapter 11.
Another way in which the Church-Turing thesis will be used in the rest of the
book is that when we want to describe a solution to a problem, we will often be
satisfied with a verbal description of the algorithm; translating it into a detailed TM
implementation may be tedious but is generally straightforward.

EXERCISES
9.1. Trace the TM in Figure 9.5 (the one accepting the language
{ss | s € {a, b}*}) on the string aaba. Show the configuration at each step.
9.2. Below is a transition table for a TM.

8@0 ¢@ © 879 ¢ 6 8G,0))


do 7541 @i,7A; R) Be hae BR) Nees wa Genae)
Gif is (Gis@, Rtas A. (@4,@,R) 1.96 Be) Gerd, BR)
PtP ogi bsR). | gan) au» (a, R)\| ge VAs agi, by)
Gin Ae Gr tA Eel Gay Dy NGanB,BR), | Gan i Gi G7, ab)
Gyvniaos \(g3, 4, RB). \ Gas oie (@yn@elaeely Gras Oe Garb A)
D-~ (qs, A,B) A g6,,0;R) A. (Gp, A,L)

a. What is the final configuration if the input is ab?


b. What is the final configuration if the input is baa?
c. Describe what the TM does for an arbitrary input string in {a, b}*.
9.3. Let T = (Q, 2,1, go, 5) be a TM, and let s and t be the sizes of the sets Q
and I’, respectively. How many distinct configurations of T could there
possibly be in which all tape squares past square n are blank and T’s tape
head is on or to the left of square n? (The tape squares are numbered
beginning with 0.)
9.4. The TM shown in Figure 9.2b (obtained from the FA in Figure 9.2a) accepts
a string as soon as it finds the substring aba. Draw another TM accepting the
CHAPTER 9 Turing Machines 355

same language that is more similar to the FA in that it accepts a string only
after it has read all the symbols of the string.
93; Figures 9.2 and 9.3 show two examples of converting an FA to a TM
accepting the same language. Describe precisely how this can be done for an
arbitrary FA.
9.6. Draw a transition diagram for a Turing machine accepting each of the
following languages.
{a'b! |i < j}
{a"b™c" |n > 0}
{x € {a, b, c}* |na(x) = no(x) =n-(x)}
The language of balanced strings of parentheses
The language of all nonpalindromes over {a, b}
Su {www
Ce
te
em
SN | w € {a, b}*}
he Describe the language (a subset of {1}*) accepted by the TM in Figure 9.22.
9.8. We do not define A-transitions fora TM. Why not? What features of a TM
make it unnecessary or inappropriate to talk about A-transitions?
9.9. Suppose 7; and 7 are TMs accepting languages L; and L> (both subsets of
&*), respectively. If we were following as closely as possible the method
used in the case of finite automata to accept the language L,;L2, we might
form the composite TM 7\7>. (See the construction of M, in the proof of
Theorem 4.4.) Explain why this approach, or any obvious modification of it,
will not work. .

1/1,L

Figure 9.22 |
356 PART 4 Turing Machines and Their Languages

9.10. Given TMs T, => (On X11, Lee 1) 61) and T» = (Qo, dX, I>, q2; 62), with
T', © Do, give a precise definition of the TM 7; T = (Q, 2, T, go, 6). Say
precisely what Q, X, I’, go, and 6 are.
9.11. Suppose T is aTM accepting a language L. Describe how you would modify
T to obtain another TM accepting L that never halts in the reject state h,.
9.12. Suppose T is a TM that accepts every input. We would like to construct a
TM Ry so that for every input string x, Rr halts in the accepting state with
exactly the same tape contents as when T halts on input x, but with the tape
head positioned at the rightmost nonblank symbol on the tape. (One reason
this is useful is that we might want to use T in a larger composite machine,
but to erase the tape after T has halted.)
a. Show that there is no fixed TM 7p so that Rr = TT for every T. (In
other words, there is no TM capable of executing the instruction “move
the tape head to the rightmost nonblank ‘tape symbol” in every possible
situation.)
b. Describe a general method for constructing Rr, given T.
9.13. Draw the Insert(o) TM, which changes the tape contents from yz to yoz.
Here y € (2 U {A})*, 0 € 2 U {A}, and z € &*. You may assume that
wefan}.
9.14. Does every TM compute a partial function? Explain.
9.15. In each case, draw a TM that computes the indicated function. In the first
five parts, the function is from NV to N. In each of these parts, assume that
the TM uses unary notation—that is, the natural number n is represented by
the string 1”.
a> ef(x) = 2-2
Dy Geax
cof) =x?
d. f(x) = x/2 (“/” means integer division.)
e. f(x) =the smallest integer greater than or equal to log,(x + 1) (i.e.,
f(0) = 9, fC) = 1, f(2) = fGB) = 2, f(4) =--- = Ff) =3, and so
on.)
f fs {a, b}* x {a,b} = 40, Lidetinedby fi, y) = 1 = y,
F(x, y) = 0 otherwise.
g. of ita, b}* x {ab)y > 410.1) detined byaf G39) = it x= 9;
f(x, y) = 0 otherwise. Here < means with respect to “lexicographic,”
or alphabetical, order. For example, a < aa, abab < abb, etc.
h. f is the same as in the previous part, except that this time < refers to
canonical order. That is, a shorter string precedes a longer one, and the
order of two strings of the same length is alphabetical.
i. f : {a, b}* — {a, b}* defined by f(x) = a™™b™© (i.e., f (x) has the
same symbols as x but with all the a’s at the beginning).
9.16. The TM shown in Figure 9.23 computes a function from {a, b}* to {a, b}*.
For any string x € {a, b}*, describe the string f(x).
CHAPTER 9 Turing Machines 357

Figure 9.23 |

be Suppose TMs 7; and T> compute the functions f, and f) fromNV to N,


_ respectively. Describe how to construct a TM to compute each of these
functions.
ak Of fo
b. the minimum of f; and fp
Coy)
9.18. Draw a TM that takes as input a string of 0’s and 1’s, interprets it as the
binary representation of a nonnegative integer, and leaves as output the
unary representation of that integer (i.e., a string of that many 1’s).
9.19. Draw a TM that does the reverse of the previous problem: accepts a string of
n 1’s as input and leaves as output the binary representation of n.
9.20. In Figure 9.24 is a TM accepting the language {scs |s € {a, b}*}. Modify it
so as to obtain a TM that computes the characteristic function of the same
language.
9.21. In Example 9.3, a TM is given that accepts the language {ss |s € {a, b}"}.
Draw a TM with tape alphabet {a, b} that accepts this language.
9.22. In Section 9.5 we mentioned a variation of TMs in which the transition
function 6 takes values in (Q U {hg, h,}) x (TU {A}) x {L,R}, so that the
tape head must move either to the left or to the right on each move. It is not
difficult to show that any ordinary TM can be simulated by one of these.
Explain how the move 5(p, a) = (q, b, S) could be simulated by such a
restricted TM.
358 PART 4 Turing Machines and Their Languages

a/a, R
b/b,R A/A, R

A/A, R

Figure 9.24 |

9.23. An ordinary TM can also be simulated by one in which 6 takes values in


(Q U {hg, hy}) x (T U {A} U {L,R}), so that writing a symbol and moving
the tape head are not both allowed on the same move. Explain how the move
5(p, a) = (q, b, R) of an ordinary TM could be simulated by one of these
restricted TMs.

Exercises 9.24—9.25 involve a Turing machine with a doubly infinite tape. The
tape squares on such a machine can be thought of as numbered left to right, as in an
ordinary TM, but now the numbers include all negative integers as well as nonnega-
tive. Aconfiguration can still be described by a pair (g, xay). There is no assumption
about which square the string x begins in; in other words, two configurations that
are identical except for the square in which x begins are considered the same. For
this reason, we may adopt the same convention about the string x as about y: when
we specify a configuration as (¢, xay), we may assume that x does not begin with a
blank.
9.24. Construct a TM with a doubly-infinite tape that does the following: If it
begins with the tape blank except for a single a somewhere on it, it halts in
the accepting state with the head scanning the square with the a.
9.25. Let T =(Q, 2,1, go, 5) be aTM. Show that there is aTM
T; = (Q1, ©, 11, q1, 61) with a doubly-infinite tape, with T C 1), satisfying
these two conditions:
a. For any x € &*, 7; accepts input x if and only if T does.
b. For any x € &*, if(qo, Ax) F> (ha, yaz), then (q1, Ax) FH}, (Aa, yaz).
CHAPTER 9 Turing Machines 359

9.26. In defining a multitape TM, another option is to specify a single tape head
that scans the same position on all tapes simultaneously. Show that a
machine of this type is equivalent to the multitape TM defined in Section 9.5.
9.27. Draw the portion of the transition diagram for the one-tape TM M>
embodying the six steps shown in the proof of Theorem 9.1 corresponding to
the move 5; (p, a), a2) = (q, b1, bo, RL) of My.
9.28. Draw a transition diagram for a three-tape TM that works as follows:
starting in the configuration (qo, Ax, Ay, A), where x and y are strings of
0’s and 1’s of the same length, it halts in the configuration (h,, Ax, Ay iNZ),
where z is the string obtained by interpreting x and y as binary
representations and adding them.
9.29. What is the effect of the nondeterministic TM with input alphabet {0, 1}
whose transition table is shown below, assuming it starts with a blank tape?
(Assuming that it halts, where is the tape head when it halts, and what strings
might be on the tape?)

¢ 8. 8,0) |
qo A {(q1, A, R)}
q A {(q1,0, R), (qi, 1, R), (q2, A, L)}
qQ 0 {(q2,0,L)}
Q 1 {(@2, 1, L)}
qQ A {(ha, A, S)}
»

9.30... Call the NTM in the previous exercise G. Let Copy be the TM in
Example 9.4, which transforms Ax to Ax Ax for an arbitrary string
x € {0, 1}*. Finally, let Equal be a TM that works as follows: starting with
the tape Ax Ay, it accepts if and only if x = y. Consider the NTM shown in
Figure 9.25. (It is nondeterministic because G is.) What language does it
accept?

0/0, R 0/0,R
1/1,R 1/1,R

A
Bales s G ———s Copy BSR Delete

0/0, L A/A, L A
1/1, 1D Equal

0/0, L
1/1, L

Figure 9.25 |
360 PART 4 Turing Machines and Their Languages

931: Using the idea in the previous exercise, draw a transition diagram for an
NTM that accepts the language {1” |n = k? for some k > 0}.
9.32. Using the same general technique, draw a transition diagram for an NTM
that accepts the language {1” |n is a composite integer > 4}.
9.33. Suppose L is accepted by a TM T. Describe how you could construct a
nondeterministic TM to accept each of the following languages.
a. The set of all prefixes of elements of L
b. The set of all suffixes of elements of L
c. The set of all substrings of elements of L
9.34. Figure 9.18b shows the portion of the Execute TM corresponding to the
portion of M, shown in Figure 9.18a. Consider the portion of M, shown in
Figure 9.26. Assume as before that the maximum number of choices at any
point in M, is 2, and that the moves shown are the only ones from state r.
Draw the corresponding portion of Execute.
935. Assuming the same encoding method discussed in Section 9.7, and assuming
that s(0) = 00 and s(1) = 000, draw the TM that is encoded by the string

Figure 9.26 | 0001 000101000010100011 000010010000100100011 00001000100001000100011 |


0000101000001010011 000001001000000100100011 0000010001000000010001006
0000001010000000010010011 000000010100000000100010011
0000000010010000000010010011 000000001000100000000100010011
000000001010101011

9.36. Draw the portion of the universal TM T, that is responsible for changing the
tape symbol and moving the tape head after the search operation has
identified the correct 5-tuple on tape 1. For example, the configuration

A.00010100001010001100001001000001000100110001...
A010010010001A...
AQO00A...
would be transformed to

A.00010100001010001100001001000001000100110001...
A0100100010001 A...
A00000A...
9.37. Table 7.2 describes a PDA accepting the language pal. Draw a TM that
accepts this language by simulating the PDA. You can make the TM
nondeterministic, and you can use a second tape to represent the stack.
9.38. Suppose we define the canonical order of strings in {0, 1}* to be the order in
which a string precedes any longer string and the order of two equal-length
strings is numerical. For example, the strings 1, 01, 10, 000, 011, 100 are
listed here in canonical order. Describe informally how to construct a TM T
that enumerates the set of palindromes over {0, 1} in canonical order. In
other words, T loops forever, and for every positive integer n, there is some
CHAPTER 9 Turing Machines 361

point at which the initial portion of T’s tape contains the string

AADALAODAILAOODA - - - Ax,
where x, is the nth palindrome in canonical order, and this portion of the
tape is never subsequently changed.

MORE CHALLENGING PROBLEMS


9.39. Suppose you are watching a TM processing an input string, and that at each
step you can see the configuration of the TM.
a. Suppose that for some n, the tape head does not move past square n
while you are watching. If the pattern continues, will you be able to
conclude at some point that the TM is in an infinite loop? If so, what is
the longest you might need to watch in order to draw this conclusion?
b. Suppose that in each move you observe, the tape head moves right. If
the pattern continues, will you be able to conclude at some point that the
TM is in an infinite loop? If so, what is the longest you might need to
watch in order to draw this conclusion?
9.40. In each of the following cases, show that the language accepted by the TM T
is regular.
a. There is an integer n so that no matter what the input string is, T never
moves its tape head to the right of square n. 7
_b. For any n > 0 and any input of length n, T begins by making n + 1
moves in which the tape head is moved right each time, and thereafter T
does not move the tape head to the left of square n + 1.
9.41. Suppose T is a TM. For each integer i > 0, denote by n;(T) the number of
the rightmost square to which T has moved its tape head within the first i
moves. (For example, if T moves its tape head right in the first five moves
and left in the next three, then n;(T) =i fori < 5 andn;(T) =5 for
6 <i < 10.) Suppose there is an integer k so that no matter what the input
string is, n;(T) > i — k for every i > 0. Does it follow that L(T) is regular?
Give reasons for your answer.
9.42. Let T = (Q, X,T, qo, 5) be aTM with a doubly-infinite tape (see the
comments preceding Exercise 9.24). Show that there is an ordinary TM
T, = (Q1, 4,11, q1, 61), with P Cy, satisfying these two conditions:
aed ie Ed);
b. For any x € &*, if (go, Ax) F 7 (ha, yaz), then (qi, Ax) Fy, (ha, yaz).
The proof requires constructing an ordinary TM that can simulate the action
of a TM having a doubly-infinite tape. There are several ways you might do
this. One is to allow an ordinary tape to “look like” a folded doubly-infinite
tape, using a technique similar to that in the proof of Theorem 9.1. Another
would use even-numbered squares to represent squares indexed by
362 PART 4 Turing Machines and Their Languages

nonnegative numbers (i.e., the right half of the tape) and odd-numbered
squares to represent the remaining squares.
9.43. In Figure 9.27 is a transition diagram for a TM M with a doubly-infinite
tape. First, trace the moves it makes on the input string abb. Then, for the
ordinary TM M, that you constructed in the previous exercise to simulate M,
trace the moves that M, makes in simulating M on the same input.
9.44. Suppose M, is a two-tape TM, and M, is the ordinary TM constructed in
Theorem 9.1 to simulate M,. If M, requires n moves to process an input
string x, give an upper bound on the number of moves M2 requires in order
to simulate the processing of x. Note that the number of moves M, has made
places a limit on the position of its tape head. Try to make your upper bound
as sharp as possible.
9.45. Show that if there is aTM 7 computing the function f : NV— WN, then there
is another one, 7’, whose tape alphabet is {1}. Suggestion: suppose T has
tape alphabet [ = {a1, a2,...,a,}. Encode A and each of the a;’s by a
string of 1’s and A’s of length n + 1 (for example, encode A byn + 1
blanks, and a; by 1'A”*!~‘). Have T’ simulate T, but using blocks of n + 1
tape squares instead of single squares.

a/a, L
b/b, L
A/A, L
B/B,L

A/A,R
B/B, R

Figure 9.27 |
CHAPTER 9 Turing Machines 363

9.46. Describe how you could construct a TM 7p that would accept input strings of
0’s and 1’s and would determine whether the input was a string of the form
e(T) for some TM T. (‘“Determine” means compute the characteristic
function of the set of such encodings.)
9.47. Modify the construction in the proof of Theorem 9.2 so that if the NTM halts
on every possible sequence of moves, the TM constructed to simulate it halts
on every input.
9.48. Beginning with a nondeterministic Turing machine 7;, the proof of Theorem
9.2 shows how to construct an ordinary TM 7) that accepts the same
language. Suppose |x| = n, T, never has more than two choices of moves,
and there is a sequence of n, moves by which 7; accepts x. Estimate as
precisely as possible the number of moves that might be required for T> to
accept x.
9.49. Formulate a precise definition of a two-stack automaton, which is like a
PDA, except that it is deterministic and a move takes into account the
symbols on top of both stacks and can replace either or both of them.
Describe informally how you might construct a machine of this type
accepting {a'b'c' |i > 0}. Do it in a way that could be generalized to
{a'bic'd' |i > 0}, {a'b'ci'd'e' | i > O}, etc.
9.50. Describe how a Turing machine can simulate a two-stack automaton;
specifically, show that any language that can be accepted by a two-stack
machine can be accepted by a TM.
9.51. A Post machine is similar to a PDA, but with the following differences. It is
deterministic; it has an auxiliary queue instead of a stack, and the input is
’ assumed to have been previously loaded onto the queue. For example, if the
input string is abb, then the symbol currently at the front of the queue is a.
Items can be added only to the rear of the queue, and deleted only from the
front. Assume that there is a marker Zo initially on the queue following the
input string (so that in the case of null input Zo is at the front). The machine
can be defined as a 7-tuple M = (Q, 4, TI, go, Zo, A, 5), like a PDA. A
single move depends on the state and the symbol currently at the front of the
queue; and the move has three components: the resulting state, an indication
of whether or not to remove the current symbol from the front of the queue,
and what to add to the rear of the queue (a string, possibly null, of symbols
from the queue alphabet).
Construct a Post machine to accept the language {a”b"c” |n > O}.
9 SZ. We can specify a configuration of a Post machine (see the previous exercise)
by specifying the state and the contents of the queue. If the original marker
Zo is currently in the queue, so that the string in the queue is of the form
a ZoB, then the queue can be thought of as representing the tape of a Turing
machine, as follows. The marker Zo is thought of, not as an actual tape
symbol, but as marking the right end of the string on the tape; the string B is
at the beginning of the tape, followed by the string a; and the tape head is
currently centered on the first symbol of «—or, if a = A, on the first blank
364 PART 4 Turing Machines and Their Languages

square following the string B. In this way, the initial queue, which contains
the string w Zo, represents the initial tape of the Turing machine with input
string w, except that the blank in square 0 is missing and the tape head scans
the first symbol of the input.
Using this representation, it is not difficult to see how most of the moves
of a Turing machine can be simulated by the Post machine. Here is an
illustration. Suppose that the queue contains the string abbZoab, which we
take to represent the tape ababb. To simulate the Turing machine move that
replaces the a by c and moves to the right, we can do the following:
a. remove a from the front and add c to the rear, producing bbZoabc
b. adda marker, say $, to the rear, producing bbZoabc$
c. begin a loop that simply removes items from the front and adds them to
the rear, continuing until the marker $ appears at the front. At this point,
the queue contains $bbZoabc.
d. remove the marker, so that the final queue represents the tape abcbb
The Turing machine move that is hardest to simulate is a move to the left.
Devise a way to do it. Then give an informal proof, based on the simulation
outlined in this discussion, that any language that can be accepted by a
Turing machine can be accepted by a Post machine.
9.53. Show how a two-stack automaton can simulate a Post machine, using the first
stack to represent the queue and using the second stack to help carry out the
various Post machine operations. The first step in the simulation is to load
the input string onto stack 1, using stack 2 first in order to get the symbols in
the right order. Give an informal argument that any language that can be
accepted by a Post machine can be accepted by a two-stack automaton. (The
conclusion from this exercise and Exercises 50 and 52 is that the three types
of machines—Turing machines, Post machines, and two-stack
automata—are equivalent with regard to the languages they can accept.)
Recursively Enumerable
Languages

10.1 | RECURSIVELY ENUMERABLE


AND RECURSIVE
In this chapter we study in more detail the languages that can be accepted by TMs,
and consider other ways to characterize them. We begin by recalling the distinc-
tion mentioned in Section 9.2 between accepting a language L and computing the
characteristic function of L.
»

co biel Accepting a Language Tare Recognizing a Language

Lc be a language. A Teens machine 7 with input alphabet


id to accept L if L(T) = L. T recognizes, or decides, L if T computes
characteristic function x, : &* — {0, 1}. In other words, T recognizes _
state ha for every ne x in &*, producing oops L ik
x ,
é andoutput 0 otherwise. —
_ Alanguage L is recursively enumerable if there is a T that accepts L
co recursive ifthere is aTM that recognizes L. (Sometimes these languages
are called ee hs eeand Tsring- decidable, respectively.)

Theorem 10.1
: Every recursive lee is
recursively enumerable.
Proof -— - -
AS we observed in Secale’y,‘ if tis a Turing ae recognizingae
|» then we can get a TM. oe Lae ate. Trso
o that phen it leaves
~ output 0, itenters the reject state. ;

365
366 PART 4 Turing Machines and Their Languages

We have already identified the potential problem with the converse of Theorem
10.1. If T is a Turing machine accepting L, there may be strings not in L for which
T loops forever and therefore never produces an answer. Later we will see that this
possibility cannot always be eliminated: There are recursively enumerable languages
that are not recursive. For now, we record the partial result that is naturally suggested.
It will be useful to generalize it slightly, to nondeterministic machines.

_ Theorem 10.2 — —
S uence
oe Lis accepted e a nondeterministic T™ {T, andtevey possible

2 ane looping - nt
_ two respects. “Pirst, if 1” ans a sequence of moves of T that accepts, it
creates: the output A
Al on tape 1 before iteae Second, ifno wihosedd of

of move: it‘should try next by using a aesof digits on tape 2,“the a


ny apts the choice 7, should make at the ithade irthere is i — .

_ portion of tape 1 the sequence of digits pines on tape 2.of this way, :
tape | keeps a history of all sequences that are unsuccessful. (In the proof
of Theorem 42, we assumed forthe sake of ie that es ase were —

each n, each time T’ findsthatthe faststringneed


of n (the onesaveddig 7s
are all k — 1) represents an unsuccessful sequence, it searches the strings
on tape .eesee whether all possible Ses of that ene ap

stn of digits of oe re

Both the union and intersection operations preserve the property of recursive
enumerability. The construction we use to prove this involves a TM capable of
simulating two other TMs simultaneously.
CHAPTER 10 Recursively Enumerable Languages 367

‘Theorem 10.3 ae
ff Ly and Ly are recursively enumerable languages over ¥, thenLyU is
and x NL» are also oy ee ale

See Tr, = (Q), 4,1, 91, 6;) and = (Os, 2 r. q2, asare, TMs |
accepting L; and L>, respectively. We wish to construct TMs accepting
: Ty; U Ly and L, M Lo. In both cases, it is useful to use a two-tape1
machine.
_We can model the construction after that of the FA in
i the proof of Theorem 2
3.4 and let the two tapes represent the tapes of T; and 7, respectively. -
We describe the solution for L; U L2. The two--tape machine T = _
(Q, 2,1, go, 6) begins by placing a copy of the input string, which isalready —
on tape 1, onto tape 2. It inserts the marker # at the beginning of both tapes,
ain order to detect a crash resulting from 7; or T trying to move its tape head
off the tape. From this point on, the simultaneous simulation of Tyon ta
1 ane T, on tape 21s eos by allowing oe. Possible move

— 8((P1, Pr), (21,4)= (41, 42), (bi, bo),(Di, D2)


ae for both values of7, 3; (pi, a== aC bj, D;). Thepossible o
oute
of the simulation are :

1 Neither 7, nor T> ever stops, i


in which case T never stops.
. At least one of lostwo oe in which case T accepts. |

rejenting thes T halts in the same


s way.

The construction causes T to , accept if and only if at ey one


machines T; and T) ane and we conclude that T oe the | An:
J.Ma —
We can handle a 1 L2 the same way, except that this time d can reject
ifeither 7; or 7) rejects, and it can accept only when T, and Ty have bo
o accepted.

The set of recursive languages is also closed under unions and intersections; we
leave the details to the exercises. For recursive languages, we can add the complement
operation to the list as well.

s Theorem 10.4
— If L isrecursive, so is
is Li.
Proof :
| M1is a tucdng ee recognizing L, we can make it recognize Le
a interchanging the two outputs.
368 PART 4 Turing Machines and Their Languages

This simple proof cannot be adapted in any obvious way for recursively enumer-
able languages. It does not immediately follow that the corresponding statement for
recursively enumerable languages is false (which it turns out to be), the next result
suggests, however, that it is less likely to be true.

10.2] ENUMERATING A LANGUAGE


To enumerate a set means to list the elements one at a time, and to say that a set is
enumerable should perhaps mean that there is an algorithm for enumerating it. In fact
this idea does lead to an equivalent characterization of recursively enumerable lan-
guages, and with an appropriate modification it can also be used to describe recursive
languages.
We begin by saying precisely how a Turing machine enumerates a language L (or,
informally, “lists the elements of L”’). It is convenient to allow a multitape machine
with one tape that functions solely as the output tape.
CHAPTER 10 Recursively Enumerable Languages 369

If L is a finite language, the Turing machine in the definition can either halt
normally when all the elements of L appear on tape 1, or continue to make moves
without printing any other strings on tape 1. If L is infinite, T continues to move
forever.
Now we wish to show that a language is recursively enumerable (can be accepted
by a TM) if and only if it can be enumerated by some TM. The idea of the proof is
simple, although one direction turns out to be a little more subtle than it might first
appear.
On the one hand, if we have a machine T enumerating L, then given an input
string x, we can test x for membership in L by just waiting to see whether x ever
appears on T’s output tape. A TM 7; that carries out this strategy is guaranteed to
accept L, because the strings for which the test is successful are precisely those in L;
for all the others, T; loops forever, unless L is finite.
On the other hand, if T is a TM accepting L, then we consider all the strings
in &*, in some order such as the canonical order described in Section 9.5. In this
ordering, shorter strings precede longer ones, and strings of the same length are
ordered alphabetically (assuming some initial, arbitrary ordering on the symbols in
x). For each string x, we try to decide whether to include x in our enumeration by
using T to determine whether x € L. Here is the place where the argument needs -
to be a little more sophisticated: If it should happen that T loops forever on input x,
and if we are not careful, we will never get around to considering any string beyond
x. The construction in our official proof will be able to handle this problem.

"Theorem 10.6
a language L Sg = isrecursively enumerable°C

t
d justeee the#. If thetwo. wipes ayieh,1,accepts. qois -
2, which by
7 will accept precisely the strings that are generated on tape
: eave aretheelements a L.
370 PART 4 Turing Machines and Their Languages

-— Thuses for ae the action of ae


T ae stringee hh le
to avoid theae ed above, 7 aes longer and lon

fe Ne

POSTE Taba i: “TA: Aral

ead by!
7 willeventuall be listec onttape1,
sand

In the second half of the proof of Theorem 10.6, you should notice that although
the strings in &* are generated in canonical order on tape 2, the strings in L will
not in general be listed in that order on tape 1. (For example, if T accepts a after
five moves and b after two, b would appear before a on tape 1.) With the stronger
assumption that T is actually recursive, however, the simple construction outlined
before the statement of the theorem can be carried out with no complications. On the
other hand, it is also easy to show that if there is a TM enumerating L in canonical
order, then L must be recursive. We state the result officially below, and leave the
proof to the exercises.
CHAPTER 10 Recursively Enumerable Languages 371

We can summarize Theorems 10.6 and 10.7 informally by saying that a lan-
guage is recursively enumerable if there is an algorithm for listing its elements, and
a language is recursive if there is an algorithm for listing its elements in canoni-
cal order. Some characterizations of recursively enumerable languages in terms of
Turing-computable functions will also be discussed in the exercises.

10.3 |MORE GENERAL GRAMMARS


We began our discussions of both regular languages and context-free languages by
describing ways of generating strings: context-free grammars for CFLs, and regular
expressions or regular grammars in the case of regular languages. We went on to find
corresponding models of computation, or abstract machines, for recognizing strings
in these languages.
We have discussed recursively enumerable languages so far in terms of machines
capable of accepting them (Turing machines), because the TM has the distinction of
being a general model of computation. We will see in this seetion, however, that a
grammar of a type more general than a CFG is exactly what we need to generate the
elements of a recursively enumerable language. We will also describe a slightly less
general type of grammar, corresponding to languages that fall between context-free
and recursively enumerable. The result will be a hierarchy of language types, ranging
from very special (regular) to very general (recursively enumerable), each with its
own type of grammar as well as its specific model of computation.
The “context-freeness” of a CFG lies in the fact that the left side of a production
is a single variable and the production can be applied whenever that variable appears
in the string, independent of the context. It is the context-freeness that allows us
to prove the pumping lemma for CFLs, since any sufficiently long derivation must
contain a “self-embedded” variable, a variable A for which S >* vAz >* vwAyz.
We can relax the rules of CFGs by allowing the left side of a production to be
more than a single variable. For example, we might use the production

aAB > ayBp

if we wanted to allow the variable A to be replaced by the string y, but only when
A is immediately preceded in the string by a and immediately followed by 8. Al-
though productions of this type are general enough for our purposes, it is often more
convenient to write them in the form

aABr yy
372 PART 4 Turing Machines and Their Languages

or, even more simply,


a— B

In other words, a production is now thought of as simply a substitution of one string


for another. The idea of strings replacing variables is retained in the sense that the
left side of a production must contain at least one variable.

Definition 3. Unrestricted Gram

1 phrase-structure.
are disjoint sets of 1

here a,B € (VU anda contains at leas

Much of the notation developed for context-free grammars can be carried over
intact. In particular,

C5 6
means that 6 can be derived from a in zero or more steps, and
L(G) =1x € SD" |S ex}
To illustrate the generality of these grammars, we consider the first two examples
of non-context-free languages in Chapter 8.

| EXAMPLE 10.1 An Unrestricted Grammar Generating {a’b’c' | i > 1}


Let

L={a'bic |i=1}
Our grammar will involve variables A, B, C, as well as two others to be explained shortly.
There will be three types of productions: those that produce strings with equal numbers of
A’s, B’s, and C’s, though not always in the order we want; those that allow the appropriate
changes in the order of A’s, B’s, and C’s; and finally, those that change all the variables to the
corresponding terminals, if the variables are in the right order.
Productions of the first two types are easy to find. The context-free productions

S — ABCS | ABC
generate all strings of the form (A BC)", and the productions

BA— AB CA — AC CB
— BC

will allow the variables to realign themselves properly. For the third type, we cannot simply
add productions like A — a, because they might be used too soon, before the variables line
CHAPTER 10 Recursively Enumerable Languages 373

themselves up correctly. Instead we say that C can be replaced by c, but only if it is preceded
by c or b:

cC > cc bC > be

B can be replaced by b if it is preceded by b or a:

bB — bb aB —> ab

and A can be replaced by a if it is preceded by a:

aA — aa

Once we have an a at the beginning to start things off, these productions allow the string to
transform itself into lowercase, from left to right. But where does the first a come from? It is
still not correct to have A — a, even with our restrictions on b’s and c’s. This would allow
ABC to transform itself into abc wherever it occurs, and would therefore permit ABC ABC
to become abcabc. One solution is to use an extra variable F to stand for the left end of the
string. Then we can say that A can be replaced by a only when it is preceded by a or F:

aA — aa FA->a

We introduce the variable F at the left end, using the production

S— FS 1

and modify the earlier productions so that they involve S, instead of S. The final grammar is
the one with productions

S—> FS, S, > ABCS, S; > ABC


BA — AB CA—> AC CB
—> BC
FA—a aA — aa aB
— ab

bB — bb bC — be cC
> cc

The string aabbcc, for example, can be derived as follows. At each point, the underlined string
is the one that is replaced in the subsequent step.

S= FS; > FABCS, > FABCABC => FABACBC


=> FAABCBC => FAABBCC = aABBCC => aaBBCC
= aabBCC = aabbCC => aabbcC = aabbcc

It is easy to see that any string in L can be derived from this grammar. In the other
direction, any string of terminal symbols derived from S has equal numbers of a’s, b’s, and
c’s; the only question is whether illegal combinations such as ba or ca can occur. Notice
first that if S >* a, then a cannot have a terminal symbol appearing after a variable, and so
a € D*V*. Furthermore, any subsequent production leaves the string of terminals intact and
either rearranges the variables or replaces one more by a terminal. Suppose u € L(G) and u
has an illegal combination of terminals, say ba. Then u = vbaw, and there is a derivation of u
that starts S >* vbB for some B € V*. This is impossible, however, because no matter what
B is, it is then impossible to produce a as the next terminal.
neve EE
374 PART 4 Turing Machines and Their Languages

In the derivation in this example, the movement of A’s to the left and C’s to
the right and the “propagation” of terminal symbols to the right in the last phase
might suggest the motion of a Turing machine’s tape head and the moves the machine
makes as it transforms its tape. This similarity is not a coincidence. The proof that
these grammars can generate arbitrary recursively enumerable languages will use the
ability of a grammar to mimic a TM and to carry out the same sorts of “computations.”
In any case, the idea of symbols migrating through the string is a useful technique
and can be used again in our next example.

A Grammar Generating {ss | s € {a, b}*}


Wet

—Hssilisac (aap

A Turing machine might generate strings in L nondeterministically by producing the first


half arbitrarily and then making a copy immediately following. Our grammar will follow
this approach, except that each symbol in the first half will be copied immediately after it is
generated. Suppose we use a marker M to denote the middle of the string and that at some point
we have the string sMs. To produce a longer string of the same type, we may insert a symbol at
the beginning, then move just past the M to insert another copy of the same symbol. This will
be accomplished by inserting the two symbols at the beginning, then letting the second migrate
to the right until it passes the M. As in the first example we use a variable F to designate the
front of the string, and the first production in any derivation will be

S —> FM

Each time a new symbol o is added, the migrating duplicate symbol will be the variable that
is the uppercase version of 0. The productions

F—> Fad i FOB

produce the symbols, and

Aa—>aA Ab—>bA Ba— aB Bb


> bB

allow the variables to migrate past the terminals in the first half. Eventually the migrating
variable hits M, at which point it deposits the corresponding terminal on the other side and
disappears, using the productions

AM —> Ma BM > Mb

To complete a derivation we need the productions

J > IN MA

The string abbabb has the following derivation. As before, the underlined portion of each
string is the left side of the production used next.

S=> EM > FbBM > EbMb = FbBbMb


=> FbbBMb > FbbMbb > FaAbbMbb => FabAbMbb
= FabbAMbb = EabbMabb = abbMabb = abbabb
CHAPTER 10 Recursively Enumerable Languages 375

It is reasonably clear that any string in L can be generated by our grammar. We argue informally
that no other strings are generated, as follows. First, every string in L(G) has even length,
because the only productions that change the ultimate length of the string increase it by 2.
Second, when M is finally eliminated in the derivation, all the terminal symbols in the final
string are present, and half come before M. Those preceding M are the terminals in the
productions F — FaA and F — FbB, because the only other productions that create
terminals create them to the right of M. The farther to the left a terminal in the first half is, the
more recently it appeared in the derivation, because the relative order of terminals in the first
half never changes. Of any two variables created by these same two productions, however, the
one appearing earliest reaches M first because the two can never be transposed. Therefore, of
two terminals in the second half, the one to the left came from the variable appearing more
recently, and therefore the second half of the final string matches the first.

We are now ready to show that the languages generated by unrestricted grammars
are precisely those accepted by Turing machines. In one direction we can simply
construct a TM to simulate derivations in a grammar. In the other direction, we
will take advantage of some of the features of unrestricted grammars that we have
already observed, in order to construct grammars that can simulate Turing machine
computations.

.
Theorem 10.8
ay any unrestricted grammar G = (V, &, S, P), there is a Turing:
machine
= (Q, %, 1, go, 8) wih Ea)- at)
Proof |
_ The TM we construct to accept L(G) will be the nondeterministic composite
: machine
T = MovePastInput — Simulate — Equal

where the first component moves the tape head to the blank square hic: :
the input string, the second simulates a derivation in G starting in this location _
and leaves the resulting string on the tape, and the third compares this result
to the original input, accepting if and only if the two strings agree. If the
input‘string x is in L(G), the nondeterministic simulation can choose the
/ sequence of moves that simulates the derivation of x, and the result will
_ be that T accepts. The only way Simulate can leave a string of terminal
symbols on the tape is as a result of carrying out the steps of a derivation; if
a ¢ L(G), this component will either generate a string different fromx or _
_._fail to complete a derivation at all, and 7 will fail to accept.
: The Simulate TM simulates a derivation in much the same way that a
nondeterministic top-down PDA simulates a derivation in a CFG (see Sec- __
tion 7.4). In the case of the PDA, however, terminal symbols at the left of
the current string are removed from the stack, and the production used in
376 PART 4 Turing Machines and Their Languages

each stepinvolves the leftmost variable Inthecaseof» a .

can involve a sbsngappearing ayerswithin the current string. The


Vv U x and possibly

where it sea a a the start eee Ss.B |


enters aloop, which it may es after any number of iterations. :‘At the .

occurrence “ofbe stainea if one is


ssn “plein ik
- B:and
ea ik ;
. methead to the Desinnte of
9 the sya ee ne ‘When Sunatate chooses

oe oy of Sim -

| EXAMPLE 10.3 | The Simulate TM for a Simple Grammar


Consider the unrestricted grammar with productions

S—>aBs|A
aB— Ba

Ba > aB
Bb

which generates the language of strings in {a, b}* with equal numbers of a’s and b’s. Figure 10.1
shows the Simulate TM discussed in the proof of Theorem 10.7. Note that in this example, the
only productions in which the left and right sides are of unequal length are the S-productions,
and S appears only at the right end of the string. In a more general example, applying a
production like S — aBS could be accomplished by using an Insert TM (Exercise 9.13)
twice, and S — A would require a deletion.
You should trace the moves of Simulate as it simulates the derivation of a string in the
language, say abba.
CHAPTER 10 Recursively Enumerable Languages 377

a/a,L
b/b, L
B/B,L

a/a, L
b/b, L

Figure 10.1 |
The Simulate TM for Example 10.3.

=~

b
y

ee
378 PART 4 Turing Machines and Their Languages
CHAPTER 10 Recursively Enumerable Languages 379

8p.4
a) = oeb,
oe

Se ‘cases, we may ‘ignore Tanne: oe moves thathaltin the


state;the
oun hyis not serenein oa OT

ndingderivationproducesastringwith theeit A
Oterase fromwet
pee oe

“@eZUla), nm eP uta)
(oj €ZU{A}, o €TU{A})
ence a €TU{A})
eee

. string,and thosein he last two do the necessary ae


It is not hard to see that this grammar generates precisely the strings
pted by M, although we do not attempt a rigorous proof. Inthe ere
follows, a sample derivation is included.

Obtaining a Grammar From a TM | EXAMPLE 10.4


10.4
This example refers to Example 9.2 and the TM pictured in Figure 9.4, which accepts the lan-
guage of palindromes over {a, b}. Although there are 251 productions in the grammar, many
are unnecessary since they involve combinations (0102) that never occur. Rather than listing
them, we show a derivation of the string aba in this grammar. The corresponding sequence of
Turing machine moves simulated by the derivation is shown to the right. Since the TM moves
its head to the blank square to the right of the input string and no farther, the derivation begins
by producing a string with one copy of (AA) on the right. At each step in the derivation, the
380 PART 4 Turing Machines and Their Languages

underlined portion shows the left side of the production that will be used in the next step.

S => S(AA)
= I(AA)
= T(aa)(AA)
= T(bb)(aa)(AA)

= T(aa)(bb)(aa)(AA)
= qo(AA)(aa)(bb)(aa)(AA) (qo, Aaba)
= (AA)qi
(aa) (bb) (aa)(AA) F (gq, Aaba)
= (AA)(@A)q2(bb)(aa)(AA) F (q2, AAba)
= (AA)(aA)(bb)q2(aa)(AA) F (q2, AAba)
= (AA)(@A)(bb)(aa)qn(AA) F (q2, AAbaA)
= (AA)(aA)(bb)g3(aa)(AA) F (qs, AAba)
= (AA)(@A)g4(bb)
(aA) (AA) F (q4, AAD)
= (AA)qa(aA)(bb)
(aA) (AA) F (q4, AAD)
= (AA)(aA)qi
(bb) (aA)(AA) F (qi, AA)
= (AA)(aA)(bA)gs
(aA) (AA) F (gs, AAAA)
= (AA)(aA)go(bA)GA)(AA) F (qo, AAA)
= (AA)(@A)(bA)ha(aA)(AA) F (hg, AAAA)
= (AA)(GA)(bA)ha(aA)ha(AA)
=> (AA)(GA)ha(bA)ha(aA)ha(AA)
=> (AA)ha(aA)ha(bA)hg(aA)ha(AA)
= ha(AA)haG@A)ha(bA)ha(aA)ha(AA)
=> ha(aA)ha(bA)ha(aA)ha(AA)
=> ahg(bA)ha(aA)ha(AA)
= abh,(aA)h,(AA)
= abah,(AA)
= aba

10.4 |CONTEXT-SENSITIVE LANGUAGES


AND THE CHOMSKY HIERARCHY
In this section we look briefly at context-sensitive grammars, which are more general
than CFGs and less so than unrestricted grammars. The corresponding models of
computation, called linear-bounded automata, lie between pushdown automata and
Turing machines.
CHAPTER 10 Recursively Enumerable Languages 381

itive Grammars

‘unrestricted grammar in which


with |B] = la| a
L) isa language that can be generated by

A slightly different characterization of these languages makes it easier to under-


stand the phrase context-sensitive. A language is context-sensitive if and only if it
can be generated by a grammar in which every production has the form
aAB > axB
where a, 6, and X are strings of variables and/or terminals, with X not null, and A
is a variable (see Exercise 10.42). Such a production may allow A to be replaced by
X, depending on the context.

ACSG tor fatore™ | ne 1} | EXAMPLE 10.5 |


In Example 10.1 we presented a grammar for the language
= KGeD eGo it
that was not context-sensitive. By modifying it slightly, however, we can see that L is in fact
a CSL. Instead of using a separate variable F to indicate the front of the string, we can simply
distinguish between the first A in the string and the remaining A’s. You can check that the
context-sensitive grammar with the productions below generates L.
S —> ABCS, |ABC

S; > ABCS, | ABC

BA— AB CA— AC CB
— BC

A>a aA —> aa aB
— ab
bB — bb bC — bec ce — Ce

We obtained the class of context-sensitive grammars by imposing a restriction


on the productions of an unrestricted grammar, and it seems natural to look for a
corresponding restriction to place on a Turing machine. We can anticipate the sort of
extra condition that might be appropriate by looking carefully at the Turing machine
constructed in Theorem 10.7 to accept the language generated by a given unrestricted
grammar G. The TM simulates a derivation in G, using the space on the tape to the
right of the input string, and the tape head never needs to move farther right than
one square past the right end of the current string in the derivation. If G is actually
context-sensitive, the string in a derivation is never longer than the string of terminals
being derived. Therefore, for any input string of length n, the tape head never needs
382 PART 4 Turing Machines and Their Languages

to move past square 2n (approximately) during the computation. In the definition


below, the extra restriction that is adopted appears to be even a little stronger.

Definition 10.5 Linear-Bounded Automata

A linear-bounded automaton (LBA) is a 5-tuple M = (Q, 2, T,qo, 5) that


_ is the same as a nondeterministic Turing machine except in the following
respect. There are two extra tape symbols (and ), assumed not to be elements
of 1. M begins iin the configuration (qo, (x)), with its tape head scanning
the symbol ( on square 0 and the symbol )in the first square to the right of
the input string x; in moving subsequently, M is not permitted to replace the
symbols ( or ), or to move its head left from the square with ( or right from
_the square with ).

Theorem 10.10 :
WL 2 isa context-sensitive ener e,thereiisa linear-bounded automa,
ton accepting 1.
Proof
_ Suppose G = (V, &, S, P) isa CSG generating L. In the proof of Thess
_ 10.8, the Turing machine used the portion of the tape to the right of the
input for the simulated derivation. That option is not available to us here,
buta satisfactory alternative is to convert the portion of the tape between (
and ) into two tracks, so that the second provides the necessary space. For
this reason we let the tape alphabet of our machine M contain pairs (a,b),
where a,b & 2 UV U {A}, in addition to elements of ©. There ey be
other symbols needed as well.
The first action taken by M will be to convert the tape conliguunen

(%1%X2 °° Xn)

(x1, A)(Ox2, A) + + Gin, A)


This step corresponds to the MovePastinput component of the TM in the
proof of Theorem 10.8. Next, M places S in the second track of square 1
and starts a loop exactly as before, except that the machine will rejec ifthe .
string produced in the second track during any iteration has length greater _
than n. As before, M may exit the loop at any time. When it does, it rejects
if the second track does not match the input in the first track. SinceG is
context-sensitive, a string appearing in the derivationof x.<¢ L cannot be —
longer than x, so that if the LBA begins with input x there is a sequence of
moves it can execute that will cause it to accept. If x ¢ L, on the other hand,
M will either reject or loop forever, since no simulated derivation will be
able to produce the string x.
CHAPTER 10 Recursively Enumerable Languages 383

As you can probably see from the proof of Theorem 10.10, the significant fea-
ture of an LBA is not that the tape head doesn’t move past the input string at all,
but that its motion is restricted to a portion of the tape bounded by some linear
function of the input length (this explains the significance of the phrase linear-
bounded). As long as this condition is satisfied, an argument using multiple tape tracks
can be used to find an equivalent machine satisfying the stricter condition in Def-
inition 10.5.
The strict converse of Theorem 10.10 does not hold, since the null string might be
accepted by an LBA but cannot belong to any context-sensitive language. However,
the obvious modification of the statement is true.

Theorem 10.11 a
If thereis a linear-bounded automaton Me a QO} S o q0> 8)ascentsile
ee Le x, then — is a context-sensitive grammar See

- Proof —
. We give only a sketch of he proof, “Which issimilar to that of Thearem
_ 10.9. As before, the grammar iS constructed so thata derivation generates.
two. copies of a string, simulates the action of M on one, and eliminates —
everything except the other from the string if and when the ues moves _
ofM lead to acceptance. oo
|. The grammar differs from the previous ‘one in. that more Variables are
_ needed and they are more complicated. Previously the variables included _
. only S, T, left and right parentheses, and one variable for each possible state
of M. However, with these variables, productions. such as h,a(O102) > O71
would violate the context-sensitive condition. The way to salvage produc-
: ons of this form is simply to interpret strings of the form (0,0) and poo)
as variables. This approach could have been used in the earlier proof as well,
a and many of the productions would have been context-sensitive as a result. :
_ The difference is that now we no longer need strings containing (AA) and _
productions like h, (Ag) > A, because sally there
are no blanks between
tape markers of M. -
In addition, we must pay attention to the tape markers ( and ); tgs /
iyalso have variables such:as$i (02), (O; 92), and (Oj; (92)) (as well

isthe ‘efanost auheorst and only symbol ee


é 10> i 7 markers, _
as In each case, 1 represents the hata: cheat hate y

ial contained ¢
A
384 PART 4 Turing Machines and Their Languages

U:
CHAPTER 10 Recursively Enumerable Languages 385

way we have defined variables thatthe gram


i t-sensitive. tis also possibleto convince
i elf
‘that itgenerates precisely the nonnull strings accepted by.M.

The four classes of languages that we have now studied—regular, context-free,


context-sensitive, and recursively enumerable—are often referred to as the Chomsky
Hierarchy. Chomsky himself designated the four types as type 3, type 2, type 1, and
type 0, from most restrictive to most general. Each level of the hierarchy can be
characterized by a class of grammars, as well as by a certain type of abstract machine,
or model of computation. This hierarchy is summarized in Table 10.1.

Table 10.1 |The preety eaey


‘Form of productions =— Accepting —
rP g ___imgrammar sevice
3 a A>aB,A->a Finite
(A,BeEV,ae dX) automaton

D, Context-free A>a Pushdown


(Ae V,a€e(VU2)*) automaton

1 Context-sensitive a—>B Linear-bounded


(a,B Ee(VUX)*, |B| = lel, automaton
q@ contains a variable)

0 Recursively enumerable a— Bp » Turing machine


(unrestricted or (a,BE(VUX)*,
phrase-structure) a@ contains a variable)

The phrase type 0 grammar was actually applied to a grammar in which all
productions are of the form a — , where a is a string of one or more variables; it is
easy to see, however, that any unrestricted grammar is equivalent to one of this type
(Exercise 10.19).
The characterizations of all these types of languages in terms of grammars make
it obvious that for 1 < i < 3, every language of type i is of type i — 1, except
that a context-free language is context-sensitive only if it does not contain the null
string. (For any context-free language, the set of nonnull strings in the language
is a context-sensitive language.) Theorem 10.12 shows that we can make an even
stronger statement about the casei = 1.

. generatedby the contextsensitive peers G. We have ale


rT .10 that L can be accepted by a linear-bounded -
au maton M,“vihich iis essentially a special type ofnondeterministic Turing —
386 PART 4 Turing Machines and Their Languages

For each i with 1 <i < 3, the set of languages of type i is a subset of the set
of languages of type i — 1. For i either 2 or 3, we know the inclusion is proper:
There are context-sensitive languages that are not context-free (see Example 10.5)
and context-free languages that are not regular. The last inclusion is also proper for a
trivial reason, because {A} is an example of a recursively enumerable, non-context-
sensitive language. In order to show there are nontrivial examples, we recall from
Theorems 10.1 and 10.12 that
CSS KC RE
(where the three sets contain context-sensitive, recursive, and recursively enumerable
languages, respectively). It would therefore be sufficient to show either that RE—R #
9 or that there is a recursive language L for which L — {A} is not context-sensitive.
Both these statements are true. The proof of the first is postponed until Section 11.1,
because it depends on a type of argument that will be introduced in the next section.
The second statement is Exercise 11.33. The conclusion of either statement is that the
class of recursive languages (which does not show up explicitly in Table 10.1, because
there is no known characterization involving grammars) falls strictly between the two
bottom levels of the hierarchy.
CHAPTER 10 Recursively Enumerable Languages 387

In spite of the results mentioned in the preceding paragraph, just about any
language you can think of is context-sensitive. In particular, programming languages
are. The non-context-free aspects of the C language mentioned in Example 8.4, such
as variables having to be declared before they are used, can be accommodated by
context-sensitive grammars.
The four levels of the Chomsky hierarchy satisfy somewhat different closure
properties. The set of regular languages is closed under all the standard operations:
union, intersection, complement, concatenation, Kleene*, and so forth. The set of
context-free languages is not closed under intersection or complement. Once we show
that there are recursively enumerable languages that are not recursive, it will follow
from Theorem 10.5 that the set of recursively enumerable languages is not closed
under complement. Although it is not difficult to show that the class of context-
sensitive languages is closed under many of these operations (Exercise 10.44), the
case of complements remained an open question for some time. Szelepcsényi and
Immerman answered it independently in 1987 and 1988, by proving that if L can be
accepted by a linear-bounded automaton, then so can L’. Open questions concerning
context-sensitive languages remain: It is unknown, for example, whether or not every
CSL can be accepted by a deterministic LBA.

10.5 |NOT ALL LANGUAGES ARE


RECURSIVELY ENUMERABLE
In Chapter 11 we will see an example of a language that cannot be accepted by any
Turing machine. Rather than waiting for the example, however; we give a noncon-
structive proof in this section that there must be such a language. There are several
reasons for doing this. First the main idea of the proof is simple, and interesting in
itself: The set of languages is much bigger than the set of Turing machines, and one
machine can accept only one language; therefore, there must be a language with no
machine to accept it. Second, we can draw an even stronger conclusion: Not only
is there a language that is not recursively enumerable, but most languages are not
recursively enumerable. (The set of languages that are not recursively enumerable
is bigger than the set of languages that are.) Third, the proof introduces a type of
“diagonal” argument that we will be able to use again when we actually look for an
example.
The set of all languages is infinite, and the set of all Turing machines is infinite.
Nevertheless, the first set is bigger than the second. Our first problem is to make
sense of this idea, and to find a precise way to compare the sizes of two infinite sets.
Let us start with with finite sets. Most people would say that A = {a, b, c} and
B = {x, y, z} are the same size because they both have three elements. This approach
does not look promising for infinite sets, however—the whole point is that two sets
with an “infinite number” of elements may not have the same size. We would like
to be able to say that two sets have the same size without saying exactly what that
size is.
In any case, “counting” the elements of a set A can be interpreted as establishing
a one-to-one correspondence between the elements of the set {2s ett? }eand: the
388 PART 4 Turing Machines and Their Languages

elements of A (and in the process determining the number n): “one,” “two,” ... is
short for “this is the element that will correspond to 1,” “this is iteelement that will
correspond to 2,” and so forth. Rather than applying this process to A, then applying
it to B, then comparing the results, we could simply try to match up the elements of
A and B directly. In the case of {a, b, c} and {x, y, z} we can: There is a bijection
(a one-to-one, onto function) from A to B. An example is the function f defined by
f(a) =x, f(b) = y, and f(c) = z. Although A and B are different sets, we can
view them as the same except for the labels we use to describe the elements: We can
talk about a, b, and c, or about f(a), f(b), and f(c). (This is exactly what it means
to have a bijection from one to the other.) It seems appropriate, in particular, to say
that whenever we have such a function, A and B are the same size.
This criterion can be applied to infinite sets as well as to finite; therefore, we
adopt it as our definition.

Two sets are the same size if there is a bijection from one to the other.

Note that the relation “is the same size as” is an equivalence relation. In particular,
if A and B are the same size, then so are B and A; and if A and B are the same size,
and B and C are the same size, then so are A and C.
Even though we are trying to avoid talking about the “size” of a set as a quantity,
we want to be able to say informally that one set is bigger than another. For example,
{p,q,r, 8, t} is bigger than {a, b, c}. This is because there is a one-to-one function
from {a, b, c} to {p,q,r,s, t} (for example, the function f for which f(a) = p,
f(b) = r, and f(c) = s), but no bijection. In other words, there is a subset of
{p,q,r, S, t} that is the same size as {a, b, c} (the subset {p, r, s}, for example), but
the entire set is not. Again, this characterization of “bigger than” extends to infinite
sets as well.

Aset A is bigger than a set B if there is a bijection from a subset of A to B but no bijection
from A to B, ©

(See Exercise 10.51 for another characterization in terms of one-to-one functions.) In


the case of finite sets, of course, we are free to use our usual idea of size to compare
two sets: A is bigger than B if it has more elements, and A and B are the same size
if they have the same number of elements. For two infinite sets, however, we tend to
proceed in the opposite direction. Rather than looking at how big the sets are in order
to see whether there is a bijection from one to the other, we try to decide whether
there is a bijection in order to compare the sizes.
One important difference between finite sets and infinite sets can be very confus-
ing at first. If a finite set A is the same size as a proper subset of B, then B must be
bigger than A. In other words, if there is a bijection of A to a proper subset of B, then
there cannot be a bijection from A to B. In particular, if B contains all the elements
of A and some more as well, then B is bigger than A. As obvious as these statements
seem, they are not true for infinite sets, as illustrated in Figure 10.2. For example,
we cannot say that B = {0, 1, 2, 3,...} is bigger than A = {1, 2,3, ...}, as we have
defined “bigger,” because the RaettOn f : B= A defined by f(n) = n Lis a
bijection (Figure 10.2a). We cannot say that the set B of all nonnegative integers is
bigger than the set A of nonnegative even integers (Figure 10.2), because the func-
CHAPTER 10 Recursively Enumerable Languages 389

A SED gest a4:

(a)

B LR2O
St Sty te

a oarSE
Figure 10.2 |

tion f defined by f(n) = 2n is a bijection from the first to the second. Finally, for
an even more dramatic example, the set B = R* of all nonnegative real numbers, is
not bigger than the interval A = [0, 1), because the formula f(x) = tan(> x) (whose
graph is shown in Figure 10.2c) defines a bijection from A to B. These results are
counterintuitive. There is clearly a sense in which the set of natural numbers is twice
as big as the set of nonnegative even integers, and R* is infinitely many times as big
as [0, 1). However, it appears that according to our definition, “twice as big,” or even
“infinitely many times as big,” does not imply “bigger” in the case of infinite sets.
The following observation may make these examples seem less surprising: Say-
ing that there is a bijection from a set S to a proper subset of itself is equivalent to
saying that S is infinite. One direction is easy, because there can be no bijection from
a finite set S to a set with fewer elements. For the other direction, see Exercise 10.25.
As you can see from the preceding paragraphs, it is necessary to think carefully
about infinite sets, and it is dangerous to rely on intuitively obvious statements that
you take for granted in the case of finite sets.
Not only are some infinite sets larger than others, but there are many different
“sizes” of infinite sets (Exercises 10.47 and 10.49). For our purposes, however, it is
enough to distinguish two kinds of infinite sets: those that are the same size as the
set V of natural numbers, and those that are bigger, which account for all the rest.

nV to S, and countable
is uncountably infinite, or —

Saying that f : VV— S is a bijection means that these three conditions hold:

1. For every natural number n, f(n) € S.


2. For any two different numbers m andn, f(m) # f(n).
3. Every element of S is f (1) for some natural number n.
390 PART 4 Turing Machines and Their Languages

Therefore, saying that S is countably infinite (the same size as \’) means that the
elements of S can be listed (or “counted”) as f(0), f(1),..., so that every element
of S appears exactly once in the list. Saying that S is countable means the same thing,
except that the list may stop after a finite number of terms.
There are at least two ways in which one might misinterpret the phrase “can be
listed.” First, of course, it is never possible to finish counting or listing the elements
of an infinite set. Saying that S is countably infinite means that we can count elements
of S (“zero,” “one,” “two,’.. .) in such a way that for any x € S, x would be counted
(included in the correspondence being established) if we continued the count long
enough.
A second possible source of confusion is that when we say a set S is countably
infinite, we are saying only that “there exists” a bijection f. There may or may not
be an algorithm allowing us to compute f, or an algorithm telling us how to list
the elements of S. Whether or not a bijection exists has to do only with how many
elements are in the set; whether or not there is a computable bijection also depends
on what the elements are. In particular, every language L over a finite alphabet is
countable (Lemma 10.2 and Example 10.7); however, there is such an algorithm only
if L is recursively enumerable, and as we will see, not all languages are.
Figure 10.3 illustrates one way of thinking of a countable set. We may think of
each underline as a space big enough for one element of a set. The set is countable if
it can be made to “fit” into the indicated spaces.
If there is a bijection from NV to A and also one from WN to B, then there is one
from A to B. Therefore, any two countably infinite sets are the same size. Similarly,
as we noticed earlier, a set that is the same size as a countable set is also countable.
Not all uncountable sets are the same size, if there are in fact many different sizes
of infinite sets. However, an immediate consequence of the following fact is that any
uncountable set is bigger than any countable one.

Lemma 10.1 Every infinite set has a countably infinite subset.

Proof We will show that if S is infinite, there is a bijection f from NV to a subset


of S. We will define f one integer at a time, as follows. Since S is infinite, there is
at least one element; choose one, and call it f(0). In general, suppose that for some
n> 0, f(0), fC), ..., fm) are distinct elements of S. Since S is infinite, there is an
element of S that is not one of these; choose any such element, and call it f(n + 1).
Therefore, by the principle of mathematical induction, f(n) can be defined for every
n => Oso that the elements f (i) are all distinct. Mf

One other simple fact about countable sets will be useful, and we record it as
Lemma 10.2.

Figure 10.3 |
Spaces to put the elements of a countable set.
CHAPTER 10 Recursively Enumerable Languages 391

Lemma 10.2 Every subset of a countable set is countable.


Proof See Exercise 10.24. @

An immediate example of a countably infinite set is the set V itself, and we have
already seen examples of countably infinite subsets of VV. It is not hard to find many
more examples. The set
Si {01/25 3/2; 265/242)
is countably infinite; the way we have defined it is to list its elements. The set Z
of all integers, both nonnegative and negative, is countable, because we can list the
elements this way:
= {0, —l, Ub mea 2: a

One way of thinking of Z is as the union of the two sets {0, 1, 2,...} and {—1, —2,
—3,...}. For any two countably infinite sets A = {ao, aj, az,...} and B = {bo, by,
b2, ...}, we can list the elements of the union in the same way, {do, bo, a1, D1, ...},
except that any x € AM B should be included only once in the list. The conclusion
is that the union of two countably infinite sets is countably infinite. A more dramatic
result, which provides a large class of examples, is the following, often expressed
informally by saying that a countable union of countable sets is countable.

Theorem 10.13 _
392 PART 4 Turing Machines and Their Languages

440 441 442 44,3 %4,4

Figure 10.4 |
Listing the elements of U2) S,.

ction, either fr

Since each of the sets ee Sn is a subset of S, we can also conclude from Lemma
10.2 that a finite union of countable sets is countable.
Theorem 10.12 can be interpreted as saying that any uncountable set A must be
much, much bigger than any countable set B, because even the union of countably
many sets the same size as B is still countable (the same size as B) and therefore
smaller than A.

STG ete «(NV x N Is Countable


Let S = N xN, the set of all ordered pairs of natural numbers. It follows easily from Theorem
10.13 that S$ is countable, since

N XN = {(G, Dili g2 0
= {(0,0), (0, 1), ©, 2),...}
Uld.0), detec san)
U2 GO) Onna oe a
Ulase
CHAPTER 10 Recursively Enumerable Languages 393

3
pleted) Le 10}
m=0
3
oo x N)
and each of the sets {m} x NV is countable mace the function f,, : M — {m} x N defined by
fn (2) = (m, n) is abijection. (Note that {m} x NV is the set of elements in the mth row of Figure
10.4.) However, we can be more explicit about the bijection f from NV to NV x N illustrated
in Figure 10.4 and described in the proof of Theorem 10.13. Let us consider the inverse of the
bijection f, the function f~!.: VNx N — N, and give the formula for f~!(m, n)—in other
words, the formula that enumerates the pairs in NV x NV.
We refer again to the path shown in Figure 10.4. Let j > 0; as the path loops through the
array the jth time, it hits all the pairs (m,n) for which m +n = j, and there are j + 1 such
pairs. Furthermore, for a specific pair (m,n) with m +n = j, there are m other pairs (p, q)
with p + g = j that get hit by the path before this one. Therefore, the total number of pairs
preceding (m, n) in the enumeration is
14+2+---+(m+n—-1)4+(m+n)4+m=(m+n)\(m+n+4+1)/2+m

(see Example 2.7). This is just another way of saying that


f-'\(m,n) =(m+n)\(m+n+1)/24+m
The function f is often referred to as a pairing function and is useful in a number of counting
arguments.
The argument used in this example can be modified easily to show that for any two
countable sets § and T, S x T is also countable.

| . Languages Are Countable Sets


For any finite set D, the set &* of all strings over X is countable. To see this, we write

= LU) >"
n=0

where &” is the set of strings over © of length n. Since %” is finite and therefore countable, it
follows from Theorem 10.13 that D* is also countable. In the simple case when & = {a, b},
one way of listing the elements of &* is to use the canonical order:
{a, b}* = {A, a, b, aa, ab, ba, bb, aaa, aab, ...}

Finally, since a language L over ¥ is a subset of ©*, Lemma 10.2 implies that L is countable.

The Set of Recursively Enumerable Languages Is Countable 10.8 |


| EXAMPLE 10.8
Let 7 be the set of all Turing machines, and RE the set of all recursively enumerable languages.
Here we are following our convention that all the states of a Turing machine are elements of
the fixed set Q and all the tape symbols elements of the set S. In particular, the recursively
enumerable languages all involve alphabets that are subsets of S. It is possible to use Theorem
10.13 directly to show that 7 is countable; see Exercise 10.29. Instead, however, we use the
394 PART 4 Turing Machines and Their Languages

encoding function e : T — {0, 1}* described in Section 9.6. The only property of e that we
need here is that it is one-to-one and therefore a bijection from J to some subset of {0, 1}*.
Since {0, 1}* is countable, any subset is, and therefore 7 is.
Now it is simple to show that RE is countable as well. By definition, a recursively
enumerable language L can be accepted by some Turing machine. For each L, lett(L) be such
a TM. The result is a function t from RE to 7, and since a Turing machine accepts precisely
one language, t is one-to-one. Since 7 is countable, the same argument we used above shows
that RE is also.

Example 10.8 provides half of the result we are looking for. Now that we have
shown the set of recursively enumerable languages to be countable, proving that there
are uncountably many languages (i.e., that the set of languages is uncountable) will
show that there must be non-recursively-enumerable languages.
As our first example of an uncountable set, however, we consider the set 7 of real
numbers. (Theorem 10.14 actually says that a subset of 7? is uncountable, from which
it follows that 7e itself is.) The proof is due to the nineteenth-century mathematician
Georg Cantor. It is a famous example of a diagonal argument; although the logic is
similar to the proof we will give for the set of languages, this proof may be a little
easier to understand.

ore
rem10.14
< This
ae
CHAPTER 10 Recursively Enumerable Languages 395

Note that in the proof, although finding one number not in the list is enough to
obtain a contradiction, there are obviously many more. (The particular choice of x; is
arbitrary, as long as it is neither a; ;nor 9.) Saying that a set is uncountable means that
no matter what scheme you use to list elements, when you finish you will inevitably
find that you have left most of the elements out.
396 PART 4 Turing Machines and Their Languages

number Ae
- Now we silty list

The statement we have shown in this theorem is that 2N is uncountable, which


is another way of saying that there is no bijection from NV to 2’. We should note that
a more general statement is true: For any set S, the set 2° is bigger than S (Exercise
10.47). This statement is what allows us to say, as we noted earlier, that there are
actually many different sizes of infinite sets.

Corollary 10.1 The set of languages over {0, 1} that are not recursively enumerable
is uncountable. In particular, there exists at least one such language.
Proof The corollary follows from Theorem 10.15, from the countability of the set
of recursively enumerable languages (Example 10.8), and from the fact that if S is
uncountable and S$; C S is countable, S — S; is uncountable (Exercise 10.27). @

The proofs of Theorems 10.14 and 10.15 are nonconstructive, and the diagonal
argument in these proofs appears to be closely associated with proof by contradiction.
However, as we mentioned at the beginning of this section, we will construct an
example of a non-recursively-enumerable language by using a diagonal argument
that parallels these two very closely.
The results in Example 10.8 and Corollary 10.1 say that the set of languages
that are not recursively enumerable is much bigger than the set of languages that are.
Nevertheless, you might wonder how significant this conclusion is in the theory of
computation. The nonconstructive proof does not shed any light on what aspects of
a language might make it impossible for a TM to accept it. The same proof, in fact,

0 1 2 3
HeAy OCA, 024 Oe A,
[oiAy eA, A ie
Darky OA ee ee
©
re
WONSee SeAye Siete, ee
Figure 10.5 |
CHAPTER 10 Recursively Enumerable Languages 397

shows that there must also be languages we cannot even describe precisely (because a
precise description can be represented by a string, and the set of strings is countable).
Maybe any language we might ever wish to study, or any language we can describe
precisely, can be accepted by a TM; if this is true, then Corollary 10.1 is more of a
curiosity than a negative result. At this stage in our discussion, such a possibility is
conceivable. In the next chapter, however, we will see that things do not turn out
this way.

EXERCISES
10.1. Show that if L; and L» are recursive languages, then L; U Lz and L; N L>
are also recursive.
10.2. Consider the following alternative approach to the proof of Theorem 10.2.
Given TMs 7; and 7) accepting L; and L», respectively, a one-tape
machine is constructed to simulate these two machines sequentially. The
tape Ax is transformed to Ax#Ax. T; is then simulated, using the second
copy of x as input and using the marker # to represent the end of the tape.
If and when 7; stops, either by accepting or crashing, the tape is erased
except for the original input, and 7 is simulated.
a. Can this approach be made to work in order to show that the union of
recursively enumerable languages is recursively enumerable? Why?
b. Can this approach be made to work in order to show that the intersection
of recursively enumerable languages is recursively enumerable? Why?
10.3. _Is the following statement true or false? If L,, Lo, ... are any recursively
enumerable subsets of &*, then Des L; is recursively enumerable. Give
reasons for your answer.
10.4. Suppose L, L2,..., Ly form a partition of &*; in other words, their union
is D* and any two are disjoint. Show that if each L; is recursively
enumerable, then each L; is recursive.
10.5. Prove Theorem 10.7, which says that a language is recursive if and only if
there is a Turing machine enumerating it in canonical order.
10.6. Suppose L C &*. Show that L is recursively enumerable if and only if
there is a computable partial function from &* to &* that is defined
precisely at the points of L.
10.7. The proof of Theorem 10.2 involves a result sometimes known as the
“Gnfinity lemma,” which can be formulated in terms of trees. (See the
discussion in the proof of Theorem 9.2 of the computation tree
corresponding to a nondeterministic TM.) Show that if every node in a tree
has at most a finite number of children, and there is no infinite path in the
tree, then the tree is finite, which means that there must be a longest path
from the root to a leaf node. (Here is the beginning of a proof. Suppose for
the sake of contradiction that there is no longest path. Then for any n, there
is a path from the root with more than n nodes. This implies that the root
398 PART 4 Turing Machines and Their Languages

node has infinitely many descendants. Since the root has only a finite
number of children, at least one of its children must also have infinitely
many descendants.)
10.8. Describe algorithms to enumerate these sets. (You do not need to discuss
the mechanics of constructing Turing machines to execute the algorithms.)
a. The set of all pairs (n, m) for which n and m are relatively prime
positive integers (relatively prime means having no common factor
bigger than 1)
b. The set of all strings over {0, 1} that contain a nonnull substring of the
form www
c. {n €N | for some positive integers x, y, and z, x" + y” = z”}
10.9. In Definition 10.2, the strings x; appearing on the output tape of T are
required to be distinct. Show that if L can be enumerated in the weaker
sense, in which this requirement is dropped, then L is recursively
enumerable.
10.10. Show that if f : WV— N is computable and strictly increasing, then the
range of f (or the set of strings representing elements of the range of f) is
recursive.
10.11. In each case, describe the language generated by the unrestricted grammar
with the given productions. The symbols a, b, and c are terminals, and all
other symbols are variables.
a.
S — Lar L>LD\A
Da — aaD DR~>R R-A

b.
Sar L—>LD|LT|\|A Da — aaD Ta — aaaT
DR—>R TR—>R R> A

©
S— ABCS|ABC
AB — BA AC —>CA BC
> CB
BA — AB CA > AC CB
—> BC
A—>a Bob C—>c¢

d.

S— LAxR A->a
L— LI IA— Al Ix > AxlIJ IR~+AxR
JA—> AJ J*x > «J JR— AR
LA—>EA EA—> AE Ex> E ER>A
CHAPTER 10 Recursively Enumerable Languages 399

10.12. Consider the unrestricted grammar with the following productions.

S— TD,D, T +> ABCT|A


AB— BA BA — AB CA —> AC CB
— BC
CD, > D,C CD, —> Dra BD, > D,b
A->a D,—> A Dy, > A
a. Describe the language generated by this grammar.
b. Find a single production that could be substituted for BD; > D,b so
that the resulting language would be
{xa” |n => 0, |x| =2n, and ng(x) = np(x) =n}

10.13. Find unrestricted grammars to generate each of the following languages.


a... {a”b"a"b? |n = 0}
bi la? xb” ||n=0,x.€ fa, b}*,. |x) =n}
contsssiuerarby }
db {sss Sea by"}
10.14. In Example 10.3, trace the moves of Simulate as it simulates the derivation
of the string abba. Show the state and tape contents at each step.
10.15. Suppose a nondeterministic TM is constructed as in the proof of Theorem
10.8 to accept L(G), where G is the grammar in Exercise 10.11(a). Draw
the Simulate portion of the TM.
10.16. In the grammar in Example 10.4, give a derivation for the string abba.
10.17. Find a context-sensitive grammar generating the language
{ss | s € {a, b}*}.
10.18. Find CSGs equivalent to each of the grammars in Exercise 10.11.
10.19. Show that if L is any recursively enumerable language, then L can be
generated by a grammar in which the left side of every production is a
string of one or more variables.
10.20. Show by examples that the constructions in the proof of Theorem 6.1 do
not work to show that the class of recursively enumerable languages is
closed under concatenation and Kleene’, or that the class of CSLs is closed
under concatenation.
10.21. Show that if for some positive integer k, there is a nondeterministic TM
accepting L so that for any input x, the tape head never moves past square
k|x|, then L — {A} is a context-sensitive language.
10.22. Show that if G is an unrestricted grammar generating L, and there is an
integer k so that for any x € L, every string appearing in a derivation of x
has length < k|x|, then L is recursive.
10.23. In the proof of Theorem 10.11, the CSG productions corresponding to an
LBA move of the form 5(p, a) = (q, b, R) are given. Give the productions
corresponding to the move 6(p, a) = (q, b, L) and those corresponding to
the move 6(p, a) = (q,b,S).
400 PART 4 Turing Machines and Their Languages

10.24. Show that any subset of a countable set is countable.


10.25. By definition, a set S is finite if for some natural number n, there is a
bijection from S to {i € N | 1 <i <n}. An infinite set is one that is not
finite. Show that a set S is infinite if and only if there is a bijection from S$
to some proper subset of S. (Lemma 10.1 might be helpful.)
10.26. Saying that a property is preserved under bijection means that if a set S has
the property and f : S > T is a bijection, then T also has the property.
Show that both countability and uncountability are preserved under
bijection.
10.27. Show that if S is uncountable and T is countable, then S — T is uncountable.
10.28. Let Q be the set of all rational numbers, or fractions, negative as well as
nonnegative. Show that Q is countable by describing explicitly a bijection
fromN to Q.
10.29. In Example 10.8, the encoding function e was used to show that the set of
Turing machines is countable. Show the same thing without using e, by
applying Theorem 10.13 directly.
10.30. Let S be the set of all infinite sequences of 0’s and 1’s.
a. Describe a bijection from S to the set 2’. It follows from Theorem
10.15, then, that S is uncountable.
b. Show directly, using a diagonal argument, that S is uncountable. Begin
by supposing that there is a list so, 5}, ... of the elements of S, and find
ans € S that is not in the list. Convince yourself, using your solution to
part (a), that this proof is essentially the same as the proof given to
Theorem 10.15.
10.31. In each case, determine whether the given set is countable or uncountable.
Prove your answer.
a. The set of all three-element subsets of VV
b. The set of all finite subsets of VV
c. The set of all finite partitions of NV (A finite partition of NV is a set of
nonempty subsets A;, Az, ..., Ax, so that any two are disjoint and
Ui, 4 = NM)
The set of all functions from NV to {0, 1}
The set of all functions from {0, 1} to V
The set of all functions from NV to V
The set of all nonincreasing functions from NV to V
Ss
Oo
mp
O
0 The set of all regular languages over {0, 1}
i. The set of all context-free languages over {0, 1}
10.32. We know that 2” is uncountable. Give an example of a set S C 2% so that
both § and 2” — § are uncountable.
10.33. Show that the set of languages L over {0, 1} so that neither L nor L’ is
recursively enumerable is uncountable.
CHAPTER 10 Recursively Enumerable Languages 401

MORE CHALLENGING PROBLEMS


10.34. Suppose L is recursively enumerable but not recursive. Show that if T is a
TM accepting L, there must be infinitely many input strings for which T
loops forever.
10.35. Sketch a proof that the class of recursively enumerable languages is closed
under the operations of concatenation and Kleene *. (Use nondeterminism.)
10.36. Canonical order is a specific way of ordering the strings in D*, and its use
in Theorem 10.7 is somewhat arbitrary. By an ordering of ©*, we mean
simply a bijection from the set of natural numbers to ©*. For any such
bijection f, and any language L C &%*, let us say that “ZL can be
enumerated in order f”’ means that the order of the enumeration is the same
as the order induced on L by f—in other words, there is aTM T
enumerating L, and if x; is the ith string appearing on the output tape of T,
the sequence f~!(x;) of natural numbers is increasing.
For an arbitrary ordering f of &*, let E(f) be the statement “For any
L C &*, L is recursive if and only if it can be enumerated in order f.” For
exactly which types of orderings f is E(f) true? Prove your answer.
10.37. Let f : {0, 1}* — {0, 1}* be a partial function. Let g(f), the graph of f, be
the language {x# f(x) |x € {0, 1}*}. Show that f can be computed by a
Turing machine if and only if the language g(f) is recursively enumerable.
10.38. Suppose L C &*. Show that L is recursively enumerable if and only if
there is a computable partial function from &* to 4* whose range is L.
10.39. This exercise is taken from Dowling (1989). It has to do with an actual
‘computer, which is assumed to use some fixed operating system under
which all its programs run. A “program” can be thought of as a function
from strings to strings: it takes one string as input and produces another
string as output. On the other hand, a program written in a specific
language can be thought of as a string itself.
By definition, a program P spreads a virus on input x if running P
with input x causes the operating system to be altered. It is safe on input x
if this doesn’t happen, and it is safe if it is safe on every input string. A
virus tester is a program IsSafe that when given the input Px, where P is a
program and x is a string, produces the output “YES” if P is safe on input x
and “NO” otherwise. (We make the assumption that in a string of the form
Px, there is no ambiguity as to where the program P stops.)
Prove that if there is the actual possibility of a virus—that is, there is a
program and an input that would cause the operating system to be
altered—then there can be no virus tester that is both safe and correct. Hint:
suppose there is such a virus tester IsSafe. Then it is possible to write a
program D (for diagonal) that operates as follows when given a program P
as input. It evaluates IsSafe(PP); if the result is “NO,” it prints “XXX”, and
otherwise it alters the operating system. Now consider what D does on
input D.
402 PART 4 Turing Machines and Their Languages

10.40. Show that an infinite recursively enumerable set has an infinite recursive
subset.
10.41. Find unrestricted grammars to generate each of the following languages.
a. {x € {a,b, c}* |na(x) < np(x) and ng(x) < n-(x)}
b.. (x1e (a,b. cl} |glk) —iple), = 2h)
c. {a"|n = j(j +1)/2 for somej > 1} (Suggestion: if a string hasj
groups of a’s, the ith group containing i a’s, then you can create j + 1
groups by adding an a to each of the j groups and adding a single extra
a at the beginning.)
10.42. Suppose G is a context-sensitive grammar. In other words, for every
production a > £ of G, |B| > |a|. Show that there is a grammar G’, with
L(G’) = L(G), in which every production is of the form
aAB > aXxB
where A is a variable and a, B, and X are strings of variables and/or
terminals, with X not null.
10.43. A context-sensitive grammar is said to be in Kuroda normal form if each of
its productions takes one of the four forms A > a, A> B, A > BC, or
AB —> CD, where a is a terminal and the uppercase letters are variables.
Show that every CSL can be generated by a grammar in Kuroda normal
form.
10.44. Use the LBA characterization of context-sensitive languages to show that
the class of CSLs is closed under union, intersection, and concatenation,
and that if L isa CSL so is Lt.
10.45. Suppose G, and G» are unrestricted grammars generating L; and Lo,
respectively.
a. By modifying G; and G, if necessary, find an unrestricted grammar
generating L,L>.
b. By modifying the procedure described in the proof of Theorem 6.1, find
an unrestricted grammar generating L7.
c. Adapt your answer to part (b) to show that if L; is a CSL, then L7 is
also.
10.46. In the proof of Theorem 10.15, we assumed that the elements of 2 were
Ao, Ai, ..., and constructed a set A not in the list by letting
A = {i |i ¢ A;}. Starting with the same list, find a different formula for a
set B not in the list.
10.47. The two parts of this exercise show that for any set S (not necessarily
countable), 2° is larger than S. It follows that there are infinitely many
“orders of infinity.”
a. For any S, describe a simple bijection from S to a subset of 2°.
Show that for any S, there is no bijection from S to 2°. (You can copy
the proof of Theorem 10.15, as long as you avoid trying to list the
elements of S or making any reference to the countability of S$.)
CHAPTER 10 Recursively Enumerable Languages 403

10.48. In each case, determine whether the given set is countable or uncountable.
Prove your answer.
a. The set of all real numbers that are roots of integer polynomials; in other
words, the set of real numbers x so that, for some nonnegative integer n
and some integers ao, a), ..., Ay, X iS a solution to the equation

Gop aie 4 anx? 4.0, x" =0

b. The set of all nondecreasing functions from VV to V


2 The set of all functions from NV to V whose range is finite
d. The set of all nondecreasing functions from N to NV whose range is
finite (i.e, all “step” functions)
e. The set of all periodic functions from NV to N (A function f : NV > N
is periodic if, for some positive integer Py, f(x + Pr) = f(x) for
every X.)
f. The set of all eventually periodic functions from N to NV (A function
f :N — N is eventually periodic if, for some positive Py and for
some N, f(x + Pr) = f(x) for every x > N.)
g. The set of all eventually constant functions from NV to NV (A function
f : N > N is eventually constant if, for some C and for some N,
fie) = € forevery x = NV.)
10.49. We have said that a set A is larger than a set B if there is a bijection from B
to a subset of A, but no bijection from B to A, and we have proceeded to
use this terminology much as we use the < relation on the set of numbers.
~ What we have not done is to show that this relation satisfies the same
essential properties that < does.
a. The Schréder-Bernstein Theorem asserts that if A and B are sets and
there are bijections f from A to a subset of B and g from B to a subset
of A, then there is a bijection from A to B. Prove this statement. Here
is a suggested approach. An ancestor of a € A isa point b € B so that
g(b) =a, ora point a; € A so that g(f(a))) =a, ora point bj € B so
that g(f(g(b1))) = a, etc. In other words, an ancestor of a € Aisa
point x in A or B so that by starting at x and continuing to evaluate the
two functions alternately, we arrive at a. If g(f(g(b))) = a and b has
no ancestors, for example, we will say that a has the three ancestors
f (g(b)), g(b), and b. Note that we describe the number of ancestors as
three, even in the case that f(g(b)) = b or g(b) =a. In this way, A
can be partitioned into three sets Ao, A;, and Aq; Ao is the set of
elements of A having an even (finite) number of ancestors, A, is the set
of elements having an odd number, and Aq is the set of points having
an infinite number of ancestors. Ancestors of elements of B are defined
the same way, and we can partition B similarly into Bo, B,, and Bo.
Show that there are bijections from Ao to B,, from A, to Bo, and from
Aigo to’ Boo.
404 PART 4 Turing Machines and Their Languages

b. Show that the “larger-than” relation on sets is transitive. In other


words, if there is a bijection from A to a subset of B but none from A to
B, and a bijection from B to a subset of C but none from B to C, then
there is a bijection from A to a subset of C but none from A to C.
c. Show that the larger-than relation is asymmetric. In other words, if A is
larger than B, then B cannot be larger than A.
d. Show that countable sets are the smallest infinite sets in both possible
senses: Not only are uncountable sets larger than countable sets, but no
infinite set can be smaller than a countable set.
10.50. Let J be the unit interval [0, 1], the set of real numbers between 0 and 1.
Let S = J x I, the unit square. Use the Schréder-Bernstein theorem (see
the previous exercise) to show that there is a bijection from J to S. One
way is to use infinite decimal expansions as in the proof of Theorem 10.14.
10.51. Show that A is bigger than B if and only if there is a one-to-one function
from B to A but none from A to B. (One way is easy; for the other,
Exercise 10.49 will be helpful.)
Unsolvable Problems and
Computable Functions

ccording to the Church-Turing thesis, any algorithm can be programmed on a


Turing machine. For this reason, any problem that cannot be solved by a Turing
machine can legitimately be called unsolvable.
Having used a diagonal argument in Part IV to establish the existence of languages
that no Turing machine can accept, we use a similar argument to produce an example,
which begins our study of unsolvable problems. In Chapter 11, we consider a general
methodof reducing one language, or one decision problem, to another. The result
is a large class of unsolvable problems, including problems having to do with TMs
themselves, problems involving context-free languages, and others that can be stated
in general combinatorial terms.
After looking at problems that cannot be solved, we return to those that can and try
to characterize them independently of Turing machines. In Chapter 12, we prove that
the functions computable by Turing machines are precisely those that can be obtained
by beginning with certain initial functions and applying three operations: composition
and two new operations, primitive recursion and unbounded minimalization. Finally,
we mention a few other formulations of computability. We can see why they are
equivalent to the Turing-machine formulation, and we can view this equivalence as
further evidence that the Church-Turing thesis is correct.

405
etl caelae
; nit ; es pacity . ie a (alte e-\7,

bas atneldord seni


ae

a? Kio hemnmerbeng ud n> ent 14 Van aiect edi D toa siti 63pee
parull 9 od berloe od Soates Mail erssbehulin ms seeen ett sudain got _—_
cae : aut \onw bellos vd elalemitnal
har ore fs ;
4 vhs ergnundal ly eoiteire ot Gvildizes cn VI tie tt ho whetmm te ncaa & hseit ani
a. ‘ biquacns ni. eoubeorng05 iochigil willie «aay vr Japede feb: Sato gah’
: leva 8 iabtescg ow.) yoked at nul elie vigane Te Wiie dn: aigeed ity
fies sit jedlione 9) whaldong acetal stwd 4 ovlosmdll ane aetna Tesfbenisin
a? pty ob oo apivar eniotloey guliuliet sirakhig vitievineal vente sagt ¢ai
bStara sd nes tad cradin han 2ryewyett ort-tratiod gnivloval: pibstdeny, aev bean
i a cee” > corey) lcheummntdiniod. faron
- outMe a a porto ibd a dvaetie acl poe a tates eeslecr ity i gotdoal afi
i
i “fads oven ow) tou al .cctiyteent qirw? ie ulipatenay ysvank create re,
EA Racinidd 2d 239 inl) seod) yinnidhig vei vont gahT yA inagniy anobtonait
IIB eog rN “ANOS G + yu eal vigne Drm ana ay Manthey riba ieee msteenyltOS
i, Unni. seottesiinoninice hobavodimnd lint ruceueset > inibey 2 wosthas ey} #96 awit np.
opt tant) vile Soe ne OW viilidaacunes Io snomtionnelt tore oraba ne a
, ‘ no SPV eS et) vei ng) ow ron ,hoiuilionne® soba eared T mat! wotnyi
@ Joma oi Or a) vi coir ‘(wh qt} taal soeSbive 1D sir
=
C H APT ER

Unsolvable Problems

11.1| A NONRECURSIVE LANGUAGE AND AN


UNSOLVABLE PROBLEM
The set of recursively enumerable languages is countable and the set of non-recur-
sively-enumerable languages is uncountable (Example 10.8 and Corollary 10.1). If
we could somehow choose a language at random, it would almost certainly not be
recursively enumerable. Although this observation does not make it obvious how to
find a specific language L no TM can accept, we can find one by using the same type
of diagonal argument as in Chapter 10.
What will be the defining property of an element of L? In the diagonal argument
in Theorem 10.15, starting with sets A;, we obtained a set A different from each A;
in that
i€A ifandonlyif i ¢ A;
Here we want to show that for every Turing machine 7, our language L is different
from L(T), the language accepted by T. An analogous approach, therefore, is first
to associate a string x7 to each TM T (in the same way that a natural number / is
associated with a set A;), and then to let

xr €L ifandonlyif x7 ¢ L(T)

In other words, force x7 to be accepted by any TM accepting L precisely if it is not


accepted by 7.
One way to associate strings with TMs is simply to list all the strings in {0, 1}*
(xo, X1,---), list all the TMs (Zo, 71, ...), and let the string associated with T; be x;.
In our previous example the assumption that it was possible to list all the subsets of
N was made in order to obtain a contradiction. Here, we can list the TMs, because
the set of all of them is countable. We might then consider the set L of all strings
x; for which 7; does not accept x;. Although this version of the diagonal argument
works, there will be some advantage later on in associating a string with aTM T ina

407
408 PART 5 Unsolvable Problems and Computable Functions

less arbitrary way. A natural choice for x7 is e(T), the string that describes T in our
encoding scheme. This approach makes it unnecessary to use countability explicitly,
since we do not need any particular ordering of either the set of TMs or the set of
strings.

Definition 11.1 The Languages Nee Fale ey

/;For. any 7, the seine+ o(Tyi


is anKelemene of NSA Z ns ouly if it is not
accepted by T. For T to accept NSA it must be true that e(7) € NSA if and —
:oe noeee machine r can ae

The two languages SA and NSA are almost the complements of each other. More
precisely, {0, 1}* is the union of the three disjoint sets SA, NSA, and E’, where E =
SA U NSA, the set of all strings of the form e(T) for some TM T. The following
simple result will make it easy to draw the conclusions we want from this fact.
Lemma 11.1 The language E = {e(7) | T is a TM} is recursive.

Proof The encoding function e is described in Section 9.6. It is easy to check that
a string x of 0’s and 1’s is in E if and only if it satisfies these conditions:

1. x corresponds to the regular expression 0000*1((0*1)°)*, so that the substring


following the first 1 can be viewed as a sequence of 5-tuples.
2. For two distinct 5-tuples y and z in the string x, the first two of the five parts of
y cannot be the same as the first two of the five parts of z. (A deterministic TM
cannot have two distinct moves for the same state-symbol combination.)
3. None of the 5-tuples in x can have first part 0 or 00. (ATM cannot move from a
halt state.)
4. The last part of each 5-tuple must be 0, 00, or 000 (representing one of the three
directions).
CHAPTER 11 Unsolvable Problems 409

Any string satisfying these conditions represents a TM, whether or not it carries out
any meaningful computation. There is an algorithm to take an arbitrary element of
{0, 1}* and determine the truth or falsity of each condition, and it is not difficult to
implement such an algorithm on a Turing machine.

Of the three languages NSA, SA, and E, we now know that the first is not recur-
sively enumerable and therefore not recursive, and the third is recursive. It follows
from the formula NSA = SA’ E that SA cannot be recursive (Theorem 10.4 and Ex-
ercise 10.1). However, although the definitions of SA and NSA are obviously similar,
the first language does turn out to be recursively enumerable.

| -‘Theorem 11.2. : a
a The language SA is recursively enumerable butnot recursive.
- Proof / - / |
Tt remains only to show that SA iisety one. The intuitive -
reason NSA is not recursively ‘enumerable is that aTM T for which eye _
ae ee failfo accent ae puns byloo pingRas Inser atl the

The three languages SA, NSA, and E’ are closely related to the decision problem
Self-accepting: Given a TM 7, does T accept the string e(T)?
The three sets contain the strings representing yes-instances, no-instances, and non-
instances, respectively, of the problem.
In order to solve a general decision problem P, we start the same way, by choosing
an encoding function e so that we can represent instances J by strings e(/) over some
alphabet ©. Let us give the names Y(P) and N(P) to the sets of strings representing
yes-instances and no-instances of P. Then, if E(P) = Y(P) U N(P), we have the
third set E’(P) of strings not representing instances, just as in our first example.
Any reasonable encoding function e must be one-to-one, so that a string can
represent at most one instance of P. It must be possible to decode a string e(/) and
recover the instance J. Finally, there should be an algorithm to decide whether a
410 PART 5 Unsolvable Problems and Computable Functions

given string in ©* represents an instance of P; in other words (since “algorithm”


means TM), the language E(P) should be recursive.
If we start with the decision problem P, solving it means answering the question
of whether an arbitrary instance / is a yes-instance of P,, and the model of computation
we use is the Turing machine. The question thataTM T may be able to answer directly
is whether a string x represents a yes-instance (this is the membership problem for
the language Y(P)). These two questions sound slightly different, and it sounds
as though the second may be somewhat harder: Before we can even start thinking
about whether an instance is a yes-instance, we must first decide whether a string
represents an instance at all. However, we do not generally distinguish between the
two problems. The extra work involved in the second simply reflects the fact that TMs
require input strings encoded over the input alphabet. Because of the conditions we
require our encoding function to satisfy, we will be able to answer the first question
if and only if we can answer the second.
With this discussion, we are ready to define a solvable, or decidable, decision
problem.

a ‘Theorem 4350 |
eka si peaaleens
a

The esi follows immediately from Definition i2and’Therer u oz

As we have seen, the language SA happens to be recursively enumerable, whereas


NSA is not. However, neither language is recursive, and the two can be thought of
as representing essentially the same unsolvable problem. (For any problem P, just
as in our example, if either Y(P.) or N(P) fails to be recursively enumerable, then
neither language can be recursive.) The strings in NSA represent no-instances of Self-
accepting, or yes-instances of the complementary problem: Given a TM T, does T
fail to accept e(T)? An algorithmic procedure attempting to answer either question
would eventually hit an instance (a yes-instance or a no-instance) it could not handle.
Finally, it is important to understand that a decision problem can be unsolvable
and still have many instances for which answers are easy to find. There are obviously
many TMs T for which we can decide whether T accepts e(T). What makes Self-
CHAPTER 11 Unsolvable Problems 411

accepting unsolvable is that there is no single algorithm guaranteed to produce the


correct answer for every instance.

11.2| REDUCING ONE PROBLEM


TO ANOTHER: THE HALTING PROBLEM
Our first example of an unsolvable decision problem is the problem we have called
Self-accepting. We used a diagonal argument to show it is unsolvable, and the
characteristic circularity of this argument accounts for the convoluted nature of the
problem itself. Once we have one unsolvable problem, however, others will be easier
to find, including some that sound more natural and whose significance may be a little
more obvious. For the time being, the unsolvable problems we obtain will continue
to be problems about Turing machines; later in this chapter we will see some that are
more diverse.
If we can establish that one decision problem, P;, can be reduced to another,
P», or that having a general solution to P would guarantee a general solution to P;,
then it is reasonable to say informally that P, is no harder than P». It should then
follow that if P2 is solvable, P; is solvable (or equivalently, if P, is unsolvable, P, is
unsolvable).
Let us start with two examples involving finite automata. First, let P; be the
problem: Given a nondeterministic finite automaton M anda string x, does M accept
x? The presence of nondeterminism means that a solution algorithm cannot be as
straightforward as “run M on x and see what happens.” However, the subset con-
struction provides an algorithm (see the proof of Theorem 4.1) to take an NFA and
produce an equivalent FA. We may therefore reduce P; to the problem P2, for which
the straightforward approach does work: Given an FA M and a string x, does M
accept x? Let us look more carefully at the steps involved when we solve P; this
way. We start with an arbitrary instance J of P;: a pair (M, x), where M is an NFA
and x is a string that might or might not be accepted by M. We answer the question by
computing an instance F (J) of Pz, consisting of a pair (NV, y), where N is an FA and
y is a string. In this case, N is the FA produced by the subset-construction algorithm,
accepting the same language as M, and y is simply x. Determining whether y is
accepted by WN tells us whether x is accepted by M.
For the second example, let P; be the problem: Given two FAs M, and M), is
L(M,) C L(M2)? As we described in Section 5.4, we can reduce this problem to P):
Given an FA M, is L(M) = @? The reduction consists of starting with J = (M,, M2),
an instance of P;, and computing an instance F (J) = M of P; by letting M be an FA
accepting L(M,) — L(M2) (see Theorem 3.4). Since L(M)) © L(M)>) if and only if
L(M,) — L(M2) = G, F (J) is a yes-instance of P if and only if J is a yes-instance
of P,; in other words, the answers for the two instances are the same.
In both these examples, the two crucial aspects of the reduction are: first, that we
be able to carry out the computation of F(/), given J; second, that the answer to the
first question involving I be the same as the answer to the second question involving
Fd).
412 PART 5 Unsolvable Problems and Computable Functions

Definition 11.3 Reducing One Bocision cence be ps

ifP, and P are ae n problems, we say P, is reducible to Py Gorited


ows us, given an
arbitrary instance I of findan instance 7:(i 2 SO that for every
stances I and F(/) are th e .
the answers for the tw

The situation in which it is easiest to say precisely what the phrase algorithmic
procedure means is that in which P; and P, are the membership problems for two
languages L; C ZF and Lz C D5, respectively. In this case an instance of P; is a
string x € Xf and an instance of P2 is a string y € 3. Finding a y for each x means
computing a function from Xf to D5, and this can be done directly by a TM. It makes
sense in this case to talk about reducing the first language to the second.

Definition afe 2: dears ene sida lis


Com Vileldil-ig

7 < pe hifand-only if f@)eLy


iswpeofreducibility is sometimes called _— 2 re ducibility)

If L; < L», being able to solve the membership problem for L2 allows us to
solve the problem for L;, as follows. If we have a string x € Xj} and we want to
decide whether x € L,, we can answer the question indirectly by computing f(x)
and deciding whether that string is in L2. The answers to the two questions are the
same, because x € L if and only if f(x) € Lo.
In a more general situation, as we discussed in the previous section, we can
normally identify the decision problems P; and P2 with the membership problems
for the corresponding languages Y (P;) and Y (P>), assuming that we have appropriate
encoding functions. This means in particular that the statement P; < P) is equivalent
to the statement Y(P;) < Y (P2) (see Exercises 11.23—11.25). In the proof of Theorem
11.5, we discuss the reduction used in the proof, both at the level of problem instances,
which is normally a little easier to think about, and at the level of languages, which
allows us to be a little more precise. After that we will normally stick to the one most
directly applicable.
The most obvious reason for thinking about reductions is that we might be able
to solve a problem P by reducing it to another problem Q that we already know
how to solve. However, it is important to separate the idea of reducing one problem
to another from the question of whether either of the problems can be solved. The
specific reason for discussing reductions in this chapter is to obtain more examples
of unsolvable problems.
CHAPTER 11 Unsolvable Problems 413

ti =thsfunctionoe 2 > 2 Lear in he eduction’ Ver :


thecomposite ™ T, = cae On input x € D*, 7; first computes
(x), then halts with output ior 0, depending on whether f(x) isin Lo or
| not, ‘The.assumption that f isa reduction of L, to L2means that the output :
is lifx Cli and 0 otherwise, z and it follows that | recognizes Le
The second statement ofthe theorem follows from our previous discus- _
sionof theEAP RRIE: between decision roblems and neSee onsae

If the problem of whether aTM T accepts the string e(T) is unsolvable, we should
not expect to be able to solve the more general membership problem for recursively
enumerable languages, which we abbreviate Accepts.
Accepts: Given a TM T and a string w, is w € L(T)?
It is worth mentioning once more why the obvious approach to the problem (give
the input string w to the TM T, and see what happens) is not a solution: This approach
will produce an answer only if T halts, not if it loops forever.
An instance of Accepts consists of a pair (JT, w), where T is a Turing machine
and w a string. With our encoding function e, we can represent such a pair by the
string e(T)e(w), and so we consider the language
Acc = {e(T)e(w) | w € L(T)}

Theorem 11.5 - L
Agente’is unsolvable.
Proof”
The intuitive idea of the proof is the observation we have already made: If
ig we could decide, for an arbitrary TM T and an arbitrary string w, whether _
os accepted w, then we could decide whether an arbitrary TM T accepted _
. eT ),andan we know that thisis impossible. To make this precise, we will
reduce Self-accepting to Accepts and use the last statement in Theorem
At.4. la order to show that Self-accepting can be reduced to Accepts, we
ust describe an algorithm for producing an instance F (1) of Accepts from
3 given instance I of Self-accepting. Tiisa Turing machine T,and F(/) is_
414 PART 5 Unsolvable Problems and Computable Functions

to be a pair (7, w), where 7; isa TM and w is a string. We want T, toaccept


~ w if and only if T accepts the string oo )The panic GO therefore, iis
‘toletT;=Tandw=e(T). —
_ At the level o can no mice yh bleBy sh
e language Acc is not recursive, using the language SA.
first part of Theorem 11.4, it is sufficient to show SA < ie becausewe _
know from Theorem 11.2 that SA is not recursive. — a
Since Senate Pebothhae are une of 0’s and 1’ s,ha

: eS
a. _ and
oy

LD 6 ee s
ne ape and leave 7

In this proof, the argument involving instances T and (7, x) of the two problems
seems simpler and more straightforward than the one involving languages. However,
if you compare them carefully, you can see that the key steps are almost exactly the
same. The definition f(x) = xe(x) in the second argument is simply the string
version of the definition F(T) = (T, e(T)) in the first. The details that make the
second proof more complicated have to do first with deciding whether the input string
is an instance of the problem, and then with the necessary decoding and encoding of
strings.
The most well-known unsolvable problem, the halting problem, is closely related
to the membership problem for recursively enumerable languages. For a given TM
T and a given string w, instead of asking whether T accepts w, it asks whether T
halts (by accepting or rejecting) on input w. We abbreviate the problem Halts.
Halts: Given a TM 7 and a string w, does T halt on input w?
Just as before, we can consider the corresponding language
= {e(T)e(w) | T halts on input w}
CHAPTER 11 = Unsolvable Problems 415

Hea8
iscain ne toobtain q fontT o eo
In Pdrto accompli chthis, iti
= (hr,b, 05(p, a=©, a,8).a2
D)
changing any move ofthe form Sp, De in :
if T ever arriv es
this: change means that eS ]
square Sine. The reason this
/ square, T; is stuck in this state and on this
entertherejectstate tryi a
changeis not~ sufficient iiswt QAes

Th willpedin byiagerine anew ccuinbol


i square 0, moving siauihe else over one square, and then move to
go with the tape head in square 1. Aside from the moves required to do that,
has thesame moves as T (except for the modification already described),
a well as the additional moves 6(q, #) = (q, #, S) for all possible states q.
ct is that if T ever tries to move its tape head off the:
ee Hswill,:
for an infinite loop with its tape head on square |0 _ .
This. algorithm for obtaining 7; from T ‘gives ‘us the reduction we
\ eneed,
and weconclude that Halts is i unsolvable. 2 :

The fact that the decision problem Self-accepting is unsolvable is interesting


because it shows that easy-to-state problems can be unsolvable; the problem itself is
sufficiently contrived that the answer for any particular instance may not be especially
significant. The membership problem for recursively enumerable languages is more
general but may still sound a little esoteric. However, the chances are that you have
thought about something very similar to the halting problem yourself.
Almost anyone who has written a computer program involving for-loops or
while-loops has encountered the problem of infinite loops. Infinite loops written by
novice programmers often involve errors that are easy to spot (failure to increment
a variable inside a loop, missing or improper initializations, and so on). As you
become a more sophisticated programmer, you learn not only to avoid these basic
errors but also to recognize potential trouble spots inside loops that may cause them
416 PART 5 Unsolvable Problems and Computable Functions

not to terminate. It might seem as though, with a careful enough analysis, any infinite
loop can be detected.
Even without knowing anything about the halting problem, we can see that this
is unrealistic by considering an example from mathematics. Many famous, long-
standing open problems have to do with the existence or nonexistence of an integer
satisfying some specific property. Goldbach’s Conjecture, made in 1742, is that every
even integer 4 or greater can be expressed at least one way as the sum of two primes.
(For example, 18 = 5 + 13 and 100 = 41 + 59.) Although the statement has been
confirmed for values of n up to about 4 x 10!*, and most mathematicians assume the
conjecture is true, no one has proved it. (In 2000 the publishing company Faber and
Faber offered a prize of $1 million to anyone who could furnish a proof by March 15,
2002.) However, testing whether a specific integer is the sum of two primes is a very
simple calculation. Therefore, it is easy to write a computer program, or construct a
Turing machine, to execute the following algorithm:

n=) 4
conjecture = true
while (conjecture)
{ if (n is not the sum of two primes)
conjecture = false
else
sah egy ear
}
The program terminates if and only if there is an even integer greater than 4 that is
not the sum of two primes. Thus, in order to find out whether Goldbach’s conjecture
is true, all we would have to do is decide whether our program runs forever.
In any case, whether we consider programs like this or programs you might write
in your computer science classes, the fact that the halting problem is unsolvable says
that there cannot be any general method to test a program and decide whether it
will terminate. This could be frustrating to mathematicians who are trying to prove
Goldbach’s conjecture. On the one hand, if they are unable to find a proof, they cannot
take much comfort from the unsolvability of the halting problem, because there may
be a simpler alternative method of deciding the conjecture (presumably a way that
uses facts about integers and primes, rather than simply facts about programs); on the
other hand, there may not!
Some problems related to the halting problem are discussed in the exercises. For
example, another question to which an answer would be useful is: Given a computer
program (or a Turing machine), are there any input values for which it would loop
forever? See Exercise 11.12.

11.3 |OTHER UNSOLVABLE PROBLEMS


INVOLVING TMS
We began this chapter with the problem of whether a TM T accepts the input string
e(T), which is a special case of Accepts. We begin this section by considering two
other useful special cases.
CHAPTER 11 Unsolvable Problems 417

First, rather than considering only the string e(T), we might try to restrict the
problem the other way, by fixing a Turing machine T and allowing a solution algorithm
to depend on T. For some machines T there is such an algorithm, but for at least one
there is not. Consider the universal Turing machine T,, introduced in Section 9.6, and
the decision problem P,,: Given w, does T, accept w? If we had a general solution
to this problem, then we could solve Accepts, by taking an arbitrary pair (TJ, x) and
deciding whether T,, accepted the string e(T)e(x). (In other words, we can reduce
Accepts to the problem P,, by assigning to an instance (T, x) of Accepts the instance
e(T )e(x) of P,,.) Therefore, P,, is unsolvable.
Let us consider another special case of Accepts obtained by restricting the string,
this time to the null string. We define Accepts-A to be the decision problem

Given a Turing machine T, is A € L(T)?

(i.e., does T eventually reach the accepting state, if it begins with a blank tape?)

In general, we show that a problem is unsolvable by finding another unsolvable


problem P to reduce to it. The more candidates we have for P, the easier this process
418 PART 5 Unsolvable Problems and Computable Functions

is likely to be. We present several more examples of unsolvable problems involving


TMs.
CHAPTER 11 Unsolvable Problems 419

€ 1). Moet a setis empty if and only if itis


isa-
] hee of the empty set, we may let T, be T and T) be the trivial TM that -
ESS oatwae be)pe Lo, he = Lee E(P) = = ae and - -

teen
T becomes
: eae
ad. Se ith input A, the two machines ‘pitoun exactly thesame.
mputation until they accept, and qT wales anaif padoe ifms
Ducane.
refore, we have the reduction we need. /

11.4| RICE’S THEOREM AND MORE


UNSOLVABLE PROBLEMS
Animportant class of decision problems contains those of the form: Given a language,
does it have a certain property? Because a Turing machine is a basic way of specifying
a language, we will be interested in formulating the problem this way:
Given a Turing machine 7, does L(T) have property R?

Several of the decision problems in the previous section are of this type. In the case
of Accepts-A, having property R means containing the null string. In the first two
problems listed in Theorem 11.8, the language has property R if it is nonempty, or if
it is all of D*, respectively.
There is a good reason for concentrating on this class of decision problems: For
just about any property R we choose, the resulting problem is unsolvable! “Just about
420 PART 5 Unsolvable Problems and Computable Functions

any” property means any nontrivial property of recursively enumerable languages—


in other words, any property satisfied by some, but not all, such languages. The
only cases in which the problem is solvable are those in which the solution is either
“Answer yes for every instance” or “Answer no for every instance.” This result,
known as Rice’s theorem, will provide us at one stroke with many more examples of
unsolvable problems.
CHAPTER 11 Unsolvable Problems 421

] oythe: apty language, Accepts-A ci can bereduced to Pr. If Ris satisfied by.
the empty set,then | We use an indirect argument to show that Pp is unsolvable.
We tedu c Accepts-A tto Pr’, where R’ is the complementary property —
is ble because R’ isa nontrivial eae) notsatisfied- Q,.
nclude that Pp is
i unsolvable because Pp is. _

Here isa list, somewhat arbitrary and certainly not complete, of decision problems
whose unsolvability follows immediately from Rice’s theorem. Some of them we
have already shown to be unsolvable, by directly reducing other unsolvable problems
to them.
AcceptsSomething: Given a TM 7, is L(T) nonempty?
AcceptsTwo: Given a TM T, does T accept at least two strings?
AcceptsFinite: Given a TM T, is the language accepted by T finite?
AcceptsEverything: Given a TM T with input alphabet ©, is L(T) = D*?
AcceptsRegular: Given a TM 7, is the language accepted by T regular?
hE
SON
Cla
ed
Bates
AcceptsRecursive: Given a TM T, is the language accepted by T recursive?

Many decision problems involving Turing machines do not fit the format required
for applying Rice’s theorem directly. The problems Accepts and Halts do not, because
in both cases an instance is nota TM buta pair (7, x). The problem Subset in Theorem
11.8 involves more than one Turing machine, as does the problem

Equivalent: Given TMs 7; and 75, is L(T;) = L(T)?

To convince ourselves that Equivalent is unsolvable, we might argue as follows. For


any specific recursively enumerable language L>, such as L2 = {A}, the problem

Accepts-L>: Given a TM 7, is L(T) =

is unsolvable because of Rice’s theorem (the property of being L> is a nontrivial


language property). However, the problem Accepts-L» is reducible to Equivalent,
because if 7) is a TM accepting L2, then for any instance T of Accepts(L>), the pair
(T, T>) is an instance of Equivalent having the same answer. Therefore, Equivalent
is unsolvable.
Rice’s theorem also does not apply directly to decision problems involving the
operation of a Turing machine, as opposed to the language accepted by the machine.
Some such problems are solvable, and some are not. The problem

Given a TM 7, does T make more than 100 moves on input A?

can obviously be solved: Being “given” a TM means in particular being given enough
information to trace the processing of a fixed string for a certain fixed number of
moves. An example of an unsolvable problem that involves the operation of a
TM and therefore cannot be immediately proved unsolvable using Rice’s theorem
is WritesSymbol, in Theorem 11.8. In view of that problem, it may seem surprising
422 PART 5 Unsolvable Problems and Computable Functions

that the following problem is solvable:

Given a TM T, does T ever write a nonblank symbol when started with input A?

SeeExercise 11:15.
Finally, even for a problem of the right form (Given T, does L(T) satisfy property
R?), Rice’s theorem cannot be applied if the property R is trivial. Remember that
“trivial” is used here to describe a property that is possessed either by all the recursively
enumerable languages or by none of them. Deciding whether the property is trivial
may not be trivial. If the property is trivial, however, then the decision problem is
trivial in the sense that the answer is either yes for every instance or no for every
instance. An example of the first case is the problem: Given a TM T, can L(T) be
accepted by a TM that never halts after an odd number of moves? Here the answer
is always yes. We can modify any TM if necessary so that instead of halting after
an odd-numbered move, it makes an extra (unnecessary) move before halting. An
example of the second case is the problem: Given a TM T, is L(T) the language
NSA? (See definition 11.1.) Here the answer is always no: No matter what T is, L(T)
cannot be NSA, because NSA is not recursively enumerable.

11.5|POST’S CORRESPONDENCE PROBLEM


In this section we show that a combinatorial problem known as Post’s Correspondence
problem (PCP) is unsolvable. Although the details of the proof are rather involved,
the problem itself can be understood easily even by someone who knows nothing
about Turing machines, and using its unsolvability is one way of showing that a
number of decision problems involving context-free grammars are also unsolvable.
The problem was first formulated by Emil Post in the 1940s. An instance
of PCP is called a correspondence system and consists of a set of pairs (a, Bi),
(a2, Bo), ..-, (Qn, Bn), Where the a;’s and £;’s are nonnull strings over an alphabet
x. The question we are interested in for an instance like this is whether there is a
sequence of one or more integers i), i2,..., 4%, each i; satisfying 1 <7; <n and the
i;’s not necessarily distinct, so that

Oj, Min Hi, = Bi, Bi2 7 Bi,

The instance is a yes-instance if there is such a sequence, and we call the sequence a
solution sequence for the instance.
It is helpful in visualizing the problem to think of n distinct groups of dominoes,
each domino from the ith group having the string a; on the top half and the string 6; on
the bottom half (see Figure 11.1@), and to imagine that there are an unlimited number
of identical dominoes in each group. Finding a solution sequence for this instance
means lining up one or more dominoes in a horizontal row, each one positioned
vertically, so that the string formed by their top halves matches the string formed by
their bottom halves (see Figure 11.15). Duplicate dominoes can be used, and it is not
necessary to use all the distinct domino types.
CHAPTER 11 Unsolvable Problems 423

[2 eed pnlE
poe

(a)
fe
fat
(b)

Figure 11.1 |

A Simple Correspondence System | EXAMPLE 11.1


Consider the correspondence system described by this picture:

rior] [oo] Fo} Fo] Po,


In any solution sequence for this instance of PCP, domino 1 must be used first, since it is the
only one in which the two strings begin with the same symbol. One solution sequence is the
following:

a i [or [0 [100 [100-0100


v0 [io fifo |o [ilo |
— —

i
s|5 = j=) S

and you can verify for yourself that there is also a solution sequence beginning

— oS

apo |
— >=

PCP has a feature shared by many of the unsolvable problems we have consid-
ered. There is a trivial way to arrive at the correct answer for any instance, if the
answer is yes: Just try all ways of lining up one domino, then all ways of lining
up two, and so forth. Of course, a mindless application of this approach is doomed
to failure in the case of a no-instance. Saying that PCP is unsolvable says on the
one hand that reasoning about this approach will not help (for any n, you can try
all sequences of n dominoes and still not be sure there is no sequence of n + 1 that
works), and on the other hand that no other approach is guaranteed to do better.
We show that PCP is unsolvable by introducing a slightly different problem,
showing that it can be reduced to PCP, and then showing that it is unsolvable by
showing that the membership problem Accepts can be reduced to it. An instance
of the Modified Post’s correspondence problem (MPCP) is exactly the same as an
424 PART 5 Unsolvable Problems and Computable Functions

instance of PCP, except that a solution sequence for the instance is required to begin
with domino 1. In other words, a solution sequence consists of a sequence of zero or
more integers i2, i3,..., ix so that
O10; ++ i, = Bi Bin Bi,
CHAPTER 11 Unsolvable Problems 425

_the same symbol. It is conceivable that some of the other i;j ’sare alson +1,
but if im 18 the last i; to equal n + 1, then 1, 12, ..., 1m 1S also a solution
sequence for the instance J. It is then easy to chee that lo... a ln— 1 1Sa
_ Solution sequence for the instance 7 of MPCP.
_ We have shown that / is a yes-“instance of MCE ifand.
ae if.
Tiisa
yes-instance of ee which implies that —
/ MPCP_ < PCP

"Theorem 11.14 |
MPCPis unsolvable.

Proof
We want to show that Accepts is reducible to MPCP. Let (7, w) be an: arbi-
trary instance of Accepts, so that T = CO. s T, qo, 8) isa Turing machine ©
_ and w isa string over the input alphabet ©. We wish to construct an instance
(@1, Bi), (@2, Bo), ..-, (ns Bn) of MPCP (a modified ee “Sys
tem) that has a elution sequence if and only if T accepts w.
___ It will be convenient to assume that T never halts in the reject state h,.
_ Since there is an algorithm to convert any I'M into an equivalent one that
_ enters an infinite loop whenever the original one would reject (see the proof
_ of Theorem 11.6), we may make this assumption without loss of generality. —
Some additional notation and terminology will be helpful! For an instance -
(a1, Bi), (@2, Bo), ..., (Qn, Bn) of MPCP, we will say that a partial solution
is a Sequence 72, i3,...,1; so thata = O10), Oj, is a prefix of Bp = _
PiB:,-- Bi,. A little less precisely, we might say that the string a obtained
_ this way is a partial solution, or that the two strings w and B represent a
| partial solution. Secondly, we introduce temporarily a new notation for _
representing ™ configurations. For x, y € (TU {A})* with y not ending —
_ in A, and q € Q, we will write xqy to represent the configuration that we —
normally denote by (q, x y), or by (¢, xA) in the case where y = A. In
other words, the symbols in x are those preceding the tape head, which is —
centered on the first symbol of y if y # A and on A otherwise.
In order to simplify the notation, we assume from here on that w aoA.
_ This assumption will not play an essential part in the proof, and we will
indicate later how to take care of the case when w = A.
Here is arough outline of the proof. The symbols involved in our
pairs will be those that appear in the configurations of T, together with
on additional symbol #. We want to specify pairs (%, B;) in our cece
correspondence system so that for any /, if
qoAw, Xi V1, - +». X79;5j
“are successive configurations through which T moves in processing the input
stringw, starting with the initial configuration qoAw, a partial solution can
426 PART 5 Unsolvable Problems and Computable Functions

fe obtained thatlooks like

tial
al coluion is one step ahead «of ¢
a.the the: :
a is determined, and the pairs (@;, B;) are suchthatevery tin
CHAPTER 11 Unsolvable Problems 427
428 PART 5 Unsolvable Problems and Computable Functions

where z’ represents the configuration of T one move later. Moreove


_| string p’ shown is theek one that can a & this a’ in apatti
to
solution. oo
clkm inthe case when
We establish the

p= yee “aegis” ++ Ak+mit


and 6(q, ae (Pp,b,R). The other cases are simi
“us to extend the p solution are these: first, airs
; (di, a,) of type1;then, the pair (qax+1, bp) of type2;ne , any :
fone pairs (4x42, 4x42),- Gen Gans and ee the pair (#,A
ec pee solution produced isis ae

++ Akim#

ip =y#ay- QeGak +1 “+ Aktmiay * Laxbp aig. ktm


sSure enous the substring of e between the last two #5 is the machine :

s the claim is established inthiscase.


andfe, |
Suppose, on the one hand, thatT accepts input w. This means <heudiore
is a sequence of consecutive configurations of T, beginning with (qo, Aw) —
and ending with an oe a ee iftheseeee are

: claimabove shows thater a onal sites


a= = H#Zo#-- Hzj- 4H

B= Hz - Hz 182j#

_ cepting co nfiguration is of the form


uhgv

“where the strings u madonsv may be null. If at least oone> is nonnull, we may ;
extend the partial ee using one pair of ag 3 andothers of type 1,to
— obtain:

c
“stillcontains
where Zz, h, but has atlea oe fewer symbol than z;.
~ similar way, we can continue to extend the partial solution, so that the strings — :
between consecutive #’s decrease in length by either one or two symbols at
each step, until we have a partial ssolution of the form
. os /

oth at
CHAPTER 11 Unsolvable Problems 429

Applying |thepair pare arnowies a as co hesoe a the —

ly, suppose a ee not accept w: ouraatepon is that in


2oe forever. ae for any boossolution of ay sone’ |

C= ae Het

aepeat Hatt

The only ee ee of the proof that neces


is inh
thecase 1
w
dsthat the initial pair is &, #qo#) instead of (#, #qoA wt).

A Correspondence System for a Simple TM | EXAMPLE 11.2 |


Let T be the TM pictured in Figure 11.2, which accepts all strings in {a, b}* ending with b.
Let us examine the modified correspondence system constructed as in the proof of Theorem
11.11, for two strings w that are not accepted by T and one that is. The only difference is in
the initial pair.
The pairs of type 2 are these:

(qoA, Aqi) (qia, aqi) (924, 4q1)


(qo#, Aqi#) (qib, bqi) (qab, ha A)
(aqi#, qaA#) (aqi A, aA)
(bqi#, qabA#) (bq: A, q2bA)
(Aqi#, aA A#) (Aqi A, @AA)

a/a, R
b/b,R

Figure 11.21
430 PART 5 Unsolvable Problems and Computable Functions

For the input string A, pair 1 is (#, #qo#). The following partial solution is the longest
possible, and the only one (except for smaller portions of it) ending in #:

Clearly no solution sequence exists.


The input string a causes T to loop forever. In this case, pair 1 is (#, #qo Aa#). In the
partial solution shown below, the last domino appears for the second time, and it is not hard to
see that longer partial solutions would simply involve repetitions of the portion following the
first occurrence.

—#[we Te] [oe [FAT eat LA |# |


ranaat Lan [a[#]a fas [#4 |aaad aq |LA |#
TaTand [#[4
|we|
|A | mad [# |A |aa |
Finally, for the input string b, which is accepted by T, pair 1 is (#, #qoAb#), and the
solution sequence is shown below.

Po [aod Tole Tatas [#] a] bat | a | ab [A |# |


[#goAb# |Aa | | #1 A [on |# |A | bad |A | nd | | #
[aha TAT # [RAL |
ica oa era ianSaat
11.6 |UNSOLVABLE PROBLEMS INVOLVING
CONTEXT-FREE LANGUAGES
For some decision problems involving context-free grammars and languages, there
are solution algorithms. The membership problem for CFLs (Given a context-free
grammar G and a string x, is x € L(G)?) is solvable, and in Section 8.3 we were
also able to solve such problems as whether a given CFL is finite or infinite. In this
section, however, we will consider two techniques for obtaining unsolvability results
involving CFGs.
The first approach uses Post’s correspondence problem, discussed in the previous
section. We begin by describing a useful construction in which two CFGs are obtained
from an instance of PCP.
Suppose J is the correspondence system

(a, B1), (Q2, 2), aa) (Qn, Bn)

where the @;’s and 6;’s are strings over &. Let

Co= {e1y Cy sen


CHAPTER 11 Unsolvable Problems 431

where the c;’s are symbols not contained in ©. The terminal symbols of both our
grammars will be the symbols of & U C. Let Gy be the CFG with start symbol Sy
and the 2” productions

Sa > @Syci |aie; (1 <i <n)


and let Gg be the one with start symbol Sg and productions

Sp Pi oeCl PiCis (l=. <= n)


Then L(G,) is the language of all strings of the form

Oj, Win * i, Ci, Cip_y °° * Ci (k= 1)


and L(Gg) is the same except that each q;, is replaced by 8;,.
There are a number of examples, two of which we describe below, of decision
problems involving CFGs to which PCP can be reduced using this construction.

"Theorem 11.12 eo i
- The problem CFG Nonemnpyintersedcogs Given
two
CFG G,and
_. 1{G)) n L(G) apes) is unsolvable. ; ee
irosh | 8 -
We can reduce PCP to CFGNonemptyinter, sectionasfollows. Fe
arbitrary instance I of PCP, consisting of the pairs Gi, B:)with le
construct theiinstance Ga,
oo -eb ee itcciee

<] = ist. HisCiCis7 ;

and, on
n the oui hand,

x =BiBn ‘Biel in@inct 7 Ci

Thenok Ciwe here as a control. There is a string x salting: both


_ these equations if and only ik = m,i, = jp forl = p = bend] isa
: yes-instance (of PCP. Therefore, / is a yes-instance of PCP if and only at”
4 (Gg, Ga) sa yes--instance of CRU Sonemptyintstser don Sincethee iS |
- unsolvable, so is CFGNonemptyIntersection. _ hr

“Theorem 11.13 / | :
The away ee ae a crais it ambiguous? isunsolvable.
. Progie /

: Again: westart ae ann arbitrary instance I of PCP,at pairs a Bi) an


| <i <n, and construct the CFGs G, and Gz as above. We obtain an _.
- instance G of IsAmbiguous by letting the variables be S,Sa, and Sp§and the .
432 PART 5 Unsolvable Problems and Computable Functions

al pod
productions those of Ga and G7p along withthe two additional

_ The language generate a ert is precisely


‘ ence i, 12, / ao sles

+ Diy te ee a eo :

- ies is in L(Gu)nN1G,
yy,7 two leftmost derivati onsi
in. G, beginnin
. and Ss> Sg, respec oe

must belees deri able from both Sa and Sp. his :


it
in roe SHy), 2andAna as in aePe Pootthatthe

Another somewhat more direct approach to decision problems involving context-


free languages is to develop a set of strings representing Turing machine computations
and to show that they can be described in terms of context-free grammars.
It will be helpful in this discussion to use the same notation for describing TM con-
figurations that we used in the previous section, in which the configuration (q, xay)
is written xgay. A complete computation of a TM can be described by a sequence
of successive configurations, starting with the initial configuration corresponding to
some input string and ending with an accepting configuration. For reasons that will
be apparent shortly, we represent alternate configurations in such a sequence by the
reverse of the string.
CHAPTER 11 Unsolvable Problems 433

Note that the sequences of moves represented by these “valid computations”


cause the TM to accept. The intuitive explanation for reversing every other entry in
a sequence like this is that strings of the form z#z’#, where z and z’ are successive
configurations, look too much like strings of the form ww to be obtainable from a
CFG. The string z#(z’)’# looks more like ww”, which is simply a palindrome.

Lemma 11.2 For a Turing machine 7, the sets

L, = {z#(z')'# | z and z’ represent configurations of T for which z Fr z’}

and

L2 = {z’#2z'# | z and z’ represent configurations of T for which z Fr z’}

are both context-free languages.

Proof We prove the result for L;, and the proof for L2 is similar. We can show that
L, is a CFL by describing how to construct a PDA M accepting it. M will have in
its alphabet both states and tape symbols of T (including A), and these two sets are
assumed not to overlap. A finite automaton is able to check that the input string is of
the form z#z’#, where z and z’ are in the set [* QI"*, and this is part of what M does,
rejecting if the input string is illegal.
For the rest of the proof, we need to show that M can operate so that if the first
portion z of the input is a configuration of 7, then the stack contents when z has been
processed are (z’)’ Zo, where z Fr z’. (This allows M to process the input remaining
after the first # by simply matching input symbols against stack symbols.)
We consider the case in which '

Z = xpay

where p is a state and a is a tape symbol of 7, and the move of T in state p with tape
symbol a is

6(p,a) = q, 5, L)
The other cases can be handled similarly.
In this case, if x = x,c for some string x; and some tape symbol c, then T moves
from the configuration xpay = x,cpay to the configuration x;qcby. If x is null, T
rejects by trying to move its tape head left from square 0.
M can operate by pushing input symbols onto its stack until it sees a state of T;
at this point, the stack contains the string x” Zo. We may specify that M rejects if the
top stack symbol is Zo (that is, if x is null). Otherwise, the stack contains
r

M has now read the state p, and the next input symbol is a. It pops the c, replaces it by
bcq, and continues to read symbols and push them onto the stack until it encounters
then
the first #. This means z has been processed completely. The stack contents are

y’ bcgx; Zo = (xigeby)’Zo
and the string x,;qcby is the configuration we want. a
434 PART 5. Unsolvable Problems and Computable Functions

eat gurations,oo eateeestrings ¢


Inotherwords, Init
qo¢ / T andx isa nonnull st

i whether z Cos orz € la

25 ofl eta)a .
CHAPTER 11 Unsolvable Problems 435

The set Cr of valid computations of T is viewed here as a subset of (T U {A} U


QU {h,})*, where I and Q are the tape alphabet and state set, respectively, of T. For
some of the unsolvability results we are developing, it is helpful also to look at the
complement of this set, which is simpler in one respect than C7 itself.

Lemma 11.3 The set C7, is a context-free language.

Proof Astring x over this alphabet fails to be in Cr if x does not end in #; otherwise,
if

X = Zotz #Z2# - - #2,#

and no z; contains #, x fails to be in Cr if and only if one or more of the following


conditions holds.
1. For some even i, z; does not represent a TM configuration.
. For some odd i, z; does not represent a TM configuration.
. Zo does not represent an initial configuration.
Neither z, nor zj, represents an accepting configuration.
oS
For some even i, z; and z},,represent configurations of T but the condition

6. For some odd i, z) and z;+; are configurations but the condition z} tr z+: fails.

It is easy to see that each condition individually can be tested by a PDA; in some
cases an FA would suffice, and in the others we can use arguments similar to those in
the proof of Lemma 11.2 (for the last two conditions in particular, nondeterminism
can be used to select a particular value of i, and testing that the condition fails for
that i is no harder than testing that it holds). Therefore, C7 is the union of CFLs, and
so itis a CFLitself.

The underlying fact that allows us to apply these results is that the problem of
determining whether there are any valid computations for a given TM is unsolvable.
This is just another way of expressing the unsolvability of AcceptsSomething, which
is a corollary of Rice’s theorem. We list two immediate applications, and there is
further discussion in the exercises.

Second proof of Theorem 11.12. This time we prove that CFGNonemptyInter-


section is unsolvable by showing that AcceptsSomething is reducible to it. Given
a Turing machine 7, an instance of AcceptsSomething, it follows from Theorem
11.14 that there are context-free grammars G; and G2 so that L(G;) L(G2) is
Moreover, you can easily convince yourself
the set Cy of valid computations of T.
(This is
that there is an algorithm to construct the grammars G, and G2 from T.
T is a yes-
necessary, of course, in order to obtain the desired reduction.) Since
if the pair (G1, G2) is a yes-instanc e of
instance of AcceptsSomething if and only
is reducible to the second, and the
CFGNonemptyIntersection, the first problem
second is therefore unsolvable.
436 PART 5 Unsolvable Problems and Computable Functions

EXERCISES
11.1. Show that the relation < on the set of languages (or on the set of decision
problems) is reflexive and transitive. Give an example to show that it is not
symmetric.
11.2. Let P> be the decision problem: Given a natural number 7, is n evenly
divisible by 2? Consider the numerical function f defined by the formula
7 @).= Mn.
a. To what other decision problem P does f reduce P2?
b. Find a numerical function g that reduces P to P>. It should have the
same property that f does; namely, computing the function does not
explicitly require solving the problem that the function is supposed to
reduce.
11.3. Show that if L; and L2 are languages over & and Ly is recursively
enumerable and L; < L, then L is recursively enumerable.
11.4. Show that if L C D* is neither empty nor all of &*, then any recursive
language over & can be reduced to L.
11.5. Fermat’s last theorem, until recently one of the most famous unproved
statements in mathematics, asserts that there are no integer solutions
(x, y, Z, m) to the equation x” + y” = 2” satisfying x, y > Oandn > 2.
Show how a solution to the halting problem would allow you to determine
the truth or falsity of the statement.
11.6. Show that every recursively enumerable language can be reduced to the
language Acc = {e(T)e(w) | T isa TM and T accepts input w}.
11.7. As discussed at the beginning of Section 11.3, there is at least one TM T so
that the decision problem Given w, does T accept w? is unsolvable. Show
that any TM accepting a nonrecursive language has this property.
CHAPTER 11 Unsolvable Problems 437

11.8. Show that for any x € &*, the problem Accepts can be reduced to the
problem: Given a TM 7, does T accept x? (This shows that, just as
Accepts-A is unsolvable, so is Accepts-x, for any x.)
11.9. Construct a reduction from Accepts-A to the problem Accepts-{A}: Given
aT T, is L(T) = {A}?
11.10. a. Given two sets A and B, find two sets C and D, defined in terms of A
and B, so that A = B if and only ifC C D.
b. Show that the problem Equivalent can be reduced to the problem
Subset.
11.11. a. Given two sets A and B, find two sets C and D, defined in terms of A
and B, so that A C B if and only if C = D.
b. Show that the problem Subset can be reduced to the problem
Equivalent.
11.12. For each decision problem given, determine whether it is solvable or
unsolvable, and prove your answer.
a. Given aTM 7, does it ever reach a state other than its initial state if it
starts with a blank tape?
b. Given aTM T and a nonhalting state g of T, does T ever enter state g
when it begins with a blank tape?
c. GivenaTM 7 and anonhalting state q of T, is there an input string x
that would cause T eventually to enter state q? ;
d. GivenaTM 7, does it accept the string A in an even number of moves?
e. Given aTM 7, is there a string it accepts in an even number of moves?
f. GivenaTM 7 and astring w, does T loop forever on input w?
g. Given aTM 7, are there any input strings on which T loops forever?
h. Given a TM T and a string w, does T reject input w?
i. Given aTM 7, are there any input strings rejected by T?
j. Given a TM T, does T halt within ten moves on every string?
k. Given aTM 7, is there a string on which T halts within ten moves?
1. Given TMs 7; and 7), is L(T,) € L(Ty) or L(Tn) € L(N))?

11.13. Let us make the informal assumption that Turing machines and computer
programs written in the C language are equally powerful, in the sense that
anything that can be programmed on one can be programmed on the other.
Give a convincing argument that both these decision problems are
unsolvable:
a. GivenaC program and a statement s in the program and a specific set I
of input data, is s ever executed when the program is run on input J?
b. Given aC program and a statement s in the program, is there a set J of
input data so that s is executed when the program runs on input J?
438 PART 5 Unsolvable Problems and Computable Functions

11.14. Show that the following decision problems involving unrestricted


grammars are unsolvable.
a. Given a grammar G and a string w, does G generate w?
b. Given a grammar G, does it generate any strings?
c. Given a grammar G with terminal alphabet ©, does it generate every
string in L*?
d. Given grammars G and G2, do they generate the same language?
11.15. Show that the decision problem WritesNonblank: Given a Turing machine
T , does it ever write a nonblank symbol on its tape when started with a
blank tape? is solvable, by providing a decision algorithm.
11.16. Here is a “proof” that the decision problem in the previous exercise is
unsolvable.

Given a TM T, construct a TM 7; as follows: T; has the same tape alphabet as


T except that it has one additional symbol #. The states of T; are the same as
those of J. The transitions of 7; are the same, except that for any transition of
T in which a nonblank symbol is written, the corresponding transition of 7;
writes # and halts. Therefore, starting with an empty tape, T writes a nonblank
symbol if and only if 7, writes the symbol #. Since the problem Given 7;, does
it write the symbol # when started with an empty tape? is unsolvable (because
WritesSymbol is), WritesNonBlank is unsolvable.

The conclusion reached here is false; explain precisely what is wrong with
the argument.
11.17. Refer to the correspondence system in Example 11.2, in the case where the
input string is ab. Find the solution sequence.
11.18. In each case below, either give a solution to the correspondence system or
show that none exists.

F
1010
: 101
omer
101
Re
asad,

11.19. Show that the special case of PCP in which the alphabet has only two
symbols is still unsolvable.
11.20. Show that the special case of PCP in which the alphabet has only one
symbol is solvable.
11.21. Show that each of these decision problems for CFGs is unsolvable.
a. Given two CFGs Gy and G2, is L(G;) = L(G2)?
b. Given two CFGs G; and G2, is L(G;) € L(G2)?
Cc. Given a CFG G and a regular language R, is L(G) = R?
CHAPTER 11 Unsolvable Problems 439

11.22. In the second proof of Theorem 11.12, given at the end of Section 11.6,
describe in reasonable detail the steps of the algorithm which, starting with
a TM T, constructs CFGs G, and G2 so that L(G,) N L(Gz2) is the set of
valid computations of T.

MORE CHALLENGING PROBLEMS


11.23. Suppose P; and P are decision problems, and Y(P;) € Xf and
Y(P2) € &}5 are the corresponding languages (that is, the languages of
strings representing yes-instances of P; and P2, respectively, with respect to
some reasonable encoding functions e; and e2). Suppose the function f
defines a reduction from P; to P2; in other words, for any instance J of P),
f (J) is an instance of P, having the same answer. Show that
Y(P;) < Y(P2). Describe a function from LF to XF that gives a reduction.
11.24. Suppose P;, P2, Y(P\), and Y(P2) are as in the previous exercise. Suppose
also that there is at least one no-instance of P). Show that if there is a
function t : &* — X* reducing Y(P;) to Y(P2), then there is another
(computable) function t’ reducing Y(P;) to Y(P2) and having the property
that for every x € D* that corresponds to an instance of P, t’(x)
corresponds to an instance of P).
11.25. Let P;, Po, Y(P;), and Y(P2) be as in Exercise 11.23. Suppose
t : Uf — BD} is a reduction of Y(P;) to Y(P2). According to Exercise
11.24, we may assume that for every string x in Lf representing an instance
of P;, f(x) represents an instance of P2. Show that P; < P>. Describe a
‘function f that gives a reduction. (In other words, for an instance I of P,,
say how to calculate an instance f (J) of P>.)
11.26. This exercise presents an example of a language L so that neither L nor Le
is recursively enumerable. Let Acc and AE be the languages over {0, 1}
defined as follows.

Acc = {e(T)e(w) | T isa TM that accepts the input string w}


AE = {e(T) | T is aTM accepting every string in its input alphabet}

(H and AE are the sets of strings representing yes-instances of the


problems Accepts and AcceptsEverything, respectively.) Acc’ and AE’
denote the complements of these two languages.
a. Show that Acc < AE.
b. Show that Acc’ < AE’.
c. Show that AE’ is not recursively enumerable.
d. Show that Acc’ < AE. (Ifx= e(T)e(z), let f(x) = e(Sr,z), where
Sr, is a TM that works as follows. On input w, S7,, simulates the
this
computation performed by T on input z for up to |w| moves. If
enters an
computation would cause T to accept within |w| moves, S7,,
infinite loop; otherwise S7,, accepts. Show that if f(x) is defined
440 PART 5 Unsolvable Problems and Computable Functions

appropriately for strings x not of the form e(T )e(z), then f defines a
reduction from Acc’ to AE.)
e. Show that AE is not recursively enumerable.
11.27. If AE is the language defined in the previous exercise, show that if L is any
language whose complement is not recursively enumerable, then L < AE.
11.28. Find two unsolvable decision problems, neither of which can be reduced to
the other, and prove it.
11.29. In this problem TMs are assumed to have input alphabet {0, 1}. For a finite
set S C {0, 1}*, Ps denotes the decision problem: Given a TM 7, is
Siu Gh)?
a. Show that if x, y € {0, 1}*, then Pix) < Pry).
b. Show that if x, Wee {0, 1}*, then Pux} < Pry.z}:
c. Show that if x, y, z € {0, 1}*, then Py,y,)< Pry.
d. Show that for any two finite subsets S and U of {0, 1}*, Ps < Pu.
11.30. Repeat the previous problem, but this time letting Ps denote the problem:
Given aTM T, is L(T) = S?
11.31. For each decision problem given, determine whether it is solvable or
unsolvable, and prove your answer.
a. Given aTM 7, does T eventually enter every one of its nonhalting
states if it begins with a blank tape?
b. Given aTM 7, is there an input string that causes T to enter every one
of its nonhalting states?
11.32. Show that the problem CSLIsEmpty: given a linear-bounded automaton,
is the language it accepts empty? is unsolvable. Suggestion: use the fact
that Post’s correspondence problem is unsolvable, by starting with an
arbitrary correspondence system and constructing an LBA that accepts
precisely the strings a representing solutions to the correspondence system.
11.33. This exercise establishes the fact that there is a recursive language over
{a, b} that is not context-sensitive. (Note that the argument outlined below
uses a diagonal argument. At this point, a diagonal argument or something
comparable is the only technique known for constructing languages that are
not context-sensitive.)
a. Describe a way to enumerate explicitly the set of context-sensitive
grammars generating languages over {a, b}. You may make the
assumption that for some set A = {A), A2,...}, every such grammar
has start symbol A, and only variables that are elements of A.
b. If G,, Go, ... is the enumeration in part (a), and x1, x2, ... are the
nonnull elements of {a, b}* listed in canonical order, let
L = {x; |x; ¢ L(G;)}. Show that L is recursive and not
context-sensitive.
11.34. Is the decision problem: Given a CFG G, and a string x, is L(G) = {x}?
solvable or unsolvable? Give reasons for your answer.
CHAPTER 11 Unsolvable Problems 441

11.35. Is the decision problem: Given a CFG G and a regular language R, is


L(G) © R? solvable or unsolvable? Give reasons for your answer.
11.36. Is the decision problem: Given a CFG G, with terminal alphabet ©, is
x* — L(G) finite? solvable or unsolvable? Give reasons for your answer.
11.37. Show that the problem: Given a CFG G with terminal alphabet &, is
L(G) # =*? is unsolvable by directly reducing PCP to it. Suggestion: if
G, and Gg are the CFGs constructed from an instance of PCP as in Section
11.5, show that there is an algorithm to construct a CFG generating
(L(Gq) 1 L(Gg))’.
CH APT ER

Computable Functions

12.1| PRIMITIVE RECURSIVE FUNCTIONS


Not all functions, even those with precise definitions, can be computed by TMs.
The uncomputable functions we have seen so far are the characteristic functions of
nonrecursive languages. In this chapter we will concentrate on numerical functions—
functions of zero or more nonnegative integer variables, whose values are nonnegative
integers—and try to find a way to characterize the ones that can actually be computed.
The focus on numerical functions is not as restrictive as it might sound, because we
will soon develop a way to describe any function from strings to strings by encoding
both arguments and function values as numbers.
Recall that a partial function f from NV to NV is Turing-computable if there is a
Turing machine 7 so that, starting with input 1”, T halts in the state h, with output
1/ if f is defined at n and fails to accept otherwise. We can easily adapt the non-
constructive argument from Section 10.5 to show that there are many uncomputable
functions. The set of Turing machines is countable; the set of partial functions from
N to N is uncountable; and a TM can compute at most one partial function from V
to NV. Therefore, the set of uncomputable functions must be uncountable.
We can also provide explicit examples of uncomputable functions from NV to NV,
just as we provided examples of nonrecursive languages. Our first example, as you
might expect, involves Turing machines and a diagonal-like argument.

| EXAMPLE 12.1 | The Busy Beaver Function


Let us define b : NV — WN as follows. b(0) is 0. For n > 0, b(n) is obtained by considering
TMs having n nonhalting states and tape alphabet {0, 1}. We can assume, by relabeling the
states if necessary, that the set of nonhalting states in all these machines is {qo, 41, ---, Gn—1}5
and for each n there are therefore only a finite number of TMs of this type. We restrict our
attention to those that halt on input 1”, and we let b(n) be the largest number of 1’s that any
of these machines leaves on the tape when it halts. (The number b(n) is therefore a measure
442
CHAPTER 12 Computable Functions 443

of how busy a TM of this type can be before it halts. It has also been suggested that the term
“busy beaver” might refer to the resemblance between 1’s on the tape and twigs arranged by
a beaver.)
Suppose, for the sake of contradiction, that b is computable. Then it is possible to find a
TM 7; having tape alphabet {0, 1} that computes it (Exercise 9.45). Let T = T,T,, where Tj
is a TM also having tape alphabet {0, 1} that moves its tape head to the first square to the right
of its starting position in which there is either a 0 or a blank, writes a 1 there, and halts. Let
m be the number of states of T. By definition of the function b, no TM with m states and tape
alphabet {0, 1} can end up with more than b(m) 1’s on the tape if it halts on input 1”. However,
T is a machine of this type that halts with output 12°)*!. This contradiction shows that b is
not computable.

The function b has been precisely defined but is not computable. A formula like

fm) = (2"** — Gn + 1)"


on the other hand, defines a function that is obviously computable. The difference
between these two functions is not so much that one is defined by words and the
other by a mathematical formula. Consider the function b2, where b(n) is the largest
number of 1’s that can be left on the tape of a TM with tape alphabet {0, 1} and two
nonhalting states, if it starts with input 1” and eventually halts. This is also a definition
in words, superficially almost identical to that of b, and you can convince yourself
that b> is computable. The property b) and f have in common is that the definition
either is, or can be replaced by, a constructive one. The proof in the example above
is subtle but shows that b lacks this property.
Trying to decide whether a definition in words can be replaced by a constructive
one can be difficult. What we can do, however, is to formulate an appropriate notion
of “constructive,” so that the functions definable in this way will be precisely the
computable functions. The definition of the function f in the algebraic formula is
constructive in the sense that f is obtained by applying various arithmetic operations
to simpler elementary functions (the identity function and various constant functions).
We will take the same approach, except that the operations we use will be more general
than arithmetic operations like addition and multiplication.
First we give a precise definition of the elementary functions we are allowed to
start with. Notice that all the initial functions in the definition are total functions. In
this definition and the ones that follow, we adopt the convention of using lowercase
letters for integers and uppercase for vectors, or m-tuples.

Definition 12.1 Initial Functions


44a PART 5 Unsolvable Problems and Computable Functions

Cly=¢ ry
for eveX ¢ N*

In the case k = 0 we may identify the function C* with the number a.


2. The successor function s : N -> N is defined by the formula
s(x)=x+l1

a : Projection functions: For each k > | and eachi with 1 <i < k, the projection
function Pr: N* — N is defined by the formula .
pn ee

Now we are ready to consider ways of combining functions to obtain new ones.
We start with composition, essentially as defined in Chapter 1, and another operation
involving a type of recursive definition.

Definition 12 Compostton

}partial function from Vv, and for each i with |


unction from N™ to partial function obtai 0
2,+-+» 8, by composition is the partial function h from NO
defined y the formula

hO0 =FB, ei eR). el

We have chosen here to restrict ourselves to functions whose values are single
integers, rather than k-tuples; otherwise, we could write h = f o g, where g is the
function from NV” to A defined by g(X) = (g1(X), ..., gx (X)).
Notice that in this definition, in order for h(X) to be defined, it is necessary and
sufficient that each g;(X) be defined and that f be defined at the point (g:(X),...,
gx(X)). If all the functions f, g1,..., gx are total, then h is total.
For a familiar example, let Add: NVx MN — WN be the usual addition function
(Add(x, y) = x + y), and let f and g be partial functions from A to NV’. Then the
function Add(f, g) obtained from Add, f, and g by composition is normally written
ft ge
The simplest way to define a function f from NV to NV recursively is to define
f (©) first, and then for any k > 0 to define f(k + 1) in terms of f(k). A standard
example is the factorial function:
O!=1 (K+ 1)!=(k%+1)
€k!
In the recursive step, the expression for f(k + 1) involves both k and f(k). We can
generalize this by substituting any expression of the form h(k, f(k)), where h is a
function of two variables. In order to use this approach for a function f of more than
one variable, we simply restrict the recursion to the last coordinate. In other words,
we start by saying what f(x, X2,..., Xn, 0) is, for any choice of (x1, ..., X,). This
means specifying a constant when n = 0 and a function of n variables in general.
CHAPTER 12 Computable Functions 445

Then in the recursive step, we say what f (x1, X2,...,%Xn,k + 1) is, in terms of
f (41, ---,%n, k). Let the n-tuple (x1, ..., x,) be denoted by X. In the most general
case, f(X,k + 1) may depend on X and k directly, in addition to f(X,k), just as
(k + 1)! depended on k as well as on k!. Thus a reasonable way to formulate the
recursive step is to say that
f® KAD =h(X,k, F(X, k))
for some function h of n + 2 variables.

Definition 12.3. The Primitive Recursion Operation

In the factorial example, n = 0, g is the number (or the function of zero variables)
Cyl. and A(x) =. +11) 4y. :
Here again, if the functions g and h are total functions, f is total. If either g
or h is not total, the situation is a little more complicated. If g(X) is undefined for
some X € N”, then f (X, 0) is undefined, f(X, 1) = A(X, 0, f (X, 0)) is undefined,
and in general f(X,k) is undefined for each k. For exactly the same reason, if
f (X, k) is undefined for some k, say k = ko, then f(X, k) is undefined for every
k > ko; equivalently, if f (X, k,) is defined, then f (X, k) is defined for every k < kj.
These observations will be useful a little later in showing that a function obtained by
primitive recursion from computable functions is also computable.
At this point we have a class of initial functions, and we have two operations
with which to obtain new functions. Although other operations are necessary in order
to obtain all computable functions, it will be useful to formalize the set of functions
we can obtain with the tools we have developed.
446 PART 5 Unsolvable Problems and Computable Functions

Just as in Chapter 2, we might characterize these functions a little more explicitly


(see the discussion after Example 2.18) by saying that primitive recursive functions are
those having primitive recursive derivations. A function f has a primitive recursive
derivation if there is a finite sequence of functions fo, fi,..., fj so that f; = f and
each function f; in the sequence is an initial function, or can be obtained from earlier
functions in the sequence by composition, or can be obtained from earlier functions
in the sequence by primitive recursion.

| EXAMPLE 12.2 | Addition and Multiplication


Let us show that the functions Add and Mult from N x N to N, defined by the formulas

Add(x,y)=x+y Mult(x,y)=x*y
are both primitive recursive. We start by finding a primitive recursive derivation for Add. Since
Add is not an initial function, and there is no obvious way to obtain it by composition, we try
to obtain it from simpler functions using primitive recursion. If Add is obtained from g and h
by primitive recursion, g and h must be functions of one and three variables, respectively. The
equations are
Add(x,0) = g(x)

Add(x,k +1) = h(x, k, Add(x, k))

Add(x, 0) should be x, and thus we may take g to be the initial function p}. In order to get
x+k-+1 (i.e., Add(x,k + 1)) from the three quantities x, k, and x + k, we can simply take
the successor of x +k. In other words, h(x, k, Add(x, k)) should be s (Add(x, k)). This means
that h(x;, x2, x3) should be s(x3), or s( P3 (X1, X2, X3)). Therefore, a derivation for Add can be
obtained as follows:

Fors tDh (an initial function)


ies (an initial function)
=p. (an initial function)
B= 5(p3) (obtained from f; and f by composition)

fs = Add (obtained from fo and f3 by primitive recursion)

This way of ordering the five functions is not the only correct one. Any ordering in which Add
is last and s and p} both precede s(p3) would work just as well.
To obtain Mult, we try primitive recursion again. We have

Mult(x,0) = 0

Mult(x, k + 1) = xx (k + 1)
CHAPTER 12 Computable Functions 447

= Add(x «k, x)

= Add(x, Mult(x, k))

Remember that we are attempting to write this in the form h(x, k, Mult(x,k)). Since x and
Mult(x,k) are the first and third coordinates of the 3-tuple (x, k, Mult(x,k)), we use the
function f = Add(p}, p3), obtained from Add, p?, and p} by composition. The function Mult
is obtained from 0 (i.e., the initial function Cj) and f using the operation of primitive recursion.
Therefore, Mult is also primitive recursive.

This derivation of Mult, and many arguments involving primitive recursive func-
tions, can be simplified somewhat by using the following general result.

An Application of Theorem 12.1

Let f be the function of two variables defined by


fy yi txt +x!
448 PART 5 Unsolvable Problems and Computable Functions

where we define 0° = 1 in order to make the function total. To show that f is primitive
recursive, we look first at the function g defined by f(x, y) = x”. We can write

Hikes 0) = 1

x")
fix, k+1) = Mult(x,
= Mult(x, fi(x, k))

By considering the formula h(x, y, z) = Mult(x, z) and using part 1 of the theorem, we can
see that f; is primitive recursive. Since y* = f\(y, x), it follows from part 2 that the first term
in the formula for f is primitive recursive. The second and third terms are primitive recursive
functions of x because of parts 4 and 3 of the theorem, respectively, and therefore primitive re-
cursive functions of x and y as aresult of part 1. Finally, since f(x, y) = Add(Add(y*, x*), x’),
we can use the fact that composition preserves primitive recursiveness to conclude that f is
primitive recursive.

| EXAMPLE 12.4 | The Predecessor Function and Proper Subtraction


The subtraction function, with the modification necessary to guarantee that its values are always
nonnegative, is primitive recursive. To show this, we begin with the function Pred (short for
predecessor), defined by

0 he 0
Pred(x) =
x—-1 eae == dl

The formulas

Pred(O) = 0

Pred(k + 1) =k

together with part 1 of Theorem 12.1 show that Pred can be derived from primitive recursive
functions using primitive recursion. If we define Sub by

ay, ifx>y
Sub(x, y) =
0 otherwise

then you can easily check the equations

Sub(x, 0) = x

Sub(x,k + 1) = Pred(Sub(x, k))

from which it follows that Sub is primitive recursive. This operation is often written — and is
referred to as proper subtraction, or the monus operation.

Although we have not actually finished producing examples of primitive recursive


functions, we close this section by proving two results, which together show that
the set of primitive recursive functions is a proper subset of the set of computable
functions.
CHAPTER 12 Computable Functions 449

Theorem 12.2
Every primitive recursive function i
is a computable total function.

i Proof
The way”we have defined the set of primitive 1 recursive functions makes _
| structural induction appropriate for proving things about them. We show
_ te following three statements: Every initial function iis a total computable _
function; any function obtained from total computable functions by« compo-
_ sition is also a total computable function; and any function obtained from
total oe. functions bypene recursion is also abot compalaile
function. - _
We hee previously. observed, in fact, that tou functions are total :
4 functions and that functions obtained from total functions by composition ;
or primitive recursion are total; thus we may concentrate on the conclusions _
involving computability. It is almost obvious that all the initial functions are
computable, and we omit the details. For the sake of simplicity, we show
that if h : NM” — AN is obtained from f : N* — WN and gj and gp, both
functions from NV” to A’, by composition, and if these three functions ae
computable, then h is. The argument is valid even if all the functions are -
partial functions, and it extends in an obvious way to the more general case 7
in which f is a function of k variables. =
Let T;. 7), and 7) be IMs computing f, 21, and go, respectively. We
will construct aTM 7), to compute h. To simplify notation, we denote by x
the m-tuple (x1, x2, ..., Xm) and by. 1* the string 1 A1?A---- AL”,
The TM 7, boots with tape contents Al*, and ‘it must use this input -
twice, once to compute each of the g;’s. It does this by copying the pot fo
prednes the tape
AV Al’
executing 7; to produce A1* A1*'?, and then making another copy of the _
input and eects T, to obtain
ALK AL a 120

_ At this bint itdeletes the original input and executes Ty on the bey
i Ale®, which produces the desired output.
_ Forany choice of X, 7), fails to accept during the execution of 7; if2;(X)
is undefined, and fails to accept during the execution of 7; if both g;(X)’s
are defined but f(¢1(X), g2(X)) is undefined. Therefore, T, computes h.
For the final step of the proof, suppose that g : Ni > Nandh
N+? > Aare computable and that f is obtained from g and h by primitive
recursion. We let T, and 7, be TMs computing g and A, respectively, and
we construct a TM T; to compute /.
originaltape of Ty looks like this:
Thele
AV A1@A- -- AVA!
450 PART 5 Unsolvable Problems and Computable Functions

Dyoo
2 AYKE
LZ
Ae
-

y
Lae
2 SZ
CHAPTER 12 Computable Functions 451

There are functions defined in more conventional ways that can be shown to be
total, computable, and not primitive recursive. One of the most well known is called
Ackermann’s function; its definition involves a sort of recursion, so that it is clearly
computable, but it can be shown to grow more rapidly than any primitive recursive
function. A readable discussion of this function can be found in the text by Hennie
(1977). :

12.2| PRIMITIVE RECURSIVE PREDICATES


AND SOME BOUNDED OPERATIONS
Several of the functions considered in the last section, such as Pred and Sub, have
been defined by cases. Those two functions are simple enough that a direct primitive
recursive derivation is feasible. However, for an arbitrary function f defined by
if P(X) is true
fi(X)

fw={hO if PX) ASaTue

fi (X) if P,(X) is true

it would be more convenient to have a general principle allowing us to draw conclu-


sions about f from properties of the functions f; and the conditions P;(X).
A “condition” P depending on the variable X € N™, so that P(X) is either true
or false, is called a predicate. More precisely, it is an n-place predicate, which is a
is its
partial function from N* to {true, false}. Closely associated with a predicate P
characteristic function x p : N” — {0, 1}, defined by

1 if P(X) is true
x)=
xP(X) 0 otherwise
452 PART 5 Unsolvable Problems and Computable Functions

Since yp is a numerical function, all the properties of functions that we have discussed
in Section 12.1 are applicable to it and, by association, to P. In particular, P is
computable if xp is, and P is primitive recursive if xp is. If the characteristic
function x, of a language L is computable, we can decide whether a given input is
in L. When we say that xp is computable, we are saying something similar: There
is an algorithm to determine whether a given X satisfies P or makes P(X) true.
Predicates take the values true and false, and therefore it makes sense to apply the
logical operators \ (AND), V (OR), and = (NOT) to them. For example, (P; \ P2)(X)
is true if and only if both P;(X) and P(X) are true. Not surprisingly, these operations
preserve the primitive recursive property.

| EXAMPLE 12.5 Relational Predicates


Among the simplest predicates are the relational predicates LT, EQ, GT, LE, GE, and NE. The
expression LT(x, y) is true if x < y and false otherwise, and the definitions of the other five
are similarly suggested by their names. These predicates are all primitive recursive. In order
to show this, we first introduce the function Sg of one variable defined by

Se(0)=0 Se(k+1)=1
This function takes the value 0 if x = 0 and the value 1 otherwise, and its definition makes it
clear that it is primitive recursive. Now we may write

xir (x, y) = Sg(y — x)


since x < y if and only if y — x > 0. This equation shows that x,7 is obtained from primitive
recursive functions by composition and is therefore primitive recursive. The result for the
equality predicate follows from the formula

Xeg(x, y) = 1 — (Sg(x — y) + Sg(y — x))


(Note that if x < y orx > y, then one of the terms x — y and y — x is nonzero, and the
expression in parentheses is nonzero, causing the final result to be 0. If x = y, both terms in
the parenthesized expression are 0, and the final result is 1.)
CHAPTER 12 Computable Functions 453

Although the other four relational predicates can be handled in the same way, it is easier
to use the formulas

LE=LTv EQ
GT = -LE
GE = -LT
NE = -EQ
which together with Theorem 12.4 imply that all these predicates are primitive recursive.

If P is an n-place predicate and fi, fo,..., fa : N* — N, we may form the


k-place predicate Q = P(f\,..., fn), and it is clear that the characteristic function
Xo is obtained from xp and f;,..., f, by composition. Therefore, if P is a primitive
recursive predicate and all the functions f; are primitive recursive, then Q is primitive
recursive. Combining this general fact with Theorem 12.4, we see that arbitrarily
complicated predicates constructed using relational and logical operators, such as

(fi = 3f2)? A fs < fa + fs)) V WP V Q)


are primitive recursive as long as the basic constituents (in this case, the functions
fi, ..-, fs and the predicates P and Q) are primitive recursive.
Now we are in a better position to return to the idea of a function defined by cases
and to establish a sufficient condition for such a function to be primitive recursive.
454 PART 5 Unsolvable Problems and Computable Functions

The Mod and Div Functions


For natural numbers x and y with y > 0, we denote by Div(x, y) and Mod(x, y) the integer
quotient and remainder, respectively, when x is divided by y. For example, Div(8,5) = 1,
Mod(8, 5) = 3, and Mod(12, 4) = 0. As it stands, these are not total functions on V x N,
because we do not allow division by 0; however, it will be useful to extend the definition and to
show that the results are actually primitive recursive. Let us say that for any x, Div(x, 0) = 0
and Mod(x, 0) = x. Then the usual formula

x = y * Div(x, y) + Mod(x, y)

still holds for every x and y, and

0 < Mod(x, y) < y


is true as long as y > 0.
We begin by showing that Mod is primitive recursive. The derivation involves recursion
in the first variable, and for this reason we let

R(x, y) = Mody, x)

According to part 2 of Theorem 12.1, the primitive recursiveness of Mod follows from that of
R. The following formulas can be verified easily.

R(x,0) = Mod(0O, x) = 0
R(x,k + 1) = Mod(k + 1, x)
R@, k)-+ 1 it 4 = OlandeR (eik) etl ec

=10() ifx #Oand R(x,k)+1=x

gap lt it x= 10
For example,

R(5,6 + 1) = Mod(7, 5) = Mod(6, 5) + 1

since 5 4 0 and Mod(6, 5) + 1 =1+1 <5, and

R(5,9+ 1) = Mod(10, 5) = 0

since 5 4 0 and Mod(9, 5) + 1 = 4+ 1 =5. The function h defined by

x34+1 if x; #Oand
x3+1 < x,

h(X1,
x2, %3) = 10 ifx,4Oandx;+1=x,
xy+1 ite)

is not
atotal function, since it is undefined if x, 4 Oandx3+1 > x,. However, the modification

xw3+1 ifx, #Oandx3+1


< x;

hin Xo.) = 40 if x; AOandx,+1>x,

x4, Ll if x, =0

works just as well. The function R is obtained by primitive recursion from Cj and this modified
h, and Theorem 12.5 implies that h is primitive recursive. Therefore, so are R and Mod.
CHAPTER 12 Computable Functions 455

The function Div can now be handled in a similar way. If we define Q(x, y) to be
Div(y, x), then it is not hard to check that Q is obtained by primitive recursion from C} and
the primitive recursive function h, defined by
x3 if x; 4 O and Mod(x2, x1) +1 < x;

hy (x1, X2, %3) = 4x3 4+1 if x; 4 Oand Mod(x2,x;)+1=x;

0 thee, =O

(Note that for any choice of (x;, x2, x3), precisely one of the predicates appearing in this
definition is true.)

The operations that can be applied to predicates to produce new ones include not
only logical operations such as AND, but also universal and existential quantifiers.
For example, if Sq is the 2-place predicate defined by

Sa es Oe =o)
then it is reasonable to apply the existential quantifier (“there exists”) to the second
variable in order to obtain the 1-place predicate PerfectSquare, defined by
PerfectSquare(x) = (there exists y with ye =X)

The predicate Sq is primitive recursive. Does it follow that PerfectSquare is? The
answer is: No, it does not follow, but yes, this predicate is primitive recursive.
Placing a quantifier in front of a primitive recursive predicate does not always produce
a primitive recursive predicate, and placing a quantifier in front of a computable
predicate does not always produce something computable.
We can easily find an example to illustrate the second statement, by considering
an unsolvable problem from Chapter 11 that can be obtained this way. Given an
alphabet ©, we can impose an ordering on it, which makes it possible to consider
the canonical order on ©*. For a natural number x, denote by s, the xth string with
respect to that ordering. Let 7, be the universal Turing machine of Section 9.6, and
let H be the 2-place predicate defined by
H(x, y) = (7, halts after exactly y moves on input s,)

H is clearly computable and is in fact primitive recursive. However, the 1-place


predicate
Halts(x) = (there exists y so that 7, halts after y moves on input s,)

is not computable, because to compute it would mean solving the halting problem.
One difference between these two examples, which is enough to guarantee that
is
PerfectSquare is computable even though Halts is not, is that for a given x there
need to be tested in order to determine whether the
a bound on the values of y that
any y
predicate “there exists y such that y? =x” is true. Since y? > y, for example,
< x. In particular, there is an algorithm to determine
for which y? = x must satisfy y
until a value is
whether PerfectSquare(x) is true: Try values of y in increasing order
> x. The predicate Halts illustrates the fact that
found satisfying either y? =x or y
456 PART 5 Unsolvable Problems and Computable Functions

if the simple trial-and-error algorithm that comes with such a bound is not available,
there may be no algorithm at all.
This discussion suggests that if we start with any (n + 1)-place predicate P, we
may consider the new predicate E p that results from applying the existential quantifier
to the last variable in a restricted way, by specifying a bounded range for this variable.
We can do the same thing with the universal quantifier (“for every”), and in both cases
this bounded quantification preserves the primitive recursive property.

Definition 12.5 Bounded Quantification

In order to simplify the proof of Theorem 12.6, it is useful to introduce two other
“bounded operations.” We start by considering a simple special case in which the
resulting function is a familiar one.
The factorial function is defined recursively in Chapter 2. Here we use the
definition
x

55 Jah i
i=1
where, in general, He ; Pi Stands for the product pj * pj+1*- ++ * Pk ifk > j,and 1
if k < j. (In the second case, we think of it as the empty product; 1 is the appropriate
value, since we want the empty product multiplied by any other product to be that
other product.) We can generalize this definition by allowing the factors to be more
general than i—in particular, to involve other variables—and by allowing sums as
well as products.
Lemma 12.1 Let n > 0, and suppose that g : N"*! — WN is primitive recursive.
Then the functions f;, f2 : V"*! — N defined below are also primitive recursive:
k

fi(X,k) = D> 8% i)
1=0
k

A(X kb =] [8% i
i=0
CHAPTER 12 Computable Functions 457

for any X € N" andk > 0. (The functions f; and f are said to be obtained from g
by bounded sums and bounded products, respectively.)
Proof We give the proof for f,, and the other is almost identical. We may write

fi(X, 0) = 8(X, 0)
fi(X, Kk + 1) = fi(X, k) + (Xk + 1)
Therefore, f; is obtained by primitive recursion from the two primitive recursive
functions g; and h, where g;(X) = g(X, 0) and h(X, y,z) =z+a(X,y+1). @

Note the slight discrepancy between the definition of bounded product, in which
the product starts with i = 0, and the previous definition of x!. It is not difficult to
generalize the theorem slightly so as to allow the sum or product to begin with the
i = ig term, for any fixed ig (Exercise 12.33).

So far, the bounded versions of the operations in this section preserve the primi-
tive recursive property, whereas the unbounded versions do not even preserve com-
by
putability. In order to characterize the computable functions as those obtained
starting with initial functions and applying certain operations, we need at least one
is be-
operation that preserves computability but not primitive recursiveness. This
cause the initial functions are primitive recursive, and not all computable functions
its
are. The operation of minimalization turns out to have this feature. We introduce
bounded version here and examine the general operation in the next section.
the
For an (n + 1)-place predicate P, and a given X e€ N”, we may consider
into a bounded
smallest value of y for which P(X, y) is true. To turn this operation
is less than or
one, we specify a value of k and ask for the smallest value of y that
or not we bound
equal to k and satisfies P(X, y). There may be no such y (whether
bounded version of our
the possible choices by k); therefore, because we want the
for the function in this
function to be total, we introduce an appropriate default value
case.
458 PART 5 Unsolvable Problems and Computable Functions

Definition 12.6 Bounded Minimalization

The nth Prime Number


For n > 0, let PrNo(n) be the nth prime number: PrNo(0) = 2, PrNo(1) = 3, PrNo(2) = 5,
and so on. Let us show that the function PrNo is primitive recursive.
CHAPTER 12 Computable Functions 459

First we observe that the 1-place predicate Prime, defined by

Prime(n) = (n > 2) A =(there exists y such thaty>2Ay<n—1A Mod(n, y) = 0)

is primitive recursive, and Prime(n) is true if and only if n is a prime.


For any k, PrNo(k + 1) is the smallest prime greater than PrNo(k). Therefore, if
we can
just place a bound on the set of integers greater than PrNo(k) that may have to be tested in
order to find a prime, then we can use the bounded minimalization operator to obtain PrNo by
primitive recursion. The number-theoretic fact that makes this possible was proved in Example
2.5: For any positive integer m, there is a prime greater than m and no larger than m! + 1.
With this in mind, let

P(x, y) =(y >x A Prime(y))

Then

PrNo(0) = 2
PrNo(k + 1) = mp(PrNo(k), PrNo(k)! + 1)

We have shown that PrNo can be obtained by primitive recursion from the two functions C2
and h, where

h(x,y) =mp(y, y!+ 1)

Therefore, PrNo is primitive recursive.

12.3 |UNBOUNDED MINIMALIZATION


AND »-RECURSIVE FUNCTIONS
For a predicate P we have defined mp(X, k), or liy[P(X , y)], to be the smallest
y in the range 0 < y < k for which P(X, y) is true, if there is one, and k + 1 if
there is not. The default value was specified in order to make the function m p a total
function. Now we want to remove the constraints on the values of y in the definition
of the function. If there is at least one value of y for which P(X, y) holds, then we
can find the smallest one by examining the values of y in increasing order. If there is
not, we may not be able to determine there is not. Because we want the unbounded
version of the minimalization operation to preserve computability, it follows that we
should not specify a default value for the function—because in the case in which the
function should have this value, we might not be able to determine that it should! For
this reason, the operation will no longer be guaranteed to produce a total function,
even when applied to primitive recursive predicates. Therefore, we do not expect the
operation to preserve the primitive recursive property.
460 PART 5 Unsolvable Problems and Computable Functions

The fact that we want Mp to be acomputable partial function for any computable
predicate P also has another consequence. Suppose, again, that the algorithm we are
relying on for computing Mp(X) is the simple-minded one of evaluating P(X, y) for
increasing values of y. Suppose also that for a particular yo, P(X, yo) is undefined.
Although there might be a value y; > yo for which P(X, y;) is true, we will never
get around to considering P(X, y,) if we get stuck in an infinite loop while trying
to evaluate P(X, yo). We can avoid this problem by stipulating that unbounded
minimalization be applied only to total predicates or total functions.
Unbounded minimalization is the last of the operations we need in order to char-
acterize the computable functions. Notice that in the definition below, this operator
is applied only to predicates defined by some numeric function being zero.

Definition -Recursive Functions


S

Just as in the case of primitive recursive functions, a function is in the set M if


and only if it has a finite, step-by-step derivation, where at each step either a new initial
function is introduced or one of the three operations is applied to initial functions,
CHAPTER 12 Computable Functions 461

to functions obtained earlier in the derivation, or to both. As long as unbounded


minimalization is not used, the function obtained at each step in such a sequence is
primitive recursive. Once unbounded minimalization appears in the sequence, the
functions may cease to be primitive recursive or even total. Note that if f is obtained
by composition or primitive recursion, it is possible for f to be total even when not
all the functions from which it is obtained are total. Thus it is conceivable that in the
derivation of a j1-recursive function, unbounded minimalization could be used more
than once, even if its first use produces a nontotal function. However, in the proof
of Theorem 12.10 we show that any j.-recursive function actually has a derivation in
which unbounded minimalization is used only once.

12.4| GODEL NUMBERING


It is easy to formulate statements in the English language about the English language.
In Section 9.6 we have done something comparable in a formal language: We have
constructed strings of 0’s and 1’s that describe languages of strings of 0’s and 1’s,
in the sense that they specify Turing machines accepting languages. This seemingly
innocuous technique makes possible the diagonal argument, with its characteristic
circularity (Turing machines accepting or not accepting their own encodings), and
the diagonal argument leads to profound results about the limits of computation.
The logician Kurt Godel used a similar idea in the 1930s (around the time of
the initial papers by Turing, Church, Post, et al.), developing an encoding scheme
to assign numbers to statements and formulas in an axiomatic system. As a re-
sult, Gédel was able to describe logical relations between objects in the system by
462 PART 5 Unsolvable Problems and Computable Functions

numerical formulas expressing relations between numbers. His ingenious use of these
techniques allowed him to establish unexpected results concerning logical systems.
Gédel’s incompleteness theorem says, roughly speaking, that any formal system com-
prehensive enough to include the laws of arithmetic must, if it is consistent, contain
true statements that cannot be proved within the system.
Although we will not be discussing Gédel’s results directly, the idea of “Godel
numbering” will be useful. The first step is simply to encode sequences of several
numbers as single numbers. One application will be to show that a more general
type of recursive definition than we have considered so far gives rise to primitive
recursive functions. A little later we will extend our “arithmetization” to objects such
as TMs. This will allow us to represent a sequence of calculations involving numbers
by a sequence of numbers, and it will be the principal ingredient in the proof that all
computable functions are j-recursive.
There are a variety of Gédel-numbering schemes. Most depend on a familiar
fact about numbers: Every positive integer can be factored into primes, and this
factorization is unique except for differences in the order of the factors.

Numbers

The Gédel number of any sequence is greater than or equal to 1, and every integer
greater than or equal to 1 is the Gédel number of a sequence. The function gn is not
one-to-one; for example,

gn(0, 1, 2) = gn(0, 1, 2, 0, 0) = 2°3!5?


However, ifon(xo.Xisee0 5 Xe), = 810; Vin «os View Noreen ee), ten
m m m+k

| [ervow* =| [PrNod)” [] Prvowi)®


i=0 i=0 i=m+1
and because a number can have only one prime factorization, we must have x; = y;
for 0 <i < mand ym4; = --: = Ym4x = 0. Therefore, two sequences having the
same Godel number are identical, except that they may end with a different number
of 0’s. In particular, for any n > 1, every positive integer is the Gédel number of at
most one sequence of n integers.
For any n, the Godel numbering we have defined for sequences of length n
determines a function from VV” to \’. We will be imprecise and use the name gn for
any of these functions. All of them are primitive recursive.
CHAPTER 12 Computable Functions 463

If we start with a positive integer g and wish to decode g to find a sequence xo,
X1,+-++,Xn Whose Gédel number is g, we may proceed by factoring g into primes.
For each i, x; is the number of times PrNo(i) appears as a factor of g. For example,
the number 59895 has the factorization

59895 = 375!113 — 29375179)13


and is therefore the Gédel number of the sequence 0,2,1,0,3 (or any other sequence
obtained from this by adding extra 0’s). The prime number 31 is the Gédel number
of the sequence 0,0,0,0,0,0,0,0,0,0,1, since 31 =PrNo(10). This type of calculation
will be needed often enough that we introduce a function just for this purpose.

The Power to Which a Prime is Raised in the Factorization of x | EXAMPLE 12.8


12.8 |
The function Exponent: N* —> WN is defined by letting Exponent(i, x) be the exponent
of PrNo(i) in the prime factorization of x, if x > 0, and 0 if x = 0. For example,
Exponent(4, 59895) = 3, as we have seen, since the fourth prime, 11, appears three times
as a factor of 59895. (Remember, 2 is the zeroth prime.)
Exponent is primitive recursive. For x > 0 andi > 0, PrNo(i)” divides x evenly if and
only if y < Exponent(i, x). In other words, Exponent(i, x) + 1 is the smallest y for which
(PrNo(i))” does not divide x evenly. This expression (“the smallest y for which . . .”) involves
a minimalization, and since we can easily find a specific k depending on x for which (PrNo(i))*
is guaranteed not to divide x evenly (e.g., k = x), we can make it a bounded minimalization.
For x > 0,
»

Exponent(i, x) = jy [Mod(x, (PrNo(i))”) > 0] — 1


The primitive recursiveness of Exponent now follows.

Many common recursive definitions do not obviously fit the strict pattern required
by the operation of primitive recursion. The standard definition of the Fibonacci
function, for example, involves the formula

fat+l=f@m+fa-—I1)
The right side apparently is not of the form h(n, f(n)) because it also depends on
f(n — 1). Ina more general situation, f(n + 1) might depend on even more, con-
ceivably all, of the terms f(0), f(1),..., f(”). This type of recursion is known as
course-of-values recursion, and it bears the same relation to ordinary primitive recur-
sion that the strong principle of mathematical induction does to the ordinary principle
(see Section 2.3).
One simple way to use Godel numbers is to recast such recursive definitions to fit
the required form. Suppose f is defined recursively, so that f (n +1) depends on some
or all of the numbers f(0),..., f() (and possibly also directly on n). Intuitively,
what we need in order to describe f in terms of primitive recursion is another function
fi for which
1. Knowing f,(n) would allow us to calculate f (7).
2. fi(n +1) depends only onn and f;(n).
464 PART 5 Unsolvable Problems and Computable Functions

If we temporarily relax the requirement that f\(n) be a number, we might con-


sider the entire sequence f\(n) = (f(0), f(1),..., f(m)). Then condition 1 is
satisfied, since f(n) is simply the last term of the sequence f\(”). In addition,
since f(n + 1) can be expressed in terms of n and f(0),..., f(), the entire se-
quence (f(0),..., f(n), f(a + 1)) can be said to depend only on n and the se-
quence (f(0),..., f(n)), so that f; also satisfies condition 2. To make this intuitive
idea work, all we need to do is to use the Gédel numbers of the sequences, instead
of the sequences themselves: Instead of saying that f(n + 1) depends directly on
f(O),..., f(), we say that f (n+1) depends on the single value gn(f (0), ..., f()).
The two versions are intuitively equivalent, since each of the numbers f (i) can be
derived from the single value gn(f(0),..., f(”)).
CHAPTER 12 Computable Functions 465

Now we are ready to apply our Gédel numbering techniques to Turing machines.
A computable function f is computed by a sequence of steps. If we can manage to
represent these steps as operations on numbers, then we will have a way of building
the function f from more rudimentary functions. Because aTM move can be thought
of as a transformation of the machine from one configuration to another, all we need
do to describe the move numerically is represent the TM configuration by a number.
We begin by assigning a number to each state. The halt states h, and h, are
assigned the numbers 0 and 1, respectively. If Q is the set of nonhalting states, then
we let the elements of Q be qz, q3,..., 5, where q2 is always assumed to be the
initial state.
The natural number to use in describing the tape head position is the number of
the tape square the head is scanning. Finally, we assign the number 0 to the blank
symbol A (we will sometimes write 0 instead of A), and we assume that the nonblank
tape symbols are 1, 2,...,t. This allows us to define the tape number of the TM at
any point to be the Gédel number of the sequence of symbols currently on the tape.
Note that because we are identifying A with 0, the tape number is the same no matter
how many trailing blanks we include in the sequence. The tape number of a blank
tape is 1.
Since the configuration of the TM is determined by the state, the tape head
position, and the current contents of the tape, we define the configuration number to
be the number

gn(q, P, tn)

where g is the number of the current state, P is the current head position, and tn is
the current tape number. The most important feature of the configuration number is
that from it we can reconstruct all the details of the configuration; we will be more
explicit about this in the next section.

12.5 |ALL COMPUTABLE FUNCTIONS


ARE ~-RECURSIVE

The main outline of the proof has been provided for us by the Gédel numbering
scheme presented in the last section and the resulting arithmetization of Turing ma-
chines. If f is computed by the Turing machine 7, we will complete the proof by
defining the functions appearing in the formula
f = Resulty o fr 0 InitConfig™
and showing that they are jz-recursive and that the formula holds.
For any n-tuple X, InitConfig(X) will be the number of the initial TM con-
figuration corresponding to input X. This number does not depend on the TM we
466 PART 5 Unsolvable Problems and Computable Functions

use, because we have agreed to label the initial state of any TM as g2. The numeric
function fr corresponds to the processing done by 7. For an input X in the domain
of f, if n is the number representing the initial configuration of T corresponding
to input X, then f7(n) represents the accepting configuration ultimately reached by
T ; for integers n corresponding to other inputs X, f7(n) is undefined. The function
Resultr has the property that if n is the number of an accepting configuration in which
the string representing output f(X) is on the tape, then Resultr(n) = f (X).
The function Resultr is one of several whose value at a number m depends on
whether m is the number of a configuration of T. The first step is therefore to examine
the 1-place predicate IsConfigr defined by
IsConfig;
(n) = (n is a configuration number for T)

Lemma 12.2 IsConfig7 is a primitive recursive predicate.


Proof Let sy be one more than the number of nonhalting states of T (recall that
they are numbered beginning with 2), and let tsr be the number of nonblank tape
symbols of T. A number m is a configuration number for T if and only if
ae a
where q < Sr, p is arbitrary, and tn is the Gédel number of a sequence of natural
numbers, each one between 0 and ts,.
The statement that m is of the general form 273? 5 can be expressed by saying
that m > 1 and for every i > 2, Exponent(i, m) = 0. An equivalent formulation is

(m > 1) A (for every i, i < 2 V Exponent(i, m) = 0)

For numbers m of this form, the conditions on qg and tm are equivalent to the statement

(Exponent(0, m) < sr) A(Exponent(2, m) > 1)A(for every i, Exponent(i,tn) < tsr)

In order to show that the conjunction of these two predicates is primitive recursive,
it is sufficient to show that both of the universal quantifications can be replaced by
bounded universal quantifications. This is true because Exponent(i,n) = O when
i > n; the first occurrence of “for every i” can be replaced by “for every i < m” and
the second by “for every i < tn.” Hf

Lemma 12.3 The function InitConfig™ : N” — N is primitive recursive.


Proof Because the initial state of any TM is designated q2, and the tape head is
initially on square 0, we may write

InitConfig” Ci Hos exe) sen (EO: rae Cites ee oe

where ¢)(x;,...,Xn) is the tape number of the tape containing the input string
Al A1”A.--A1*. It is therefore sufficient to show that the function t™ is primi-
tive recursive. The proof is by mathematical induction on n. The basis step, n = 0, is
clear, since t” is constant in that case. Suppose that k > 0 and that t is primitive
recursive. The number t®*)) (x1, ..., x4, Xx41) = t“4) (X, x441) is the tape number
for the tape containing
AI* A. AI AL
CHAPTER 12 Computable Functions 467

Counting the symbols in the string Al*'A---A1**, we find that the string 1**+
occupies tape squares k + a x; + 1 through k + ae Xj +Xx41. This means that
the additional factors in the tape number resulting from the 1’s in this last block are
those of the form
k
PrNo(k + ) ix tj) (Sj < xen)
i=1
In other words, we may write
Xk+1 k
mn*)(X, xe41) = n® (X) « |]PrNo(k + Vox ti)
j=l i=l
The first factor, viewed as a function of X, is primitive recursive according to the
induction hypothesis. Therefore, by Theorem 12.1, it is still primitive recursive when
viewed as a function of k + 1 variables. The second is of the form
Xk+1

[]sa
j=l
for a primitive recursive function g, and is therefore primitive recursive by Lemma
12.1. The result follows because the set of primitive recursive functions is closed
under multiplication.

As we discussed earlier, we want Resultr (n) to be f (X) ifn represents the accept-
ing configuration with output f(X). This will be the case if, for any n representing
a configuration, we simply define Result; (n) to be the number of the tape square
containing the last nonblank symbol on the tape in this configuration, or 0 if there
are no nonblank symbols. We may also let Resulty (n) be 0 if n does not represent a
configuration.

Lemma 12.4 The function Result; : N — N is primitive recursive.


Proof Because the tape number for the configuration represented by n is
Exponent(2, n) and the prime factors of the tape number correspond to the squares
with nonblank symbols, we may write

HighestPrime(Exponent(2, n)) if IsConfig(n)


Resulty (n) =
otherwise

where for any positive k, HighestPrime(k) is the number of the largest prime factor
of k, and HighestPrime(0) = 0 (e.g., HighestPrime(2°5°197) = 7, because 19 is
PrNo(7)). It is not hard to see that the function HighestPrime is primitive recursive,
and it follows that Result; is also. @

The only remaining piece is f7, the numerical function corresponding to the
processing done by T itself. At this point we make the simplifying assumption that
T never attempts to move its tape head left from square 0. This involves no loss
of generality because any TM is equivalent to one with this property. It will be
helpful next to introduce explicitly the functions that produce the current state, tape
468 PART 5 Unsolvable Problems and Computable Functions

head position, tape number, and tape symbol from the configuration number. The
respective formulas are

State(m) = Exponent(0, m)
Posn(m) = Exponent(1, m)
TapeNum(m) = Exponent(2, m)
Symbol(m) = Exponent(PrNo(Posn(m)), TapeNum(m))

for any m that is a configuration number for T, and 0 otherwise. Because IsConfigr
is a primitive recursive predicate, all four functions are primitive recursive.
The main ingredient in the description of fr is another function

Mover :N => N

Roughly speaking, Mover calculates the effect on the configuration number of a


single move of T. More precisely, if m is the number for a configuration of T in
which T can move, then Mover (m) is the configuration number after the move, and
if m is the number of a halting configuration or any other configuration in which T
cannot move, Mover(m) = 0.

Lemma 12.5 The function Mover : NV— N is primitive recursive.


Proof We may write
gn(NewState(m), NewPosn(m), NewTapeNum(m))
Mover (m) = if m is a configuration number
0 otherwise

The three functions NewState, NewPosn, and NewTapeNum all have the value 0 at
any point m that is not a configuration number. For a configuration number m,
NewState(m) is the resulting state if T can move from configuration m, and State(m)
otherwise; the other two functions are defined similarly. Thus, in order to show that
Mover is primitive recursive, it is sufficient to show that these three New functions
are. In the argument it will help to have one more function, NewSymbol, defined
analogously.
So far, our description of NewState(m) has involved three cases. One case
corresponds to the primitive recursive predicate —/sConfig,. The other two cases may
be divided into subcases, corresponding to the possible combinations of State(m) and
Symbol(m). Because these two functions are primitive recursive, so are the predicates
defining the subcases. In each subcase, the value of NewState(m) is either m or the
value specified by the transition table for T. Therefore, since NewState is defined by
cases that involve only primitive recursive functions and predicates, it must also be
primitive recursive. The argument to show that NewSymbol is primitive recursive is
exactly the same.
The proof for NewPosn is almost the same. This function may also be de-
fined by cases, the same ones involved in the definition of NewState. In each case,
NewPosn(m) is either 0, if m is not a configuration number; Posn(m), if T cannot
move from configuration m, or if the move does not change the position of the tape
CHAPTER 12 Computable Functions 469

head; Posn(m) + 1, if the move shifts the head to the right; or Posn(m)—1, if the
move shifts the head to the left. Therefore NewPosn is primitive recursive.
The definition of NewTapeNum can also be made using the same cases, with
slightly more complicated formulas. Suppose that Posn(m) = i, Symbol(m) = j, and
NewSymbol(m) = j'. The difference between TapeNum(m) and NewTapeNum(m)
is that the first number involves the factor PrNo(i)/ and the second has PrNo(i)’
instead; the exponents differ by j — j’ = NewSymbol(m) — Symbol(m). Thus in this
subcase, NewTapeNum(m) can be expressed as

TapeNum(m) * PrNo(Posn(m)yNewSymbol(m)-Symbol(m)
if NewSymbol(m) > Symbol(m), and

Div(TapeNum(m), PrNo(Posn(m))S¥mbol(m)-NewSymbol(m) )
otherwise. Since both formulas define primitive recursive functions, the function
NewTapeNum is primitive recursive. @

Now that we have described the effect of one move of T on the configuration
number, we can generalize to a sequence of k moves. Consider the function Tracer :
N? — N defined as follows:
m if IsConfig, (m)
Tracer (m, 0) =
z 0 otherwise

Tee Gree A Mover T( (Trace(m,


(m, k k)) if IsConver m )
otherwise

It is clear from Lemma 12.5 that Tracer can be obtained by primitive recursion from
two primitive recursive functions and is therefore primitive recursive itself. Assuming
that m is a configuration number, we may describe Tracey (m, k) as the number of the
configuration after k moves, if T starts in configuration m—or, if T is unable to make
as many as k moves from configuration m, as the number of the last configuration T
reaches starting from configuration m.
We need just one more auxiliary function before we can complete the proof of
Theorem 12.10. Let Acceptingr : N — N be defined by
0 if IsConfig; (m) A Exponent(0, m) = 0
Accepting, (m) =
Poem) 1 otherwise

Acceptingy (m) is 0 if and only if m is the number of an accepting configuration for


T, and Acceptingr is clearly primitive recursive.
470 PART 5 Unsolvable Problems and Computable Functions

_ oe N > N by - ;

fr(m)= Tracer (m, Movestodecepi(n).


e may describe the functions MovesToAccept and-z as follows. If mnis
a configuration number for T and T eventually accep when starting from
confi guration m, then MovesToAccept(m) is the number of 1moves from that — _
point before a accepts, and se (m) is the number of the eee configu:

accepting configuration i, Als


A ), and whee Resulty is applied to this.
mnfiguration number itproduces f(X). On the other hand, suppose that
(X) is undefined. Then T fails to accept input ‘X, and this means that |
InitConfig”” (xyas undefined. The proof is complet o

12.6 |NONNUMERIC FUNCTIONS AND OTHER


APPROACHES TO COMPUTABILITY
The technique of Gédel numbering allows us to extend the definitions of primitive
recursive and j-recursive to functions involving strings, and to obtain the correspond-
ing generalization of Theorems 12.8 and 12.10. The idea is that if f takes the string
x to the string f(x), we can describe f in terms of the related numerical function
pr which takes the Godel number of x to that of f(x). Although we discuss only
functions of one variable in this section, the extension to functions of several variables
is straightforward (Exercise 12.36).

sg) =2".
= (Pr
ie number of A is definedto
CHAPTER 12 Computable Functions 471

The fact that none of the exponents in the formula for gn(x) can be 0 has two
consequences. First, since the factorization of a positive integer into primes is unique,
the function gn: &* > WN is one-to-one. Second, numbers such as 3 = 293! or
10 = 21295! cannot be the Gédel number of any string. Note also that for any
x € 2, the highest power to which any prime can appear in the factorization of
gn(x) is the number of symbols in D.
Because gn: &* — WN is not a bijection, it is not correct to speak of gn~!. It is
convenient, however, to define a function from NV to =* that is a left inverse of gn,
as follows:
On a x ifn = gn(x)
A if n is not gn(x) for any string x
The default value A is chosen arbitrarily. Saying that gn’ is a left inverse of gn means
that for any x € X*,

gn'(gn(x)) = x
Now suppose that f : Lf — 2%} is a partial function, where © and X> are
alphabets. We define the corresponding numerical function pi N > N by saying
that if n is the Godel number of x, then pr (1) is the Gédel number of f (x). Note that
the one-to-one property of gn is necessary for this definition to make sense. We can
also express py concisely in terms of the left inverses of the two Gédel-numbering
functions. If gj : U{ — N and g> : L} — WN are the functions that assign Gédel
numbers, and gi and g% are their respective left inverses, then the formula for p is

ps (n) = g0(f (gi (n)))


To understand this formula better, trace the right side of the formula using Figure
12.1. The formula says that beginning at the lower left of the diagram, + is computed
by following the arrows up, over, and down.
Since g' is a left inverse for g;, the formula

Pf (81(X)) = g2(
fF(84 (81(x))))
= g2(f(x))
holds for any string x in X}. This formula says that in the figure, both ways of getting
from the upper left to the lower right produce the same result, and this is just another

xy X53
af *

A A

gi || & 82 || 85
Pf
N N
Figure 12.1 |
472 PART 5 Unsolvable Problems and Computable Functions

way of saying that the numerical function pf mirrors the action of the string function
f: If f takes the string x to the string y, py takes the number gn(x) to the number
enty).

=- fegiin)
@ oO Figure this formula can be interpreted as saying that
t
h paths from the lower left to the upper right produce the ssi
Heong hisformula when n = g)(x), we obtain :

@)= Feiler (gi)


: “we can now construct composite TM. My
- Paani and itfollows that f is computable.

The theory of computability and recursive functions is a large and well-developed


subject, and this chapter is no more than a brief introduction. We close by mentioning
briefly a few other approaches.
CHAPTER 12 Computable Functions 473

Just as unrestricted grammars provide a way of generating languages, they can


be used to characterize computable functions. If G = (V, x, S, P)isa grammar, and
f is a partial function from X* to D*, G is said to compute f if there are variables
A, B, C, and D in the set V so that for any x and y in D*,

f(x)=y ifandonlyif AxB=>% CyD


It can be shown, using arguments not unlike those in Theorems 10.8 and 10.9, that the
functions computable in this way are precisely those that can be computed by Turing
machines.
Computer programs written in high-level programming languages can be viewed
as computing functions from strings to strings. It is natural to consider the set of
functions that can be computed by such programs, say those written in C. If we
remove the physical limitations imposed by any particular implementation, so that
there is an unlimited amount of memory, no limit to the size of integers, and so on,
then “C-computable” and Turing-computable are the same.
Although high-level languages such as C have many features that facilitate writ-
ing programs, these features make no difference as far as which functions can be
computed. We might consider a drastically pared-down programming language,
which has variables whose values are natural numbers; statements of the form
xX — X41

and

red
ib ean
which cause variables to be incremented and decremented; “conditional go-to” state-
ments of the form
ifX AO goto L
where L is an integer label in the program; statements of the form
read(X)
write(X)

and nothing else. Even with a language such as this, it is possible to compute all
Turing-computable functions. One approach to proving this would be to write a
program in this language to simulate an arbitrary TM. Doing so would involve some
sort of arithmetization of TMs similar to Gédel numbering: One integer variable
would represent the state, another the head position, a third the tape contents, and so
on. Another approach would be via Theorem 12.10: to show that the set of functions
computable using this language contains the initial functions and is closed under all
the operations permitted for jz-recursive functions.
Finally, this language, or even a less restricted programming language, can com-
pute only Turing-computable functions. This might be shown directly, by simulating
on a TM each feature of the language, and in this way building a TM to execute a
program in the language. It can also be shown with the help of Theorem 12.8. Just as
aTM configuration can be described by specifying a state, a tape head position, and a
474 PART 5 Unsolvable Problems and Computable Functions

string, a program configuration can be described by specifying a statement (the next


statement to be executed) and the current values of all variables. These parameters
can be assigned Gédel numbers, configuration numbers can be defined, and each
step in the execution of the program can be viewed as a transformation of the Gédel
number, just as in the proof of Theorem 12.10. As a result, any function computed
by such a program can be shown to be jz-recursive.
Other formalisms have been introduced to describe computable functions, and
other abstract machines can be shown to be equivalent to Turing machines in comput-
ing power. Up to this point, every attempt to formulate precisely the idea of “effective
computability” has produced the same set of functions, those that can be computed
by TMs. One reasonable conclusion to be drawn is that we are justified in treating
“effectively computable” as synonymous with “Turing-computable.”

EXERCISES
12.1. Let f : NM — N be the function defined as follows: f(n) is the maximum
number of moves an n-state TM with tape alphabet {0, 1} can make if it
starts with input 1” and eventually halts. Show that f is not computable.
12.2. Define f : VN> N by letting f (1) be the maximum number of 1’s that an
n-state TM with no more than n tape symbols can leave on the tape,
assuming that it starts with input 1” and always halts. Show that f is not
computable.
12.3. Show that the uncomputability of the busy-beaver function (Example 12.1)
implies the unsolvability of the halting problem.
12.4. Suppose we define bb(n) to be the maximum number of 1’s that can be
printed by an n-state Turing machine with tape alphabet {0, 1}, assuming it
starts with a blank tape and eventually halts. Show that bb is not
computable.
12.5. Show that if f : VM— WN isa total function, then f is computable if and
only if the decision problem: Given natural numbers n and C, is
f(n) > C? is solvable.
12.6. Suppose that instead of including all constant functions in the set of initial
functions, Ge were the only constant function included. Describe what the
set PR obtained by Definition 12.4 would be.
12.7. Suppose that in Definition 12.4, the operation of composition is allowed but
that of primitive recursion is not. What functions are obtained?
12.8. If g(x) =x and h(x, y, z) = z+ 2, what function is obtained from g and h
by primitive recursion?
12.9. Here is a primitive recursive derivation. fo = C?; fi = C$; fo is obtained
from fy and f; by primitive recursion; f; = p3; f4 is obtained from f> and
f3 by composition; fs = ce ; fo is obtained from fs and f,4 by primitive
recursion; f7 = p\; fg = p3; fo =; fio is obtained from fo and fg by
composition; f;; is obtained from f7 and fio by primitive recursion;
fiz = pis fiz is obtained from fg and fi2 by composition; f\4 is obtained
CHAPTER 12 Computable Functions 475

from fi, fi2, and f3 by composition; and f}5 is obtained from fs and
fig
by primitive recursion. Give simple formulas for fo, fo, fia, and fis.
12.10. Find two functions g and h so that the function f defined by f(x) = x?
is
obtained from g and h by primitive recursion.
12.11. Give complete primitive recursive derivations for each of the following
functions.
f :.N? > N defined by f(x, y) = 2x + 3y
aS f :N definedby f(n) =n!
—> N
c. f:N— N definedby f(n) = 2"
d. f :N — WN defined by f(n) =n? — 1
e. f:N* > N defined by f(x, y) = |x — y|
12.12. Show that for any n > 1, the functions Add, and Mult, from N” to N :
defined by
AdG AX IGS+ O05) =Xp+X2+---+X,

Mult,
(X41, .:., Xp) = X1 * XQ * ++ KX,
respectively, are both primitive recursive.
12.13. Show that if f : NV > N is primitive recursive, A C N is a finite set, and
g is a total function agreeing with f at every point not in A, then gis
primitive recursive.
12.14. Show that if f : NM— N is an eventually periodic total function, then f is
primitive recursive. Eventually periodic means that for some no and some
D> Onf (+p) =f (x). forevery x:= no:
12.15. Show that each of the following functions is primitive recursive.
a. f :N* — N defined by f(x, y) = max{x, y}
b. f :N* > N
defined by f(x, y) = min{x, y}
ce. f:N— N defined by f(x) = |./x] (the largest natural number less
than or equal to ./x)
d. f :N— WN defined by f(x) = [log,(x + 1)]
12.16. Suppose P is a primitive recursive (k + 1)-place predicate, and f and g are
primitive recursive functions of one variable. Show that the predicates
AggP and Ey,P defined by
A ¢gP(X,k) = (for every i with f(k) <i < g(k), P(X,i))
E¢gP(X,k) = (there exists iwith f(k) <i < g(k) so that P(X, i))
are both primitive recursive.
12.17. Show that if g : V* > N is primitive recursive, then f : VM — N defined
by
x

[Qa >) eae)


i=0

is. primitive recursive.


476 PART 5 Unsolvable Problems and Computable Functions

12.18. Show that the function HighestPrime, introduced in the proof of Lemma
12.4, is primitive recursive.
12.19. In addition to the bounded minimalization of a predicate, we might define
the bounded maximalization of a predicate P to be the function m? defined
by
. a <k| P(X, y) is true} if this set is not empty
Mm (XK) = ;
0 otherwise

a. Show m? is primitive recursive by finding two primitive recursive


functions from which it can be obtained by primitive recursion.
b. Show m? is primitive recursive by using bounded minimalization.
12.20. Give an example to show that the unbounded universal quantification of a
computable predicate need not be computable.
12.21. Show that the unbounded minimalization of any predicate can be written in
the form py[ f (X, y) = 0], for some function f.
12.22. True or false: if unbounded minimalization applied to a primitive recursive
predicate yields a total function, the function is primitive recursive.
Explain your answer.
12.23. The set of z-recursive functions was defined to be the smallest set that
contains the initial functions and is closed under the operations of
composition, primitive recursion, and unbounded minimalization (applied
to total functions). In the definition, no explicit mention is made of the
bounded operators (universal and existential quantification, bounded
minimalization). Do bounded quantifications applied to jz-recursive
predicates always produce j-recursive predicates? Does bounded
minimalization applied to jz-recursive predicates or functions always
produce j1-recursive functions? Explain.
12.24. Is the problem: Given a Turing machine T computing some partial function
f, is f atotal function? solvable? Explain.
12.25. Consider the function f defined recursively as follows:
ORD; o forx= 0, fe)=14 fw)
Show that f is primitive recursive.
12.26. Suppose that f : NV — WN is a w-recursive total function that is a bijection
from N to NV. Show that its inverse f—! is also jz-recursive.
12.27. Let & be an alphabet. Show that the 1-place predicate [sgn defined by
Isgn(x) = (x = gn(s) for some string s € &*)
is primitive recursive.
12.28. a. Give reasonable definitions of primitive recursive and recursive for a
function f : ©* — N, where © is an alphabet.
b. Using your definition, show that f : {a, b}* + N defined by
f (x) = |x| is primitive recursive.
CHAPTER 12 Computable Functions 477

MORE CHALLENGING PROBLEMS


12.29. Let b: N — N be the busy-beaver function discussed in Example 12.1.
Show that f is eventually larger than any computable function; in other
words, for any computable total function g : MN > N, there is an integer k
so that f(n) > g(n) for every n > k.
12.30. In the discussion after Example 12.1 we defined b(n) to be the largest
number of 1’s that can be left on the tape of a 2-state TM with tape alphabet
{0, 1}, if it starts with input 1” and eventually halts.
a. Give a convincing argument that b is computable.
b. Is the function b; (identical to by except that “2-state” is replaced by
“k-state”) computable for every k > 2? Why or why not?
12.31. a. Show that the function f : V* — N defined by f(x, y) = (the
number of integer divisors of x less than or equal to y) is primitive
recursive. Use this to show that the 1-place predicate Prime (see
Example 12.7) is primitive recursive.
b. Show that the function f : V3 > N defined by f(x, y, z) = (the
number of integers less than or equal to z that are divisors of both x and
y) is primitive recursive. Use this to show that the 2-place predicate P
defined by P(x, y) = (x and y are relatively prime) is primitive
recursive.
12.32. Show that both these functions from NV to NV are primitive recursive.
a. f(n) =the leftmost digit in the decimal representation of 2*
_b. f(a) = the nth digit of the infinite decimal expansion of
J/2 = 1.414213... (i,e,,, f O)y=1, f GQ) = 4,.and so on)
12.33. Show that if g : "+! — N is primitive recursive, and1,m:N —> N are
both primitive recursive, then the functions f, and fy from N+! to N
defined by
m(k) m(k)
POOk) [leone (poy =D Neean
i=l(k) i=l(k)
are primitive recursive.
12.34. Suppose we copy the proof of Theorem 12.3, but using recursive
derivations instead, as follows. Consider the strings in ©* in canonical
order; for each /, find the ith string representing a “jz-recursive derivation”
of a function of one variable, and let the function be called f;; define f by
the formula f(i) = f,(i) + 1. Then (apparently) we have exhibited an
algorithm for computing f, but f cannot be jz-recursive. Explain the
seeming contradiction.
12.35. In each case below, show that the function from {a, b}* to {a, b}* is
primitive recursive.
a. 7 is defined by /(~) = xa
478 PART 5 Unsolvable Problems and Computable Functions

b. f is defined by f(x) = ax
f is defined by f(x) = x’
12.36. a. Give definitions of primitive recursive and recursive for a function
f : (2*)" > &X* of n string variables.
aa
b. Using your definition, show that the concatenation function from (2
to X* is primitive recursive.
Introduction to
Computational Complexity

& o far in our discussion of decision problems, we have considered only the quali-
tative question of whether the problem is solvable. In real life, the computational
resources available for solving problems are limited: There is only so much time and
space available. As a result, there are problems solvable in principle for which even
medium-sized instances are too hard in practice. In Part VI, we consider the idea of
trying to identify these intractable problems, by describing in some way the amounts
of computer time and memory needed in order to answer instances of a certain size.
In Chapter 13, we first introduce notation and terminology involving growth
rates that allow us to discuss in a meaningful way questions such as “How much
time?” In the rest of the chapter, we relate these quantitative issues to our specific
model of computation and then discuss some of the basic complexity classes: ways of
categorizing decision problems and languages according to their inherent complexity.
In trying to distinguish between tractable and intractable problems, a criterion
commonly used is polynomial-time solvability on a Turing machine. Many prob-
lems are tractable according to this criterion, and some can be shown not to be;
we can adapt the reduction technique of Chapter 11 in order to obtain examples of
both types. Perhaps even more interesting are problems whose status with respect
to this criterion is still up in the air—problems for which no one has found either a
polynomial-time solution or a proof that none exists. The notion of NP-completeness
is a way of approaching this topic. In the last two sections, we give Cook’s proof of the
NP-completeness of the satisfiability problem and several examples of combinatorial
problems that can be shown to be NP-complete by using polynomial-time reductions.
|

479
“ wre

of noifoubotnl.
Vixsiqmesd isnoifstiuqmnod
ca SS =
% Se
~ \

er —— a. an
tines ait vi tery Lpstede 4713 ert we cee (Py SVE rab hi pantera 16st wah)

= liceie Maruti crt ilies) al alder loe ol upset ati eeltorives0 fone civty
a bis stolt digit ap vleo @ Sra cbatvetxs ersldoty aalvlog a) higlicvs A2tUOUa:
io . wave diaitw wit Slatin nf ciety EPO) SIs ery pil) ayer eh sitintioy SPR .
in eon +
of Bei wi Mw AS net a Soro (ria! mor vic esunetedh bas-ind im
_"
te stittomns sft!Yew apne pi atively ye est rit aaty hes ayer Sead (tiasbiors ms
‘ely cinhen 2 terval rowel? tole er telaen) pion basin Tere} ‘
. Aworm witiond: soi nig Way ptnin ho niet «eT sal aa vera ab
*' ufre wiBP” 2h ftuate OHReUp eer Tanymiiiteey a th Seuceil ai wr wisitay se
Sania “esr! G2 2kareed widdiet] snantee atwld vide ow eiialy weP Medaee heat. “
aw ai Ay hashes crud att sine eourypath mod? bat ie si ecgrmoa Te
oe aati iperacne aioe) ay a iebrcroipe Meal ine em planting vide ah amine
> cies es some ig-iteewsitit wm Alfa noe ed sng St qnign of
a Smee oe, orailiasn warn) & ne cule qnt!-lenneny log at boat Lor on
a4 6? 168 Bviiolte OF iii) Hee bt rari am ott uP wal baboon sfdhret she
Cf

te ssicnirs midiit tee nt Tl sanuett 2 tp Gophulay Wetoial oat tqBbe o|


| pare ibe erte seater cnaktst ate Qeeediol piouy meee exports» 28
~ ie etn ten saub ne o@ Houtw. ws) site bieup~aia oi 1 ge he aii
: ee Disieeis\ pics rl Teel eth ee we a] a paps ian
say ear) Bar 2 ek bw, row ivan (ot ff i ROT ea
7,
ie = pTUM evar ay> ki SSNS 7. INIG iUeghyenery eiitieiniifine baie a

ee anit ining slew qireen!. 7 Opole ay oi an ately ae

:
Parent

aes
7

|
C HAPTER

Measuring and Classifying


Complexity

13.1| GROWTH RATES OF FUNCTIONS


The complexity of computational problems can be discussed by choosing a specific
abstract machine as a model of computation and considering how much time and/or
space machines of that type require for the solutions. In order to compare two prob-
lems it is necessary to look at instances of different sizes. Using the criterion of
runtime, for example, the most common approach is to compare the growth rates of
the two runtimes, each viewed as a function of the instance size. Before we introduce
definitions and notation for discussing growth rates of quantities such as runtime, we
consider an example in which four simple functions are contrasted.

The Growth Rates of Polynomial and Exponential Functions | EXAMPLE 13.1,


We consider the functions p;(n) = 2n?, p(n) = n? + 3n +7, p3(n) = n3, and q(n) = 2”.
Table 13.1 shows the values of these four functions for some selected values of n, and these few
values are enough to make certain trends apparent. For small values of n, p(n) is significantly
larger than p;(n). By the time we reach n = 1000, however, the lower order terms 3n + 7 in
P2 are relatively negligible, and the extra factor of 2 in the leading term of p, accounts almost
entirely for the difference between the two. The polynomial p3 is of higher degree, and even
without the extra terms this value grows more rapidly than either of the first two.
A more striking trend is the growth of the exponential function g. Once n is larger than
about 10, there is no contest: If the function values represent nanoseconds of runtime (one
nanosecond is one billionth of a second), p3(1000) is one second, and g(1000) is more than
3 « 107°? centuries.
The table illustrates the effect of different growth rates. The two functions p; and p2
have different sizes, primarily because of the factor of 2, but the same (quadratic) growth rate.
The growth rate of the cubic polynomial is larger, and all three are eventually dwarfed by the

481
482 PART 6 Introduction to Computational Complexity

Table 13.1 |Selected values of polynomial and exponential functions

21 8
125 32
1000 1024
8000 1048576
5000 2567 125000 1.13 « 10%
20000 10307 1000000 1.27 « 10°°
2000000 1003007 1000000000 1.07 « 10°!

exponential function. Once we say how to talk precisely about growth rates, one of the most
useful distinctions will be that between polynomial growth rates and exponential growth rates.
Many of the features evident in this example will persist in general.
ee EE ee
ee

The simplest situation in which two functions f and g will be said to have
the same growth rate is when f is exactly proportional to g, or f = Cg for some
constant C. (The size of C is irrelevant, as long as it is independent of n.) Because
it is unusual for two runtimes to be exactly proportional, we generalize by allowing
one function to be approximately proportional to the other, which means that f < Cg
for some constant C and f > Dg for some other (positive) constant D. Again, the
sizes of C and D are not relevant. The first part of Definition 13.1 involves a single
inequality, so that we can talk about one growth rate being no greater than another; in
order to consider functions with equal growth rates we can simply use the statement
twice, the second time with the two functions reversed. The other way in which
the definition generalizes the simplest case is that it allows the inequality to fail at a
finite set of values of n, to take care of the case when functions are undefined or have
unrepresentative values at a few points.

on 13.1 Notation for Comparing Growth Rates


CHAPTER 13 Measuring and Classifying Complexity 483

The statements f = O(g), f = O(g), and f = o(g) are read “f is big-oh ofg,”
“f is big-theta of g,” and “f is little-oh of g,” respectively. All these statements can
be rephrased in terms of the ratio f (n)/g(n), provided that g(n) is eventually greater
than 0. Saying that f = o(g) means that the limit of this ratio as n approaches infinity
is 0; the statement f = O(g) means only that the ratio is bounded. If f = O(g), and
both functions are eventually nonzero, then both the ratios f/g and g/f are bounded,
which is the same as saying that the ratio f/g must stay between two fixed positive
values (or is “approximately constant’). If the statement f = O(g) fails, we write
f # O(g), and similarly for the other two. Saying that f 4 O(g) means that it is
impossible to find a constant C so that f(n) < Cg(n) for all sufficiently large n; in
other words, the ratio f(n)/g(n) is unbounded. This means that although the ratio
f (n)/g(n) may not be large for all large values of n, it is large for infinitely many
values of n.
A statement like f = O(g) describes a relationship between two functions. It
is not an equation, and it makes no sense, for example, to write O(f) = g. The
notation is fairly well-established, although a variation that is a little more precise is
to define a set O(g) and to write f € O(g) instead of f = O(g).
The statement f = O(g) conveys no information about the values of f(n) and
g(n) for any particular n. It says that in the long run (that is, for sufficiently large
values of n), f(n) is no larger than a function proportional to g. The constant of
proportionality may be very large, so that the actual value of f(n) may be much
larger than g(n). Nevertheless, we say in this case that the growth rate of f is no
larger than that of g. The terminology is most appropriate when the two functions f
and g are both nondecreasing, or at least eventually nondecreasing, and most of the
functions we are interested in will have this property. If f = O(g), we say f and
g have the same growth rate; as we have seen, this is a way of saying that the two
values are approximately proportional, or that the ratio is approximately constant, for
large values of n. If f = o(g), it is appropriate to say that the growth rate of f is
smaller than that of g, because according to the definition, no positive constant is small
enough to remain a constant of proportionality as n gets large. In this case, although
f (n) will eventually be smaller than g(n), in fact much smaller, the statement says
nothing about how large n must be before this happens.
We have introduced the idea of two functions having the same growth rate, or of
one having a smaller growth rate than another. This terminology is not misleading,
in the sense that these two relations (on the set of partial functions from VV to V
defined for all sufficiently large n) satisfy at least most of the crucial properties we
associate with the corresponding relations on numbers. It is clear from the definitions
that if f = o(g), then f = O(g), so that “smaller than” implies “no larger than” in
484 PART 6 Introduction to Computational Complexity

reference to growth rates. We would also expect that if f = o(g), then g £ O(f))
(if the growth rate of f is smaller than that of g, then it cannot be true that the growth
rate of g is no larger than that of f'), and this is easy to check. Theorem 13.1 contains
several other straightforward properties of these relations.

A polynomial function is either identically 0 or of the form

p(n) = ayn* aR apn’ +-+--+tajn+do

where a, # 0. In the latter case, k is the degree of the polynomial. It is easy to check
that if the leading coefficient a, is positive, then p(n) > 0 for all sufficiently large n.
Usually we will be interested only in polynomials having this property, and we may
CHAPTER 13 Measuring and Classifying Complexity 485

simply regard p(n) as undefined if it is negative. An exponential function is one of


the form

gin) =a"
for some fixed number a > 1. If the coefficients a; of a polynomial or the base a of an
exponential function are not integers, we may obtain an integer function by ignoring
the fractional part; in both cases, the growth rate is not affected.
On the basis of Example 13.1, we would probably conjecture that any quadratic
polynomial has a smaller growth rate than any cubic, and that either one has a smaller
growth rate than an exponential function. The next theorem generalizes both these
statements.
486 PART 6 Introduction to Computational Complexity

A tabulation such as the one in Example 13.1 would obviously look different
for polynomials of different degrees and for exponential functions with a different
base. At n = 1000, for example, (1.01)” is still only about 20959. However, the
“exponential growth” is still there, as the second part of the theorem confirms; it just
takes a little longer for its effects to become obvious. When n = 10000, (1.01)” is
greater than 10**, whereas (10000)? = 107, or one trillion.

13.2| TIME AND SPACE COMPLEXITY


OF A TURING MACHINE
The model of computation we have chosen is the Turing machine. When a TM answers
a specific instance of a decision problem, we can measure the time (the number of
moves) and the space (the number of tape squares) required by the computation.
The most obvious measure of the size of an instance is the length of the input string
that encodes it, and the most common approach is to consider the worst case: the
maximum time or space that might be required by any input string of that length.
With this in mind, we can now define the time and space complexity of an ordinary,
deterministic TM.
CHAPTER 13 Measuring and Classifying Complexity 487

Definition 13.2 The Time and Space Complexity of a Turing


Machine

The Time Complexity of a Simple TM | EXAMPLE 13.2 |


We consider the Turing machine T of Example 9.3, shown in Figure 13.1, which accepts the
language {ss | s € {a, b}*}. We derive the formula for t;(n) for an even integer n and leave
the other case as an exercise.
An input string of length 2k is processed in three phases: First, find the middle, changing
all the symbols to uppercase along the way; second, change the first half back to lower case,
while moving the tape head to the beginning; third, compare the two halves. In the first phase,
one move positions the tape head at the first input symbol, 2k + 1 moves are required to change
the first symbol and find the rightmost symbol, and 2k more moves are needed to change the
symbol and return the tape head to the leftmost lowercase symbol. Each subsequent pass deals
with two fewer lowercase symbols and therefore requires four fewer moves. The total number
of moves in the first phase is
k
14+ 4k+1)+ 4k -— I) +1) +--+ 40) +1) =14+4) i+ 41) = 2K? + 3k 42
i=0
(The result in Example 2.7 is used here.) Phase two requires k + 1 moves. For even-length
inputs, the total time for the first two phases depends only on the input length. In the third
phase, the input strings requiring the most moves are those in the language. This phase consists
of k passes, one for each symbol in the first half. In each pass, there are k moves to the right,
k to the left, and one more to the right, for a total of 2k + 1 moves. The third phase therefore
contains at most 2k? + k moves. Adding these three numbers and including the final move to
the accepting state, we obtain

4k? + 5k+4=n?4+5n/2+4

moves. Because odd-length input strings require fewer moves, we may conclude that

tr(n) = O(n’)
488 PART 6 Introduction to Computational Complexity

A/A, L
a/a, L
b/b, L

Figure 13.1 |
A Turning machine to accept {ss |s € {a, b}*}.

We might ask whether there is a TM accepting this language with a significantly smaller time
complexity. A complete answer to this question is a little complicated. It looks at first as though
the number of actual comparisons a TM must make in order to recognize a string in the language
should be simply proportional to the string’s length. The quadratic behavior of the machine in
Figure 13.1 is the result of the repeated back-and-forth motions of the tape head, which seem
to be necessary in phases one and three. There are ways to reduce the number of these motions
without changing the overall approach. For example, in the first phase we could make the TM
convert two symbols at the left end to uppercase, then two at the right, and so forth, and in the
final matching phase we could ask the TM to remember two symbols from the first half before
it sends its tape head to the second half to try to match them. These improvements ought to cut
the back-and-forth motion almost in half, at the expense of an increase in the number of states.
Furthermore, if going from one to two is good, going from one to a number bigger than two
CHAPTER 13 Measuring and Classifying Complexity 489

ought to be even better. (If there were some way to get the TM to remember the entire first
half of the input in the third phase, we could eliminate the extraneous motions in that phase
altogether. There is no way to do this, unfortunately, because the number of states is finite.)
Any Turing machine that looks at all n input symbols must make at least n moves. For
this language, any TM following an approach similar to that in Figure 13.1 will probably need
at least twice that many. Our tentative conclusion from the preceding paragraph is that we may
be able to reduce the runtime beyond that bare minimum by an arbitrarily large factor. This
conclusion turns out to be correct, not only for this example but in general. By increasing the
number of states and/or tape symbols, one can produce a comparable “linear speed-up” of any
Turing machine, and there are similar results for space complexity. We may conclude that the
number of moves a TM makes or the number of tape squares it uses on a particular input string
is not by itself very meaningful; a more significant indicator is the growth rate of the function.
Reducing the runtime by a large constant factor still leaves us with a quadratic growth rate,
and it can be shown that the time complexity of any one-tape TM accepting this language is
quadratic or higher.
Another way to reduce the runtime would be to increase the number of tapes. In this
example, using a two-tape TM makes it possible to recognize the language in linear runtime
by avoiding back-and-forth motions altogether (Exercise 13.22). In more general examples,
this is too much to expect, and the growth rate of the runtime can be reduced in this way only
to about the square root of the original (Exercise 13.25).

As Example 13.2 has suggested, looking at the growth rate of a TM’s time
complexity tells us something about the efficiency of the algorithm embodied by the
machine. Of two TMs recognizing the same language (with the same number of tapes),
the one for which the growth rate of the time complexity is smaller will be preferable,
at least for sufficiently long input strings. We can now turn this statement around in
order to compare the complexity, or difficulty, of two languages. If recognizing L; can
be accomplished by a k-tape TM with time complexity /;, and the time complexity
of any k-tape TM recognizing L2 grows faster than /;, it is reasonable to say that L2
is more complex, or difficult to recognize, than L.
You might not expect that the time or space complexity of a nondeterministic
Turing machine would be a useful concept; after all, such a machine is allowed to
take shortcuts by making guesses. However, many of the most interesting decision
problems have solutions that can be described easily by using nondeterminism, and
we will see that adding this ingredient will be helpful in categorizing languages, or
decision problems, according to their complexity.

Definition 13.3 The Time and Space Complexity of a


Nondeterministic Turing Machine
490 PART 6 Introduction to Computational Complexity

This definition is not as difficult to interpret as it might appear at first. Fortunately,


the complication caused by tr or s7 being undefined at some points will not usually
arise. For the languages we will be interested in (or for the decision problems we
will be trying to solve), Turing machines are available that cannot loop forever. In
particular, we will usually be considering TMs T for which tr < f for some total
function f on the natural numbers, and this is understood to imply that t7 is also a
total function.
The number t;7 (n) is supposed to measure the time that might be required for T
to determine whether a string x of length n is in the language L accepted by T. If
x € LandT makes the right choice of moves on input x, then it reaches the accepting
state within t,(n) moves. If |x| =n and x ¢ L, then there is a sequence of no more
than t; (n) moves causing T to reject input x. Another way to say this is that for any
input x, if we trace all possible sequences of tr(|x|) or fewer moves, then we have
the answer for the string x.
Another way to visualize the situation is to use the idea of parallel computation.
One might imagine several “parallel processors” carrying out simultaneously all the
possible computations T can execute on input x. We could answer the question for
x if we could monitor these independent computations for the first t7 (|x|) moves. In
practice this might not be feasible, because the number of processors required may
continue to grow with the number of steps in the computation.

| EXAMPLE 13.3 | The Time Complexity of a Simple NTM


Consider the Turing machine pictured in Figure 13.2, which accepts the language

= {x € {a,b}* | forsomek > 2 and some w with |w| > 1,x = w*}

A straightforward deterministic approach to recognizing this language would be to determine


whether the input string x is of the form w* for a string w with |w| = 1, or for a string w with
CHAPTER 13 Measuring and Classifying Complexity 491

|w| = 2,..., or fora string w with length |x|/2. The machine in Figure 13.2 uses nondeter-
minism instead. Let us temporarily ignore the first component, Place($), and concentrate on
the remaining parts, which also begin with the tape head on square 0. The machine moves
past the input string, generates an arbitrary string w on the tape, makes an identical copy of w,
makes an arbitrary number of additional copies, deletes all the blanks that separate the copies,
and compares the resulting string to the original input. The nondeterminism appears both in
the construction of w and in the choice of the number of copies made. If the input string is
of the form w*, where |w| > 1 and k > 2, then the sequence of moves in which that string
is generated on the tape causes the TM to accept; because the string generated is of this form,
any input that is not causes the final comparison to fail and the TM to reject.
The purpose of Place($) is to prevent the possibility of infinite loops, which would oth-
erwise be possible in two places: in the generation of w or in the production of additional
copies of w. Place($) places the special marker in square 3n + 2, where n is the length of the
input. Thereafter, since no moves are specified for the tape symbol $, the machine crashes if
the tape head ever moves that far right on the tape. The number 3n + 2 is chosen to allow for
the extreme case in which |w| = 1 and n copies of w are generated, so that the copies and the
blanks separating them require a total of 2n tape squares.
Although calculating the exact nondeterministic time complexity of this machine is com-
plicated, finding a big-oh answer is not so hard. If the string w that is generated has length m,
and k copies are made, then in order for the input string of length n to be accepted, km must
be n. From this point on we can argue as follows. Moving past the input and generating w
takes time roughly proportional to n + m. Creating each additional copy of w requires time
proportional to m?, and this occurs k — 1 times, so that this total is proportional to km? = mn.
The first deletion of a blank takes time proportional to m, the second requires time proportional
to 2m, ..., and the last requires time proportional to (k — 1)m; the total of these is therefore
proportional to k?m = kn. Finally, still assuming that the input and the string created non-
deterministically have the same length n, comparing them takes time proportional to n?. It is
now clear that the overall time complexity is O(n’).

a/a, R A/a, R
b/b, R A/b, R

—+ Place ($)

a/a, L Equal
b/b, L

a/a, R a/a, L a/a, L


b/b, R b/b, L b/b, L

Figure 13.2]
492 PART 6 Introduction to Computational Complexity

The space complexity of our nondeterministic Turing machine is easy to calculate. The
rightmost tape square visited is the one just to the right of the last copy of w that is created.
Again, we are only interested in the case when the input string is accepted, so that the total
amount of space used (including the space required for the input) isn + 2+k(@m+ 1) =
2n + k + 2, where as before, k is the number of copies of w that are created. Because k is no
larger than n, we conclude that the TM has linear space complexity.

Example 13.3 illustrates in a simple way something that we will often see later.
To decide whether x is in this language, we must answer the question: Do there exist
a string w and an integer k so that x = w*? A deterministic algorithm to answer
the question consists of testing every possible choice of w and k to see if any choice
works. Although nondeterminism can eliminate the “every possible choice” aspect
of the algorithm by guessing, the TM must still test deterministically the choice it
guesses. Many interesting decision problems have the same general form: Is there
some number (or some choice of numbers, or some path, ...) that works? The obvious
nondeterministic approach to such a problem is to choose values by guessing and then
test them to see if they work. If T is a nondeterministic TM answering the question
in this way, we can interpret the time complexity tr informally by thinking of tr (1)
as measuring the time required to test a possible solution. The assumption is that the
time required for guessing a solution to test is relatively small.

13.3 |COMPLEXITY CLASSES


Now that we have defined the time and space complexity of a Turing machine, we
can begin to discuss the inherent complexity of a computational problem. Just as in
Chapter 11, we are interested in decision problems. An instance of such a problem can
be encoded as a string; we may then consider the language of encoded yes-instances;
and we can describe the complexity of the problem by considering the complexity
functions of machines that solve it.
The complexity classes in Definition 13.4 are classes of languages. In Chapter
11 we could avoid distinguishing between the decision problem and the language
corresponding to it, because the only issue was whether or not the problem was
solvable. Any encoding was all right, as long as it was possible, in principle, to
decode a string representing an instance. Now that we are concerned with how much
time or space is required to solve a problem, we must be a little more careful about the
encoding we use. On the one hand, we do not want the difficulty of solving a problem
to be dramatically increased because of the difficulty of encoding an instance or
decoding a string; on the other hand, we do not want the description of an algorithm’s
runtime in terms of the instance size to be distorted because the encoding results in
instance strings that are longer than necessary. (For example, suppose a problem
involves integer input. An algorithm that has linear runtime when the integers are
encoded in unary notation may turn out to be exponential when binary notation is
used instead.) For the most part, it will be sufficient to keep in mind that the encoding
we adopt should be “reasonable” in this sense.
CHAPTER 13 Measuring and Classifying Complexity 493

Definition 13.4 Basic Complexity Classes

At this point we introduce a convention that occasionally will be useful in the


remainder of this chapter and the next. It is possible for a Turing machine to halt
before it has finished reading all its input, so that its time complexity function might
be less than n. More commonly, however, a TM reads all its input. In any case, the
rules we are following for having a TM recognize a language require that the machine
erase all its input and leave the answer, 0 or 1, in square | of the tape. Because it is
impossible to do this for an input string of length n in fewer than 2n + 2 moves, we
interpret Time(f ) to mean Time(g), where g(n) = max(f (n), 2n+2). For the sake of
consistency, we also follow this convention for Space(f) and for the corresponding
nondeterministic complexity classes. (In the case of space, amore common approach
is to consider a slightly different TM model, in which there is a read-only input tape
and one or more work tapes. If “space” is taken to mean the number of tape squares
used on the work tapes, it is possible to consider Space(f) for functions f satisfying
f(n) <n. We consider this briefly in the exercises.)
Apart from this convention, the classes in Definition 13.4 make sense for arbitrary
functions f. Itis often useful to impose some additional restrictions, however, without
which the complexity classes sometimes exhibit surprising and unintuitive behavior.
Adopting the extra restrictions will not change things in any significant way when it
comes to discussing practical, real-world problems.

Step-counting Functions

It is easy to see how such functions might be useful. If f is a step-counting


function, T is a TM halting in f (7) moves as in the definition, and T’ is any Turing
machine, then a composite TM 7; can be constructed that executes T’, except that it
494 PART 6 Introduction to Computational Complexity

halts when T halts if that happens first. In other words, T can be used as a clock in
conjunction with other machines. In a similar way, if we want to constrain the space
resources of a TM T during its computation, we can use a step-counting function f
in order to mark a space of exactly f(n) squares to be used by T.
It is obvious from the definition that if f is a step-counting function, then f can
be computed by a TM in such a way that the number of steps in the computation of
f(n) is essentially f(n). It is also true, though much less obvious, that a relaxed
form of this condition is still sufficient for a function to be a step-counting function.

Lemma 13.1 If f : V — AN isa positive function, and if there is a constant C > 1|


so that f(n) > Cn for all but a finite number of integers n, then f is a step-counting
function if and only if f can be computed in time O(f) (i.e., there is a Turing machine
T computing f, and a constant K, so that for every n, T computes f(n) in no more
than K f (n) steps).
The proof is omitted.
It is not hard to show using Lemma 13.1 that most familiar functions are, in fact,
step-counting functions. (See also Exercises 13.12 and 13.13.)
Having defined the basic complexity classes, we should remind ourselves that
there are “infinitely complex” languages—that is, languages not in Time(f) or
Space(f ) for any f because they cannot be recognized by any TM. However, solvable
problems can still be arbitrarily complex. Theorem 13.3 deals with time complexity,
and its proof uses a diagonal argument.

eae we need only he


t assumption that éis comp
A Tu 1 ngmachiine can check whether w € L by determining whe
CHAPTER 13 Measuring and Classifying Complexity 495

It is possible to show that if the function f is a step-counting function, then the


function Cn*(f (n))*, or a function that differs from this one at only a finite number
of points, is also. This means that by applying the theorem repeatedly, we obtain a
sequence of more and more complex languages. This is a simple case of more general
“hierarchy” results, which specify conditions on functions f and g that are sufficient
to obtain languages in Time(g) — Time(f). There are similar “space hierarchy”
theorems, which show in particular that there are decision problems whose solutions
require Turing machines of arbitrarily great space complexity.
In the remainder of this section, we note some of the simple relationships among
the complexity classes in Definition 13.4, and mention several others with little or no
proof.

nitions oftime complexity coincide. The proofis similar

ore than f(n) moves and isrequiredto_


¢ head can visit no more than f(n)
496 PART 6 Introduction to Computational Complexity

By combining Theorems 13.5 and 13.6, we see in particular that for a step-
counting function f, NTime(f) © Time(C!) for some constant C. In fact, this
result does not require the assumption that f be step-counting, and it can easily be
obtained directly from the proof of Theorem 9.2. In that proof, we constructed a
Turing machine to try all possible finite sequences of moves of a given NTM. The
number of sequences of k or fewer moves is simply the number of nodes in the first k
levels of the “computation tree.” Because there is an upper bound on the number of
moves possible at each step, this number is bounded by c* for some constant c, and
the result follows without difficulty.
CHAPTER 13 Measuring and Classifying Complexity 497

This observation tells us that if we have a nondeterministic TM accepting L with


nondeterministic time complexity f, we can eliminate the nondeterminism at the
cost of an exponential growth in the time complexity. If this is really the best we
can do, then the presence of nondeterminism can make a dramatic difference in the
time required for a solution, and it is reasonable to expect that for many problems,
making a lucky guess is the only way of obtaining an answer within a reasonable
time. Whether this is actually the case will be discussed further in Chapter 14.

EXERCISES
13.1. Suppose f, 2,h,k:N> WN.
a. Show that if f= O(h) andg = O(k), then f + g = O(h +k) and
¥2= O(hk).
b. Show that if f = o(h) and g = o(k) then f + g = o(h +k) and
fg =o(hk).
c. Show that f + g = O(max(f, g)).
13.2. Let fi(n) = 2n?, fo(n) = n? + 3n?/, f3(n) = n3/logn, and
2n? if n is even
Larter? aiiGes died
For each of the twelve combinations of i and j, determine whether
fi = O(f;), and whether f; = o( f;), and give reasons.
13.3. Suppose f is a total function from NV to N and that f = O(p) for some
polynomial function p. Show that there are constants C and D so that
f(™) < Cp(n) + D for every n.
13.4. a. Show that each of the functions n!, n”, and 2” has a growth rate greater
than that of any exponential function.
b. Of these three functions, which has the largest growth rate and which
the smallest?
13.5. a. Show that each of the functions 2V” and n'°” has a growth rate greater
than that of any polynomial and less than that of any exponential
function.
b. Which of these two functions has the larger growth rate?
13.6. Classify the function (log n)'°£” with respect to its growth rate (polynomial,
exponential, in-between, etc.)
13.7; Give a proof of the second statement in Theorem 13.2 that does not use
logarithms. One way to do it is to write n = no + m, and to consider the
formula
k k
ck (Motl) (mo +2 ( no +m )k
Saeed No not+1 no tm—1

13.8. In Example 13.2, find a formula for tr (7) when n is odd.


498 PART 6 Introduction to Computational Complexity

13.9. Find the time complexity function for each of these TMs:
a. The TM in Example 9.2 accepting the language of palindromes over
{0, 1}.
b. The Copy TM shown in Figure 9.12.
13.10. Show that if L can be recognized by a TM T with a doubly infinite tape,
and tr = f, then L can be recognized by an ordinary TM with time
complexity O(f).
13.11. Show that for any solvable decision problem, there is a way to encode
instances of the problem so that the corresponding language can be
recognized by a TM with linear time complexity.
13.12. Show that if f and g are step-counting functions, then so are f + g, f * 8,
fog, and 2/,
13.13. Show that any polynomial with positive integer coefficients and a nonzero
constant term is a step-counting function.
13.14. Show that the following decision problem is unsolvable: Given a Turing
machine T and a step-counting function f, is the language accepted by T
in Time(f)?
13.15. Is the problem, Given a Turing machine T, is tr < 2n? solvable or
unsolvable? Give reasons.
13.16. Suppose s is a step-counting function satisfying s(n) >n. Let L bea
language accepted by a (multitape) TM 7, and suppose that the tape heads
of T do not move past square s(n) on any of the tapes for an input string of
length n. Show that T € Space(s). (Note: the reason it is not completely
obvious is that T may have infinite loops. Use the fact that if during a
computation of T some configuration repeats, then T is in an infinite loop.)
13.17. If T is aTM recognizing L, and T reads every symbol in the input string,
then tr(n) > 2n + 2. Show that any language that can be accepted by a
T™ T with tr(n) = 2n + 2 is regular.
13.18. Suppose L;, Ly € &*, L; € Time(f;), and Lz € Time(f2). Find functions
g andh so that L; U Lz € Time(g) and L, M Lz € Time(h).
13.19. As we mentioned in Section 13.3, we might consider an alternate Turing
machine model, in which there is an input tape on which the tape head can
move in both directions but cannot write, and one or more work tapes, one
of which serves as an output tape. For a function f, denote by DSpace(f)
the set of languages that can be recognized by a Turing machine of this type
which uses no more than f (n) squares on any work tape for any input
string of length n. The only restriction we need to make on f is that
f(n) > 0 for every n. Show that both the language of palindromes over
{0, 1} and the language of balanced strings of parentheses are in
DSpace(1 + [log,(n + 1)]). ([x] means the smallest integer greater than or
equal to x.)
CHAPTER 13 Measuring and Classifying Complexity 499

MORE CHALLENGING PROBLEMS


13.20. If f and g are total, increasing functions from NV to V, and f = O(g) and
& # O(f), does it follow that f = o(g)? Either give a proof or find
functions that provide a counterexample.
13.21. If f and g are total, increasing functions from NV to V , does it follow that
one of the two statements f = O(g), g = O(f) must hold? Either give a
proof or find functions that provide a counterexample.
13.22. Describe in at least some detail a two-tape TM accepting the language of
Example 13.2 and having linear time complexity.
13.23. Let f : MN > N. Show that if L can be recognized by aTM T so that
tr(n) < f(n) for all but finitely many n, then L € Time( Ff). (Recall our
convention that Time(f) means Time(max(f, 2n + 2)).)
13.24. Suppose that f is a function satisfying n = o(f), and L € Time( f). Show
that for any constant c > 0, L € Time(cf).
13.25. Show that if L can be recognized by a multitape TM with time complexity
f, then L can be recognized by a one-tape machine with time complexity
Oem
13.26. According to Theorem 13.3, for any step-counting function f, there is a
recursive language L so that the time complexity of any TM recognizing L
must be greater than f for at least one n. Generalize this by showing that
for any such f, there is a recursive language L so that for any TM T
_ recognizing L, tr(n) > f (n) for infinitely many values of n.
13.27. Show that for any total computable function f : NV> AN, there is a
step-counting function g so that g(n) > f(n) for every n.
C HAPTER

Tractable and Intractable


Problems
14.1| TRACTABLE AND POSSIBLY
INTRACTABLE PROBLEMS: P AND NP
We may use the classification of languages described in the last chapter to compare
the complexity of two languages. For example, if L; €¢ Time(f) and Lz ¢ Time(f),
then it is reasonable to say that L> is, at least in some ways, more complex than
L,. Now we would like to identify those languages that are tractable, those we can
recognize within reasonable time and space constraints. In other words, we would
like to know which decision problems we can actually solve.
Although there is no clear line separating the hard problems from the easy ones,
one normally expects a tractable problem to be solvable in polynomial time. The most
common problems for which no polynomial-time algorithms are known seem to re-
quire exponential runtimes. As we have seen in Section 13.1, even moderately sized
worst-case instances of such problems are likely not to be feasible. With this in mind,
we define complexity classes containing the languages recognizable in polynomial
time and polynomial space, respectively, as well as corresponding nondeterministic
complexity classes.

500
CHAPTER 14 Tractable and Intractable Problems 501

The sets P and PSpace include any language that can be recognized by aTM with
time complexity or space complexity, respectively, bounded by some polynomial.
We may speak informally of decision problems being in P or PSpace, provided
we keep in mind our earlier comments about reasonable encoding methods. Saying
that the tractable problems are precisely those in P cannot be completely correct. For
example, if two TMs solving the same problem had time complexities (1.000001)"
and n!°°°, respectively, one might prefer to use the first machine on a typical problem
instance, insnue of Theorem 13.2. However, in real life, “polynomial” is more likely
to mean n? or n? than n!?, and “exponential” normally turns out to be 2” rather than
(1.000001)”.
Another point in favor of the polynomial criterion is that it seems to be invariant
among the various models of computation. Changing the model can change the time
complexity, but by no more than a polynomial factor; roughly speaking, if a problem
can be solved in polynomial time on some computer, then it is in P.
It is obvious from Theorem 13.5 that P C PSpace and NP C NPSpace. The
following result, which follows easily from the theorems in Section 13.3, describes
some other simple relationships among these four sets. In particular, having defined
NPSpace, we can now forget about it.

One would assume that allowing nondeterminism should allow dramatic im-
provements in the time complexity required to solve some problems. An NTM, as we
have seen in Section 13.2, has an apparently significant advantage over an ordinary
TM, because of its ability to guess. The quadratic time complexity of the NTM in
Example 13.3 reflected the fact that we can test a proposed solution in quadratic time;
it does not obviously follow that we can test all solutions within quadratic time. As we
observed in Section 13.3, replacing a nondeterministic TM by a deterministic one as
in the proof of Theorem 9.2 can cause the time complexity to increase exponentially;
in fact, this is true of all the general methods known for eliminating nondeterminism
from a TM. It is therefore not surprising that there are many languages, or decision
problems, in NP for which no deterministic polynomial-time algorithms are known.
In other words, there are languages in NP that are not known to be in P.
What is surprising is that there are no languages in NP that are known not to
be in P. Although the most reasonable guess is that P is a proper subset of NP, no
one has managed to prove this statement. It is not for lack of trying; the problems
known to be in NP include a great many interesting problems that have been studied
502 PART 6 Introduction to Computational Complexity

intensively for many years, for which researchers have tried without success to find
either a polynomial-time algorithm or a proof that none exists. A general rule of
thumb is that finding good lower bounds is harder than finding upper bounds. In
principle, it is easier to exhibit a solution to a problem than to show that the problem
has no efficient solutions. In any case, the P = NP question is one of the outstanding
open problems in theoretical computer science.
Whether the second inclusion in Theorem 14.1 is a strict inclusion is also an
open question. We can summarize both these questions by saying that the role of
nondeterminism in the description of complexity is not thoroughly understood.
In the last part of this section, we will study two problems in NP that are interesting
for different reasons. The first can easily be shown to be in NP; however, we will see
in the next section that it is, in a precise sense, a hardest problem in NP. The second
also turns out to be in NP, though not obviously so, since it does not seem to fit the
“guess a solution and test it in polynomial time” pattern.

| EXAMPLE 14.1 | The CNF-Satisfiability Problem


An instance of this problem is a logical expression, which contains variables x; and the logical
connectives A, V, and — (AND, OR, and NOT, respectively). We use the notation x; to denote
the negation —x;, and the term literal to mean either an x; or an X;. The expression is assumed
to be in conjunctive normal form (CNF), which means that it is a conjunction

Cy AC2A---AC.

of subexpressions C;, each of which is a disjunction (i.e., formed with v’s) of literals. For
example,

(x1 V x3 V.X4)
A (%1 V x3)
A 1 V x4. V X2) AX3 A (X2 V X4)

is a CNF expression with five conjuncts. In general, one variable might appear more than once
within a conjunct, and a conjunct itself might be duplicated. We do, however, impose the extra
requirement that for some v > 1, the distinct variables be precisely x), x2,...,x,, with none
left out.
You can verify easily that the expression above is satisfied—made true—by the truth
assignment

Xp =o rue Kis) — alse

The CNF-satisfiability problem (CNF-Sat for short) is this: Given an expression in conjunctive
normal form, is there a truth assignment that satisfies it?
We can encode instances of the CNF-satisfiability problem in a straightforward way,
omitting parentheses and v’s and using unary notation for variable subscripts. For example,
the expression

(x1 V X2) A (X%2 V x3 V X1) A (%4 V X2)

will be represented by the string

eA eAul veces ilbeal YAseiUililitseiil


CHAPTER 14 Tractable and Intractable Problems 503

We define CNF-Satisfiable to be the language over © = {A, x, X, 1} containing


the encodings
of all yes-instances of CNF-Sat.
Is our encoding scheme a reasonable one? We might try to answer this by considering an
instance with k literals (not necessarily distinct), c conjuncts, and v distinct variables.
If n is
the length of the string encoding this instance, then

n<kvt+1l)4+c<F
+2k
The first inequality depends on the fact that all the strings x1' and ¥1/ all have length < v +1,
and the second is true because v and c are both no larger than k. This relationship between n
and k implies that any polynomial in n is bounded by a polynomial in k, and k seems like a
reasonable measure of the size of the problem instance. Therefore, if CNF. -Satisfiable € NP, it
makes sense to say that the decision problem CNF-Sat is in NP.
We can easily describe in general terms the steps a one-tape Turing machine T needs
to follow in order to accept CNF-Satisfiable. The first step is to verify that the input string
Tepresents a valid CNF expression in which the variables are precisely x), x,..., xy for some
v. Assuming the string is valid, T attempts to satisfy the expression, keeping track as it
proceeds which conjuncts have been satisfied so far and which variables within the unsatisfied
conjuncts have been assigned values. The iterative step consists of finding the first conjunct
not yet satisfied; choosing a literal within that conjunct that has not been assigned a value (this
is the only place where nondeterminism is used); giving the variable in that literal the value
that satisfies the conjunct; marking the conjunct as satisfied; and giving the same value to all
subsequent occurrences of that variable in unsatisfied conjuncts, marking any conjuncts that
are satisfied as a result. The loop terminates in one of two ways. Either all conjuncts are
eventually satisfied, or the literals in the first unsatisfied conjunct are all found to have been
falsified. In the first case T accepts, and in the second it rejects. If the expression is satisfiable,
and only in this case, the correct choice of moves causes T to guess a truth assignment that
works.
The TM T can be constructed so that, except for a few steps that take time proportional to
n, all its actions are minor variations of the following operation: Begin with a string of 1’s in the
input, delimited at both ends by some symbol other than 1, and locate some or all of the other
occurrences of this string that are similarly delimited. We leave it to you to convince yourself
that a single operation of this type can be done in polynomial time, and that the number of such
operations that must be performed is also no more than a polynomial. (The nondeterministic
time complexity of T is O(n); see Exercise 14.2.) Our conclusion is that CNF-Satisfiable,
and therefore CNF-Sat, is in NP.
The number of distinct truth assignments to an expression with j distinct variables is 2/.
Although this fact does not by itself imply that the decision problem is not in P, it tells us
that the brute-force approach of trying all solutions will not be helpful in attempting to find a
polynomial-time algorithm. We will return to CNF-Sat in the next section.

The Primality Problem


This is the familiar decision problem, Given a positive integer n, is n prime? Here the obser-
vations in the second paragraph in Section 13.3 are relevant. A solution that is polynomial,
even linear, when unary notation is used for the integer n is exponential if binary notation is
504 PART 6 Introduction to Computational Complexity

Let
used instead, because the number of binary digits needed to encode n is only about logn.
that for this problem “polynomial -time solution” means polynomial, not in 7, but in
us agree
log n, the length of the input string. In particular, therefore, the algorithm in which we actually
test all possible divisors up to ./n is not helpful. Even if we could test each divisor in constant
time, the required time would be proportional to ./n, which is not bounded by any polynomial
function of logn.
This problem also seems to illustrate the importance of whether a problem is posed posi-
tively or negatively. The composite decision problem: Given an integer n > 1, is it composite
(i.e., nonprime)? has a simple nondeterministic solution—namely, guess a possible factoriza-
tionn = p *q and test by multiplying that it is correct—and therefore is in NP. The primality
problem is not obviously in NP, since there is no obvious way to “guess a solution.” At the
language level, the fact that a language is in NP does not immediately imply that its complement
is (Exercises 14.3 and 14.4).
To see that the primality problem is in NP, we need ‘some facts from number theory. First
recall from Chapter 1 the congruence-mod-n relation =,, defined by a =, b if and only if
a — bis divisible by n. Of the two facts that follow, the first is due to Fermat, and we state it
without proof.

1. A positive integer n is prime if and only if there is a number x, with 1 < x <n — ily
satisfying
x"! = 1, and for every
m with 1 <m <n—1, x" #, 1
2. Ifn is not prime, then for any x with 0 < x <n — | satisfying x"! =, 1, we must also
have x\"—))/P =, 1 for some p that is a prime factor of n — 1.

We can check the second statement without too much trouble. If x”~! =, 1 and 7 is not prime,
then by statement 1, x” =, 1 for some m < n — 1. We observe that the smallest such m must
be a divisor of n — 1. The reason is that when we divide n — 1 by m, we get a quotient qg and
a remainder r, so that
n—-l=q*m+r and O0O<r<m
This means that x"~! = x7”"+” = (x)? x x”, and because x”! and x” are both congruent to
1 mod n, we must have (x”)? =, 1 and therefore x” =, 1. It follows that the remainder r
must be 0, because r < m and by definition m is the smallest positive integer with x” =, 1.
Therefore, n — 1 is divisible by m.
Now, any proper divisor m of n — 1 is of the form (n — 1)/j, for some j > 1 that is a
product of (one or more) prime factors of n — 1. Therefore, some multiple of m, say a * m, is
(n — 1)/p for a single prime p. Because x°*”" = (x”)* =, 1, statement 2 follows.
The significance of the first statement is that it gives us a way of expressing the primeness
of n that starts, “there is anumber x so that... ,” and thus we have a potential nondeterministic
solution: Guess an x and test it. At first, testing x seems to require that we test a// the numbers m
with 1 < m <n—1 to make sure that x” #, 1. If this is really necessary, the nondeterminism
is no help—we might as well go back to the usual test for primeness, trying divisors of n. The
significance of statement 2 is that in order to test x, we do not have to try all the m’s, but only
those of the form (n — 1)/p for some prime factor p of n — 1. (According to statement 2, if n
is not a prime, some m of this form will satisfy x” =, 1.) How do we find the prime factors
of n — 1? We guess!
CHAPTER 14 Tractable and Intractable Problems 505

With this introduction, it should now be possible to see that the following recursive
nondeterministic procedure accepts the set of primes.

Is_Prime(n)

if n = 2 return true
else ifn > 2 and n is even return false
else
{ guess
x withl <x <n
if x""' #, 1 return false
guess a factorization py, po,..., py of n — 1
fori =1tok
if not Is_Prime(p;) return false
if py * py *---py An — 1 return false
fori = 1 tok
if x®-)/Pi =, 1 return false
return true
}
ATM can simulate this recursion by using its tape to keep a stack of “activation records,” as
mentioned once before in the sketch of the proof of Theorem 13.6. In order to execute the
“return false” statement, however, it can simply halt in the reject state.
It is still necessary to show that this nondeterministic algorithm can be executed in poly-
nomial (nondeterministic) time. We present only the general idea, and leave most of the details
to the exercises. It is helpful to separate the time required to execute Is_Prime(n) into two
parts: the time required for the k recursive calls Is_Prime(p;), and all the rest. It is not hard
to see that for some constant c and some integer d, the nondeterministic time for everything
but the recursive calls is bounded by c(log n)4 (remember that log n is the length of the input
string). If 7 (n) is the total nondeterministic time, then
k
T(n) < c(logn)* + )>T(p;)
esi

This inequality can be used in an induction proof that T(n) < C(logn)4+', where C is some
sufficiently large constant. In the induction step, if we know from our hypothesis that T(p;) <
C (log p;)4*' for each 7, then we only need to show that
k

c(logn)’ + )~ C(log p;)“*! < C(logn)**!


p=

This follows from the inequality


k k
c(logn)4 + }7C(log pi)**! < CC)|log pi)**"
i=l i=l
which is true if the constant C is chosen sufficiently large, because the sum on the right side is
simply log(p; *--- * px) = log(n — 1).
506 PART 6 Introduction to Computational Complexity

14.2| POLYNOMIAL-TIME REDUCTIONS


AND NP-COMPLETENESS
As we have seen, the classes Time( f) and Space( f) allow us to compare the complex-
ity, or difficulty, of two problems or two languages. Just as in Chapter 11, however,
it is useful to introduce a way of describing the relative complexity of L; and Lo,
without having to pin down the absolute complexity of either language.
For the type of reducibility introduced in that chapter, L; is reducible to L2 if
there is a computable function f so that deciding whether x € Ly is equivalent to
deciding whether f(x) € Lo. Ly is “no harder than” L>, in the sense that testing
membership in L; can be done in two steps, computing f and testing membership
in L>. The phrase “no harder than” is reasonable because we distinguish only two
degrees of hardness, possible and impossible; if testing membership of a string in L2
is possible, then testing a string that must first be‘obtained by computing f is still
possible, because f is computable.
Now, however, we are using a finer classification system, and we want our com-
parison to be quantitative as well as qualitative. An appropriate way to modify the
definition is to specify that the reducing function f should be computable in a rea-
sonable amount of time, where “reasonable” is taken to mean polynomial.

Definition 14.2 Polynomial-time Reducibility

The following properties of the relation <, are not surprising and are consistent
with our understanding of what “no harder than” should mean in this context.

theorem 14.2 oe
|. S. is transitive: if Ly. Aes Lea LsthenAe
2. ye eP and L; Sp
< L», then L; Ee P.
3. IfLo ‘NPand Ly <p Lo, then hy e NP.

atthe ae ee al 1 the alphe


a)Sens and g : 3S -on
respectively, and f > 2) = as
oe in the reductions. 1

tr,(n)< Cn!+D and“or,(0) <n +2


CHAPTER 14 Tractable and Intractable Problems 507

If we continue to require “reasonable” encodings of problem instances, we can


extend Definition 14.1 and Theorem 14.2 to decision problems. We can therefore talk
about one decision problem being polynomial-time reducible to another, and we can
use these techniques to show that decision problems are in P or NP.
In our first example of a polynomial-time reduction, we show that the problem
CNF-Sat discussed in Example 14.1 can be reduced to a decision problem involving
undirected graphs. The point of the reduction is not to show that a language is in P
or NP (we know already that CNF-Sat is in NP, and there is no immediate prospect
of showing that the graph problem is in P). Rather, we interpret the result to say that
508 PART 6 Introduction to Computational Complexity

problem is
even though CNF-Sat appears to be a difficult problem in NP, the other
no easier.

A Polynomial-time Reduction Involving CNF-Sat


finite nonempty
We begin with some terminology. A graph is a pair G = (V, E), where V is a
set of vertices and E isa finite set of edges, or unordered pairs of vertices. (The term unordered
edge (1, v2)
means that the pair (v1, v2) is considered to be the same as the pair (v2, v;).) The
v; and v2, or to have end points v; and v2, and two vertices joined
is said to join the vertices
are subsets
by an edge are adjacent. A subgraph of G is a graph whose vertex set and edge set
of the respective sets of G. A complete graph, sometimes called a clique, is a graph in which
any two vertices are adjacent.
A schematic diagram of a graph G with seven vertices and ten edges is shown in Figure
14.1. G has a complete subgraph with four vertices (3, 5, 6, and 7), and obviously, therefore,
several complete subgraphs with fewer than four vertices. The complete subgraph problem is
this: Given a graph G and an integer k, does G have a complete subgraph with k vertices? It
is easy to see that the problem is in NP: a nondeterministic TM can take a string encoding an
instance (G, k), nondeterministically select k of the vertices, and then examine the edges to
see whether every pair is adjacent.
We let CompleteSub be the language corresponding to the complete subgraph problem;
we may assume that vertices are represented in unary notation, and that a graph is described by
a string of vertices, followed by a string of vertex-pairs, each pair representing an edge. So that
strings can be decoded uniquely, 0’s are inserted appropriately. We will show that CNF-Sat
is polynomial-time reducible to the complete subgraph problem, or that CNF-Satisfiable <p
CompleteSub.
When we discussed reductions in Chapter 11, in the context of solvability, we gave
two definitions: Definition 11.3a, involving languages, and Definition 11.3, involving actual
problem instances. The reduction here is easier to understand using the second approach. It will
then be straightforward to confirm the polynomial-time aspect of the reduction by considering
the corresponding string function.
We must construct for each CNF expression x an instance f (x) of the complete subgraph
problem (that is, a graph G, and an integer k,), so that for any x, x is satisfiable if and only if
G, has a complete subgraph with k, vertices.
Let x be the expression
Cc d;

x= AV 4
=e

where each a; ; isa literal. We want the vertices of G, to correspond precisely to the occurrences
of the terms q;,; in x; we let

¥,={G
ll si se and Veyeqg)
The edges of G, are now specified so that the vertex (i, j) is adjacent to (J, m) if and only if
the corresponding literals are in different conjuncts of x and there is a truth assignment to x
CHAPTER 14 Tractable and Intractable Problems 509

Figure 14.1 |

that makes them both true. The way to do this is to let

E,={(G@ Jj), U,m)) |i Al and a,j F 7a,,m}


Finally, we take the integer k, to be c, the number of conjuncts in the expression x.
If x is satisfiable, then there is a truth assignment © so that for each i there is a literal Gi,
that is given the value true by ©. The vertices

(ik Gis @ Ja), sey (k, Jn)

then determine a complete subgraph of G,,, because we have specified the edges of G, so that
any two of these vertices are adjacent.
On the other hand, suppose there is a complete subgraph of G, with k, vertices. Because
none of the corresponding literals is the negation of another, there is a truth assignment that
makes them all true; and because these literals must be in distinct conjuncts, this assignment
makes at least one literal in each conjunct true. Therefore, x is satisfiable.
Now let us consider how long it takes, beginning with the string w representing x, to
construct the string representing (G,, k,). The vertices of the graph can be constructed in a
single scan of w. For a particular literal in a particular conjunct of x, a new edge is obtained
for each literal in another conjunct that is not the negation of the first one. Finding another
conjunct, identifying another literal within that conjunct, and comparing that literal to the
original one can each be done within polynomial time, and it follows that the overall time is
still polynomial.

The polynomial-time reducibility relation <, is used, as in Example 14.3, to


measure the relative complexity of two languages. However, it also allows us to
describe a kind of absolute complexity: We can consider the idea of a hardest lan-
guage in NP. (Not the hardest because it will turn out that many languages share this
distinction.)

Definition NP-hard and NP-complete Languages


510 PART 6 Introduction to Computational Complexity

Just as before, Definition 14.3 and Theorem 14.3 can also be extended to decision
problems. An NP-complete problem is one for which the corresponding language is
NP-complete, and Theorem 14.3 provides a way of obtaining more NP-complete
problems—provided that we can find one to start with. It is not at all obvious that
we can. The set NP contains problems that are diverse and seemingly unrelated:
problems involving graphs, networks, sets and partitions, scheduling, number theory,
logic, and more. It is reasonable to expect that some of these problems will be more
complex than others, and perhaps even that some will be “hardest.” An NP-complete
problem, however, is not only hard but archetypal: Finding a good algorithm to solve
it guarantees that there will be comparable algorithms for every other problem in NP!
Exercise 14.14 describes a way to obtain an “artificial” NP-complete language. In
the next section we will see a remarkable result of Stephen Cook: The language CNF-
Satisfiable (or the decision problem CNF-Sat) is NP-complete. This fact, together
with the technique of polynomial-time reduction and Theorem 14.3, will allow us to
show that many interesting and widely studied problems are NP-complete.
Theorem 14.3 indicates both ways in which the idea of NP-completeness is sig-
nificant in complexity theory. On the one hand, if someone were ever to demonstrate
that some NP-complete problem could be solved by a polynomial-time algorithm,
then the P = NP question would be resolved; NP would disappear as a separate en-
tity, and researchers would redouble their efforts to find polynomial-time algorithms
for problems now known to be in NP, confident that they were not on a wild-goose
chase. On the other hand, as long as the question remains open (or if someone actually
succeeds in proving that P # NP), the difficulty of a problem P can be established
convincingly by showing that some other problem already known to be NP-hard can
be polynomial-time reduced to P.

14.3|COOK’S THEOREM
The idea of NP-completeness was introduced by Stephen Cook in 1971, and our
first example of an NP-complete problem is the CNF-satisfiability problem (Example
14.1), which he proved is NP-complete. The details of the proof are complicated,
and that is perhaps to be expected. Rather than using specific features of a decision
problem to construct a reduction, as we were able to do in Example 14.3, we must
now show there is a “generic” polynomial-time reduction from any problem in NP to
this one. Fortunately, once we have one NP-complete problem, obtaining others will
be considerably easier.
CHAPTER 14 Tractable and Intractable Problems 511

In the proof, as in Example 14.3, we will use the notation


n

/\4i
i=l
to denote the conjunction

Ai A Az A+++
A Ay
and the same sort of shorthand with disjunction. As in Example 14.1, @ stands for
the
negation of a.
In one section of the proof, the following result about Boolean formulas is useful.
Lemma 14.1 Let F be any Boolean expression involving the variables ay, Gyan.
Then F is logically equivalent to an expression in conjunctive normal form: one of
the form
li
a
pee
i Il
= f=!SS,

where each b,,; is either an a, or an@,.

Proof It will be easiest to show first that any such F is equivalent to an expression
in disjunctive normal form—that is,one of the form
Kaa ~
VA 2:3
v
I
ply

To do this, we introduce the following notation. For any assignment ® of truth values
to the variables, and for any j from 1 to t, we let

a; if the assignment © makes a; true


%,j = :
; a; otherwise
Then the assignment © makes each of the a@,;’s true; and if the assignment ®
assigns a different truth value to a;, a@,; is false for the assignment ®. In other
words, the conjunction A‘._,a@@,; is satisfied by the assignment © and not by any
other assignment. It follows that if S is the set of all assignments satisfying the
expression F’, then the assignments that satisfy one of the disjuncts of the expression

Sy ies,
OeS j=1

are precisely those that satisfy F. Since F and F; are satisfied by exactly the same
assignments, they must be logically equivalent.
We can finish the proof by applying our preliminary result to the expression —F’.
If —F is equivalent to

V/A
YY
bs
512 PART 6 Introduction to Computational Complexity

then De Morgan’s law implies that F' is equivalent to

AV >i
i j

Recall from Example 14.1 that CNF-Satisfiable is the language of encoded yes-
aie
instances of the CNF-satisfiability problem. It is a language over 2 = Uieak

the theorem
41,1, Go, 0)
CHAPTER 14 Tractable and Intractable Problems 513
514 PART 6 Introduction to Computational Complexity
CHAPTER 14 Tractable and Intractable Problems 515
516 PART 6 Introduction to Computational Complexity

ach new symbol i


vorst
/ polynomial
CHAPTER 14 Tractable and Intractable Problems 517

14.4 |SOME OTHER NP-COMPLETE


PROBLEMS
Now that we know CNF-Sat is an NP-hard problem, we can find others by following
the model in Chapter 11. We show that a problem is unsolvable by reducing another
unsolvable problem to it; we show that a problem is NP-hard by showing that another
NP-hard problem is polynomial-time reducible to it (Theorem 14.3). We have already
done this once, in Example 14.3, and the following Theorem records the conclusion.

Soon we will look at two other decision problems involving undirected graphs.
Our next example is one of several possible variations on the CNF-satisfiability prob-
lem; see the book Computational Complexity (Addison-Wesley, Reading, MA, 1994)
by Papadimitriou for a discussion of some of the others. We denote by 3-Sat the
following decision problem: Given an expression in CNF in which every conjunct is
the disjunction of three or fewer literals, is there a truth assignment satisfying the ex-
pression? The language 3-Satisfiable will be the corresponding language of encoded
518 PART 6 Introduction to Computational Complexity

yes-instances, using the encoding method discussed in Example 13.1. There is an


obvious sense in which 3-Sat is no harder than CNF-Sat. On the other hand, it is not
significantly easier.
CHAPTER 14 Tractable and Intractable Problems 519

In addition to the complete subgraph problem, studied in Example 14.3, many


other important combinatorial problems can be formulated in terms of graphs. A little
more terminology will be helpful. A vertex cover for a graph G is a set C of vertices
so that any edge of G has an endpoint in C. For a positive integer k, we may think
of the integers 1, 2,..., k as distinct “colors,” and use them to color the vertices of
a graph. A k-coloring of G is an assignment to each vertex of one of the k colors so
that no two adjacent vertices are colored the same. In the graph G shown in Figure
14.1, the set {1, 3,5, 7} is a vertex cover for G, and it is easy to see that there is
no vertex cover having fewer than four vertices. Clearly, since there is a complete
subgraph with four vertices, G cannot be k-colored for any k < 4. Although the
absence of a complete subgraph with k + 1 vertices does not automatically imply that
the graph has a k-coloring, you can easily check that in this case there is a 4-coloring
of G.
The vertex cover problem is this: Given a graph G and an integer k, is there
a vertex cover for G with k vertices? The k-colorability problem is the problem:
Given G and k, is there a k-coloring of G? Both problems are in NP. In the second
case, for example, colors between 1 and k can be assigned nondeterministically to all
the vertices, and then the edges can be examined to determine whether each one has
different-colored endpoints. In order to show that both problems are NP-complete,
therefore, it is sufficient to show that they are both NP-hard.
520 PART 6 Introduction to Computational Complexity
CHAPTER 14 Tractable and Intractable Problems 521
522 PART 6 Introduction to Computational Complexity

the problems
Although we now have five examples of NP-complete problems,
list is growing
now known to be NP-complete number in the thousands, and the
referenc e for
constantly. The book by Garey and Johnson remains a very good
, grouped
a general discussion of the topic and contains a varied list of problems
according to category (graphs, sets, and so on).
prob-
NP-completeness is still a somewhat mysterious property. Some decision
lete (Exercises
lems are in P, and others that seem similar turn out to be NP-comp
or a
14.15 and 14.22). In the absence of either an answer to the P ” NP question
definitive characterization of tractability, people generally take a pragmatic approach.
Many real-life decision problems require some kind of solution. If a polynomial-time
ete
algorithm does not present itself, maybe the problem can be shown to be NP-compl
group available, and construct ing a re-
by choosing a problem that is, from the large
duction. In this case, it is probably not worthwhil e spending a lot more time looking
for a polynomial-time solution. The next-best thing might be to look for an algorithm
that produces an approximate solution, or one that provides a solution for a restricted
set of instances. Both approaches represent active areas of research.

EXERCISES
14.1. In studying the CNF-satisfiability problem, what is the reason for imposing
the restriction that an instance must contain precisely the variables
X1,.-.,Xy, with none left out?
14.2. The nondeterministic Turing machine we described that accepts
CNF-Satisfiable repeats the following operation or minor variations of it:
starting with a string of 1’s in the input string, delimited at both ends by a
symbol other than 1, and locating some or all of the other occurrences of
this string that are similarly delimited. How long does an operation of this
type take on a one-tape TM? Use your answer to argue that the TM
accepting CNF-Satisfiable has time complexity O(n>).
14.3. a. Show thatif L € Time(f), then L’ € Time(f).
b. Show thatif L € P, then L’ € P, andif L € PSpace, then L’ € PSpace.
c. Explain carefully why the fact that L <¢ NP does not obviously imply
that L’ € NP.
14.4. a. Let L and L> be languages over ©, and Xp, respectively. Show that if
Lge Lo; then =, Ls
b. Show that if there is an NP-complete language L whose complement is
in NP, then the complement of any language in NP is in NP.
14.5. Show that if L,, L> C D*, L; € P, and L, is neither 9 nor &*, then
Ly <p Lp.
CHAPTER 14 Tractable and Intractable Problems 523

14.6. a. If every instance of problem P, is an instance of problem P», and if P,


is hard, then P, is hard. True or false?
b. Show that 3-Sat < p CNF-Sat, or, at the language level, 3-Satisfiable
<p CNF-Satisfiable.
c. Generalize the result in part (b) in some appropriate way.
14.7. In each case, find an equivalent expression that is in conjunctive normal
form.
a a— (bA(c> (dVe)))
b. Vi(a A 8;)
14.8. In the proof of Cook’s theorem (Theorem 14.4), given a nondeterministic
TM T with input alphabet D1, we constructed a function 81: 25 —> CNF
so that for any x € Xj, x is accepted by T if and only if g; (x) is satisfiable.
The idea is that the expression g,(x) “says” that x is accepted by 7, in the
sense that g(x) is constructed from a number of atoms, each of which is
associated with a statement about some detail of T’s configuration after a
certain number of steps. Consider the following much simpler function
82: XL} — CNF. For any x € DF, g(x) is a single atom, labeled a,. For
each x, we associate the atom a, with the statement “T accepts x.” There is
an obvious similarity between the expressions g)(x) and 22(x); both
depend on x, and both are associated with statements that say x is accepted
by T. Explain the essential difference, which is the reason Cook’s theorem
doesn’t have a trivial one- or two-line proof.
14.9, Show that if k > 4, the k-satisfiability problem is NP-complete.
14.10. Find an unsatisfiable CNF expression involving three variables in which
each conjunct has exactly three literals and involves all three variables, so
that the number of conjuncts is as small as possible.
14.11. Show that both these decision problems are in P.
a. DNF-Satisfiability: Given a Boolean expression in disjunctive normal
form (the disjunction of clauses, each of which is a conjunction of
literals), is it satisfiable?
b. CNF-Tautology: Given a Boolean expression in CNF, is it a tautology
(i.e., satisfied by every possible truth assignment)?
14.12. Show that the general satisfiability problem, Given an arbitrary Boolean
expression, not necessarily in conjunctive normal form, involving the
variables x), x2, ..., X), 1S it satisfiable? is NP-complete.
14.13. Explain why it is appropriate to insist on binary notation when encoding
instances of the primality problem, but not necessary to do this when
encoding subscripts in instances of the satisfiability problem.
14.14. Consider the language L of all strings e(T)e(x)1”, where T is a
nondeterministic Turing machine, n > 1, and T accepts x by some
sequence of no more than n moves. Show that the language L is
NP-complete.
524 PART 6 Introduction to Computational Complexity

14.15. Show that the 2-colorability problem (Given a graph, is there a 2-coloring
of the vertices?) is in P.
14.16. Consider the following algorithm to solve the vertex cover problem. First,
we generate all subsets of the vertices containing exactly k vertices. There
are O(n*) such subsets. Then we check whether any of the resulting
subgraphs is complete. Why is this not a polynomial-time algorithm (and
thus a proof that P = NP)?
14.17. Let f be a function in PF, the set of functions from &* to ©* computable
in polynomial time. Let A (a language in &*) be in P. Show that f~!(A) is
in P, where by definition, fuAysie ed A Cree
14.18. In an undirected graph G, with vertex set V and edge set E, an independent
set of vertices is a set V; € V so that no two elements of V’ are joined by
an edge in E. Let IS be the decision problem: Given a graph G and an
integer k, is there an independent set of vertices with at least k elements?
Denote by VC and CSG the vertex cover problem and the complete
subgraph problem, respectively. Construct a polynomial-time reduction
from each of the three problems IS, VC, and CSG to each of the others.
(Part of this problem has already been done, in the proof of Theorem 14.7.
In the remaining parts, you are not to use the NP-completeness of any of
these problems.)

MORE CHALLENGING PROBLEMS


14.19. In Example 14.2, the claim was made that the time required for carrying out
the steps other than the recursive calls in the algorithm Is_Prime(7) is
O((log n)7) for some integer d. Show that this is true, and find the smallest
d that works.
14.20. Complete the argument in Example 14.2, by showing the following: if
T (n) satisfies the inequality
k

T(n) < c(logn)’ + ¥)T(pi)


i=l

where n — 1 = p, -:- px is the prime factorization of n — 1, and c and d are


positive constants, then T(n) = O((log n)@t!),
14.21. For languages L;, Lz € {0, 1}*, let

L, ® Lo = Lj{0} U Lo{h}
a. Show thatL; <, L; ® L2 and L2 <p L; @ Lo.
b. Show that for any languages L, L,, and L> over {0, 1}, with
L # (0, 1}*, if L; <p L and Lo <p L, then L; @ Lo Sp L.
14.22. Show that the 2-Satisfiability problem is in P.
14.23. Show that both P and NP are closed under the operations of union,
intersection, concatenation, and Kleene *.
CHAPTER 14 Tractable and Intractable Problems 525

14.24. A subexponential function from NV to WN is one that is O(2” ) for every


positive real number c. Show that if there is an NP-complete language in
Time(f) for some subexponential function f, then NP C Subexp, where

Subexp = | J{Time( f) | f is subexponential}


14.25. Show that the following decision problem is NP-complete: Given a graph
G in which every vertex has even degree, and an integer k, does G have a
vertex cover with k vertices? (The degree of a vertex is the number of
edges containing it.) Hint: given an arbitrary graph G, find an way to
modify it by adding three vertices so that all the vertices of the new graph
have even degree.
14.26. Give an alternate proof of the NP-completeness of the vertex cover
problem, by directly reducing CNF-Sat to it. Below is a suggested way to
proceed; show that it works.
Starting with a CNF expression x = /\7__, A;, where A; = \/”!ja Gi, j
and x involves the n variables v;, ..., vy, construct an instance (G, k) of
the vertex cover problem as follows: first, there are vertices v; and uf for
each variable v;, and these two are connected by an edge; next, for each
conjunct A; there is a complete subgraph G; with n; vertices, one for each
literal in A;; finally, for a variable v; and a conjunct Aj, there is an edge
from the vertex in G; corresponding to w, either to v; (if v; appears in A;)
or to vj (if v; doesn’t appear in A;). Finally, let k be the integer
WU tl ie ee (it — 1).
14.27. Show that the following decision problem is NP-complete Given a finite
‘set A, a collection C of subsets of A, and an integer k, is there a subset A,
of A having k or fewer elements so that A; 1 S 4 @ for each S in the
collection C?
14.28. The exact cover problem is the following: given finite subsets S,,..., S, of
a set, with nes S; = A, is there a subset J of {1, 2,..., k} so that for any
two distinct elements i and j of J, S$; S; =@ and U;., S; = A?
Show that this problem is NP-complete by constructing a reduction
from the k-colorability problem.
14.29. The rey as problem is the following: Given a sequence
a, @2,..., An Of integers, and an integer M, is there a subset J of
{12 os ah so that }7;., a; = M?
Clow that this problem is NP-complete by constructing a reduction
from the exact cover problem.
14.30. The partition problem is the following: Given a sequence a), dz, ..., dn of
integers, is there a subset J of {1,2,...,} so that )0;.) 4; = Doig; ai?
Show that this problem is VP-complete by constructing a reduction
from the sum-of-subsets problem.
14.31. The 0-1 knapsack problem is the following: Given two sequences
W1,---+, Wy and pj,..., Pn of nonnegative numbers, and two numbers W
and P, is there a subset J of {1,2,...,} so that }°,., wi < W and
526 PART 6 Introduction to Computational Complexity

SS) pr a PP? Che significance of the name is that w; and p; are viewed
as the weight and the profit of the ith item, respectively; we have a
“knapsack” that can hold no more than W pounds, and the problem is
asking whether it is possible to choose items to put into the knapsack
subject to that constraint, so that the total profit is at least P2)
Show that the 0-1 knapsack problem is NP-complete by constructing a
reduction from the partition problem.
REFERENCES

There are a number of texts that discuss the top- contains the first explicit statement of the Church-
ics in this book. Hopcroft, Motwani, and Ullman Turing thesis. Hennie (1977) contains a good general
(2001) and Lewis and Papadimitriou (1998) are both introduction to Turing machines.
recent editions of books that previously established Chomsky’s papers of 1956 and 1959 contain the
themselves as standard references. Others that might proofs of Theorems 11.1 and 11.2, as well as the defi-
serve as useful complements to this book include nition of the Chomsky hierarchy. The equivalence of
Sipser (1997), Sudkamp (1996), and Linz (2001), context-sensitive grammars and linear-bounded au-
and more comprehensive books for further reading tomata is shown in Kuroda (1964).
include Davis, Sigal, and Weyuker (1994) and Floyd Post’s correspondence problem was discussed in
and Beigel (1994). Post (1946). A number of the original papers by
The idea of a finite automaton appears in Mc- Turing, Post, Kleene, Church, and Gédel having to
Culloch and Pitts (1943) as a way of modeling neural do with computability and solvability are reprinted
nets. Kleene (1956) introduces regular expressions in Davis (1965). Kleene (1952), Davis (1958), and
and proves their equivalence to FAs, and NFAs are Rogers (1967) are further references on computabil-
investigated in Rabin and Scott (1959). Theorem 5.1 ity, recursive function theory, and related topics.
and Corollary 5.1 are due to Nerode (1958) and My- Rabin (1963) and Hartmanis and Stearns (1965)
hill (1957). Algorithm 5.1 appears in Huffman (1954) are two early papers dealing with computational com-
and in Moore (1956). The Pumping Lemmas (The- plexity. The class P was introduced in Cobham
orem 5.2a and Theorem 8.la) were proved in Bar- (1964), and Cook (1971) contains the definition of
Hillel, Perles, and Shamir (1961). NP-completeness and the proof that the satisfiability
Context-free grammars were introduced in problem is NP-complete. Karp’s 1972 paper exhib-
Chomsky (1956) and pushdown automata in Oet- ited a number of NP-complete problems and helped
tinger (1961); their equivalence is shown in Chom- to establish the idea as a fundamental one in com-
sky (1962), Evey (1963), and Schiitzenberger (1963). plexity theory. Garey and Johnson (1978) is the
LL(1) grammars are introduced in Lewis and Stearns standard introductory reference on NP-completeness
(1968) and LL(k) grammars in Rosenkrantz and and contains a catalogue of the problems then known
Stearns (1970). Knuth (1965) characterizes the gram- to be NP-complete. Recent references on computa-
mars corresponding to DCFLs. tional complexity include Balcazar (1988 and 1990),
Turing’s 1936 paper introduces Turing machines Bovet and Crescenzi (1994), Papadimitriou (1994),
and universal Turing machines and proves the un- and many of the chapters of van Leeuwen (1990).
solvability of the halting problem. Church (1936)

527
7 a ~ = oe

‘9 weihy) ‘ oF
aa8 PAaT © \) OE = eet ee GaSe ti

Fe lg eo eet ee =
eagvInat<a
ee Pe rae epee

: ; i rE ye re | 2 acre thee iW) eapos ih rowel


7 RA:
os glo aia ' © Ween Wr pit ate ihe Rivipaigeh oe
el ee a] ) b ines [oy ;Po IVE Ws at SESE #3 or
‘- \
a ‘ ey Toe © PI AeA
Are
re
@
apiete Ry any

gctutdion (ttn tie id

ae Sad 1G terstreate Toilyxs Yall eld xetindios Sl) cea A ayes waine a:
| ema oea AaMisiRes PROT) Sie at Ob2aa8 nies | ee hak imnerinke crepe seen ae
SSntiioan anu Tot abtioubein ee vp (SCC) marvin jeabwindl banat
or GAP) bng HU! 16 aioqng2Air voiiidnies YieroryeKy it Wo eh
sfiob ditMietiseanAT bis C1! cori Mo Hany
7G
hidalcdieibitmimes
*Fo coipiaviniss od? vitotivsit ytertiod) ait to mein
i ists Stee) SV ISA RG
Sh iota gist? ‘iy z we 6

7 ees hubavod-warit Eiae suninaritg sv iiasse-/ xm) ites pres


sone db FOROTS “emnedba.. A800) 8

: se fever) L eiage at vod 2 wert) ~duientgs ahhakh whe proms Tee rhe

ae niteaicseih tsa sMidony vikwireaqernias peo" band) bag (001) vod bineiv vt a
“yf serxgar insignd 30) Yh wine A 49) ee —<—
: (POO tr
Gh gnived labhe> bis Hoel quel echt a att ahd ni cmorige UID jinn » tovesti a
25 watittav iow hex yairimudens die ch levied On Maett he a 6 cette sei hag

ayeet (5287) areata B07} aarwGtt a) enrize STES wlues ‘soul oAr iB
ie ilepiucioe 7) IM) SARESITIEN pera gin TH) edd 30 eA han aA 0} eoepleahip sada
midA leon Do QR mottt] Sri set 1) Lesdavrod? 40tes ot
bee iets
vt eglaeiehinys etiansnshy bar i401) aitten4 eh | ey (GZRT) ners) oh Sih ane.Ll
it ae bree paises: pees Ries OT Fi, {2 P{ }renthotikt rufsgqa 11 > andvisclts A

‘at bexaybarnts: age Sani ee ey nS |mamunenT Renqautl ofl AGSeT)=


Hi ot) colin freed, dat} bee pers tsi nf GvOnE SI Ula, 5 msontl brs
bods Teds boo, sels bis ae scisiah Oe WY (\ 0 nite Gil aasf
he j

> eeAG FamA arsltoas SA eh piste ft bembounl, sey runrunety vail


p epabisiona, Sealtjined-S. lo. techaupes Sa; yt) aa wn hs iewobgece|
=

ae liiponishadt 5 ya cota tl Quuitetes oo Sal) at. mre ae aunsleviups age :


a ‘ndsa]CORSE) perms: Reta You snout girl ftNOL ew emhasibss be)
ue enorslamia-AM ro Sos rata Ieee? commmet® tyre Aiwa.) ns bocuberritar Miya hi vas ene

on- uvan nsdtminke Sie soytntie 0 ederim fad ino MORDiOR ai Qnty CSE
) pehiqm Gs n mrivoS oiient VA 94 ot
inn esoneti ‘wert siete tre
AOC! bus 0201) ses s'ehh shut ni yiirs!qais ian a ee a
AROQ) pamiiimibatet (PCy new bea el! sutton gnoel oreasta sinh 9
* (OC?1) nawlse. Dany te moiqido stl) Ser vetere ttn my sit anventy bas stencils
=.
Sey
jel) dnd} eierty 9
sani
ve a i ion
“~—T2s ay

ie oa" 1 a a By

‘ =a ahh)cep
a bis, a

j peo : a

oss 7
eae
Pig
ie
BIBLIOGRAPHY

Balcazar JL, Diaz J, Gabarro J: Structural Complexity I. Davis MD: Computability and Unsolvability. New York:
New York: Springer-Verlag, 1988. McGraw-Hill, 1958.
Balcazar JL, Diaz J, Gabarro J: Structural Complexity II. Davis, MD: The Undecidable. Hewlett, NY: Raven Press,
New York: Springer-Verlag, 1990. 1965.
Bar-Hillel Y, Perles M, Shamir E: On Formal Properties of Davis, MD, Sigal R, Weyuker EJ: Computability,
Simple Phrase Structure Grammars, Zeitschrift fiir Complexity, and Languages: Fundamentals of
Phonetik Sprachwissenschaft und Theoretical Computer Science, 2nd ed. New York:
Kommunikations-forschung 14: 143-172, 1961. Academic Press, 1994.
Boas RP: Can We Make Mathematics Intelligible? Dowling WF: There Are No Safe Virus Tests, American
American Mathematical Monthly 88: 727-731, 1981. Mathematical Monthly 96: 835-836, 1989.
Bovet DP, Crescenzi P: Introduction to the Theory of Earley J: An Efficient Context-free Parsing Algorithm,
Complexity. Englewood Cliffs, NJ: Prentice Hall, Communications of the ACM 13(2): 94-102, 1970.
1994. Evey J: Application of Pushdown Store Machines,
Carroll J, Long D: Theory of Finite Automata with an Proceedings, 1963 Fall Joint Computer Conference,
Introduction to Formal Languages. Englewood Montvale, NJ: AFIPS Press, 1963, pp. 215-227.
Cliffs, NJ: Prentice Hall, 1989. Floyd RW, Beigel R: The Language of Machines: An
Chomsky N: Three Models for the Description of Introduction to Computability and Formal
Language, IRE Transactions on Information Theory Languages. New York: Freeman, 1994.
2: 113-124, 1956. Garey MR, Johnson DS: Computers and Intractability: A
Chomsky N: On Certain Formal Properties of Grammars, Guide to the Theory of NP-Completeness. New York:
Information and Control 2: 137-167, 1959. Freeman, 1979.

Chomsky N: Context-free Grammars and Pushdown Hartmanis J, Stearns RE: On the Computational
Storage, Quarterly Progress Report No. 65, Complexity of Algorithms, Transactions of the
Cambridge, MA: Massachusetts Institute of American Mathematical Society 117: 285-306, 1965.
Technology Research Laboratory of Electronics, Hennie FC: Introduction to Computability. Reading, MA:
1962, pp. 187-194. Addison-Wesley, 1977.
Church A: An Unsolvable Problem of Elementary Hopcroft JE., Motwani R, Ullman J: Introduction to
Number Theory, American Journal of Mathematics Automata Theory, Languages, and Computation, 2nd
58: 345-363, 1936. ed. Reading, MA: Addison-Wesley, 1979.
Cobham A: The Intrinsic Computational Difficulty of Huffman DA: The Synthesis of Sequential Switching
Functions, Proceedings of the 1964 Congress for Circuits, Journal of the Franklin Institute 257:
Logic, Mathematics, and Philosophy of Science, New 161-190, 275-303, 1954.
York: North Holland, 1964, pp. 24-30. Immerman N: Nondeterministic Space is Closed under
Cook SA: The Complexity of Theorem Proving Complementation, S[AM Journal of Computing 17:
Procedures, Proceedings of the Third Annual ACM 935-938, 1988.
Symposium on the Theory of Computing, New York: Karp RM: Reducibility Among Combinatorial Problems.
Association for Computing Machinery, 1971, pp. In Complexity of Computer Computations. New
151-158. York: Plenum Press, 1972, pp. 85-104.

529
530 Bibliography

Kleene SC: Introduction to Metamathematics. New York: Paulos JA: Once upon a Number: The Hidden Logic of
Van Nostrand, 1952. Stories. New York: Basic Books, 1999.
Kleene SC: Representation of Events in Nerve Nets and Post EL: A Variant of a Recursively Unsolvable Problem,
Finite Automata. In Shannon CE, McCarthy J (eds), Bulletin of the American Mathematical Society 52:
Automata Studies. Princeton, NJ: Princeton 246-268, 1946.
University Press, 1956, pp. 3-42. Rabin MO: Real-Time Computation, /srael Journal of
Knuth, DE: On the Translation of Languages from Left to Mathematics 1: 203-211, 1963.
Right, Information and Control 8: 607-639, 1965. Rabin MO, Scott D: Finite Automata and their Decision
Kuroda SY: Classes of Languages and Linear-Bounded Problems, JBM Journal of Research and
Automata, Information and Control 7: 207-223, Development 3: 115-125, 1959.
1964. Rogers H, Jr: Theory of Recursive Functions and Effective
Levine JR, Mason T, Brown D: lex & yacc, 2nd ed. Computability. New York: McGraw-Hill, 1967.
Sebastopol, CA: O’Reilly & Associates, 1992. Rosenkrantz DJ, Stearns RE: Properties of Deterministic
Lewis HR, Papadimitriou C: Elements of the Theory of Top-down Grammars, Information and Control 17:
Computation, 2nd ed. Englewood Cliffs, NJ: 226-256, 1970.
Prentice Hall, 1998. Salomaa A: Jewels of Formal Language Theory.
Lewis PM II, Stearns RE: Syntax-directed Transduction, Rockville, MD: Computer Science Press, 1981.
Journal of the ACM 15: 465-488, 1968. Savitch WJ: Relationships between Nondeterministic and
Linz P: An Introduction to Formal Languages and Deterministic Tape Complexities, Journal of
Automata, 3rd ed. Sudbury, MA: Jones and Bartlett, Computer and System Sciences 4: 2, 177-192, 1970.
2001. Schiitzenberger MP: On Context-free Languages and
McCulloch WS, Pitts W: A Logical Calculus of the Ideas Pushdown Automata, Information and Control 6:
Immanent in Nervous Activity, Bulletin of 246-264, 1963.
Mathematical Biophysics 5: 115-133, 1943. Sudkamp TA: Languages and Machines: An Introduction
Moore EF: Gedanken Experiments on Sequential to the Theory of Computer Science, 2nd ed. Reading,
Machines. In Shannon CE, and McCarthy J (eds), MA: Addison-Wesley, 1996.
Automata Studies. Princeton, NJ: Princeton Sipser M: Introduction to the Theory of Computation.
University Press, 1956, pp. 129-153. Boston, MA: PWS, 1997.
Myhill J: Finite Automata and the Representation of Szelepesény R: The Method of Forcing for
Events, WADD TR-57-624, Wright Patterson Air Nondeterministic Automata, Bulletin of the EATCS
Force Base, OH, 1957, pp. 112-137. 33: 96-100, 1987.
Nerode A: Linear Automaton Transformations, Turing AM: On Computable Numbers with an Application
Proceedings of the American Mathematical Society to the Entscheidungsproblem, Proceedings of the
9: 541-544, 1958. London Mathematical Society 2: 230-265, 1936.
Oettinger AG: Automatic Syntactic Analysis and the van Leeuwen J (ed): Handbook of Theoretical Computer
Pushdown Store, Proceedings of the Symposia in Science (Volume A, Algorithms and Complexity).
Applied Mathematics 12, Providence, RI: American Amsterdam: MIT Press/Elsevier, 1990.
Mathematical Society 9, 1961 pp. 104-109. Younger DH: Recognition and Parsing of Context-free
Ogden O: A Helpful Result for Proving Inherent Languages in Time n°, Information and Control
Ambiguity, Mathematical Systems Theory 2: 10(2): 189-208, 1967.
191-194, 1968.
Papadimitriou CH: Computational Complexity. Reading,
MA: Addison-Wesley, 1994.
INDEX OF NOTATION

{a, b,c} = (alphabet) 29


LGN PAGR..s} A 29
{x | P(x)} |x| 29

29
>I!

{x € A| P(x)} Ng (x) 29
{3i +77 |i, 7 = O} xy, XYZ 30
‘= L,L2 30
Al
Ga SEF 30
U peal be 31
¢ Lx] 35
AUB,ANB
|L| 38
A-—B
RSS Rem Re 41,70
4)
n! 47, 58
A@B n
i=1 60
AN Big C
pal 63
jot Ai AE 63
Upe Ai F

Up Ai rev, x" 71
QA C(n, i) 183
»
(a, b) GAE 75
AxB + (in a regular expression) 85
(Qj,2s ->5Gn) WwW
WW
PW
HA
HR (ri +12) (rir2), (r*)
OCOODOO0OD0DWM
wWOmMHIAMUNAA
DW 86
A, X A2x...An ©7),@) 86
IN =e FA 95
— (conditional) M = (Q, =, qo, A, 5) (an FA) 95
<-> 6* (in an FA) 98
=> (logical implication) L(M) (for an FA) 99
? L/x 105
Ax (P (x)), Vx(P(x)) Ln 107
dx € A(P(x)) rreyv 114
ay > 0(P()) sh(r) 114
f(x) NFA 125
R M = (Q, =, qo, A, 5) (an NFA) 125
(aca B 6* (in an NFA) 126
N L(M) (for an NFA) 127
f(S) NFA-A 135
Rt M = (Q, %, qo, A, 5) (an NFA-A) 135
sof 5* (in an NFA-A) 136
f-@) A(S) 137
f"S8) L(M) (for an NFA-A) 137
Z My = (Qu, 2, qu, Au, Su) 146
xey,e(x,y),TeT Mz = (Qc, 2, Ge, Acs Sc) 147
aRb My = (Qk, U, qk, Ak, 9k) 148
a=, b L(p,q) 152
L(p, 4; J) 152
[a], [alr

531
532 Index of Notation

154 NTM 341


r(p,4sJ) 342
168 -, F* (in an NTM)
Ty 342
169 L(T) (for an NTM)
Lq 348
QL 170 Ty
176 8) = {q1, 92, Denies = {a}, a2, eet 348
P=4,P #4 349
lengths(L) 184 s
e(T), e(m), e(z) 349
dz,x,y
192
(a, b,c), (D1, D2, D3) 350
x 197
Insert(a) 356
L;/L2 198
aAB > ayB,aAB > y 371
—, => (in a context-free grammar) 203
a—> Bp 372
S>a|p 204
G =(V, X, S, P) (an unrestricted grammar) E72
CFG 206
=> (in an unrestricted grammar) ay
G =(V, B, S, P) (a CFG) 206
L(G) (for an unrestricted grammar) 372,
>¢6,>5.>" 206
(0102) 377
CFL 207
CSG, CSL 381
L(G) (for a CFG) 207
LBA 382
d(x) 210
(01 (02), (01 (02)), (0102), 4(o1 (02)", ete. 383, 384
Gy = (Vu, x, Su, Py) 212 386
CS, R, RE
Ge = (Ve, B, Se, Pc) 213 T 393
G*
213 394
251
[0, 1)
CNEF (Chomsky normal form) 403
Aco
BNF 241 408
NSA, SA
241 408
E
{} (in a BNF rule) 241 409
Self-accepting
(p, a) (PDA move) 253 E(P), Y(P), N(P) 409
PDA 2d5 < (with decision problems) 412
M =(@Q,2,T,q0, Zo, A, 6) (a PDA) 255 < (with languages) 412
(q, x, @) (configuration in a PDA) 256 Accepts 413
Ey, t, Fy, F* (in a PDA) 256 Acc 413
L(M) (for a PDA) 256 Halts 414
DPDA 260 H 414
DCFL 260 Accepts-A 417
[p, A,q] 275 Write(x) 417
$ 281 AcceptsSomething 418
LL(k) 285 AcceptsEverything 418
ad 309 Subset 418
™ 32 WritesSymbol 418
T = (Q, Z,T, go, 5) (a TM) 321 AcceptsNothing 419
slip 321 Pr 420
A $2 AcceptsTwo 421
{R,L,S} 321 AcceptsFinite 421
(r, Y, D) (aTM move) 321 AcceptsRegular 421
(q,xay),
(q, xw) S71 AcceptsRecursive 421
Er, F, F7,* Gina TM) 322 Equivalent 421
L(T) 322 Accepts-L2 421
(x*)* 328 PCP 422
XL 331 (a, B) (in a correspondence system) 422
T{T, T) > To, T > Th 333 MPCP 423
Delete 336 xqy (aTM configuration) 425
Copy, Reverse, Equal 336 Gy, Gp 43]
(FU {A})” 338 CFGNonemptyIntersection 431
{R,L,S}" 338 IsAmbiguous 431
(9, X141Y1,---,XnQnYn) 338 Z0HZ1if #Z2#Z3r ...#Z),_
r
1#z,# 432
Index of Notation 533

Cr: 434 InitConfig™ ,IsConfigr 465


CFGGeneratesAll 436 ST, tsT 466
WritesNonblank 438 HighestPrime 467
AE 439 Mover 468
Ps 440 Tracer 469
CSLIsEmpty 440 Acceptingr 469
b 442 MovesToAccept 470
Co.
$sPE 444 Pr 471
Add gn’ 471
PR 445 Add,,, Multy, 475
Mult 446 ApeP, EsgP
fale
475
Pred, Sub 448 476
448 f(n) = O(g(n)), f = O(g) 482
AC (Ch emyoeaneen rs 1x 449 f = O(g) 482
XP 451 f(n) = o(g(n)), f = 0(g) 483
LT, EQ, GT, LE, GE, NE 452 tr, sr (fora TM) 487
Sg 452 Tt, (for an NTM) 489
Div, Mod 454 tr, Sx, Sr (for an NTM) 490
R(x,y) 454 Time(f), Space(f), NTime(f), NSpace(f) 493
Q(x,y) 455 P, PSpace, NP, NPSpace 500
Sq, PerfectSquare 455 CNF (conjunctive normal form) 502
A(x, y), Halts(x) 455 xu 502
Ep, Ap 456 CNF-Sat 502
mp 458 CNF-Satisfiable 503
k 506
LL 458 Sp
Nini % 508
PrNo 458
ae,j 511
Prime 459
CNF " 512
Mp 460
Qi,j, Aik, Si,k 513
byl P(X, i y)] 460
M 3-Sat 517
460
3-Satisfiable 517
Ms 460
G53-G 518
Hee slemcsees 462
IS, VC, CSG 524
Exponent 463
® 524
Resultr 465
Subexp 525
Ui 465
= ae! vi = , >
etre. =s"

asa ; SREY NS EH

(te whan. site


‘one Pye sia’

dag a pats tage A 4

‘i. me = ery
ee
22GB? | ee
a Warren’.
we its. Atwrattiheren poe

aoe eee ht "


sh &

7 *, 4 x % STD
: m Oe =~,iw - ‘ ™
_ Racha Seaport gmoney + ERG, yA = Pe
acy. * oy & oh ths a
i ita r eV
; aru
Pe:se *. r
ee WBE.
9: ga eile tga = sadn rr} | an
“e? : he= peu
ea

(ye | Mor ter


Ca ee
sigh Ea mee 0
‘ iwet} Ale ‘ye ih

cH Se OE) wee 71
Poh A At yo ride
a) * \ aor ov As
Be. ry) isonet ia 3 raat
ay ms 2 rive VA TY sont
é ayom.
(pares: “ersigt Sree) 24
a leah ca

St
age ee
At)

wh
igeti'w. pint
ip ih | A?

ay SP gi pare : ~ <4
: é i= ait
em ed, | py ’ ‘
; f nF : ra :
io ct
VU TM wm: :
“ay eatLh be
7 bah i=: pi?

ert “a
wore
_ i
——<“a* my ssi?
UY a ee A
[anbee as ' hee fale |
ey ra >

‘ rire Sener | yar dig = 7


7
Anh gee ae SoM

:) ae
INDEX

Approximately proportional, 482


A-Derivable, 236 Arithmetic progression, 200
Absorptive laws, 5 Associative laws, 6
Abstract machine, 32, 94 Axioms, 43
Acceptance
by a PDA, 256
by a TM, 322, 365
by an FA, 99 Backus-Naur form, 241
by an NFA, 127 Balanced strings of parentheses, 230
by an NFA-A, 137 Basis step, 51
by an NTM, 342 Beigel, R., 314
by empty stack, 257, 273 Biconditional, 12
by final state, 257 Bigger than, 388
Accepting a language by a TM, 365 Big-oh, 483
Accepting configuration Big-theta, 483
of a PDA, 257 Bijection, 19, 388
of a TM, 322 Bin, 26
Accepting states Binary operation, 22
of an FA, 94 Binary tree, 298
of an NFA, 125 height of, 298
Accepts, 413 leaf nodes in, 298
Accepts-A, 417
path in, 298
Accepts-L2, 421
Blank symbol, 321
BNF. See Backus-Naur form
AcceptsEverything, 418
Boas, R.P., 95
AcceptsFinite, 421
AcceptsNothing, 419 Bottom-up parsing, 287
AcceptsRecursive, 421
Bottom-up PDA corresponding to a CFG, 270
Bound variable, 15
AcceptsRegular, 421
Bounded existential quantification, 456
AcceptsSomething, 418
Bounded maximalization, 476
AcceptsTwo, 421
Bounded minimalization, 458
Activation records, 496
Bounded sums and products, 457
Add (addition function), primitive recursive derivation of, 446
Bounded universal quantification, 456
Addition, 446
Branch point, 305
Algebraic expressions, 174, 207
Breadth-first traversal, 130, 343
an unambiguous CFG for, 226
Busy beaver function, 442
as an example of a nonregular language, 174
fully parenthesized, 63
Algorithm for recognizing a language, 93
Algorithmic procedure, 353 C identifiers, 89
Alphabet, 29 C programming language, 114
Ambiguity C programs, 303
in a CFG 223 C-computable, 473
in English, 223 Canonical order, 344
Ambiguous CFG, 223 Cantor, G., 394
Ancestor, 403 Cartesian product, 9

535
536 Index

Cases, definition by, 453 Computation tree


Cases in a proof, 47 for a PDA, 259
CFG. See Context-free grammar for an NFA, 129
CFGGeneratesAll, 436 for an NTM, 342
CFGNonemptyIntersection, 431 Computers and FAs, 190
CFL. See Context-free languages Computing a function with a TM, 328
Characteristic function Concatenation
of a predicate, 451 of CFLs, 213
of a set or language, 331 of languages, 30
Chomsky, N., 385 of regular languages, an NFA-A to accept, 147
Chomsky hierarchy, 385 of strings, 30
Chomsky normal form, 237 unit of, 30
Transforming a CFG to, 239 Conditional, 11
Church, A., 353
Conditional composition of TMs, 333
Church’s Thesis. See Church-Turing thesis Conditional go-to statement, 473

Church-Turing thesis, 353 Configuration


of a PDA, 256
informal evidence for, 353
Closed under an operation, 22
of a TM, 321
Configuration number, 465
CNE. See Chomsky normal form; Conjunctive normal form
Congruence mod n, 24
CNF-Sat, 502
Conjunct, 502
CNF-Satisfiability problem, 502
Conjunction, 10
NP-completeness of, 512
Conjunctive normal form, 502
CNF-Satisfiable, 503
Constant functions, 443
CNF-Tautology, 523
Constructive definition, 443
Cocke-Younger-Kasami algorithm, 312
Constructive proof, 44
Codomain, 17
Context, 206, 209
Coloring the vertices of a graph, 519
Context-free grammar, 206
Combining TMs, 332
corresponding to a given PDA, 273
Commutative laws, 5 equivalent to a regular expression, 214
Complement language generated by a, 207
of a CFL, 307 Context-free languages, 207
of a CSL, 387 concatenation and * of, 213
of a DCFL, 310 deterministic. See Deterministic CFL
of a graph, 520 intersections and complements of, 307
of a language, 29 intersections with regular languages, 308
of a recursively enumerable language, 368 union of, 212
of a regular language, 110 Context-sensitive grammar, 381
of a set, 4 corresponding to a given LBA, 383
Complementary problem, 410 Context-sensitive language, 381
Complete graph, 508 complement of, 387
Complete induction, 57 recursive language that is not a, 440
Complete pairing, 250 recursiveness of, 385
Complete subgraph problem, 508 Contradiction, 13
NP-completeness of, 517 proof by, 45
Complexity classes, 493 Contrapositive, 14
Composite decision problem, 504 proof by, 44
Composite TM, 332 Converse, 12
Composition, 20, 444 Cook, S., 510
Compound statement, 208 Cook’s theorem, 512
Computable, 329 Copying a string, 335
by a grammar, 473 Correctness of programs, 54
Computable functions, jz-recursiveness of, 465 Correspondence system, 422
Computation of a TM, 432 Countable set, 389
Index 537

Countable union of countable sets, 391 Dowling, W., 401


Countably infinite, 389 DPDA. See Deterministic PDA
Counter automaton, 292 DSpace(f), 498
Counting the elements of a set, 388 Dummy variables, 447
Course-of-values induction, 57
Course-of-values recursion, 463
Crash of a TM, 320
CSG. See Context-sensitive grammar Earley, J., 312
CSL. See Context-sensitive language Effectively computable, 474
CSLIsEmpty, 440 egrep, 190
Empty language, 86
D Empty set, 5
Empty stack, acceptance by, 257
Dangling else, 224
Encoding
DCEL. See Deterministic CFL
of a problem instance, 337, 341
Decidable problem, 410
of a string, 80
Deciding a language by a TM, 365
Decision algorithm, 186 of a TM, 348
of aTM move, 349
Decision problems, 186
instances of, 409 reasonable, 407, 416, 420
involving CFLs, 311 Encoding function, 349
involving regular languages, 186 English, grammar rules for, 209
unsolvable, 410 Enumerating a language, 368
Deleting a symbol, 335 in canonical order, 371
De Morgan laws, 6 Equivalence class, 26
Depth-first traversal, 130 of I, as a state in an FA, 169
Derivation Equivalence relation, 24
leftmost, 222 Equivalent, 421
of a string in a CFG, 204, 220 Erasing a TM tape, 356
simulation by a PDA, 266 Eventually periodic function, 200
of a string in an unrestricted grammar, 372 Exact cover problem, 525
simulation by a TM, 375 Existential quantifier, 15
Derivation tree, 220 Exponential function, growth rate of, 481
Deterministic, 124 Expression graph, 167
Deterministic CFL, 260 Expression tree, 222
complement of a, 310 Extended transition function (6*)
Deterministic PDA, 260 of an FA, 98
Diagonal argument, 394, 395 of an NFA, 126
Difference nonrecursive definition of, 126
of regular languages, 109 recursive definition of, 127
of sets, 5 of an NFA-A,; 136
Direct proof, 43 nonrecursive definition of, 136
Directed graph, 496 recursive definition of, 137
Disjoint, 5
Disjunction, 10 F
Disjunctive normal form, 511
FA. See Finite automaton
Distinct, 5
Distinguishable with respect to L, 105 Factorial function, 58, 444
Distinguished positions in a string, 304 Factoring in a CFG, 284
Distinguishing between two strings, 90 Factors in an algebraic expression, 227
Distributive laws, 5 False, 13
Div, 454 Fermat’s last theorem, 436
DNF-Satisfiability, 523 Fibonacci function, 59
Domain, 10 Final states in a TM, 320
of a function, 17 FindNull algorithm, 234
Dominoes, 422 Finer (with regard to partitions of a set), 42, 175
538 Index

Finite automaton, 95 Hennie, F., 451


corresponding to a given NFA, 130 Hierarchy theorems, 495
language accepted by, 99 Homomorphism, 166
string accepted by, 99
Finite subsets of the natural numbers, 64
Finite-state machine, 95
Idempotent laws, 5
First in, first out, 352
Identifier in C, 89
Floyd, R., 314
If and only if, 12
For statements, 208
If statements, 208
Free variable, 10
Immerman, N., 387
Fully parenthesized algebraic expressions, 63
Indirect proof, 44
Function, 17
Indistinguishability relation, 109, 168
codomain of, 17
set of equivalence classes of, 169
domain of, 17
Indistinguishable with respect to L, 105
Fundamental theorem of arithmetic, 55
Induction. See Principle of mathematical induction
Induction hypothesis, 51
Induction on the length of the string, 53
Garey, M., 522 Induction step, 51
Generalized regular expression, 119 Inference, rules of, 43
Generally accepted facts, 43 Infinite decimal expansion, 394
Generating languages, 32 Infinite loop, 322, 415
Gédel, K., 461 Infinite sets, 388
Godel number, 462 characterization of, 389
of a sequence of natural numbers, 462 Infinity lemma, 397
of a string, 470 Inherently ambiguous, 226
Gédel numbering, 462 Initial configuration corresponding to input x, 322
of a TM, 465 Initial functions, 443
Gédel’s incompleteness theorem, 462 Initial state
Going through a state, 152 of a PDA, 255
Goldbach’s conjecture, 416
of a TM, 321
Grammar, 203
of an FA, 95
context-free. See Context-free grammar
Injection, 19
context-sensitive. See Context-sensitive grammar
Input alphabet of a TM, 321
linear, 220
Input symbols, 95
phrase-structure, 372
Input to a TM, 322
regular. See Regular grammar
Inputs to an FA, 95
unrestricted. See Unrestricted grammar
Inserting a symbol, 336
Grammar rules, 204, 206
Instance of a decision problem, 186
for English, 209
Intersection
Graph, 508
of a CFL and a regular language, 308
Graph of a partial function, 401
grep, 190
of CFLs, 307
Growth rates, 481 of regular languages, 109
Guessing in a PDA, 257 of sets, 4
Intractable problem, (PART INTRO)
Inverse function, 21
IsAmbiguous, 431
Halt, 321 Isomorphic, 121
Halting problem, 414 Isomorphism from one FA to another, 121
unsolvability of, 415
Halting states in a TM, 320
J
Halts, 414
Hardest problem in NP, 502 Johnson, D., 522
Index 539

Logically equivalent, 13
k-colorability problem, 519 Logically implies, 13
NP-completeness of, 520 Lookahead, 281
k-coloring, 519 Loop in a path in an FA, 180
Kleene, S.C., 31 Loop invariant, 79
Kleene star, 31
Kleene’s Theorem
Part 1, 146 Many-one reducibility, 412
Part 2, 151 Mate of a left parenthesis, 230
Mathematical induction. See Principle of mathematical
induction
A, 29 Maximal pairwise inequivalent set, 42
A-closed, 165 Membership problem
A-closure, 136 for a CFL, 311
algorithm to calculate, 137 for a regular language L, 186
A-productions, 232 for regular languages, 186
algorithm to eliminate, 234 for the language Y(P), 410
A-transitions Membership table, 14
eliminating from an NFA-A, 140 Memory
in an NFA-A, 135 in an FA, 95
in a PDA, 253, 258 required to recognize a language, 90, 106
Language, 28 Migrating symbols, 374
accepted by a PDA, 256 Minimal counterexample principle, 54
accepted by a TM, 322 Minimalization, 457
accepted by an FA, 99 bounded, 458
accepted by an NFA, 127 unbounded, 460
context-free. See Context-free languages Minimization algorithm, 179
recognized by an FA, 99 Minimum-state FA for a regular language, 170
recursive, 365 Mod, 454
recursively enumerable, 365 Model of computation, 189, 352
regular. See Regular languages Modified correspondence system, 425
Last in, first out, 251 Modified Post’s Correspondence Problem, 423
LBA. See Linear-bounded automaton reduction to PCP, 424
Left inverse, 471 unsolvability of, 425
Left recursion, 283 Modula-2, 226
eliminating, 283 Modus ponens, 34
Leftmost derivation, 222 Monus operation, 448
simulated by a top-down PDA, 266 MPCP. See Modified Post’s Correspondence Problem
Leftmost square of a TM tape, 320 j4-recursive functions, 460
Length of a string, 29, 71 computability of, 461
lex, 190 Mult (multiplication function), primitive recursive
Lexical analysis, 189 derivation of, 446
LIFO. See Last in, first out Multiplication, 446
Linear grammar, 220 Multitape TM, 338
Linear speed-up, 489 configuration of, 338
Linear-bounded automaton, 382 simulation of by an ordinary TM, 339
corresponding to a given CSG, 382 Mutual recursion, 206
Listing the elements of a set, 390 Myhill-Nerode Theorem, 171
Literal, 89, 114, 502
Little-oh, 483 N
Live variable in a CFG, 246 Natural numbers, 4
LL(k) grammar, 285 Negating quantified statements, 17
Logical connective, 10 Negation, 10
540 Index

NFA. See Nondeterministic finite automaton


NFA-A. See Nondeterministic finite automaton with Ogden’s Lemma, 304
A-transitions One-to-one, 19
No-instances, 409 Only if, 11
Non-context-free languages, 297 Onto, 18
Non-context-sensitive language, 440 Operation on a set, 22
Non-instances, 409 Order of a regular language, 118
Non-recursively-enumerable languages, 396 Ordered n-tuples, 9
example of, 408 Ordered pairs, 9
Non-self-accepting, 408 Orders of infinity, 402
Nondeterminism Output of a TM, 320
eliminating from an NFA, 130
in a PDA, 253, 258
in a TM, 341
P, 500
in an FA, 125
P =NP problem, 502
Nondeterministic finite automaton, 125
significance of, 510
acceptance by, 127
Pairing function, 393
corresponding to a given NFA-A, 139
Pairwise disjoint, 5
transition function, 125
Pairwise distinguishable, 108
Nondeterministic finite automaton with A-transitions, 135
Pairwise inequivalent, 42
acceptance by, 127
Palindromes, 63
for a given regular expression, 148
a PDA accepting, 257
for the concatenation of two languages, 147
a TM to accept, 324
for the Kleene * of a language, 148
nonregularity of the language of, 108
for the three basic languages, 146
not accepted by any DPDA, 264
for the union of two languages, 146
Papadimitriou, C., 517
transition function, 125
Parallel computation, 490
Nondeterministic space complexity, 490
Parse tree, 220
Nondeterministic time complexity, 490
Parsing, 280
Nondeterministic TM, 341
bottom-up, 287
simulation by an ordinary TM, 342
top-down, 281
space complexity of a, 490
Partial function, 321, 328
time complexity of a, 490
computed by a TM, 328
Nonpalindromes, 205
Partial solution to a modified correspondence system, 425
Nonrecursive languages, 409
Partition, 26
Nonregular languages, 108
Partition problem, 525
Nonterminals, 206
Nontrivial property of recursively enumerable languages, 420 Path in a binary tree, 298
Paulos, J., 80
Normal forms for CFGs, 232
PCP. See Post’s Correspondence Problem
n-place predicate, 451
PDA. See Pushdown automaton
NP, 500
Peano axioms, 57
NP-complete, 509
Permuting the variables of a function, 447
NP-hard, 510
Phrase-structure grammar, 372
NPSpace, 500
NSA. See Non-self-accepting Pigeonhole principle, 77
Polynomial function, growth rate of, 485
NSpace(f), 493
Polynomial time, 500
NTime(f), 493
NIM. See Nondeterministic TM Polynomial-time reducible, 506
Null inputs to an NFA, 134 Post, E., 422
Null string, 29 Post machine, 363
Nullable variables in a CFG, 234 Post’s Correspondence Problem, 422
algorithm to find, 234 instance of, 422
Power set, 9
Numeric function, 329
Index 541

Precedence of operations in a regular expression, 87 top-down, corresponding to a given CFG, 267


Precedence relation, 288 transition diagram for, 254
Predecessor function, 448
Predicate, 451
Q
characteristic function of, 451
Quantified statement, 15
n-place, 451
alternative notation for, 16
primitive recursive, 452
Queue, 352
relational, 452
Quotient of two languages, 198
Prefix, 30
Prefix property, 80
Preserved under bijection, 400
Primality problem, 503 Range of a function, 18
Prime, 16 Reachable states in an FA, 117, 176
Prime factorization, 55 Reachable variable in a CFG, 246
Primitive recursion, 445 Real literals in Pascal, 89
Primitive recursive derivation, 446 Reasonable encoding, 409
Primitive recursive functions, 445 Recognizing a language, 90
computable total functions that are not, 450 by a TM, 365
computability of, 449 Recognizing strings in a language, 31
Primitive recursive string functions, 472 Recursive definition
Principle of mathematical induction, 51 of a function, 58
Productions of a set, 62
in a CFG, 206 of the union of n sets, 60
ina CSG 381 Recursive descent, 285
in a regular grammar, 219 Recursive functions, 460
in an unrestricted grammar, 372 Recursive language, 365
Programming languages, 189 complement of a, 367
syntax of, 208 Recursively enumerable languages, 365
Projection functions, 444 complements of, 368
Proof, 43 _ countability of the set of, 393
by cases, 47 that are not recursive, 409
by contradiction, 45 unions and intersections of, 367
by contrapositive, 44 unrestricted grammars generating, 377
by counterexample, 44 Reduce (in a PDA), 270
by induction, 51 Reducible, 412
constructive, 44 Reducing one language to another, 412
direct, 43 Reducing one problem to another, 412
indirect, 44 Reduction (in a PDA), 270
Proper subset, 38 Reflexive, 24
Proper subtraction, 448 Regular expression, 85, 86
Proposition, 9 corresponding to a regular language, 86
Propositional variables, 12 for a given FA, 154
PSpace, 500 Regular grammar, 219
Pumping Lemma Regular languages, 85, 86
for context-free languages, 300 characterization of, 171
for regular languages, 181 regular expressions corresponding to, 86
weaker forms of, 184 unions, intersections, and differences of, 109
Pushdown automaton, 255 Reject state of a TM, 321
acceptance by, 256 Rejection by an FA, 99
bottom-up, corresponding to a given CFG, 270 Relation
configuration of, 256 from A to B, 23
deterministic, 260 on a set, 23
generalizations of, 352 Repeating a variable, 447
542 Index

Reverse function, 71 Subset, 4


Reversing a string, 329 Subset, 418
Rice’s theorem, 420 Subset construction, 131
Right invariant, 170 Substituting constants for variables, 447
Rightmost derivation, 223 Substring, 30
simulated by a bottom-up PDA, 270 Successor function, 444
Rightmost nonblank symbol, 336 Suffix, 30
Sum-of-subsets problem, 525
Surjection, 18
SA. See Self-accepting Symmetric, 24
Savitch’s theorem, 496 Symmetric closure, 42
Schréder-Bernstein theorem, 403 Symmetric difference, 7
Scope of a quantifier, 15 Syntax diagram, 208
Self-accepting, 408 Szelepcsenyi, R., 387
Self-accepting, 409
Self-embedded variable, 371
Self-embedding, 244 Tape alphabet of a TM, 321
Set identities, 5 Tape contents, 321
Sets, 3 Tape head, 320
comparing the size of, 387 Tape in a TM, 320
identities involving, 5 Tape number, 465
Shift, 270 Tautology, 12
Simulating a multitape TM, 339 Terminal symbols, 206
Sizes of infinite sets, 389 Terms in an algebraic expression, 227
Smallest subset, 41 Testing a possible solution, 492
Smallest transitive relation containing R, 70 Time complexity
Solvable problem, 410 of a TM, 487
Space complexity of an NTM, 490
of a TM, 487 Time(f), 493
of an NTM, 490 TM. See Turing machine
Space(f), 493 Tokens in a programming language, 189
Stack, 251, 262 Top-down parsing, 281
of a PDA, 255 Top-down PDA corresponding to a CFG, 267
Stack height, 264 Total function, 328
Stack-emptying state, 273 Tracks of a TM tape, 339
Stamm-Willbrandt, H., 120 Tractable problems, 500
Star height, 114 Transition diagram
Start symbol for a PDA, 254
in a CFG 206 for a TM, 323
in the stack of a PDA, 253 for an FA, 96
States, 94 for an NFA, 128
of a Turing machine, 320 Transition function
of an NFA, 130 of a PDA, 255
States of mind, 319 of a TM, 321
Step-counting function, 493 of an FA, 95
Stored-program computer, 348 of an NFA, 125
Strings, 29 Transition table
concatenation of, 30 for an FA, 96
length of, 29 for an NFA, 127
Strong principle of mathematical induction, 56 Transitive, 24
Structural induction, 68 Transitive closure, 42, 71
Subexponential function, 525 Tree traversal algorithm, 130
Subgraph, 508 Trivial property of recursively enumerable languages, 422
Index 543

True, 13 Unit of concatenation


Truth table, 10 for languages, 30
Truth value, 9 for strings, 30
Turing, Alan, 189, 319 Unit of multiplication, 30
Turing-acceptable, 365 Unit productions, 232
Turing-computable, 329 algorithm to eliminate, 237
Turing-decidable, 365 Universal quantifier, 15
Turing machines, 189, 319, 321 Universal set, 4
acceptance by, 322 Universal TM, 348
composite, 332 input to a, 348
computing a partial function, 328 Unix operating system, 190
configuration of, 321 Unrestricted grammar, 372
corresponding to a given unrestricted grammar, 375 generating a given recursively enumerable language, 377
deciding a language, 365 simulating a derivation in, 375
encoding of a, 348 TM corresponding to a, 375
enumerating a language, 368 Unsolvable problems, 413
initial configuration corresponding to input x, 322 involving CFLs, 430
input to a, 322 Useful variable in a CFG, 246
language accepted by, 322 Useless variable in a CFG, 246
move, 320
n-tape, 338 Vv
nondeterministic, 341 Valid computations of a TM, 432
recognizing a language, 365 the complement of the set of, 435
rejecting a string, 322 the set of, 434
simulating a multitape TM, 339 Variables in a CFG, 206
space complexity of a, 487 Venn diagram, 6
time complexity of a, 487 Vertex cover of a graph, 519
transition diagram for, 323 Vertex cover problem, 519
universal, 348 NP-completeness of, 520
variations of, 337 Virus tester, 401
Two stacks, 352
Two-stack automaton, 363 Ww
Typei language, 385
Weak precedence grammar, 289
Well-ordering principle, 57
Within parentheses, 230
Unambiguous CFG for algebraic expressions, 226 Word, 29
Unary operation, 22 WritesNonblank, 438
Unary representation, 329 WritesSymbol, 418
Unbounded minimalization, 460
Uncomputable functions, 442 Y
Uncountable, 389 yacc, 190
Union Yes-instances, 409
of CFLs, 212
of recursively enumerable languages, 367 Zz
of regular languages, 109
Zero-one knapsack problem, 525
of sets, 4
La

Eevore hheacens, +4 GalraciecweeYs,belt &


i Reese 4ay, © Of paparype! wl ‘agPsi 1)*

ee 1h qnioeei ye ao ia-wei LF
eee
7 fan 3 OF. ondteniqiidasWi nr S >= aes -

7 ak 205. gutixadxrny, J » vi q

ip eter ene rap TASS ces buifetanyle é . e i234

.
Bre e a Opto tr + (ah eety erdivin’!
iS d
> 2m lear iel! @ > pp ae

a Bee | Deere)
HE wor jug id

OA) devnaeys geting hit) | <a .


nis SNe acai, inion!
vleviavepen hy hap |S eCirnarams ee ere
PTE_al antarish s gael: ° . ~22E cmgoegnal
22 .2 ob ahem ST =e ' RL £90
CB : caviogii Rt epee & gate
OOP
oc PEF gretvtoes Ey Et, wa O wribyxjrmoaggieugiieds Me
08S DP 6 ofshtcher tha?
: OL DPE) oni abla costes
m lag ypsi we ” ’ 5 pe nd
wey, |
es | SEY 327 6 ts anatrene
gas eT. —.-

Pachatin NG®
So 00 ontFo saeco hy
ees sees 7 are ma
re wil

4 ie: 4, : -SRCRFE4asacenal
een sci ln sn € 8 snsiesdy mre

,
alegre fog ot Sle 2 eens rere!
ee MAO, THRs Pw
Hi Jerseotinsenas AY.
TO) s4nat etry
rae

w
one Ate, yt eT Ban
ft otgaaeiny Spreads sin?
HS corecimaacs ici?
REJee
_ SP AnabhagMaorher
ty s%. ab aati aan

Sigbasali,»
LO,

Pr}
4 ;pte
eiabcamaind Tos Vs secure
9a't
a vada Min, 13h a4
é 5 *
2:
an) yoo. se ;
er 2 fapsidin 2h

a ce aoiatbaoe
gr poms ” aia at Se A

_ Wis . i Soa Mien eyyte


ppeudliacs wa» a tow. —.
- 7 ror a)
.
Viger, ortiaaly ieee rs! oho Se \Poviia ibe a f
Sveti (planus “a ‘ Trane oletomy, ay IE <
ies
~~
e Tae toern A) Td Sais
f Pulp apey, We Dem pene wi,
a

f
M7,
ISBN O —O? =c3e2200— \y

||
(||

oO 780072°322 00 2
WWW mhhe com

You might also like