Minds Brains, Computers: Robert M. Hrrnish
Minds Brains, Computers: Robert M. Hrrnish
Computers
An Historical Introduction to the
Foundations of Cognitive Science
ROBERT M. HRRNISH
B
Digitized by the Internet Archive
in 2019 with funding from
Kahle/Austin Foundation
https://archive.0rg/details/mindsbrainscompuOOOOharn
Minds, Brains, Computers
For Csilla
Minds, Brains, Computers
« r
Robert M. Harnish
BLACKWELL
Publishers
Copyright © Robert M. Harnish 2002
The right of Robert M. Harnish to be identified as author of this work has been
asserted in accordance with the Copyright, Designs and Patents Act 1988.
2 4 6 8 10 9 7 5 3 1
All rights reserved. Except for the quotation of short passages for the purposes of
criticism and review, no part of this publication may be reproduced, stored in a
retrieval system, or transmitted, in any form or by any means, electronic, mechanical,
photocopying, recording or otherwise, without the prior permission of the publisher.
Except in the United States of America, this book is sold subject to the condition
that it shall not, by way of trade or otherwise, be lent, resold, hired out, or otherwise
circulated without the publisher’s prior consent in any form of binding or cover
other than that in which it is published and without a similar condition including
this condition being imposed on the subsequent purchaser.
Harnish, Robert M.
Minds, brains, computers: an historical introduction to the foundations of cognitive
science/Robert M. Harnish.
p. cm.
Includes bibliographical references and index.
ISBN 0-63I-2I259-0 (alk. paper)—ISBN 0-631-21260-4 (pbk.: alk. paper)
1. Cognitive science—History. I. Title.
BF3H.H339 2001
153'.09—dc21
00-052929
British Library Cataloguing in Publication Data
\ CIP catalogue record for this book is available from the British Library.
Typeset in 10.5 on 12.5 pt Ehrhardt
by Best-set Typesetter Ltd, Hong Kong
Printed in Great Britain by TJ. International, Padstow, Cornwall
List of Figures x
Preface xv
Acknowledgments xvii
Introduction 15
1 Associationism 16
2.1 Introduction 37
2.2 The Rise of Behaviorism and Stimulus-Response Psychology 37
VI Contents
3 Biological Background 55
3.1 Introduction 55
3.2 Brain Ventricles vs. Brain Substance 55
3.3 Cortical Localization vs. Holism 60
3.4 Nerve Net Theory vs. the Neuron Doctrine 66
3.5 The First Half of the Twentieth Century 71
Study Questions 76
Suggested Reading 77
4 Neuro-Logical Background 79
4.1 Introduction 79
4.2 Neural Networks and the Logic of Propositions 80
4.3 Perceptrons 85
4.4 Linear Separability and XOR: McCulloch and Pitts Nets
and Perceptrons 90
4.5 Simple Detector Semantics 95
Study Questions 100
Suggested Reading 102
Introduction 105
6 Architecture(s) 124
7 Representation(s) 153
Introduction 275
C. 1 Introduction 394
C.2 Functional View of Computers 395
C.3 Levels of Description View' of Computers 398
C.4 Combined Functional-Descriptive View of Computers 403
C.5 Levels of Computation: Stabler 404
C.6 Digital and Connectionist Computers 407
C.7 Is Everything a Computer? 410
Study Questions 411
Suggested Reading 412
Bibliography 413
Index 434
List of Figures
4.3 An OR net 82
4.4 An AND net 82
4.5 M&P neurons 83
4.6 Organization of the original perceptron 87
4.7 Organization of a three-layer (by units) perceptron 88
4.8 A Rosenblatt neuron 88
4.9 A simple M&P OR neuron 91
4.10 Cartesian coordinates for OR 92
4.11 OR is linearly separable 92
4.12 Truth tables for XOR and its negation 92
4.13 XOR is not linearly separable 93
4.14 A “perceptron” that computes XOR 94
4.15 The depth problem and the spread problem 98
Introduction: what is cognitive science describes the broad and narrow concep¬
tions of the discipline, then offers a combined view.
Part I: historical background traces some of the contributions of philosophy,
psychology, and neuroscience to what would become cognitive science, up to
the point of the development of the digital computer about 1950.
Part II: the digital computational theory of mind looks at a particular
artificial intelligence demonstration project SHRDLU, then surveys some
digital architectures, and some standard knowledge representation formats.
With this computational material in place we look at the theory of mind
these developments suggested as well as some of the most important problems
with it.
Part III: the connectwnist computational theory of mind follows the same
pattern for connectionist machines. It looks at a pair of connectionist compu¬
tational demonstration projects. Jets and Sharks and NETtalk. Then we survey
XVI Preface
This book grew out of lecture notes as the result of many years of teaching,
and talking, cognitive science. The hundreds of students who endured earlier
incarnations of this material must be the first to be thanked. Without their
patience and good sense not only wouldn’t this work be as it is, it wouldn’t be.
I was also aided in these courses by a string of excellent teaching and research
assistants: Peter Graham, Laleh Quinn, Bongrae Seok, Jack Lyons, Scott
Hendricks, and Brad Thompson, who compiled the index. Thanks are due to
teachers who introduced me to cognitive science, though it was not called that
then, and subsequently influenced my views (in chronological order): Bert
Dreyfus and John Searle at Berkeley, Jerry Fodor and Ned Block (then) at MIT.
Integrating what I learned from them has proven to be a challenge. Various
colleagues here at the University of Arizona and elsewhere read portions of
this work and provided useful feedback including (alphabetically): Kent Bach,
Tom Bever, Ned Block, Andras Bocz, Dick Carter, Dave Chalmers, Ken
Forster, Merrill Garrett, Alvin Goldman, Bill Ittelson, Tom Larson, Shaughan
Lavine, Chris Maloney, Lynn Nadel, and Rich Zemel. Jason Barker helped
with some of the figures, and Agnes Pasztor secured the rights to the cover
print. The University of Arizona has proven to be extremely hospitable for the
pursuit of interdisciplinary work such as this, and I am grateful to Merrill
Garrett, director of cognitive science, for providing such an environment. I
want to thank Csaba Pleh, for an invitation to present this material at the Uni¬
versity of Budapest (ELTE), and the Rockefeller Foundation, for a month at
the Bellagio Study and Conference Center, which provided just the right
opportunity to give the manuscript a rewriting. Finally, my wife Csilla Pasztor
is beyond thanking - I dedicate this work to her.
A note regarding quotations: all are from the items mentioned in the bibli¬
ography and suggested readings, but in the interest of readability, only some
(amusing, contentious, important, unbelievable) have been attributed.
<'.•**'#' . ■■' li* »!•'^t*. .<> * ««
How are these and neighboring disciplines related.^ One popular proposal is
the “cognitive hexagon” of the 1978 Sloan State of the Art Report (see the
Appendix to this chapter). On a second approximation, then, the broad con-
strual of cognitive science is that it is the scientific study of cognition as carried
out in accordance with the methodologies of these six disciplines. Of course, mutual
admiration does not always reign, and cooperation can break down, as in
Dennett’s delightful parody:
Why . . . , ask the people in .Artificial Intelligence, do you waste your time
conferring with neuroscientists.^ They wave their hands about “information pro¬
cessing” and worry about where it happens, and which neurotransmitters are
involved, and all those boring facts, but they haven’t a clue about the computa¬
tional requirements of higher cognitive functions. Why, ask the neuroscientists,
do you waste your time on the fantasies of Artificial Intelligence? They just
invent whatever machinery they want, and say unpardonably ignorant things
about the brain. The cognitive psychologists, meanwhile, are accused of con¬
cocting models with neither biological plausibility nor proven computational
powers; the anthropologists wouldn’t know a model if they saw one, and philoso¬
phers, as we all know, just take in each other’s laundry, warning about confusions
they themselves have created, in an arena bereft of both data and empirically
testable theories. (1995: 254-5).
4 Introduction
Construed narrowly, cognitive science is not an area but a doctrine, and the doc¬
trine is basically that of the computational theory of mind (CTM) - the
mind/brain is a type of computer. Consider the 1978 Sloan Report on Cogni¬
tive Science: “what the subdisciplines of cognitive science share ... is a
common research objective: to discover the representational and computational
capacities of the mind and their structural and functional representation in the
brain" (1978: 76). Some authors endorse the narrow as opposed to the broad
conception explicitly:
There certainly is some justice to this remark. Not every collection of research
programs forms a field. But is the underscored reason for denying that cogni¬
tive science would form a field without the ideology compelling? Should we
agree? Maybe, maybe not. Remember, the alternative to doctrine was a subject
matter, not a “research program.” Furthermore, there is no single research
program that makes up the “field” of anthropology, or history, or philosophy,
or even linguistics - should we demand one of cognitive science?
Although the disciplines under the broad construal give us the relevant notion
of “science” (“science” is what these disciplines do), we still do not know
which portions of our mental life are cognitive. What is it for mental phenom¬
ena to be “cognitive”? Like many other important concepts, “cognition” seems
to have clear instances, but it lacks a general definition. One way of specifying
what is cognitive is by analogy to the broad conception of cognitive science —
we just list the clear cases of cognitive phenomena and declare cognition to be
Introduction 5
Computation broadly construed is just what computers do. This of course puts
all the weight of specifying the nature of computation on saying what com¬
puters do (they do lots of things including give off heat), not to mention saying
what a computer is (is your digital watch a computer?). A popular idea is
that computers basically input, store and output “information” - they are
“information-processing” devices. But what is information processing and
what do such devices have in common (even if it is only a family resemblance)?
An influential narrow conception of computers is Marr’s “tri-level” hypothesis:
information-processing devices are algorithmic symbol manipulators, which
are describable at three importantly different levels, answering three impor-
6 Introduction
tantly different questions: what problem(s) does the system solve, what
algorithm(s) does it use, and how are these algorithms implemented in the
physical world (silicon, neural tissue, other)?
We see here in item 2 the narrow conception of cognitive science as the disci¬
pline which is exploring a computational conception of cognition, and in item
4 the broad conception of cognitive science as the interdisciplinary study of
cognition. The relationship between the broad and the narrow construals of
cognitive science, which we will call the “working view of cognitive science,”
assumed by this book, is this: the domain of cognitive science is cognition, the
methodologies of cognitive science are the methodologies of the participating
disciplines (see the cognitive hexagon again), and the central assumption of the
Introduction 1
Philosophy
field is that mental states and processes are computational. In this way we give
prominence to the computational notion while ensuring that if it turns out to
be false, cognitive science will not self-destruct.
Appendix
1978 Sloan Report (excerpt)
Study questions
How do the narrow conceptions of cognition and computation fit the narrow
conception of cognitive science?
Suggested reading
See for instance Pylyshyn (1983) and the commentaries, especially Newell (1983). And
for the nature of cognitive scientists, see the enlightening (and entertaining) Baum¬
gartner and Payr (1995).
There are numerous good introductions to cognitive science available based on the two
conceptions of cognitive science reviewed above. And just browsing through Wilson
and Keil (1999) can be a wonderful introduction for the curious. Dunlop and Fetzer
(1993) provide a useful, compact volume defining many of the most important notions
in cognitive science.
Broad conception
The first text in cognitive science to survey the various disciplines and their contribu¬
tions to cognitive science is Stillings et al. (1995). It is clearly written and multi-
authored, but it has the advantage of reading like a single-authored text. The multiple
volume set by Osherson et al. (1990, 1995) has individually written chapters on a variety
of topics in cognitive science by well-known authorities in the field.
Narrow conception
One of the first texts written based on the notion of the mind as a computational device
10 Introduction
is Johnson-Laird (1988). A more recent text based on the same idea, but covering
material more similar to ours is von Eckhardt (1993). It contains an extensive di.scus-
sion of the nature of cognitive science in chapters 1 and 2 (as w'ell as the notion of its
“working assumptions”). Likewise, Crane (1995) covers mueh of our non-historical
material in a very readable and lively manner - the nature of representation is the
focus of the book. As it is in Thagard (1996), who organizes his introduction around
various representational formats used in cognitive science. Dawson (1998) is
organized around Marr’s “tri-level hypothesis” (see text above).
The best-known history of cognitive science to date is the very readable Gardner
(1985). Flanagan (1991) also contains interesting historical chapters on some of the
people (such as William James) and movements (such as behaviorism) which we discuss,
and some (such as Freud, Gestalt theory) which we do not. The first part of Bara (1995)
also reviews some cognitive science history, as does part I of Bechtel and Graham
(1998), which is the best short history.
Anthologies
Related disciplines
Churchland (1986) for a survey of neuroscience and its connection to philosophy, and
Churchland and Sejnowsky (1992) for an attempt to integrate neuroscience and cog¬
nitive science. Gazzaniga (1995) is a monumental compendium of topics, issues, and
areas in cognitive neuroscience. Squire and Kosslyn (1998) is a shorter selection of
recent articles on major subareas of cognitive neuroscience. For the cognitive side of
anthropology one might look at D’Andrade (1989). Glimpses of the scope of current
cognitive science can be had by looking at a recent Proceedings of the Annual Conference
of the Cognitive Science Society.
.Mm 'm ' «lwl!'•*' "-••VitC* 1A«' < j(
14# ^•I.^ ^ 'H' I0 .<*ifai,i,imm ifk ^ f' (O*-'
■ *,. p tddu
itwJ* ♦'.•rtnnefi#
i’ .• •« »• .i._<m>
*• Hll • IW ■'» IAW#
SaAittr**-' ■ . •'.v.*.^. *41 Hn^f >x- ^
ifitf <*#•'•*-*'-*Mr fS3pS^
* ^ 6»r^v^:/K-5i - ;. ^d1H^ •>»ptif4
rxv— iivlW? -•<*tr#4£i5ai
t00x» <♦ *• n .'liA#** A ^iMi7»^‘ *'V Il< >■ * M4*4lNMf I
Part I
Historical Background
Introduction
The computational theory of mind (CTM) emerged first in its “digital” form,
then in its “connectionist” form. The purpose of Part I is to survey some of
the historical antecedents to this emergence. Not all we will discuss contributed
directly (or even indirectly) to the CTM, but it is all relevant to cognitive
science, broadly construed. From these larger pictures we will try to extract a
sketch of what these fields contributed directly to the CTM. We will see that
the influences contributing to the shaping of cognitive science, and especially
the computational theory of mind include the following. The connectionist
form of the CTM is a descendant of both perceptrons (chapter 4) and asso-
ciationism (chapter 1). After introducing connectionism we will return to asso-
ciationism (chapter 12). From associationism and James we have the idea of (i)
the conscious mind as an introspectable manipulator (association) of repre¬
sentations (ideas), and (ii) two levels of explanation: the introspective subjec¬
tive psychological (software) and the objective neurological (hardware). From
behaviorism we moved away from just introspection and into laboratory experi¬
mentation as practiced by current cognitive psychology. Information pro¬
cessing psychology gave us cognition as information processing, and more
specifically in Miller, Galanter, and Pribram, TOTE units for explaining
behavior are structured and function like computer programs. From biology
we got most importantly the neuron doctrine: the nervous system and
especially the idea that the brain is composed of networks of discrete units
(neurons, axons, dendrites) joined together at synapses. The neuro-logical tra¬
dition, and especially McCulloch and Pitts, argued that the brain is composed
of on/off units and circuits of such units that can be associated with the propo¬
sitions of logic - the brain is equivalent to a machine table of a Turing machine,
and if supplemented with unlimited memory is equivalent to a universal
Turing machine. Perceptrons demonstrated that a computer hardware organ¬
ized on the gross anatomy of the brain could be trained to di.scriminate certain
categories of things in a broadly human way.
1
Associationism
Associationism is the view that the mind is organized, at least in part, by prin¬
ciples of association. Associationists don’t say just what makes a principle
“associationist.” Rather, they are content to state specific principles and call
them “associationist” (the word gained currency with Locke, see below). But
the basic idea behind associationism seems to be this: items that “go together” in
experience will subsequently “go together" in thought. Typically, associationists are
empiricists — they hold that all knowledge comes from experience both in the
sense of being causally dependent on experience and in the sense of being jus¬
tified solely by reference to experience. However, this is about where agree¬
ment ends, and each particular empiricist holds a doctrine slightly different
from the others.
These English psychologists - what do they really want? One always discov¬
ers them . . . seeking the truly ejfective and directing agent. . . in just that
place where the intellectual pride oj man would least desire to find it (in the
vis inertiae of habit, for example, or in forgetfulness, or in a blind and chance
mechanistic hooking together of ideas, or in something purely passive, auto¬
matic, reflexive, molecular and thoroughly stupid) - what is it really that
always drives these psychologists in just this direction? Is it a secret, malicious,
vulgar, perhaps self-destructing instinct Jor belittling man?
(Nietzsche, 1887)
v_^_/
Complex Idea
Basic tenets
Figure 1.2 Principles of association (from Marx and Hillix, 1963: 106, figure 8; reproduced
by permission of MeGraw-Hill Companies)
Associationist processes
For associationists, there are also three major processes of association. One
kind of process involves which items follow one another in time, such as recall-
Associationism 19
ing something from memory or the temporal order of thoughts. Another kind
involves compounding, such as taking simpler items and building more complex
ones. A final kind of process involves decomposition or taking complex items
and breaking them down into simpler ones:
With compounding there is a major difference between those who, like Locke,
use a kind of “mental mechanics,” from those, like J. S. Mill (see below), who
argue for a kind of “mental chemistry” as well.
peak with the British Empiricists (ca. 1700-1850) and although the most exten¬
sive associationist theorizing probably occurs in James Mill’s (1829) The Analy¬
sis of the Phenomena of the Human Mtnd^ the most influential discussion of
associationism for philosophy was probably David Hume’s (1739), Treatise
of Human Nature. However, contemporary cognitive science seems to owe
more to Locke and James than to any of the other players in the associa¬
tionist tradition.
Ideas
For Locke, unlike Descartes (see chapter 3), there are no innate ideas: “Let us
then suppose the mind to be, as we say, white paper, void of all character,
without any ideas. How comes it to be furnished? ... I answer, in one word,
from experience. In that, all our knowledge is founded, and from that it ulti¬
mately derives itself” {Essay, bk 2, ch. 1, para. 2). Mental contents (ideas)
are derived either through external experience, sensation, or from internal
experience, reflection, on the operations of the mind itself Sensation and reflec¬
tion yield simple ideas upon which mental operations, such as recognizing
similarity and differences or abstracting, creates complex ideas of substance,
relation, etc.
The world
Sensation gives us ideas of qualities of external things and there are two
important classes of qualities:
secondary qualities are not essential to their bearers, and are the powers
of objects (by configurations of primary qualities) to cause experiences
(such as color, sound, taste, smell) in perceiving minds.
(1) composition, (2) setting ideas next to each other without composition (rela¬
tions of ideas), and (3) abstraction (general ideas). The first relates especially
to association.
Locke calls composition the process where the mind: “puts together several of
those simple ideas it has received from sensation and reflection, and combines
them into complex ones” {Essay: ch. 11, sect. 6). .And in reverse: “All our
complex ideas are ultimately resolvable into simple ideas, of which they are
compounded and originally made up” {Essay: ch. 22, sect. 4). Here we see a
kind of “mental mechanics” at work, where complex ideas are built out of
simpler ideas like a wall is built out of bricks and mortar. It may be that asso¬
ciative principles such as similarity and contiguity are operative in composi¬
tion, but if so, they are merely two of many principles and by no means hold
sway over the process in general: “The mind . . . arbitrarily unites into complex
ideas such as it finds convenient; whilst others that have altogether as much
union in nature are left loose, and never combined into one idea, because they
have no need of one name. It is evident, then, that a mind, by its free choice,
gives a connection to a certain number of ideas, which in nature have no more
union with one another than others it leaves out” {Essay: bk III, ch. 5, sect. 6).
According to Locke, then, complex ideas need not always result from ideas
which arrive together, and complex ideas can be formed “arbitrarily” and by
“free will” - hardly associationist principles.
Succession of ideas
In the 4th edition of his Essay, Locke added a new chapter entitled “Of the
association of ideas,” thereby giving a name to a doctrine, which name turns
out to have been more influential that the original doctrine. His interest in the
association of ideas (he also used “connection” of ideas) seems restricted
almost completely to the pathological, that is, to mental breakdowns and he
never names or formulates explicit principles of association. Like Hobbes
before him and Hume after him he distinguishes two types of association of
ideas^ - those that have “a natural correspondence and connexion one with
another,” and “wholly owing to chance or custom; ideas that in themselves are
not at all of kins, come to be so united in some men’s minds, that ’tis very hard
to separate them, they always keep in company, and the one no sooner at any
time comes into the understanding but its associate appears with it.” Locke
says little here about the first category, but he goes on to make a number of
points about the second: (i) the mind makes these combinations either volun-
Associationism 23
tarily or by chance (hence people exposed to the same environment can be very
different psychologically); (ii) strength of the first impression or “future indul¬
gence” (positive reinforcement?) can so unite ideas “that they always afterwards
kept company together in the man’s mind as if they were but one idea"-, (iii)
some antipathies “depend upon our original constitution, and are born with
us.” Locke’s main concern here is a concern with rectifying wrong associations
- pedagogical, not psychological, analysis. That will change dramatically in
the hands of probably the most distinguished and influential associationist,
William James.
Psychology is the science of mental life, both its phenomena and of thetr
conditions.
(The Principles of Psychology)
24 Historical Background
For the purposes of cognitive science, James’s conception of our mental life or
“thinking” (James: “I use the word thinking for every form of consciousness
indiscriminately”) has the following central features:
Association
Background
Although James speaks occasionally, as the British empiricists did, of the mind
compounding idea parts into complex wholes, he was on the whole skeptical
of the doctrine of complex ideas. James also, paradoxically, claims explicitly
that “objects are associated, not ideas” and he goes on to say: “We shall avoid
confusion if we consistently speak as if association, so far as the word stands
for an effect were between things thought of. . . not ideas, which are associated
in the mind. . . . And so far as association stands for a cause, it is between
processes in the hrain" {Briefer Course: 5). This is not completely clear: what
exactly is an effect of what here? Maybe we should think of it in the way shown
in figure 1.3.
Associationism 25
t
BP-2 — CAUSE —> 1-2 = = = REPRESENTS = = = > T-2
Brain processes are the basic bearers of association. One brain process
(BP-I) causes and becomes associated with another brain process (BP-2). But
brain processes cause ideas (I) which are about or represent, objects, things (T)
in the world that we think about, and by this means these things come to be
associated - that is the effect of brain processes. Ideas, then, are the inter¬
mediary between brain processes and things - ideas both are caused by brain
processes and represent these things.
Whatever exactly James meant by his remark, the real issue, he thinks, is
accounting for the time course of thought: how does the mind solve the problem
of what to think next} His general answer is that the sequencing of thoughts is
in accordance with principles of association, and he suggests a variety of such
principles including contiguity and similarity. But James never rests content
with mere descriptions of patterns of association. He regularly presses for expla¬
nations at a deeper neural level. For instance, after formulating association by
contiguity he says: “Whatever we name the law, since it expresses merely a phe¬
nomenon of mental habit, the most natural way of accounting for it is to conceive
it as a result of the laws of habit in the nervous system; in other words, it is to ascribe
it to a physiological cause" {Principles of Psychology. 561-2). '‘'‘The psychological
law of association of objects thought of through their previous contiguity in
thought or experience would thus be an effect, within the mind, of the physical
fact that nerve currents propagate themselves easiest through those tracts of con¬
duction which have been already most in use" {Principles of Psychology. 563). And
true to this explanatory strategy he postulates a pair of important, and pre¬
scient, neurological principles, the first for a pair of brain processes, the second
for multiple brain processes:
(variable weight w)
(W).
inputs CWi). -► outputs
(+, graded) m. (+, graded)
(P2) The amount of activity at any given point in the brain-cortex is the sum
of tendencies of all other points of discharge into it, such tendencies
being proportionate:
1 to the number of times the excitement of each other point may have
accompanied that of the point in question;
2 to the intensity of such excitements; and
3 to the absence of any rival point functionally disconnected with the
first point, into which the discharges might be diverted.
{Briefer Course: 5)
Total recall
This happens when there is unrestricted association between previous events
and later recall. In James’s example a dinner party is followed by a brisk walk:
Partial recall
Partial recall (see figure 1.6(c)) is the most common variety of association
and in these cases only some of the past experiences have associational con¬
sequences; “In no revival of a past experience are all the items of our thought
equally operative in determining what the next thought shall be. Always
some ingredient is prepotent over the rest” (Briefer Course: 1). So the question
arises as to which ingredient is prepotent and why. James’s answer is that “the
prepotent items are those which appeal most to our INTEREST” (ibid.)
“Expressed in brain-terms, the law of interest will be; some one hrain-process is
always prepotent above its concomitants in arousing action elsewhere'' (ibid.). James
surveys four principles of “interest” for determining “revival in thought”:
(1) Habit By this James means an association will favor elements that are
most frequent in past experience: “Frequency is certainly one of the most
potent determinants of revival. If I abruptly utter the word swallow., the reader,
if by habit an ornithologist, will think of a bird, if a physiologist or medical
specialist in throat-diseases, he will think of deglutition” (Briefer Course: 8).
(2) Recency James gives the example of a book, which habitually reminds
him of the ideas it contains, but upon hearing of the suicide of the author, now
remind him of death. He concludes: “Thoughts tend, then, to awaken their
most recent as well as their most habitual [frequent] associates” (Briefer Course:
28 Historical Background
Figure 1.6 James’s figures for the succession of thought (from Briefer Course)
(a) James’s figure 57: Total recall
(b) James’s figure 58: Total recall
(c) James’s figure 59: Partial recall
(d) James’s figure 60: Focalized recall
(e) James’s figure 61: Recalling and means-ends reasoning
Associationism 29
8). And as usual, James tries to account for the phenomena at a lower level:
“Excitement of peculiar tracts, or peculiar modes of general excitement in the
brain, leave a sort of tenderness or exalted sensibility behind them which
takes days to die away. As long as it lasts, those modes are liable to have their
activities awakened by causes which at other times might leave them in repose.
Hence recency in experience is a prime factor in determining revival in thought”
{Briefer Course: 8-9).
(3) Vividness This is the strength or degree of an impression that the
original experience carries and Vividness in an original experience may also
have the same effect as habit or recency in bringing about likelihood of revival”
{Briefer Course: 9). For example: “If the word tooth now suddenly appears on
the page before the reader’s eye, there are fifty chances out of a hundred that,
if he gives it time to awaken any image, it will be an image of some operation
of dentistry in which he has been the sufferer. Daily he has touched his teeth
and masticated with them; this very morning he brushed them, chewed his
breakfast and picked them; but rarer and remoter associations arise more
promptly because they were so much more intense” (ibid.).
(4) Emotional congruity As for this, James writes: “A fourth factor in
tracing the course of reproduction [in thought] is congruity in emotional tone
between the reproduced idea and our mood. The same objects do not recall
the same associates when we are cheerful as when we are melancholy. Nothing,
in fact, is more striking than our utter inability to keep up trains of joyous
imagery when we are in depressed spirits. . . . And those of sanguine tem¬
perament, when their spirits are high, find it impossible to give any permanence
to evil forebodings or to gloomy thoughts” {Briefer Course: 9).
James sums up these four factors: '‘‘'Habit, recency, vividness, and emotional
congruity are, then, all reasons why one representation rather than another
should be awakened by the interesting portion of a departing thought. We may
say w ith truth that in the majority of cases the coming representation will have
been either habitual, recent, or vivid, and will be congruous" {Briefer Course: 9).
Notice that although James labels these associational principles (APs),
and gives us examples of them, he never explicitly formulates them. What
might such a principle look like.^ James never says, but if such principles
are supposed to control the time course of thought, they might look like
these:
(AP2) At any given time, the strongest principle of association is the opera¬
tive one.
James also does not distinguish the fourth principle (emotional congruity)
from the earlier three, yet it is possible that it is really quite different in that
it does not seem to associate any particular thought (B) with any other par¬
ticular thought (A). It says that a whole class of thoughts is more likely to be
called up than the rest - the class of thoughts that are similar in emotional
value.
untary sequences of thought is that the latter involves persistently active neural
processes while the former does not.
Voluntary thought is traditionally a stumbling block for associationist the¬
ories since it would seem that here, if anywhere, rational, logical procedures
can occasionally prevail over associative links. James approaches the question
in two stages. First he tries to account for “recalling a thing forgotten” in asso¬
ciationist terms. Then he tries to extend this account to problem solving. James
poses the issue of voluntary thought in terms of problems and their means of
solution: “But in the theoretic as well as in the practical life there are interests
of a more acute sort, taking the form of definite images of some achievement
which we desire to effect. The train of ideas arising under the influence of such
an interest constitutes usually the thought of the means by which the end shall
be attained. If the end by its simple presence does not instantaneously suggest
the means, the search for the latter becomes a problem^ and the discovery of the
means forms a new sort of end ... an end, namely, which we intensely desire
. . . but of the nature of which ... we have no distinct imagination whatever”
{Briefer Course: 11). Thus problem solving is pictured as predominantly
means-end reasoning. James immediately extends this: “The same thing
occurs whenever we seek to recall something forgotten” (ibid.). “The desire
strains and presses in a direction which it feels to be right, but towards a point
which it is unable to see. In short, the absence of an item is a determinant of
our representations quite as positive as its presence can ever be” (ibid.). As
usual, James tries to redescribe this at the physiological level: “If we try to
explain in terms of brain-action how a thought which only potentially exists
can yet be effective, we seem driven to believe that the brain tract thereof must
actually be excited, but only in a minimal and subconscious way” (ibid.). James
thinks that both kinds of problem have a common structure: “Now the only
difference between the effort to recall things forgotten and the search after the
means to a given end is that the latter have not, whilst the former have, already
formed a part of our experience” (ibid.).
m, and n, and as all these processes are somehow connected with Z, their com¬
bined irradiations upon Z, represented by the centripetal arrows, succeed in
rousing Z also to full activity” (Briefer Course: 12).
theories for studying the nervous system (Golgi, Cajal, Sherrington), called
into question the desirability of (purely) psychological principles at all (we turn
to this in chapter 3). With the demise of introspectionist methodology came
the demise of the objects of introspection - ideas. The new elements of mind
were stimuli and response, and their neural substrata - not introspectable at
all. And as ideas were replaced by stimuli and responses, introspection was
replaced with laboratory experimentation. There was also the increased promi¬
nence of reinforcements, reward, and conditioning - procedures rarely dis¬
cussed by the British empiricists.
Notes
1 As we will see, it is William James’s official position that it is things (out in the
world), not ideas, that are associated.
2 “Ideas” for Locke, unlike Hume later, cover all mental contents: ideas of sensation
and of reflection.
3 A shorter version occurs in chapter 16 of William James’s Psychology (Briefer
Course). Figure numbers are those of the Briefer Course.
Study questions
What is associationism?
What are the two major types of mental processes that associationist
principles are supposed to account for?
What basic principle governs association involving two brain processes active
together?
What are the labels James gives to the four principles of partial recall?
WTat kind of reasoning seems to pose a problem for James, and why?
Suggested reading
General: The single most complete survey of associationism is Warner (1921), which
is obviously a bit dated and which, curiously, does not discuss James. Boring (1929),
chapter 10, covers British empiricism, and chapter 12 covers the Mills and Bain.
■Marx and Hillix (1963), chapter 6, covers both traditional associationism and early
behaviorism, which it treats as associations between stimuli and responses, h'or a more
contemporary perspective, see the introduction to Anderson and Bower (1974).
,16 Utitorical Hackground
f’(rT mtirc on I^uke %cc (Cummins (1W^), chapter 4, and Mc(^ulloch (1995), chapter
2, fw more on Ijtckc on repre5icntatit)n. For more on James see Flanagan (1991),
chapter 2, cr>ntains an excellent discussion of James’s philosophy of mind and psy¬
chology from a uignitivc science perspective, and some of our general remarks follow
his. For v»me other empmctsls we did not cover; on Flume, Wilson (1992) is a particu¬
larly relevant study of Flume (see references therein). ()n Hain, Young (1970), chapter
3, ermtains a di.scussion of Uain from a contemporary point of view. A recent selection
of aiiociationist writings can be found in Ficakicy and F.udlow (1992), part IV. Hunt
(IW.l), chapter 3, contains a readable brief survey of empiricist and rationalist psy¬
chological d/jctrine, and chapter 6 cr>ntains a general di.scussion of James.
I . 'la.'vjrC.)
TI . • tV
.. .
2
Behaviorism and Cognitivism
2.1 Introduction
So far we have briefly surveyed the rise and fall of associationism. I’he
dominant event between the heyday of associationism and the computational
theories of mind (digital and connectionist) was the rise and fall of behavior¬
ism and stimulus-response (S-R) theory, mostly in America, Britain, and
Australia (think of it as an English-speaking movement and you won’t be far
wrong). A number of other movements were afoot during this period; there
was Freud’s investigations into unconscious processes, the Gestalt investiga¬
tion into the internal organization of perception, ethological studies of natural
animal behavior, and Piaget’s work on children’s cognitive developmental
stages (think of these as mainly European, German-speaking initially, move¬
ments and you won’t be far wrong). Our goal in this chapter will be to try to
understand what motivated behaviorism, what its basic doctrines are, and what
led to its demise. We want to see how criticisms of behaviorism set the foun¬
dation for cognitivism and information-processing psychology, which eventu¬
ally led to the computational theory of mind.
On the American side of the Atlantic Ocean there was a great gap in research
on human complex cognitive processes from the time of William James almost
down to World War II. Although the gap was not complete, it is fair to say
38 Historical Background
I. P. Pavlov
y. B. Watson
The year 1913 marked the publication in the Psychological Review of Watson’s
(1878-1958) behaviorist manifesto “Psychology as the behaviorist views it.” In
1943 a group of eminent psychologists proclaimed this the most important
article ever published in the Psychological Review. Watson’s early work was
on animal behavior, where reports of introspection and consciousness play,
naturally, no role. This was congenial to Watson’s hostility to the conscious
introspection of the psychology of the time (e.g., James and Wundt), and to
traditional worries about the relation of mind to matter (the “mind-body
problem”): “The time honored relics of philosophical speculation need trouble
the student of behavior as little as they trouble the student of physics. The
40 Historical Background
Like Hartley and James before him, Watson hoped ultimately for physiologi¬
cal explanations. From the behaviorist perspective, “the findings of psychol¬
ogy become the functional correlates of structure and lend themselves to
explanation in physico-chemical terms” (1913: 177). Watson originally formu¬
lated his theory in terms of “habits,” but in 1916 he embraced the conditioned
reflex method developed by Pavlov as the correct way of understanding habit
units. Some of Watson’s views regarding habits strike us as quaint (at best).
For instance, he claimed that “thinking” (it is not clear what a behaviorist is
referring to with this term) does not involve the brain, but rather consists in
“sensori-motor processes in the larynx.”
E. Thorndike
response, in this case escaping by stepping on a pedal, and over time it learns
that response. But if the response is not rewarded, it gradually disappears.
Learning is by trial and error, reward and punishment. Thorndike’s descrip¬
tion of his aim was to catch animals “using their minds,” but his methodology
has also been (uncharitably) described as “studying a response that could be
readily made accidently by a half-starved cat attempting to escape from the
cage in which it was confined” (Hilgard 1987: 190). Early critics of Thorndike,
such as Kohler, complained that animals should be studied in their natural
settings, and that they seemed not to reason in the laboratory because their
situations did not permit it. Thorndike proposed two laws which he thought
could explain all animal (and human) behavior: the so-called “law of effect”
and the “law of exercise”:
Law of effect Any act which in a given situation produces satisfaction becomes
associated with that situation, so that when the situation recurs the act is more
likely than before to recur also. Conversely, any act which in a given situation
produces discomfort becomes disassociated from the situation, so that when
the situation recurs the act is less likely than before to recur (positive and
negative reinforcement).
Law of exercise Any response to a situation will, all other things being equal,
be more strongly connected with the situation in proportion to the number of
times it has been connected with that situation, and to the average vigor and
duration of the connections.
It is not hard to see the influence of William James in the latter law. Later, in
Human Learning (1929), Thorndike extended and applied these principles to
human learning, which he formulated in terms of hierarchies of stimuli (S) and
responses (R). Every S-R link has a probability that S would elicit R; learing
amounts to increasing that probability and forgetting amounts to decreasing
it. Thorndike also developed a neurological theory of learning involving the
establishment of new connections at the synapse (made prominent in 1906 by
Sherrington). In virtue of this he called himself a “connectionist.”
B. F Skinner
The most famous and influential behaviorist after Watson undoubtedly was
B. F. Skinner (1904—90). Skinner rejected all mental states and processes, and
even went further, along with the logical positivism of the period, to reject
“hypothetical” entities altogether. Early on (1931) he rejected even the reflex
42 Historical Background
The 1950s mark the end of the dominance of behaviorism in American psychol¬
ogy (and related cognitive sciences) and the rise of an alternative cognitive,
information processing, paradigm. Four things were happening simultaneously.
First, logical positivism, with its unrealistic and restrictive conception of scien¬
tific methodology and theory construction had been mostly abandoned by the
philosophical community, and this reaction was spreading to other disciplines.
Second, behaviorism was professionalizing itself out of existence. Hundreds
(thousands?) of articles were being written on problems of interest to no one
outside the field. As one psychologist wrote; “a strong case can be made for the
proposition that the importance of the psychological problems studied during
the last 15 years has decreased as a negatively accelerated function approaching
an asymptote of complete indifference” (Harlow, 1953). 77riri, serious criticisms
of behaviorism’s basic assumptions were launched during this period. The basic
point of the critiques we will review is the structured nature of behavior
(organization that cannot be explained by traditional behaviorism and S-R
psychology) and the contribution of the organism that produces it. Finally, an
alternative, less restrictive and more exciting research program stated in terms
of computation and information was emerging. We now turn to the third point,
and in the next section to the final point.
K. Lash ley
the integration of timing and rhythm, where such associative chains cannot
explain the behavior: “Considerations of rhythmic activity and of spatial ori¬
entation force the conclusion, I believe, that there exist in nervous organiza¬
tion, elaborate systems of interrelated neurons capable of imposing certain
types of integration upon a large number of widely spaced effector elements”
(1951: 127). Lashley’s (like James’s before him) alternative conception is of a
nervous system that is always active, not passive as in traditional S-R theories,
that has its own principles of organization that it imposes on incoming sensory
material: “My principal thesis today will be that the input is never into a
quiescent or static system, but always into a system which is already actively
excited and organized. In the intact organism, behavior is the result of inter¬
action of this background of excitation with input from any designated
stimulus” (1951: 112).
Noam Chomsky
Skinner’s most trenchant critic was not another psychologist, nor a physiolo¬
gist, but the young linguist Noam Chomsky, whose review of Skinner’s book
Verbal Behavior in 1959 is credited with helping to bring dow n behaviorism as
a framework for human psychology, and to inaugurate the new cognitive
approach: “Chomsky’s review is perhaps the single most influential psycho¬
logical paper published since Watson’s behaviorist manifesto of 1913” (Leahey
1992: 418). Chomsky’s (excruciatingly) detailed critique focuses on, but is not
limited to, Skinner’s analysis of language. Chomsky is also anxious to cast
doubt on the adequacy of Skinner’s framework for both other forms of human
activity and some forms of animal behavior, especially those emphasized by
comparative ethologists. Chomsky closes his review by outlining his alterna¬
tive framework from the point of view of generative grammar.
Conclusion
information theory and (what would become) computer science: “In the long
run the most important approach to cognition grew out of mathematics and
electrical engineering and had little or nothing to do with psychology and its
problems” (Leahey, 1992: 397). The digital computational influence will
occupy Part II of this book, so we will here be concerned primarily with influ¬
ential work in the 1950s and 1960s that used or presupposed information¬
processing notions.
Information
Figure 2.1 One of the first information-processing diagrams (from Broadbent, 1958: 299,
figure 7)
charts of this conception makes this clear (see figure 2.1). However, as we move
to memory and thought it is not at all clear how to apply the technical notion
of information, and so the informal notion surreptitiously began to replace it
until finally “information processing” became virtually a pun.
George Miller was an early enthusiast of information theory and its applica¬
tion to psychology. His early book Language and Communication (1951) and
papers, specially “The magical number seven . . .” (1956), “drew attention to
limitations on human attention and memory and set the stage for the first wave
of research in information-processing psychology” (Leahey 1992: 406). As
Miller himself said: “Informational concepts have already proved valuable in
the study of discrimination and of language; they promise a great deal in the
study of learning and memory” (1951: 42). Miller begins his influential (1956)
article remarkably: “My problem is that I have been persecuted by an integer.
For seven years this number has followed me around, has intruded in my most
private data, and has assaulted me from the pages of our most public journals.”
In the body of his paper he rehearses about a dozen experiments, and his con¬
clusion is two-fold: “First, the span of absolute judgment and the span of
immediate memory impose severe limitations on the amount of information
that we are able to receive, process, and remember” (1967: 41). “Second, the
process of recoding is a very important one in human psychology and deserves
much more explicit attention than it has received” (1967: 42). Let’s take a look
at each of these in a little more detail.
48 Historical Background
Miller surveys four studies of absolute judgment. For example, when subjects
were asked to identify tones by assigning them numbers, researchers got the
result that: “When only two or three tones were used the listeners never con¬
fused them. With four different tones confusions were quite rare, but with
five or more tones confusions were frequent. With fourteen different tones
the listeners made many mistakes” (1956: 18). Similar results are reported for
judgments of loudness, saltiness, and spatial location. Miller concludes: “There
seem to be some limitations built into us by learning or by the design of our
nervous systems, a limit that keeps our channel capacities in this general
range. On the basis of the present evidence it seems safe to say that we possess
a finite and rather small capacity for making unidimensional judgments and that
this capacity does not vary a great deal from one simple sensory attribute to
another” (1956: 25). And this capacity, he proposes, is about 7 plus or minus 2.
Recoding
Since our memory span is limited to such a small number of “chunks” of infor¬
mation, one way of getting more into it is to make each chunk worth more bits
of information. This process is called “recoding.” Miller cites a study of
expanding memory of binary digits by recoding them into larger chunks. The
author was able, using the 5:1 recoding scheme, to repeat back 40 binary digits
without error: “The point is that recoding is an extremely powerful weapon
for increasing the amount of information that we can deal with” (1956: 40).
Miller’s article had a significant influence on the field, coming out as it did
at the same time as Chomsky’s early critique of structuralist linguistics (as
well as his critique of behaviorism). It not only synthesized a wide variety of
experimental findings, but it did so by endowing the organism with endo¬
genous information-processing structure - also what Chomsky was proposing
in the realm of language.
W'e saw earlier that Chomsky (1959) complained that Skinner, and behavior-
ists in general, ignored “knowledge of the internal structure of the organism,
the ways in which it processes input information and organizes its own
behavior.” Behaviorist and stimulus-response theorists proposed internal S-R
chains to “mediate” input stimuli and output response, but as we also saw.
Behaviorism and Cognitivism 49
Lashley (1951) and others complained that most interesting behavior needed
more structure than this. In particular, the organism needed to have goals, and
knowledge, plans, strategies, and tactics to achieve these goals. This requires
in part a hierarchical organization, not a linear one. Miller, Galanter, and
Pribram in Plans and the Structure of Behavior (1960) offered an influential
analysis of complex behavior in these terms.
The problem
The solution
MGP’s solution to the problem involves five leading ideas: the distinction
between molar and molecular levels of analysis, the image, the plan, execution,
and the TOTE unit.
The first step in the solution to the problem is to recognize the “hierarchical
organization of behavior.” For the purposes of explanation and understand¬
ing, most human behavior can and must be analyzed at different, hierarchically
50 Historical Background
A + B
I-1 I ^ I
a+b c+d+e
Image (unfortunate label): this is all the knowledge that the organism has
about the world around it, including itself (1960: 17-18).
Plan-, this is any hierarchical process in the organism that can control the
order in which a sequence of operations is performed (1960: 16).
TOTE units
Figure 2.2 A TOTE unit for hammering a nail (from Miller, Galanter, and Pribram, 1960:
36, figure 5; reproduced by permission of the authors)
respond until the incongruity vanishes, at which time the reflex is terminated”
(1960: 26).
Although the arrows in a TOTE unit (see below) might represent neural
impulses, they could also represent higher-level processes: “The reflex should
be recognized as only one of many possible actualizations of a TOTE pattern.
The next task is to generalize the TOTE unit so that it will be useful in a major¬
ity — hopefully in all — of the behavioral descriptions we shall need to make”
(1960: 27). Two such processes MGP discuss in detail are the transmission of
information and the transfer of control - telling the system what to do next. In
a standard digital computer, this is done by the program, where control is
passed from instruction to instruction, telling the machine what to fetch, what
operation to perform, and where to store the result. According to MGP “the
TOTE unit... is an explanation of behavior in general” (1960: 29). To
account for complex hierarchical behavior they suggest that TOTE units be
nested inside one another. Plans are to be conceptualized as complex TOTE
hierarchies and behavior is explained in terms of the execution of the opera¬
tions contained in them. The “image” (knowledge of the world) provides
conditions which will be tested for when needed. An illustrative example of
such a complex TOTE unit is the one for hammering a nail flush to a surface
as shown in figure 2.2.
52 Historical Background
Note
Study questions
From what two fields (outside psychology) did the “cognitive” tradition
emerge?
What, for Miller, Galanter, and Pribram, is the “image,” “plan,” and “TOTE
unit”?
Suggested reading
General
\ very readable brief survey of psychology with special emphasis on cognition and
its contribution to cognitive science can be found in Gardner (1985), chapter 5. An
influential recent general history of psychology is Leahey (1992) - parts III and IV are
especially relevant to our present topic. Hilgard (1987) covers some of the same
material, focusing on American psychology, and with more biographical information.
Hernstein and Boring (1966) has not been improved on as an amazing collection of
classics from the history of psychology (and philosophy and physiology). Pavlov (1927)
is his classic work.
Behaviorism
Other movements
For a discussion of other schools of the time see Flanagan (1991), chapter 7 on Freud,
and chapter 10 on the Gestalt movement.
Cognitivism
Besides the works cited in the text, and the relevant chapters from the general
histories mentioned above, see Gardner (1985), the latter parts of chapter 5, Flanagan
(1991), chapter 6, and Hunt (1993), chapter 16.
3
Biological Background
3.1 Introduction
At first the role of the brain itself in thought (broadly construed) and action
had to be established. Greek thought, for instance, was divided on the ques¬
tion of the location of the mind (the “seat of the soul”). Some continued to
believe that the heart was primarily responsible for mental functions. For
example, Empedocles (490-430 BC), who is given the major credit for devel¬
oping the theory of the four ba.sic elements (earth, air, fire, and water) held the
cardio-centric theory of mind. Aristotle (384—322 BC) also held the cardio-
centric theory, but with the further refinement that the function of the brain
56 Historical Background
was to serve as a radiator for the “heat and seething” of the heart. On the other
hand Anaxagoras (500-428 Bc) proposed that the brain was the center of sen¬
sation and thought. Democritus (460-370 bc), famous for his “atomic” theory
of matter, believed in a “triune” soul: one part was in the head, was respon¬
sible for intellectual matters and was immortal. A second part was in the heart
and was associated with emotions. A third part was located in the gut, and was
associated with lust, greed, desire, and the “lower” passions. These last two
parts were not immortal.
Then the question of whether it was the brain ma.ss or the holes in the
brain mass (ventricles) that supported thought had to be resolved. Among
those who assigned a significant role in thought to the brain were those
who located this function in the large cavities (ventricles) vs. the surrounding
tissue. One of Galen’s (ad 130-200) most influential doctrines was the idea
that vital spirits, produced by the left ventricle of the heart, are carried to
the brain by the carotid arteries. These were then transformed into the
highest spirits in the rete mirabile (miraculous net) at the base of the brain.
Those spirits were stored in the brain’s ventricles. When needed they
passed through hollow nerves, to force muscles into action and mediate
sensation (he didn’t say exactly how). Interestingly, Galen broke the mind
into three components: imagination, cognition, and memory, which he sug¬
gested should be a.ssociated with brain substance (the “encephalon”), but
he seems not to have localized the function in that substance. In the fourth
and fifth centuries this changed to belief in ventricular localization. Nemesius
(390), Bishop of Emesa (Syria) localized perception in the two lateral
ventricles, cognition in the middle ventricle, and memory in the posterior
ventricle. In one form or another this doctrine continued for almost one thou¬
sand years.
Descartes
Rene Descartes (1596-1650) raksed many if not most of the fundamental ques¬
tions in the foundations of cognitive science. He contracted tuberculosis from
his mother at birth (she died of it), was a sickly child and a relatively frail adult.
He was schooled by Jesuits in mathematics and philosophy (and they allowed
him to work in bed until noon, a practice he kept throughout his life). For a
while he socialized in Paris, enlisted in the army, then at age 32 he moved to
Holland where he wrote his most important works. He died of pneumonia at
Biological Background 57
the age of 54 after catching a cold trudging through the snow at 5:00 a.m. tutor¬
ing Queen Christina of Sweden in philosophy.
On the biological side, he was perhaps the first to describe the reflex arc\ exter¬
nal stimuli move the skin, which pulls on filaments within each nerve tube,
which opens “valves” allowing the flow of animal spirits stored in the ventri¬
cles to trigger the muscles (he didn’t say how). The flow of animal spirits also
powered digestion as well as some psychological functions, such as sensory
impressions, the passions (love, hate, desire, joy, sadness, wonder), and memory
(repeated experience makes certain pores in the brain larger and easier for the
animal spirits to flow through). Eventually, however, Descartes’s anatomy gave
way to Harvey’s as his physics gave way to Newton’s.
Voluntary actions were explained by acts of the will: “the will is free in its
nature, that it can never be constrained. . . . And the whole action of the soul
consists in this, that solely because it desires something, it causes a little gland
to which it is closely united to move in a way requisite to produce the effect
which relates to this desire” {Passions of the Soul). This is the pineal gland,
which is suspended between the anterior ventricles so as to influence and be
influenced by the animal spirits stored there. Thus, Descartes seems to have
held at least a modified ventricular theory (see figure 3.1).
58 Historical Background
Figure 3.1 Descartes’s theory (from Finger, 1994: 26, figure 2.16; reproduced by permis¬
sion of Oxford University Press)
For Descartes, mind was a distinct substance from body and the principles
covering the one do not extend to the other.^ He had at least three arguments
for this position. The most famous is his “cogito ergo sum” argument:
(1) I can’t doubt that I exist as a thinking thing {Cogito ergo sunt: I think,
therefore I am).
(2) I can doubt that I exist as a body (I might be being fooled by an evil
demon).
(3) Therefore, my mind is not the same as my body.
This argument has been subject to much justified criticism and it is generally
considered to be defective; Bob Dylan might be the same person as Robert
Zimmerman, yet one could doubt that the one is rich without doubting that
the other is. Descartes also argued that mind and body were distinct on the
basis of the fact that:
Biological Background 59
(4) Bodies are unthinking and essentially extended in space, minds are essen¬
tially thinking things and not extended in space.
(5) Bodies are divisible and have parts, minds are not divisible and have no
parts.
(We leave evaluating these points as an exercise.) So minds are thinking, unex¬
tended, indivisible substance and their acts of will are free and undetermined.
On the other hand bodies are unthinking, extended, divisible substance, and
their behavior is mechanically determined. Descartes’s discussion has three
important corollaries for cognitive science, and for our subsequent discussion
in later chapters. First, the contents of our mind are completely conscious and
introspectable: “there can be nothing in the mind, in so far as it is a thinking
thing, of which it is not aware ... we cannot have any thought of which we
are not aware at the very moment it is in us.” Second, our introspective access
to these contents is “authoritative” - no one else has as much authority as to
what you are thinking as you do. Third, these contents are independent not
only of the body, but of the rest of the world around it - they are internal to
the mind.
Mind—body interaction
If body and mind are distinct substances, how are they related? We have just
seen that according to Descartes, the mind can wiggle the pineal gland and so
cause muscles to contract. Likewise, effects on the skin can cause the pineal
gland to move and so affect the mind. In short, we have mind-body causal
interaction (in chapter 8 we will survey problems with this theory).
Persons
Finally, Descartes at times identified the person with the just mind: “But what
am I? A thing which thinks. What is a thing which thinks? It is a thing which
doubts, understands, affirms, denies, wills, refuses, which also imagines and
feels” {Meditations). “This I (that is to say, my soul by which I am what I am)
is entirely and absolutely distinct from my body, and can exist without it”
{Meditations). This of course runs against the grain of contemporary cogni¬
tive science where the methods of natural science, such as biology and neuro¬
physiology, are being extended to cognition. On the other hand he sometimes
says more contemporary things: “I am not only lodged in my body as a pilot
is in a vessel, but that I am very closely united to it and so to speak intermin¬
gled with it that I seem to compose with it one whole” {Meditations). In his
60 Historical Background
last work Passions of the Soul (1649) he divided activities into three spheres: (1)
those that belong just to the mind (intellectual and volitional), (2) those that
belong just to the body (physiology), and (3) those that belong to a “union” of
the two (emotions and sensations).
It was Franz Joseph Gall (1758-1828) more than anyone else who put the
issue of cortical localization into play, though as wc will see, the effect was
decidedly mixed. Gall was German by birth and began lecturing on the subject
in Vienna in 1796. He was joined by his pupil and future collaborator
Spurzheim in 1800. In 1802 he was ordered by the government (at the insis¬
tence of the Church, which objected to the “materialism” of his doctrine) to
cease lecturing. After a tour of Germany, they settled in Paris in 1807. Their
first major treatise appeared between 1810 and 1819 in four volumes
(Spurzheim collaborated on volumes 1 and 2) under a title which began:
Anatomy and Physiology of the Nervous System in General and the Brain in Par¬
ticular. . . . Gall later (1825) completed a six-volume study. On the Functions of
the Brain and the Functions of Each of its Parts, which was translated into
English ten years later. Although scorned by most scientists (one called it “that
sinkhole of human folly”), phrenology won Gall wide popularity - and a
handsome livelihood.
Before turning to Gall’s “phrenology” (the physical localization of mental
function by outward manifestations by bumps on the skull: “phrenology” was
a term invented by a student and was not used by Gall), it is important to realize
that at least part of his influence was based on his medical skills: “everyone
agreed he was a brilliant brain anatomist,” and he made fundamental contri¬
butions to neuroscience, including his comparative work on brain size which
indicated that larger amounts of cortex are generally associated with more
intelligent organisms. “No one before Gall had shown so clearly that brain size
paralleled mental development” (Fancher, 1979: 45). These achievements have
been obscured by Gall’s dubious inference:
(1) The mind can be analyzed into a number of specific faculties or capacities
(Gall assumed these faculties were innate).
(2) Mental capacities have specific locations in the brain.
(3) Physical locations of specific mental capacities manifest themselves by a
greater mass of tissue.
Biological Background 61
1 2 3 4
STRIKING implies F.\CULTY implies CORTICAL implies CRANIAL
BEHAVIOUR ORGAN PROMINENCE
causes causes causes
(talent, (innate (activity (size varies
propensity, instinct) varies with underlying
mania) with size) organ)
(4) This mass will distend the skull and enable those capacities to be read by
those knowledgeable in craniology (the measurement of skulls).
One might diagram Gall’s methodology as in figure 3.2. Gall observed 1 and
4, and went on to infer 2 and 3. Gall himself reported that he first came to the
idea behind phrenology when at age 9 he observed that classmates with bulging
eyes had good verbal memories. He hypothesized that a brain area responsible
for verbal memory was abnormally enlarged and pushed out the eyes. Gall
wrote: “I could not believe, that the union of the two circumstances which had
struck me on these different occasions, was solely the result of accident. Having
still more assured myself of this, I began to suspect that there must exist a con¬
nection between this conformation of the eyes, and the facility of learning by
heart” (1835, vol. I: 57-8). Gall does not say how he “still more assured”
himself of this correlation, perhaps by more anecdotal observations. In any
case, his mature methodology was not much better. For instance, he concluded
that destructiveness was located above the ear (for Gall the hemispheres were
duplicates of each other) because: (1) it is the widest part of the skull in car¬
nivores, (2) prominence here was found in a student “so fond of torturing
animals that he became a surgeon,” (3) this area was well developed in an
apothecary who later became an executioner. In addition to his anecdotal
style of testing his theory. Gall’s position suffered from having no principled
basis for selecting the mental faculties associated with various locations, and
different practitioners came up with different sets. In the end Gall identified
27 faculties, 19 of which also occur in animals (Gall got his list mainly from
the Scottish philosopher Thomas Reid, who proposed roughly 24 active powers
of the mind, and about six intellectual powers). Here are Gall’s faculties
(Corsi, 1991: 155):
1 instinct of reproduction
2 love of one’s offspring
3 attachment and friendship
4 defensive instinct of oneself and one’s property
62 Historical Background
Flourens
Figure 3.3 One of Spurzheim’s diagrams from 1825 (from Finger, 1994: 33, figure 3.2;
reproduced by permission of Oxford University Press)
to health and compare the behavior of the ablated animal with a non-ablated
control. He regularly failed to confirm phrenological predictions. Given this
failure he concluded that the cortex functioned holistically: “All sensations, all
perception, all volition occupy concurrently the same seat in these organs. The
faculty of sensation, perception, and volition is then essentially one faculty”
(Finger, 1994: 36). But this inference was not fully warranted; by taking slices
from these small brains he cut across anatomically and functionally distinct
regions of the brain. Time would suggest that Flourens had the right kind of
methodology, but the wrong theory, whereas Gall had the right kind of theory,
but the wrong methodology.
Broca
Figure 3.4 One of Spurzheim’s diagrams from 1834 (from Boring, 1951: 55, figure 1;
reproduced by permission of Prentice-Hall, Inc.)
(inability to communicate with speech) was well known to follow certain kinds
of strokes (interruption of blood supply to the brain, often due to clotting).
Ironically, it was Gall’s account of one such case that was “the first specifically
noted correlation of a speech deficit with injury to the left frontal lobe of the
cortex.” Broca’s diagnosis of Monsieur Leborgne (“Tan”) in 1861 was the first
such localist account to be widely accepted. It is now regarded by many as “the
most important clinical paper in the history of cortical localization” (Finger,
1994: 38). The story of Tan is interesting. Aubertin (1825-93), a contempo¬
rary of Broca’s, was attracted earlier than Broca to localist doctrines regarding
speech, and after studying a particular patient for a long time, he challenged
the skeptical Society of Anthropology in Paris to perform an autopsy on the
Biological Background 65
patient: “if at autopsy the anterior lobes are found intact, then I renounce the
ideas which I have sustained.” Before this could take place however, Broca got
a 51-year-old patient who had become aphasic at age 30. All he was known to
utter regularly was something like “tan,” which became his nickname, and
occasionally, though rarely (once in Broca’s presence), he would say “Sacre
Nom de Dieu!” Broca asked Aubertin to conduct an examination, after which
Aubertin “unhesitatingly declared that [the patient] suffered from a lesion in
the left frontal lobe of the cortex.” Within a few days Tan died of gangrene,
Broca performed an autopsy and brought the brain to the next meeting of the
Society of Anthropology. It had an egg-sized lesion on the left side of the brain.
The center of the lesion (the presumed point of origin) coincided with the
lower part of the third convolution of the left frontal lobe.
In a few months, Broca had another case, an 84-year-old man with sudden
loss of speech, who upon autopsy was discovered to have a small lesion in
exactly the same spot as the center of Tan’s lesion. Broca gathered a number
of other cases as evidence to the same conclusion. In each case, Broca noted,
the lesion w as to the same part of the left hemisphere of the brain - right hemi¬
sphere damage of the same region was not implicated in the loss. In his honor
this area is now called “Broca’s area,” and the syndrome is called “Broca’s
aphasia.” Here is a sample (from Akmajian et al., 517):
Wernicke
In the 1870s, Carl Wernicke (1848-1905) went further, showing that damage
in a portion of the temporal lobe led to a language disorder characterized by a
loss of comprehension rather than speech. The conception of mental func¬
tioning of the brain in play at the time of Wernicke was one where sensory
information of different kinds was projected onto the cortex at various points
on the sensory strip, and stored in surrounding tissue in the form of “images.”
Specific motor acts were also stored near the motor strip as “images.” The
remaining brain tissue was thought to be association areas which linked the
various sensory and motor centers. Wernicke was the first to propose explain¬
ing motor (“Broca’s”) aphasia by way of damage to the area storing “images”
66 Historical Background
of articulatory movements for speech. But he is most famous for the discov¬
ery and explanation of the opposite syndrome (now called “Wernicke’s
aphasia”) where the subject can speak perfectly fluently, though even familiar
words may be mispronounced, but whose comprehension is impaired. The
lesions causing this kind of aphasia are located in the temporal lobe near the
auditory area. Such patients hear the words, know they are being addressed,
and try to respond, but they just don’t understand what is being said. Here’s
a sample:
Figure 3.5 Brodmann’s areas of 1909 (from Finger, 1994: 42, figure 3.15; reproduced by
permission of Oxford University Press)
that the cell often gave rise to a tail-like appendage, but he thought the cell body
and appendage were each surrounded by a sheath; “thus Valentin missed the
critical point that the nerve fiber arises from the nerve cell.” In 1837 Purkyne
gave a talk in Prague where he described large ganglion cells in the cerebellar
cortex, now called “Purkyne cells.” Purkyne also noticed that a “tail-like ending
faces the outside and, by means of two processes mostly disappears into the gray
matter.” These “processes” ultimately came to be called “dendrites.” Still, cru¬
cially, he could not discern how these fibers were related to the cell bodies. In
1838 Theodor Schwann (1810-82) proposed the “cell theory”: the entire body,
inside and out, is made up of individual cells. Cell theory was accepted for every
part of the organism except the nervous system. The reason for this exception
was the existence of two difficulties:
1 Microscopes could not tell whether all nerve fibers arise directly from
nerve cells or whether some of them could exist independently.
2 It couldn’t be seen whether the long thin branching fibers have definite
terminations^ or whether they ran together with neighboring thin
branches to form a continuous network.
The first view required the additional idea that neurons act at sites of contact
and leads to the “neuron doctrine.” The second view postulated that activity
spreads in a continuous fashion through the network of branches. In 1833
Christian Ehrenberg compared nerve fibers to the capillary vascular system;
the continuity of the arterial-venous system provided the model for the later
“nerve net” theory.^ At the same time researchers were trying to see if they
could reduce the huge variety of branching patterns. In the end they settled
on two, what we today call “axons” and “dendrites.” (Cajal’s “Law of Dynamic
Polarization” stated that every neuron is a unit that receives input in its den¬
drites and sends output in its axon(s) — but that comes later.) Deiters, in 1865,
and Koelliker, in 1867, proposed that axons were independent, though den¬
drites might form a net.
Based mostly on his illustrations of 1853, Koelliker seems to have been
the first to actually establish that nerve fibers arise from nerve cells. Deiters,
in 1865, made the first clear distinction between what we now call axons
and dendrites: “It is ironic that by his clear observations on dendrites and
on the single axon Deiters had placed himself on a direct path to the neuron
doctrine and indeed to modern times, but by the introduction of his second
set of fine fibers he at the same time contributed to the reticular [nerve net]
theory” (Shepherd, 1991: 47). New- and more powerful techniques were needed
to settle this issue. The key advance was the introduction in 1870 of the Golgi
stain.
Biological Background 69
Golgi
It was Camillio Golgi (1843-1926) who in 1873 reported on his important silver
nitrate staining method, which he discovered in a kitchen at the hospital near
Milan where he worked. This process is difficult to apply reliably, but when it
worked it clearly stained cells black against a yellow background revealing all
relevant morphological features.
It was Golgi’s opinion, one that he thought was supported by the results of
his staining technique, that the axons formed a dense, fused network, much as
the circulatory system does (“anastomosis”): “In fact the method was inadequate
for establishing this; thinning of the terminal branches was mostly due to failure
of complete staining, and the branches were too thin in any case for the pur¬
ported anastomoses to be resolved clearly by the light microscope. This fateful
misinterpretation was to become his reticular theory of nervous organization”
(Shepherd, 1991: 91). Golgi also thought the role of dendrites was mainly nutri¬
tive. This network conception led him to oppose cerebral localization and to
endorse holism of brain function. In 1883 he wrote a long review article in which
he summarized his findings. Particularly relevant are these theses:
Eleventh: In all the strata of the gray substance of the central nervous organs,
there exists a fine and complicated diffuse nervous network . . . decomposing
into very slender filaments, and thus losing their proper individuality, pass on
to be gradually confounded in the network. . . . The network here described
is evidently destined to establish a bond of anatomical and functional union
between the cellular elements of extensive zones of the gray substance of the
centers.
Fifteenth: Another corollary from what precedes is that the concept of the
so-called localization of the cerebral functions, taken in a rigorous sense, . . .
cannot be said to be in any manner supported by the results of minute anatomi¬
cal researchers. (Shepherd, 1991: 99-100)
Amazingly, Golgi carried on his distinguished research career while also rector
of the University of Pavia, and a senator in Rome.
Cajal
Various anatomists of the time opposed this network theory, but it was Santi¬
ago Ramon y Cajal (1852-1934) who contributed most to its rejection. After
70 Historical Background
seeing a sample of Golgi stains in Madrid in 1887, he set out to improve the
method by cutting thicker sections and studying neurons prior to myelination.
In his first paper of 1888 Cajal stated he could find no evidence for either axons
or dendrites undergoing anastomosis, and so forming continuous nets. Subse¬
quent studies confirmed this judgment and in a 1889 review of this work he
argued that nerve cells were independent elements.
In 1889 Cajal traveled to Berlin to present his views at a conference. There
he met, among others, Wilhelm von Waldeyer (1836-1921), who was influ¬
enced by Cajal’s views, and who in 1891 wrote a favorable and influential
theoretical review of the literature. In this review he claimed that the “neuron”
(nerve cell) is the anatomical, physiological, metabolic, and genetic unit of the
nervous system. The word “neuron” was introduced in this way, as was the
neuron doctrine:
I The a.\is cylinders [axons] of all nerve fibers . . . have been shown to origi¬
nate directly from cells. There is no connection with a network of fibers or
any origin from such a network.
II All these nerve fibers terminate freely with “end arborizations” without a
network or anastomotic formation. (Shepherd, 1991: 181-2)
In 1894 Cajal was invited by the Royal Society of London to give the presti¬
gious Croonian Lectures (entitled “La fine structure des centres nerveux”)
which became the basis for his later 1909 two-volume work on the neural
structure of the nervous system. This was a landmark address and paper in
the history of neuroscience, synthesizing his earlier experimental work, as well
as the work of Golgi, Deiters, and Koelliker, and accompanied by slides of his
amazing drawings: “These are the images that, more than any others, have
implanted themselves in the minds of succeeding generations of scientists
concerning structures of different nerve cell types and their interconnections
in the nervous system” (ibid., 254). Cajal concluded that each nerve cell con¬
sists of three parts, each with a distinct function: the cell body and “proto¬
plasmic prolongation” (now “dendrite”) for reception of impulses, the “axis
cylinder” (now “axon”) for transmission, and the axis cylinder terminals for
distribution. This is basically the modern view. Still, neuron doctrine fol¬
lowers could not explain how- neurons communicate with one another if they
did not fuse.
the arborescence is not continuous with, but merely in contact with, the sub¬
stance of the dendrite or cell body on which it imposes. Such a special con¬
nection of one nerve cell with another might be called a synapse.” Earlier, in
the 1870s, Emil du Bois-Reymond (1818—98) had hypothesized that excitatory
transmissions from nerves to effector cells could take place electrically or
chemically - the details of which awaited the technology of the twentieth
century.
As late as 1906 (when both Golgi and Cajal received the Nobel prize for
their work) Golgi attacked three central theses of the neuron doctrine: (1) the
neuron is an embryological unit, (2) the neuron is one cell, (3) the neuron is a
physiological unit. Golgi still clung to his view of axons as a fused network and
forming large, holistic networks. In his Nobel lecture he wrote: “When the
neuron theory made, by almost unanimous approval, its triumphant entrance
on the scientific scene, I found myself unable to follow the current of opinion,
because I was confronted by one concrete anatomical fact; this was the
existence of the formation which I have called the diffuse nerve network”
(Shepherd, 1991: 261). “The conclusion of this account of the neuron ques¬
tion, which has had to be rather an assembly of facts, brings me back to my
starting-point, namely that no arguments, on which Waldeyer supported the
theory of the individuality and independence of the neuron, will stand exam¬
ination” (ibid., 265). He was therefore opposed to doctrines of cortical local¬
ization as well.
Lashley
Figure 3.6 Lashley’s ablation results (from Fancher, 1979: 76: 2-3; reproduced by per¬
mission of W. W. Norton & Co., Inc.)
character of the functions involved. It probably holds only for the associa¬
tion areas and for functions more complex than simple sensory or motor
coordination.”
Hebb
Finally, one can see Donald Hebb’s (1949) work The Organization of Behavior
as a sort of synthesis of localist and holist tendencies. Hebb started from where
Lashley left off - attempting to find a way to account for what appeared to be
both distribution and localization in the way in which the brain represented
information. Hebb’s solution was influenced strongly by the work of a
neuroanatomist, Lorente de N6, whose analysis of neural circuitry led him to
the view that there were “re-entrant” neural loops within the brain. From these
studies Hebb abstracted the notion that there could be circuits in the brain
Biological Background 73
Figure 3.7 Hebb: neural loops (from Hebb, 1972: 70, figure 23; figures 3.7-3.10 repro¬
duced by permission of W. B. Saunders Co.)
within which activity “reverberated,” and that these circuits could therefore
act as a simple closed loop (see figure 3.7).
On Hebb’s view, behavioral patterns are built up over time via connections
formed between particular cells (localist) into cell assemblies. But how did these
circuits come into being.^ Hebb proposed that there is a neural mechanism by
which this could happen - so-called “Hebb’s Postulate”: cells that fire together,
wire together. This slogan (which was not Hebb’s) covers two situations
each of which can be found in Hebb’s writings. First, there is the situation,
as Hebb put it, “When an axon of cell A is near enough to excite a cell B
and repeatedly or persistently takes part in fixing it, some growth process or
metabolic change takes place in one or both cells such that A’s efficiency as one
of the cells firing B, is increased” (1972: 62). Second, there is the situation
where there is simultaneous firing of cells A and B, as in Hebb’s figure (see
figure 3.8).
This kind of simultaneous “co-activation” provides a model for association
at the neural level. Over time, cell assemblies could become locked in larger
phase sequences which recruit many cell assemblies (holistic), and underlie more
complex forms of behavior.
By the end of the first half of the twentieth century a certain picture of the
neuron, the synapse, and neural firing had emerged in broad conformity with
the neuron doctrine. Here are the basic elements of that picture (following
Boring et al., 1939; Hebb, 1972).
74 Historical Background
Figure 3.8 Hebb’s postulate (from Hebb, 1972: 64, figure 29)
Figure 3.9 Synaptic knobs making contact with cell body (from Hebb, 1972: 68, figure 28)
There are a number of different types of neurons (we return to this in later
chapters). In each case the dendrites receive information, the axon sends out
information. There are generally many dendrites and one axon, though the
axon can branch at the end. The cell body can also receive information directly
(see figure 3.9).
The synapse is the point at which an axon makes contact with the dendrite
or cell body. The enlargement at the end of the axon is the synaptic knob. The
axon (and cell body) works in an all-or-none fashion. The axon is like “a trail
of gunpowder” - it uses up all its stored fuel at each point and so works without
decrement. The dendrite, on the other hand, works with decrement: “[it] is
more like a bow and arrow system in which a weak pull produces a weak effect.”
Over much of its length, the dendrite acts in a graded, not in an all-or-none
fashion.
The nerve impulse is the fundamental process of information transfer in the
nervous system. It is characterized by the following features:
Biological Background 75
1 The impulse is an electrical and chemical change that moves across the
neuron. The rate varies with the diameter of the fibre; from about 1
meter/second for small diameters to about 120 meters/second for large
diameters.
2 This disturbance can set off a similar disturbance across the synapse in
a second neuron, in a muscle (to contract) or in a gland (to secrete).
3 Neurons need a definite time to “recharge.” This is the absolute refrac¬
tory period lasting about 1 millisecond. Given the refractory period, a
neuron can fire at a maximum rate of about 1,000 per second.
4 Immediately after firing, nothing can make the neuron fire again. A little
later it can be fired by a strong stimulation. This is the relative refrac¬
tory period lasting for about a tenth of a second. Although a strong stimu¬
lation does not produce a bigger impulse, it can stimulate the neuron
more frequently, by catching it earlier in the refractory period — inten¬
sity of stimulation is translated into frequency of firing. A cell fired at
a rapid rate for a prolonged period begins to fatigue by building up
sodium ions on the inside of the cell, and it can take an hour or so to
recover from this.
5 The action potential is driven by positive sodium ions located on the
outside of the semipermeable membrane moving through it to the inside
and exchanging with negative potassium ions, which move out, creating a
negative charge on the outside of the membrane. This destabilizes the
adjacent region of the axon and the process takes place again, which
again destabilizes an adjacent region, etc. In this way the negative charge
moves down the axon. Immediately after destabilization the cell begins
to re-establish the original balance by pumping out the sodium ions from
the inside surface (see figure 3.10). This process takes about 1 mil¬
lisecond (0.5 msec in large fibers, 2 msecs in small fibers).
6 Cells can be inhibited from firing. This is the hyperpolarization vs. the
depolarization of cells. Polarization is when negative ions and positive
ions are on different sides of the membrane.
7 Cells can summate inputs from different sources. Summation can occur
on a sensory surface or at a synapse. Since the probability that a single
impulse will fire a neuron is low, summation increases the probability of
firing.
76 Historical Background
(A) polarization of resting neuron. (5) passage of one impulse (shaded region) along the axon;
showing that two or more impulses can occur at the same time in the neuron, since a second
one can be started in the “recovered” region as soon as the first has moved along the fiber
and the cell body has recharged itself. The process is known to be far more complex than
diagram A would suggest.
Figure 3.10 Polarization and passing of an impulse (from Hebb, 1972: 70, figure 29)
Study questions
What is “phrenology”?
What sort of methodology did Gall use for testing phrenological hypotheses?
What are some typical examples of local mental faculties from Gall or
Spurzheim?
What was the original evidence for the “localization” of brain function for
language (Broca, Wernicke)?
What was the controversy between nerve-net theory (Golgi) and the neuron
doctrine (Cajal)?
In what four different senses was the neuron the basic unit of the nervous
system?
What challenges have there been to the doctrine that the neuron is the
anatomical unit of the nervous system?
What three features challenged the doctrine that the neuron is the physiological
unit of the nervous system?
Notes
1 “Animal spirits” were taken to be a highly purified component of blood, which was
filtered out before reaching the brain.
2 In the metaphysics of the time, which came basically from the Greeks, especially
Aristotle, a substance is any kind of thing whose existence does not depend on the
existence of another kind of thing.
3 In some ways, contemporary “connectionist” cognitive modeling (see Part III)
can be seen as an intellectual descendant of this earlier nerve net theory, but at a
functional, rather than a structural level.
Suggested reading
History
The best single source for the history of the study of the nervous system is Finger
(1994). Corsi (1991) also contains good historical text and some of the best color plates
available. The rise of the neuron doctrine is explored in fascinating detail in Shepherd
(1991), which contains (pp. 239-53) the first half of Cajal’s influential Croonian
Lecture. This chapter relies heavily on these sources, though all detailed citations have
78 Historical Background
been suppressed to make this chapter readable. For more on Descartes’s psychology,
see Hatfield (1992) and references therein, and for more on Descartes’s internalism see
McCulloch (1995), chapter 1.
The history of cerebral localization is surveyed in Young (1970), see especially
chapter 1 on Gall and chapter 4 on Broca. Boring (1957), chapter 3, contains
interesting biographical and bibliographical information on both Gall and Spurzheim.
See Fodor (1983) for an interesting discussion of Gall’s important non-phrenological
contributions to the architecture of the mind. See al.so Corsi (1991), chapters 3 and 5.
Biology
For an authoritative introduction to the biology of the neuron and nervous system
see Shepherd (1994), especially chapters 1-9, as well as Kandel et al. (1995). A well-
illustrated introductory textbook covering many of the same topics is Beatty (1995).
For an authoritative and detailed look at the state of knowledge of the neuron at mid¬
century see Brink (1951). A highly readable overview of the contribution of neuro¬
science to cognitive science can be found in Gardner (1985), chapter 9. Churchland
(1986) briefly surveys the history of neuroscience, outlines the modern theory of
neurons and functional neuroanatomy, then plunges into its philosophical implications.
A good chapter-length survey of neuroscience can be found in Stillings et al. (1995),
chapter 7.
4
Neuro-Logical Background
4.1 Introduction
We saw earlier than in 1890 William James had proposed that the activity of a
given point in the brain, at a given time, is the sum of other points of activity
inputting to that point, and the strength of each input is proportional to:
1 the number of times each input accompanied the activity of the given
point;
2 the intensity of such activation;
3 the absence of another rival point where this activity could be diverted.
The idea behind (1) and (2) seems to be that the strength of the connections
between points could be increased by the frequency and intensity of co¬
activation (recall James’s associational principles of frequency and intensity in
partial recall). So each input has a strength or “weight” (again, see the “James
neuron” of chapter 1). Condition (3) seems redundant if the given point is
doing summation of its input, since diverted activity simply will not be counted
as input. However, we might play with the idea that “diversion” is a primitive
form of “inhibition” - which has veto power over the summation process. So
taken, we are on the road to our next type of unit, the so-called McCulloch
and Pitts neuron.
The first generally cited study of the formal, computational properties of the
nervous system is the 1943 paper by McCulloch and Pitts (hereafter M&P).
The paper is rather condensed and at times obscure, even by industry
standards. We will rehearse their basic finding, sticking close to their own
discussion. M&P begin by reviewing certain “cardinal assumptions” of the
theoretical neurophysiology of their time. They summarize these for the
purposes of formalization in their calculus as follows (1943: 22):'
(w)
(w)
inputs
(0,1) .■(«)■
time: t — 1 .
That is, each neuron can be assigned the proposition that conditions sufficient
for its firing have been met:
(PI') Every neuron can be assigned a proposition of the form: the conditions
for my activation are now met.
When the neuron fires, that proposition will be true; when it does not fire, that
proposition will be false. Hence the “all-or-none” feature of neurons is mapped
into the truth-values of a proposition. Furthermore:
And so the “switching circuitry” of the nervous system gets mapped by the
two-valued logic of propositions. M&P proceed to construct a formal system
for modeling neural activity, and proving certain theorems about such models.’
The system itself is quite opaque,* but their diagrammatic representation
of neural nets was influential, and has been incorporated into the literature
in one form or another.’ For example, in figure 4.2, the firing of neuron 2
represents the fact that 1 has fired. In figure 4.3, the fact that neuron 3 has
fired represents the fact that neuron 1 or 2 has fired. In figure 4.4, the firing
of neuron 3 represents that fact that neurons 1 and 2 both have fired.
82 Historical Background
Figure 4.2 A delay net (from McCulloch and Pitts, 1943, figure la)
Figure 4.3 An OR net (from McCulloch and Pitts, 1943, figure lb)
Figure 4.4 An AND net (from McCulloch and Pitts, 1943, figure Ic)
“b
a ^
Bo-
o
d —
•Majority a and b but neither NOT
c nor d
a 0 1 0 1 0 1 0 1
Signals on
b 0 0 1 1 0 0 1 1
input fibers
c 0 0 0 0 1 1 1 1
.^O— 0 1 0 1 0 1 0 1
:^0- 0 0 0 1 0 0 0 1
-o- 0 1 1 1 0 1 1 1
0 0 0 1 0 1 1 1
0 1 0 0 0 1 0 0
Signals
on
1 0 0 0 1 0 0 0 output
fibers
;=:0— 0 0 0 0 0 0 0 1
BO- 0 1 1 1 1 1 1 1
BO- 0 0 0 1 0 0 0 0
1 0 1 0 1 0 1 0
“ —
Figure 4.5 M&P neurons (from Minsky, 1967: 35, figure 3.1-1)
84 Historical Background
The part of a Turing machine not dedicated to the tape (memory) is its finite
state control, so the theorem also establishes the equivalence of M&P nets
and finite state automata. We will return to Turing machines and their
computational power in chapter 6.
Psychological consequences
M&P conclude with some very general (and obscure) remarks on the episte¬
mology of nets. At one point, however, they remark: “To psychology, however
defined, specification of the net would contribute all that could be achieved
in that field - even if the analysis were pushed to ultimate psychic units or
‘psychons,’ for a psychon can be no less than the activity of a single neuron.
Since that activity is inherently propositional, all psychic events have an
intentional, or ‘semiotic,’ character. The ‘all-or-none’ law of these activities,
and the conformity of their relations to those of the logic of propositions,
insure that the relations of psychons are those of the two-valued logic of
propositions. Thus in psychology, introspective, behavioristic or physiological, the
fundamental relations are those of two-valued logic" (1943: 37-8; emphasis
added).
This proved to be a very influential doctrine, but is there any reason to
accept it.^ First, it is not clear why they infer, from the all-or-none activity of
neurons, that the fundamental relations in psychology are those of two-valued
logic. It is true that neurons can be assigned the propositions M&P assign to
them, and that the relations between them can be formalized by propositional
connectives, but it is certainly not necessary. For instance, for all we know
thoughts may correspond to patterns of neural activity, where the patterns are
statistically defined and are not Boolean functions of constituent elements.
Second, the propositions M&P associated with individual neurons and net¬
works of them do not give the right semantics for thought. The propositions
we typically think are about people, places and things - rarely about the suffi¬
cient conditions for neurons firing. So even if propositional logic models the
firing or neurons, it does not follow that it models the thought those firings
instantiate. Third, it is not clear from the text what relationship they see
between thinking (thoughts), the formal nets, neural nets, and the propositions
Neuro-Logical Background 85
assigned to such nets. At times they write as if only neural activity instantiates
thought. At other times they write neutrally about the psychology of “nets.”
What is at issue here is the question of “multiple realizability” or “multiple
instantiation.” That is, does thought reside just in the (logical) organization of
the net, or does it matter what kind of material the net is constructed out of?
If the latter, then thought may be realized only in nervous systems (or causally
equivalent systems). If the former, then any matter could have thought pro¬
vided it were organized in a causally sufficient way. In particular, a silicon-
based machine could have thoughts if it could be designed to instantiate the
appropriate net. This may be the beginning of the influential doctrine that the
hardware does not, within limits, matter. (We return to this question shortly,
after Turing machines.)
4.3 Perceptrons
The perceptron created a sensation when it was first described. It was the first
precisely specified, computationally oriented neural network, and it made a
major impact on a number of areas simultaneously.
(Anderson and Rosenfeld, 1988: 89)
The study of perceptrons marks a kind of double turning point in the history
of connectionism, first in their relation to the “cybernetics” movement of the
time, and second in relation to the just developing digital computational
movement of the time. First, their introduction in the 1950s by Frank
Rosenblatt added some much-needed discipline to the chaotic “cybernetics”
movement. As Rosenblatt noted early on: “Those theorists . . . have generally
been less exact in their formulations and far from rigorous in their analyses,
so that it is frequently hard to assess whether or not the systems that they
describe could actually work in a realistic nervous system . . . the lack of an
analytic language comparable in proficiency to the Boolean algebra of the
network analysts has been one of the main obstacles. The contributions of
this group should perhaps be considered as suggestions of what to look for
and investigate” (1958: 389). Second, as we will see, perceptrons were the
subject of precise scrutiny and withering criticism by Minsky and Papert
(1969), and this work helped to turn the tide against the “neuro-logical”
approach of the 1950s and towards the digital computational approach of the
1960s and 1970s.
86 Historical Background
Rosenblatt
About ten years after McCulloch and Pitts published the results of their
studies, Frank Rosenblatt and his group began studying a device called the
“perceptron,” which has been described as “a McCulloch-Pitts network with
modifiable connections.”” Rosenblatt’s original introduction of the perceptron
(a “hypothetical nervous system or machine,” 1958: 387) was in opposition
to the “digital computer” view, according to which “storage of sensory infor¬
mation is in the form of coded representations” (1958: 386). Rosenblatt
grants that “the hypothesis is appealing in its simplicity and ready intelligi¬
bility” (ibid.), but it has led to: “a profusion of brain models which amount
simply to logical contrivances for performing particular algorithms” (ibid.).”
Rosenblatt continues: “The models which have been produced all fail in
some important respects (absence of equipotentiality, lack of neuroeconomy,
excessive specificity of connections and synchronization requirements, unre¬
alistic specificity of stimuli sufficient for cell firing, postulation of variables or
functional features with no known neurological correlates, etc.) to correspond
to a biological system” (1958: 388). This reads like a contemporary list of the
“lures of connectionism,” as we will see. According to Rosenblatt, no fine-
tuning of the computer model will correct these problems: “a difference in
principle is clearly indicated” (ibid.). What was needed was an alternative
to the “network analysts,” an alternative which would provide a language: “for
the mathematical analysis of events in systems where only the gross organiza¬
tion can be characterized, and the precise structure is unknown” (1958: 387—8).
To this end Rosenblatt lists the assumptions on which he modeled his
perceptron:
Figure 4.6 Organization of the original perceptron (from Rosenblatt, 1958: 389, figure 3)
Rosenblatt (1958) outlines two main perceptron architectures: the first with
three layers of connections and four layers of units, the other with two layers
of connections and three layers of units. He concentrates his attention on
the simpler one.'^ The four-layer (units) perceptron includes a Retina, an A-I
Projection Area, an A-II Association Area, and a set of Responses. The first
layer of connections is localized, the second and third are random (see figures
4.6 and 4.7).
There are five basic rules of organization of the perceptron (here somewhat
simplified):
s
T
I
M
U
L
I
(Modifiable weight w)
(+,-) (+-)
(Modifiable 0)
2 Impulses are transmitted to sets of cells (A-units) in A-I and A-II. (The
A-I cells may be omitted.)
The set of retinal points transmitting impulses to a particular A-unit
will be called the origin points of that A-unit. These origin points may
be either excitatory or inhibitory. If the sum of excitatory and inhibitory
impulses is equal to or greater than threshold, then the A-unit fires on
an all-or-nothing basis (see figure 4.8).*'*
3 Between the projection area and the association area connections
are assumed to be random.
4 The responses are cells that respond in much the same fashion as A-
units. The arrows indicate that up to A-II transmission is forward, but
between A-II and the responses there is feedback. Freedback can in prin¬
ciple be either excitatory to its own source-set, or inhibitory to the com¬
plement of its own source-set. The models investigated typically used
the second pattern. The responses of such a system are mutually exclu¬
sive in that if R-1 occurs it will inhibit R-2 and its source set, and vice
versa.
5 For learning to take place it must be possible to modify the A-units
or their connections in such a way that stimuli of one class will tend
Neuro-Logtcal Background 89
to evoke a stronger impulse in the R-1 source set than in the R-2
source set.
Imagine we are training a perceptron to sort pictures of males (M) and pictures
of females (F). If we give it an F and it responds with an F, then ignore the
weights on the connected A-units and R-units. If we give it an M and it
responds with an M, do the same. If we give it an F and it responds with an M,
change the weights on the active connected A-units and R-units (lowered if the
response should have been -1, raised if the response should have been +1). If
we give it an M and it responds with an F, do the same (see Block, 1962: 144).
Conclusion
In the early 1980s connectionists found ways around some of the limitations
of simple perceptrons, and network studies now flourish — though Minsky and
Papert suspect that their original doubts about the limitations of perceptrons
carries over, with minor modifications, to current work (see the Epilogue:
The New Connectionism, to the 1988 edition of their 1969 book). We will
return to this question after an introduction to connectionist machines in
chapter 11.
The connective XOR and its negation illustrate a simple but important lesson
concerning the computational power of nets of neuron-like units organized as
(elementary) perceptrons.
Neuro-Logical Background 91
McCulloch-Pitts neuron
Inclusive OR
! 0 I
0 1 1
I 1 I
Figure 4.9 A simple M&P OR neuron (from Anderson, 1995: 49, figure 2.9; reproduced
by permission of the MIT Press)
Linear separability
Imagine a unit with two inputs, P and Q^, and a threshold such that if the sum
of the inputs is over threshold, then the output is 1, otherwise it is 0 (as shown
in figure 4.9). Suppose inputs are the output of some other units, so they too
are either 1 or 0. Let’s suppose that the threshold for the unit is such that it
gives a 0 if both P and (^are 0, but it gives a 1 otherwise (this is the rule for
OR):
P OR Q_
1 1 1
0 1 1
1 1 1
0 0 0
VVe can let each input line be a “dimension” and since there are here two
input lines we have two dimensions, and the possible inputs to our OR-
unit can be diagrammed in a two-dimensional plane . On this we plot the con¬
ditions under which a certain truth function of x, y, is off (0) or on (1), where
X = the first value and y = the second. We plot OR as in figure 4.10. In this
graph we can draw a straight line separating the off states (0) from the on states
(1) (figure 4.11). This shows that OR is linearly separable. It can be shown that
14 of the 16 elementary truth functions are also linearly separable - only XOR
and its negation (see below) are not.
XOR
The two truth functions that are not linearly separable are exclusive “or”
(XOR) and its negation (see figure 4.12). That XOR and its negation are not
92 Historical Background
(0,1); =1 (1,1)= 1
x-..-.i
(0,0) ;=0 (1,0)= 1
P Q PXORQ. NOT(PXORQj
110 1
10 1 0
011 0
0 0 0 1
linearly separable can be seen from its graph (figure 4.13). It takes two lines to
separate the on states (1) from the off states (0). This shows that XOR is not
linearly separable. The same is obviously true for the negation of XOR, since
its values are represented simply by exchanging Os for Is in the graph.
This idea of linear separability can be generalized to more than two input
units (more than two dimensions). For instance, we might have a unit con-
Neuro-Logical Background 93
nected to three input lines: P, Q^, R. To diagram this we would need a three-
dimensional space, such as a cube. But a line would not divide it into two
regions, one on, one off. Rather, we need a plane to do this. With more than
three input lines most people’s spatial intuitions abandon them, but the idea is
the same. The surface that separates such higher-dimensional spaces is called
a “hyperplane.” More exactly, the equation for this “hyperplane” just is: the
points in this space where the sum of the product of synaptic weights times inputs
equals the threshold. If a hyperplane exists for a space of inputs, then that
category of inputs is linearly separable, and can in principle be learned.
Single McCulloch and Pitts units can compute 14 of the 16 possible truth func¬
tions of two inputs. The two they cannot compute are exclusive “or” (XOR)
and its negation. However, a McCulloch and Pitts net can compute XOR (and
its negation) by simply conjoining an AND unit to two input units such that
the AND unit turns the system off if and only if both of the input units
are on.
Figure 4.14 A “perceptron” that computes XOR (from Quinlan, 1991: 33, figure 1.11)
it? Certainly some networks that have been called “perceptrons” can, as in
figure 4.14 (where the threshold on \|/ is 1). Note the weights that the network
uses: <1, 1, —2>. Could these be learned by the perceptron training proce¬
dure? And if they could, would XOR be an example of a linearly inseparable
function learnable by the perceptron training procedure? (And if not, is XOR
an example of a function a perceptron can compute, but not learn how to
compute?)
The answer is that yes, this perceptron could learn <1, 1, —2>, and so it
would be an example of a linearly inseparable function learnable by the per¬
ceptron training procedure. The trick is that we have made the first layer of
connections unmodifiable and we have tacitly set them uniformly by hand at I
each: <1, 1, 1, 1>. These values have not been learned by the perceptron train¬
ing procedure, and when combined with the weights on the modifiable layer
of connections, they cannot be guaranteed to be learned by that procedure.
Recall that the perceptron convergence theorem states that if the function is
linearly separable, then with enough training a perceptron is guaranteed to
learn it. We now see that the converse has the form: if the data are not linearly
separable, then there is no guarantee that enough training will eventuate in the
perceptron learning them. The perceptron might hit on a solution, as with
XOR above, but there is no guarantee that it will.
Furthermore, this perceptron, though it may learn XOR with the set of
weights <1, 1, 1, 1> on the unmodifiable layer, will not be able to learn a
function requiring the first layer not to be, weighted <1, 1, 1, 1>. That is, a
function requiring the first layer to take on some other values than uniform Is
will not be representable by this perceptron, and not learnable via the training
procedure. If we were to go in and change these weights by hand so that the
perceptron could learn the correct weights on the modifiable layer, then there
would be some other function (perhaps XOR) that the perceptron could not
learn.
Neuro-Logical Background 95
The idea behind “simple detector semantics,” as we will call it, is that a neural
process (or more generally any “network”) is about what turns it on - the
neural activity indicates the presence of or detects the features that the neurons
are tuned to: the unit or set of units is when the assigned feature (an object,
property, or state of affairs in the environment) is present, the unit is off
otherwise. The slogan is:
To understand what the frog’s eye tells the frog’s brain it is important to
understand some characteristics of the frog.
The frog
The detectors
The frog’s eye-to-brain pathways contain four fiber groups which are concen¬
tric in their receptive fields. Moving from the center of the array outwards one
encounters:
1 The contrast detector which tells, in the smallest area of all, of the
presence of a sharp boundary, moving or still, with much or little
contrast.
2 The convexity detector which tells, in a somewhat larger area, whether or
not the object has a curved boundary, if it is darker than the background
and moving on it; it remembers the object when it has stopped, pro¬
viding the boundary lies totally with that area and is sharp; it shows
most activity if the enclosed object moves intermittently with respect
to a background. The memory of the object is abolished if a shadow
obscures the object for a moment.
3 The moving edge detector which tells whether or not there is a moving
boundary in a yet larger area within the field.
4 The dimming detector which tells how much dimming occurs in the
largest area, weighted by distance from the center and by how fast it
happens.
The convexity detector (fiber 2 above) turns out to have some useful features:
“Such a fiber [fiber 2] responds best when a dark object, smaller than a recep¬
tive field, enters that field, stops, and moves about intermittently thereafter.
The response is not affected if the lighting changes or if the background
(say a picture of grass and flowers) is moving, and is not there if only the back¬
ground, moving or still, is in the field. Could one better describe a system for
detecting an accessible bug}” (1959: 253^; emphasis added). Lettvin et al. give
the semantics of this detector as follows: “What, then, does a particular fiber
in the optic nerve measure.^ We have considered it to be how much there is in
a stimulus of that quality which excites the fiber maximally, naming that quality”
(1959: 253; emphasis added). This last passage tells us that the authors take
the firing of the detector to “name” a particular stimulus quality; that is, the
firing of the detector represents the presence of that stimulus quality. The first
passage suggests that, in its natural environment at least, the stimulus quality
detected is had mostly by accessible bugs. Likewise, we might discover that
certain neurons in the cat’s visual cortex are on when exposed to a bar of light
forming a horizontal line, whereas other neurons are on when exposed to a bar
Neuro-Logtcal Background 97
of light forming a vertical line or specific angles in between. We might say the
cat has oriented “edge detectors.”'* Various other detectors have been reported
in the literature,'’ though contrary evidence has emerged as well.'"
The firing of “fiber 2” is said to signal the presence oj bugs because bugs turn it
on. Such a simple idea is bound to have problems, and this one is no excep¬
tion. Here we focus on problems arising specifically out of the underscored
idea.
In the course of causing fiber 2 to fire, photons have been emitted from the
sun, bounced off particles in the atmosphere, bounced off grass and trees,
bounced off a bug only to be absorbed in a frog’s retina and transformed into
electrical impulses which give rise to other electrical impulses which eventu¬
ally cause fiber 2 to fire. Question: why say that it is (just) the bug that the
firing of fiber 2 detects, when there are all of these other causal components.^
Why isn’t fiber 2 detecting the retina, or the tree or the sun.^ What determines
the correct depth of the causal chain?
If we want to identify the relevant cause of the fiber’s firing as a bug., what jus¬
tifies favoring that categorization? After all, the piece of matter reflecting the
light can also correctly be categorized in a whole variety (or “spread”) of other
ways: winged thing, potential pest at picnics, potential victim oJ insect spray, small
dark thing, erratically moving blob 93 million miles from the sun, and so forth.
Why say that the firing picks its target out qua (as a) bug, and not qua (as)
any of these other candidates? These two problems can be illustrated as in
figure 4.15.
Sample of water
Class of things
that have the same
chemical structure
The depth problem; why does “water” pick out a property of something at a certain
point, at a certain “depth,” in the causal chain that ends up with our using the word
“water”? The spread problem: why does “water” pick out from the very many
properties of water the property that it does?
Figure 4.15 The depth problem and the spread problem (from Braddon-Mitchell and
Jackson, 1996; 69, figure 3; reproduced by permission of the publisher)
graph of the natural habitat of a frog from a frog’s-eye view, flowers and grass.
We can move this photograph through the receptive field of such a fiber, waving
it around at a 7-inch distance: there is no response. If we perch with a magnet
a fly-sized object 1 degree large on part of the picture seen by the receptive
field and move only the object, we get an excellent response. If the object is
fixed to the picture in about the same place and the whole moved about, then
there is none” (1959: 242-3). Notice what has happened. The frog story
originally had two components: first, there is our little semantic theory (SDS)
which says that the firing of fiber 2 represents what turns it on; second, there
is the description of that fiber as a “bug detector.” But then we just saw that
fiber 2 also responds to an erratically moving magnet (MM). Here is the
problem. Fiber 2 is described as a “bug detector,” but it also responds to MMs.
So by (SDS) it is also an MM detector. So what it detects is either a bug OR
a magnet - it is a bug-OR-magnet detector. However, if it is (really) a bug-OR-
magnet detector, then it correctly represents the MM as a bug-OR-MM. It does
not misrepresent the MM as a bug. This makes misrepresentation impossible.
However, misrepresentation is possible, so 1 or 2 must be false. In short, a
potential case of misrepresentation is converted by the theory (SDS) into the
correct representation of a disjunction, since the fiber fires when exposed to a
bug or a magnet. In fact, there is no reason in the story so far to call fiber 2 a
“bug detector” (that also misdetects magnets) rather than a “magnet detector”
(that also misdetects bugs).
Neuro-Logical Background 99
What are we to do? We could (1) claim that Lettvin et al. misdescribed the
semantics of fiber 2 in calling it a bug detector - rather it is a small-dark-
erratically moving-blob detector - notice that the way it is first introduced
is as a “convexity” detector (which isn’t accurate either). In which case MMs
are not misrepresented at all and there is no problem. Or we could (2) modify
(SDS) by adding something about the “natural environment” of the frog:
The idea is that fiber 2 evolved to help the frog survive by locating food in an
environment replete with bugs (not MMs). In either case, however, we have to
modify the original story.
Notes
1 One “cardinal assumption” of then current neuroscience not reflected in this list
is that signal velocities can vary with diameter of the axon.
2 The “all-or-none” character of the action potential which dominated M&P’s
discussion has since (due to more sophisticated recording techniques) been seen
to be accompanied by graded activity spread out over many milliseconds. These
“cardinal assumptions” also did not include the role of neurotransmitters in this
activity.
3 In effect each neuron has an associated threshold which must be met or exceeded
in order to fire.
4 In effect, inputs all have the same strength or “weight.”
5 In effect, there is a refractory period during which a neuron cannot fire.
6 Note that the unit does not sum excitatory and inhibitory inputs, as do later formal
neurons in perceptrons and (most) connectionist networks.
7 The system was based on the work of Whitehead and Russell (1925) and Carnap
(1937).
8 For instance, Minsky (1967: 36): “The original McCulloch-Pitts paper is recom¬
mended not so much for its notation as for its content, philosophical as well as
technical.”
9 For instance, McCulloch and Pitts’ diagrams were used by von Neumann in his
EDVAC report of 1945 - whereof more later.
10 In these diagrams two filled circles are required to excite a neuron; contrast
AND with OR. In other words, the threshold is implicitly 2.
11 See Cowan and Sharp (1988).
12 Here Rosenblatt mentions Minsky - among others. Perhaps reading of one’s work
as a “logical contrivance” motivated some of Minsky’s subsequent comments.
13 Here we are counting actual layers of connections, not just modifiable
layers.
100 Historical Background
14 A Rosenblatt neuron first sums its weighted inputs, then, if that sum is equal to
or above threshold, it outputs. If the sum does not equal or exceed threshold it is
quiet.
15 Or it may simply be significantly more active. VVe will mean by “on”: on (vs. off)
or significantly more active; mutatis mutandis for “off”
16 See Hubei and Wiesel (1979).
17 See Gross et al. (1972) for “hand detectors” in macaques.
18 See Stone (1972).
19 See Fodor (1984, 1987). Cummins (1989, ch. 5) calls this the “misrepresentation
problem.” We follow Cummins here.
Study questions
What five assumptions about the nervous system did M&P make for their
formalization?
What two basic principles connect neural networks with formal systems?
Rosenblatt: perceptrons
What five assumptions about the nervous system were built into the
perceptron?
Original perceptron
How was the original perceptron organized: what were its layers of units, what
were the connections between layers of units?
Neuro-Logical Background 101
How was the elementary (simple) perceptron organized: what were its layers
of units, what were the connections between layers of units?
What are three conclusions Rosenblatt draws from his experiments with
perceptrons?
Can a (simple) perceptron learn only functions that are linearly separable?
Can a (simple) perceptron compute and learn to compute a function that the
perceptron convergence theorem does not guarantee it can learn?
Simple detector semantics (what the frog's eye tells the frog's brain)
What four detectors were discovered in the frog; i.e. what did each do?
What is the structure of the receptive fields for the four detectors discovered
in the frog?
Why call it a “bug detector” if, as Lettvin et al. showed, a frog will attempt to
eat an appropriately moving magnet}
Suggested reading
General
The best collection of writings on this period is Anderson and Rosenfeld (1988), which
contains a wealth of articles on related matters as well as helpful introductions. A useful
overview discussion of McCulloch and Pitts networks, perceptrons, and XOR can be
found in chapter 1 of Quinlan (1991). This chapter has the additional advantage of
relating these topics to both associationism and connectionism. Another good short
survey discussion is Cowan and Sharp (1988). A concise recent survey of this period
can be found in McLeod et al. (1998). Anderson and Rosenfeld (1998) contains
fascinating interviews with some of the pioneers of neural modeling.
Minsky (1967), chapter 3, contains a clear formal discussion of McCulloch and Pitts
networks, and most of the surveys listed above discuss M&P networks.
Perceptrons
In addition to the surveys listed above there are numerous short discussions of per¬
ceptrons in the literature. See, for instance, Wasserman (1989), chapter 2, and Caudill
and Butler (1993), chapter 3. Unfortunately, they are marred as a group by different
characterizations of the notion of a perceptron. It is useful to go back and look at
Rosenblatt’s own discussion (1962). Block (1962), sections 1-8, provides a lucid intro¬
duction to perceptrons as neurally inspired computational devices. A good general
discussion of perceptron learning can be found in Nilsson (1965/90).
Part II
The Digital Computational
Theory of Mind
•> o»cb dc-^
,
"
■ •
" ilm'f ‘ ^ i • V
,. , lRii:§iCl ofl1
. . -. sr,« r •
'.K»^ 'i
■ ■ V-At'i-i ‘■'
}
It tr* -1. ■ w€ H ’ t-. J>|i? *'..«rri/v.
.' • Hr i^'U *«l>'
V ' « t.^4AMb Mil# 'tr *■ • •- «*<**#,/* ■** il-^' It.M )l^
f I' ' ■ rii« . fU .r.t'.kc»*i*:j|| .
'" '*'®*^* ^ “‘ ■* • <*1
»». I .--•%•■ fc !*? *!/ I * tT>rV'.« «H . • UlM !• *>••
» .♦ * T*»»i tf ^IK I" 0oit' i.i’/'f^t fvw **
• »» #•* <1 •*«<>• . ••<<
4 *■•
r#- ....
4.
# ■( nf'»
RTM
CTM
DCTM CCTM
Our approach will be semi-historical. The RTM goes back at least to the British
empiricists Locke and Hume (see chapter 1), but the CTM seems to have
emerged only in the middle of this century with the invention of the modern
digital computer, even though general purpose programmable “calculating
machines” go back at least to Babbage in the middle of the nineteenth century.
The idea that digital computers can be programmed to think or display some
intelligence is usually credited to Alan Turing, though it is not as clear that he
also championed the idea that human thinking or intelligence is a case of digital
computation. The idea that human cognition might be a species of computa¬
tion is natural if one simultaneously holds that (1) minds mentally manipulate
mental representations, and (2) digital computers are algorithmic symbol
manipulators.
Our strategy in Part II is to first introduce the reader to some elements of
digital computation (which is relevant to cognitive science) without bringing
in cognitive modeling. If computers algorithmically manipulate symbols, then
106 The Digital Computational Theory of Mind
we can ask after the manipulation aspect of computers, and the symbol aspect
of computers. This is what we do. We first (chapter 5) illustrate digital mod¬
eling with the historically important demonstration project SHRDLU. We
then (chapter 6) survey some digital architectures that were influential for dif¬
ferent reasons: Turing machines, von Neumann machines, production systems,
and Pandemonium. We taxonomize these architectures along dimensions of
memory and control. Next (chapter 7) we survey the strengths and weaknesses
of three popular symbol schemes: logic (predicate calculus), semantic net¬
works, and frames/scripts. Then, with this digital computational material in
place, we turn to formulating (chapter 8) and criticizing (chapter 9) the concept
of mind inspired by this computational story.
5.1 Introduction
VP: Do you think that I can use the computer this morning.^
ELIZA: Why do you ask?
VP: I might be able to make some additional telcomp sales.
ELIZA: WTy aren’t you sure?
VP: My prospective customers have never seen a demonstration of the system.
ELIZA: Does this mean very much to you?
VP: Of course it does.
ELIZA: You seem very positive about that.
VP: This is ridiculous.
ELIZA: VV’hat do you mean this is ridiculous?
VP: Is there any reason why I can’t use the computer now?
ELIZA: Tell me more about your prospective customers.
VP: Please dial me up on 491-1850.
Who says people don’t talk like ELIZA? Still, despite ELIZA’s seductive appeal
(one secretary, who knew ELIZA was a computer program, still asked others
108 The Digital Computational Theory of Mind
Figure 5.1 “Pick up a big red block” (from Winograd, 1972: 8, figure 2; reproduced by
permission of Academic Press)
to leave the room while they chatted) there was nothing going on behind the
scene: no reasoning, no planning, no semantic (meaning) analysis, no conver¬
sational strategies, no awareness of the environment, no knowledge, etc.
Terry Winograd’s 1971 MIT doctoral dissertation appeared to substantially
change all that. SHRDLU' was designed to “understand” (the scare-quotes
are Winograd’s) language in a limited domain: a micro-world of colored blocks,
colored pyramids, and a box - all on a table top. SHRDLU provides a syntac¬
tic and semantic analysis of questions, statements, and commands; it reasons
about this blocks world and it plans movements and responses in accordance
with conversational strategies and motives.
The system answers “OK” when it carries out a command. In order to pick
up the red block, it had to clear it off by finding a space for the green one and
moving the green one away.
There are three pyramids in the scene, so the program does nothing.
3 find a block which is taller than the one you are holding and put it into
the box.
The system has to figure out what is meant by “one” and “it,” as well as using
deduction to decide which block it is holding.
OK.
The hand has to set down the red block to pick up the taller blue one.
THE BOX.
6 how many blocks are not in the box.^
Negatives such as “not” and “no” are handled whether they appear in clauses
or noun groups (such as “no boxes”).
110 The Digital Computational Theory of Mind
FOUR OF THEM.
7 is at least one of them narrower than the one which I told you to pick up.^
The phrase “at least one of them” must be analyzed to refer back to the pre¬
vious question, and “the one which . . must also refer to blocks. Noun groups
can have complicated numbers such as “at least one,” “more than three,”
“exactly seven,” etc. The memory of past events is used to deduce which block
is “the one which I told you to pick up.” Notice the complicated syntactic
structure in which “the one” is actually the object of a clause embedded two
levels deeper.
The semantic rules make this a nonsensical question which therefore must
be false.
The question makes sense, but there is no example in the scene, so there is no
easy way to tell without being able to examine the robot’s programs.
14 I own blocks which are not red, but I don’t own anything which supports
a pyramid.
I UNDERSTAND.
Two new theorems are created for proving things about “owning.”
Logical connectives such as “and,” “or,” “either,” etc. are handled in both the
grammar and semantics. A command is executed even if it looks like a ques¬
tion. The command is carried out by the following series of steps: it puts a
green cube on the large red block (note that it chooses the green cube with
nothing on it), then removes the small pyramid from the little red cube, so that
it can use that cube to complete the stack as specified. . . .
Ordinarily the system would be satisfied with calling this “a big green cube,”
but since we asked “which cube” it has to be more specific in naming it. The
use of substitute nouns (such as “one” and pronouns in answers makes the
discourse more natural.
“A small one” must contrast with the earlier phrase “a large block,” so this
means “a small block.”
OK.
21 put the littlest pyramid on top of it.
Words like “littlest” are not in the dictionary but are interpreted from the root
forms like “little.”
OK.
22 how many things are on top of green cubes.^
I’M NOT SURE WHAT YOU MEAN BY “ON TOP OF” IN THE
PHRASE “ON TOP OF GREEN CUBES.”
DO YOU MEAN:
1 - DIRECTLY ON THE SURFACE
2 - ANYWHERE ON TOP OF.?
When there are several interpretations and none is clearly better, the system
has to ask the user for more information. The paraphrases were included with
the dictionary definitions of “on,” and the request for more information occurs
automatically when the heuristics for resolving ambiguities aren’t sufficient.
A request like this can also occur when the meaning of a pronoun such as
“it” or “they” is unclear.
{22a I mean}2.
THREE OF THEM.
23 had you touched any pyramid before you put the green one on the
little cube?
YES, THE GREEN ONE.
The system deduces from its memory what event is being referred to, and can
relate other events to it in time. The analysis includes complex tenses of verb
groups, and there are special facilities for keeping track of the various forms
of irregular verbs like “have.”
A time is described by naming the main goal which was being worked on.
SHRDLU 113
25 why?
TO GET RID OF IT
By keeping track of selected parts of the original subgoal tree, the system has
some understanding of its own motives.
“It” refers to the action mentioned in the last question, rather than the answer.
31 how many objects did you touch while you were doing it?
114 The Digital Computational Theory of Mind
Time references can refer to a time span, using words like “while.”
FOUR OF THEM.
32 what did the red cube support before you started to clean it off?
THE GREEN PYRAMID.
Earlier in the dialogue, new information about “owning” was accepted at face
value. But the system knows that it has complete information about the basic
properties of objects, including location. It checks any statements about
location or other such properties to see whether it agrees or disagrees.
OK.
35 is there anything which is bigger than every pyramid but is not as wide as
the thing that supports it?
The system understands quantifiers such as “every,” “any,” “some,” etc., and
structures such as “bigger than,” “as wide as,” etc. Here “it” refers to a noun
group within the same sentence but two levels up in the parsing tree.
The system interrupts immediately when it sees a word it can’t figure out.
I UNDERSTAND.
38 are there any steeples now?
NO.
39 build one.
OK.
The description given in the definition is enough to deduce a plan for build¬
ing new objects as well as recognizing them. The word “one” means “a
steeple,” and the plan is carried out. First, one of the large green cubes is
cleared off by removing the red cube and two pyramids. Then the two green
cubes and the tall red pyramid are stacked up to fit the description.
Once an object has a name, the name is used to help identify it in responses.
44 thank you.
YOU’RE WELCOME!
MONITOR
MOVER
PLANNER
i
DATA
Figure 5.2 The organization of the system (from Winograd, 1972: 5, figure 1; reproduced
by permission of Academic Press)
Syntax
Figure 5.3 Sentence parser (from Winograd, 1972: 83, figure 23)
we would get an analysis like that in figure 5.4. The syntax does not operate
automatically and completely; it can call a semantic routine during parsing, and
either syntax or semantics can call on the cognitive-deductive system to help
direct the parsing.
Semantics
SENTENCE
DETERMINER NOUN
the giraffe
Figure 5.4 Parse of a sentence (from Winograd, 1972; 91, figure 27; figures 5.4-5.6 repro¬
duced by permission of Academic Press)
To deartop X
Figure 5.5 Procedural description for the concept CLEARTOP (from Winograd, 1973:
169, figure 4.5)
Figure 5.6 Procedural representation of “a red cube which supports a pyramid” (from
Winograd, 1973: 172, figure 4.7)
Cognitive-deductive system
The database of SHRDLU consists of facts like those listed in figure 5.7.
These sorts of facts, plus the procedures, are integrated with a general deduc¬
tive system which tries to satisfy a given goal by setting up successive subgoals.
The currently active goal stack looks like the one shown in figure 5.8.
5.4 Limitations
Winograd (1973) ends his discussion with some cautionary notes. He sees
two basic ways in which SHRDLU might be inadequate as a model of
human natural language processing. First, there is the way the process is
directed: the syntax comes up with acceptable phrases and the semantics
120 The Digital Computational Theory of Mind
(IS B1 BLOCK)
(IS B2 PYILWIID)
(AT BI (LOCATION 100 100 0))
(SUPPORT Bl B2)
(CLEARTOP B2)
(MANIPULABLE Bl)
(CONTAIN BOXl B4)
(COLOR-OF Bl RED)
(SHAPE-OF B2 POINTED)
(IS BLUE COLOR)
(CAUSE EVENT27 EVENT29)
Figure 5.7 Typical data expressions (from Winograd, 1973: 168, box 4.1)
(GR.ASP Bl)
(GET-RID-OF B2)
(PUTON B2 TABLE 1)
(PUT B2 (453 201 0))
(.MOVEHAND (553 301 100))
Figure 5.8 Currently active goal stack (from Winograd, 1973: 171, box 4.2; reproduced by
permission of W. H. Freeman and Co.)
5 (a) I dropped a bottle of Coke on the table and it broke, (bottle or table)
(b) Where’s the broom: I dropped a bottle of Coke on the table and it
broke, (preferred: bottle)
(c) Where’s the glue: I dropped a bottle of Coke on the table and it broke,
(preferred: table)
SHRDLU 121
In spite of all these limitations we should not lose sight of the fact that
SHRDLU was a tour de force of programming showing how to integrate syn¬
tactic and semantic analysis with general knowledge. It showed that if the data¬
base was narrow enough the program could be made deep enough to display
human-like interactions. This inspired expert systems research, such as
MYCIN (Shortliff), that has produced serious contributions to the field of
knowledge engineering.
Notes
these letters, and typesetters often ‘correct’ a mistake by inserting them in a faulty
line so that the proofreaders will easily spot that a mistake has been made. Bad
proofreading may result in this deliberate gibberish being printed in the final text
- a fact made much of in MAD magazine. Being an ex-devotee of MAD, Wino-
grad picked this nonsense word as the name for his program.” Hofstadter (1979:
628) repeats the same explanation. According to Waltz (1982: 120), “Winograd’s
program is called SHRDLU after the seventh through twelfth most frequent letters
in English” - an opinion repeated by Gardner (1985: 158).
2 From W'inograd (1972). It is slightly modified in W'inograd (1973).
Study questions
How did ELIZA manage to give the appearance of intelligent conversation for
short periods of time.^
What five components does SHRDLU consist of.^ Say something about each.
W^hat limitations did Winograd and the artificial intelligence (and cognitive-
science) community decide SHRDLU had.^
Suggested reading
SHRDLU
The basic work for understanding SHRDLU is W’inograd (1972). A shorter version
appears in Wfinograd (1973). Winograd (1977), lecture 2, places SHRDLU in per¬
spective by looking at computer systems for knowledge representation and natural
language understanding in general. Wilks (1977), lecture 3, does the same thing from
a different point of view. Winograd (1980) contains a critical discussion of SHRDLU,
and Winograd (1983) is an accessible and authoritative introduction to computer lan¬
guage processing. Barr, Cohen, and Feigenbaum (1981), chapter 4, survey natural lan¬
guage processing, as well as SHRDLU. Tennant (1981) surveys the field of natural
language processing and chapter 5 discusses SHRDLU. A more recent semi-popular
discussion of SHRDLU can be found in .McTeal (1987). Kobes (1990) explores the
implications of SHRDLU’s virtual blocks world for central doctrines in current
cognitive science.
SHRDLU 123
binary), just as a single algorithm can be coded into many programs from many
languages (Basic, Pascal, Lisp, Prolog). Or better, the relation between a
program and an algorithm is like the relation between a word (sound, shape)
and its meaning - different words can have the same meaning {cat, Katz, chat)
in different languages. Showing that a certain program is defective does not
show that the encoded algorithm is defective, since the problem might be with
the language being used to encode it.
We will also distinguish between two ways in which algorithms (or machines
running programs) can be said to be “equivalent.” The “weak equivalence” of
two algorithms is when they give the same output for each input. For instance,
suppose two multiplying algorithms are each given the numbers 32 and 17 as
input, and they both return 544 as output. If they returned the same output
of all inputs the algorithms would be said to be weakly equivalent. But these
two multiplying algorithms may have gone about their tasks in very different
ways. The first algorithm may have multiplied 32 and 17 by ADDING 32 to
itself 17 times - it could have been built out of just an adder and a counter.
This is called multiplication by successive addition. The second algorithm, on
the other hand, might have multiplied 32 and 17 more like we do:
32
17
224
32
544
First take 2 and multiply it by 7, then carry the 1 over to the 3, then multiply
the 3 by 7 and add the 1, giving 224. Then do the same with 32 and 1, giving 32.
Then add 224 and 32 as above giving the result 544. Notice that on this method
one multiplies 32 first by a part of 17, viz. 1, then one multiplies 32 by another
part of 17, viz. 1, then one adds the two partial products together giving the
whole product. Appropriately enough, this is called partial products method.
This method will given the same output as the successive addition method -
they will be weakly equivalent, but they get their answers in different ways, and
so they are not strongly equivalent. Two (or more) algorithms are strongly
equivalent if they not only produce the same outputs for the same inputs (i.e.
they are weakly equivalent), but they also do it in the same way, that is, they go
through the same intermediate steps. So, of course, (1) if we could show that
126 The Digital Computational Theory of Mind
one machine goes through steps that another machine does not go through, that
would be fairly direct evidence against their strong equivalence. More indi¬
rectly, (2) one might show that the “complexity profiles” for the two machines
are different. For instance, suppose we give the two machines a wide range of
numbers to multiply. As the numbers get bigger, one machine takes propor¬
tionally longer time to multiply them, but the other machine bakes a dispro¬
portionately longer time to process them. Or we might compare memory used
or the errors made, and find them different in the two machines (see Pylyshyn,
1979). We will see that the distinction between weak and strong equivalence is
important in assessing the psychological plausibility of a computational model
of a cognitive capacity. We want the model not to just do what the cognitive
capacity does, but to do it the way the cognitive capacity does it. Typically, a psy¬
chological experiment is a test of strong equivalence of some cognitive capacity
and a model of it, using such measures as reaction times and error rates.
Europe by two years. After resigning from the Foreign Office in 1945 he
joined the National Physical Laboratory in London, and was leader in design¬
ing the ACE (Automatic Computing Engine) project. In 1949 he accepted the
position of Assistant Director of “Madam,” the Manchester Automatic
Digital Machine, at Manchester University. In 1950 he published his most
famous non-technical paper “Computing Machinery and Intelligence.” In
that article he proposed what he called the “imitation game” - what is
now known as the “Turing test” for intelligence. In 1952 he was convicted of
homosexuality. He apparently committed suicide (he was conducting
experiments with potassium cyanide, so it could have been an accident) on
June 7, 1954.
Intuitively, TMs are abstract computing devices with two principal parts;
a tape and a read-write head (with a machine table and a scanner), as shown
in figure 6.1. This figure incorporates the following four central structural
notions concerning Turing machines: tapes, head, scanner, and machine table.
Structure
Head with scanner: The head moves back and forth relative to the tape,
reading or writing a symbol on the tape.
Head
Machine Table
SCANNER
Tape
Operation
Control: The movement of the head is determined by the machine table which
is sensitive to two things: the current internal state of the machine, and the
symbol it is presently scanning. The head then (1) writes a specific symbol on the
current square, and (2) moves left one square, right one square or remains where
it is. The machine then goes into its next state mentioned in its machine table.
Programming a TM: This consists of specifying its machine table and initial
tape configuration in such a way that it will start on this initial symbol and halt
on the halt symbol. The final tape configuration constitutes the output one
wants.
Head
Machine Table
1 None 0 R 2
2 None R 3
3 None 1 R 4
4 None R 1
SCANNER
Figure 6.2 Turing’s first machine (adapted from Turing, 1936/7: 119)
State Operation
1 It prints a “0,” then moves one square to the right, and goes into
state 2.
2 It moves to the right one square and goes into state 3.
3 It prints a “I,” then moves one square to the right and goes into state 4.
4 It moves one square to the right and goes into state 1 (etc.).
Many theorems have been proven about TMs, but one of the most interesting
is Turing’s original (1936) theorem:
Turing’s theorem: There exist “universal TMs” (UTMs) that can mimic the
moves of any other TM.
130 The Digital Computational Theory ofMtnd
start
1
0 1
1 2 3 4 5 6 7
Figure 6.4 Minsky’s universal Turing machine (from Haugeland, 1985; 139, box 3; repro¬
duced by permission of the MIT Press)
For the record, the smallest known universal Turing machine is Minsky’s
(see figure 6.4). If we think of a specific TM as codifying or formalizing a
procedure for doing something that can be represented on its tape, then the
idea that a universal TM can mimic the behavior of any specific TM suggests
that a universal TM can codify any procedure that can be written on a TM
tape. And since TMs are automatic step-by-step procedures, maybe a UTM
explicitly codes the notion of an effective procedure, and algorithm. This has
led to the so-called “Church-Turing” thesis (put in our terms):
such widely different and (in the opinion of the author) equally natural defi¬
nitions of effective calculability turn out to be equivalent adds to the strength
of the reasons adduced below for believing that they constitute as general a
characterization of this notion as is consistent with the usual intuitive under¬
standing of it” (ibid.). Shortly thereafter, in his (1936/7) paper, Turing
showed that every Turing machine computable function is recursive and
every recursive function is Turing machine computable. So effective calcula¬
bility can be defined as Lambda-definability, which is equivalent to recur¬
siveness (Church) and recursiveness is equivalent to TM computability
(Turing), so effectiveness is TM computability (Church-Turing);
Church (1936)
Thesis: effective procedure (intuitive) = Lambda-definability
Theorem: Lambda-definability = recursiveness
Turing (1936/7)
Theorem: TM computable = recursive
Church-Turing
Theorem: Lambda-definability = recursiveness = TM computable
Thesis: effective procedure = TM computable
Thus, there is some inductive support for the idea that being computable by a
Turing machine is weakly equivalent to being an effective procedure. The
upshot of this thesis is that a universal TM can do what any effective proce¬
dure can do. Insofar as our cognitive capacities can be seen as effective proce¬
dures, a UTM can model the input-output pairings of such a capacity, and so
be weakly equivalent to it. This gave legitimacy, in the minds of some at the
time, to the notion that one could build a thinking computer by programming
it properly, and that one could understand human cognitive capacities in terms
of effective procedures implemented in the brain.
132 The Digital Computational Theory oj Mind
Storage Arithmetic
Figure 6.5 von Neumann machine architecture (from Pohl and Shaw, 1981: 141,
figure 5.1; reproduced by permission of W. H. Freeman and Co.)
1.4 [Control] If the memory for orders is merely a storage organ there must
exist an organ which can automatically execute the orders stored in the memory.
We shall call this organ the Control.
1.5 [Arithmetic] Inasmuch as the device is to be a computing machine there
must be an arithmetic organ in it which can perform certain of the elementary-
arithmetic operations. . . .
The operations that the machine will view as elementary are clearly those
which are wired into the machine. . . .
1.6 [Input-Output] Lastly there must exist devices, the input and output
organ, whereby the human operator and the machine can communicate with each
other. . . .
people refer to as von Neumann (though this name is sometimes used rather
loosely to mean ‘conventional’). This architecture — which has been universal
practically since the design of the Princeton I.A.S. computer — is a register
machine in which symbols are stored and retrieved by their numerical
‘addresses’, control is transferred sequentially through a program (except for
‘branching’ instructions), and operations on symbols are accomplished by
retrieving them from memory and placing them in a designated register,
applying a primitive command to them, then storing the resulting symbol back
in memory. Although there are variants on this pattern, the main idea of a
sequential process proceeding through a series of fetch \ 'operate and ‘store ’ opera¬
tions has been dominant since digital computation began” (1984: 96-7; emphasis
added).
von Neumann machine cycle: we will summarize the von Neumann ma¬
chine cycle of operations as: fetch, operate, store. The fact that only one
instruction is executed at a time causes a restriction in the flow of information
through the machine. This restriction is sometimes called the “von Neumann
bottleneck.”
We have seen that the basic cycle of operations in a von Neumann machine
is: fetch an instruction, operate on (execute) it, and store the result, and a
program is a set of instructions in some programming language for carrying
out some algorithm. We want now to see how, at least in broad outline, pro¬
gramming a von Neumann machine works - how do we get a physical machine
to do what we tell it? Or, to put it more dramatically, if a program is the
“mind” of a computer, and flip-flops, registers, and gates are its “body,” then
how' do we solve the mind-body problem for computers?^
We begin with a simple assembly language program to compare the con¬
tents of two memory locations (register 1 and register 2) and say “OK” if they
are the same, “NO” if they are different. Here is the program (from Copeland
1993b, chapter 4). Descriptions of what each instruction does are in curly
braces.
2 Branch-on-0 6
{Check the match register and if it is 0, jump to line 6 of the program}
3 Output-as-character 1001111
{This is the ASCII code for the letter O}
4 Output-as-character 1001011
{This is the ASCII code for the letter K}
5 Branch 8
{Jump to line 8}
6 Output-as-character 1001110
{This is the ASCII code for N}
7 Output-as-character 1001111
{This is the ASCII code for 0}
8 Halt
Program description
Starting at line 1, the program checks the contents of register 1 and 2, com¬
pares them, and stores a 1 if they are the same, a 0 if they are different. At line
2 the program jumps to line 6 if the contents don’t match and prints first an
N, then an O, i.e. NO, then halts at line 8. If at line 2 the registers match, then
it goes on to print an O and a K, i.e. OK, then jumps at line 5 to line 8 and
halts. So the computer answers No if the numbers don’t match, OK if they
do, and halts after either answer.
Since our computer is a binary digital machine all these instructions will have
to be converted into binary code. This is done by the compiler^ which trans¬
lates the instructions into binary code, including the references to registers and
program line numbers. A sample table of bit code for the commands in our
program might be;
Operation Bit-code
Compare 00000011
Branch-on-0 00000100
Brach 00000110
Output-as-character 00000101
Halt 00000001
The registers referred to in the instructions must also be given binary code in
order for the computer to find them.
Architecture (s) 137
Registers Bit-code
1 01
2 10
21 10101
25 110111
Each register is just a row of flip-flops, each of which can be in one of two
states: on or off So ultimately the programmed computer is just a specific con¬
figuration of flip-flops - some on, some off To run the program one pushes a
button or the equivalent that starts the physical process going of copying the
first instruction into the instruction register. The machine is so designed
(“hard-wired”) that when the flip-flops in the instruction register are set, for
example, to ()()()()()() 11, it executes a comparison and the contents of registers
1 and 2 are compared, yielding a 1 or a 0 in the match register. The next
instruction specified in the program is then copied into the instruction regis¬
ter and the machine continues this fetch-operate-store cycle until it halts — all
in accordance with the laws of physics-cum-electrical engineering. There is no
ghost in the machine. The syntax of the program’s instructions are compiled
into distinct physical arrays of flip-flops. That’s how we can make the machine
do what we tell it.
This architecture is not subject to complaints against TMs: (1) vNMs allow
both direct (absolute, random) access and indirect (or relative) access in
memory, whereas a TM allows only indirect (relative) access; (2) the program
can be stored as data in memory; and (3) it has specialized computational organs
such as an arithmetic unit (TMs have no special-purpose circuits). Some call
the first two ideas “the primary architectural advance of the von Neumann
de.sign” (Haugeland, 1985: 143). The von Neumann architecture allows one to
exploit “subroutines” to their fullest; that is, they can be called at any point in
a computation, and after they are executed the computation can proceed where
it left off as if the subroutine had been simply the prior instruction in the
program. The use of subroutines induces “modularity” on a computational
Architecture (s) 139
system, in the sense that once a subroutine works correctly, it can be inserted
into a program where needed and it will do its computation. It only need be rep¬
resented once in memory no matter how often it is called. If a program can be
assembled from numerous previously debugged subroutines, then it has a better
chance of functioning correctly. And if something goes wrong, subroutines
allow one to isolate the problem quicker: first find the defective subroutine, then
go in and fix it. Finally there is the issue of how the machine computes the func¬
tions it computes. Different architectures have different “complexity profiles”
- the number of steps, time, or memory required to run different algorithms
differs; in the limiting case, different architectures canndt execute identical
algorithms directly: “For example, the number of basic steps required to look
up a string of symbols in a Turing machine increases as the square of the number
of strings stored. On the other hand, in what is called a ‘register architecture’ (an
architecture possessing what is usually called random access memory . . .) [a
von Neumann machine], the time complexity can, under certain conditions, be
made independent of the number of strings stored . . . something that is impossible
in a Turing machine, despite the fact that the Turing machine can be made
weakly equivalent to this algorithm. ... A register machine . . . makes various
additional algorithms possible, including binary search, in which the set of
remaining options is reduced by a fraction with each comparison, as in the game
‘Twenty Questions’ . . . These algorithms cannot be executed directly on a Turing
machine architecture” (Pylyshyn, 1984: 97-9; emphasis added).
Production systems (PSs) of Newell and Simon (1972) and Newell (1973)
descend from the systems of the same name by E. Post (1943). Production
systems have been popular in psychology for modeling a variety of cognitive
capacities, and they have been popular in artificial intelligence in the construc¬
tion of expert systems.
Newell and Simon (1972) explicitly recommend PSs for modeling cognitive
phenomena left out of behaviorism’s focus on stimuli and responses (see
chapter 2), and neuroscientists’ focus, at that time, on neural hardware (see
chapter 3): “the information processing theories discussed in this book repre¬
sent a specific layer of explanation lying between behavior, on the one side, and
neurology on the other . . .” (1972: 876). Newell characterized PSs as follows:
“A production system is a scheme for specifying an information processing
system. It consists of a set of productions, each production consisting of a
condition and an action. It has also a collection of data structures: expressions
140 The Digital Computational Theory oj Mind
that encode the information upon which the production system works — on
which the actions operate and on which the conditions can be determined to
be true or false. A production system, starting with an initially given set of data
structures, operates as follows. That production whose condition is true of the
current data (assume there is only one) is executed, that is, the action is taken.
The result is to modify the current data structures. This leads in the next
instant to another (possibly the same) production being executed, leading to
still further modification. So it goes, action after action being taken to carry
out an entire program of processing, each evoked by its condition becoming
true of the momentary current collection of data structures. The entire process
halts either when no condition is true (hence nothing is evoked) or when an
action containing a stop operation occurs” (1973: 463). Newell diagrams overall
production system architecture as shown in figure 6.6.
PSs have three major components: a set of production rules., of the form: if
A (the condition), then do B (the action); a memory work space (sometimes
called the “context”); and a rule interpreter which applies the relevant rule to
the results in the work space. A typical cycle of operation for a PS involves:
Architecture (s) 141
B-► C
Work space
B,
(time) C,
halt
PRODUCTIONS:
INTERPRETER:
1. Find all productions whose condition parts are TRUE and make them applicable.
2. If more than one production is applicable, then deactivate any production whose action adds a
duplicate symbol to the CL.
3. Execute the action of the lowest numbered (or only) applicable production. If no productions
are applicable, then quit.
4. Reset the applicability of all productions and return to SI.
Figure 6.8 A sample production system (from Barr, Cohen, and Feigenbaum, 1981; 191)
Applying the productions to what is in the work space (here called “On-
CL”) we get the following computation:
This example illustrates some of the programming difficulties one can get into
with PSs. In this example, as the annotations suggest, the productions must be
Architecture (s) 143
Production system architecture differs from vNMs in interesting ways that are
sometimes thought to be advantages. First, in contrast with TMs and von
Neumann machines, there is no separate, external control structure beyond the
rule interpreter. Conflict resolution aside, a production fires if and only if its
condition matches the contents of the work space. This introduces a certain
desirable parallelism into the system, but as we just saw, it also makes it diffi¬
cult for PSs to respond with a specific sequence of actions: the intended algo¬
rithmic structure of the machine’s rules may be difficult to enforce (we will
see analogs of this problem in many connectionist architectures). Second, infor¬
mation is operated on in terms of its description in the work space, not in terms
of its absolute or relative address - it is “content” addressable rather than
“location” addressable. Think of the police going to an address (an anonymous
tip) and arresting whoever is there (location addressing), vs. having a finger¬
print of somebody and finding the person who matches it. It is true that
content addressing can be simulated on a von Neumann Machine by “hash
coding,” w here the location of the content, and so the content itself, is some
logical-arithmetic function of the entry argument or probe.^ However, these
techniques are available for only highly restricted forms of information. And
even for the information they work on, these methods require elaborate control
through use of “collision functions” to keep the hash code from yielding mul¬
tiple, or the wrong, addresses. This makes them quite brittle and graceless in
their degradation. PSs, on the other hand, can degrade more gracefully by
using the measurements in their match-subcycle. Third, some have claimed
that PSs have an advantage over vNMs in being highly modular: “Since pro¬
duction systems are highly modular, there is a uniform way of extending them,
without making distributed alterations to the existing system” (Pylyshyn, 1979:
82). And Haugeland (1985: 162) goes even further, saying: “Production systems
promote a degree of ‘modularity’ unrivaled by other architectures. A module is
144 The Digital Computational Theory of Mmd
Architecture
Cognitive demons,
who inspect the
Data or image
demon shouts, and the decision demon chooses the one who shouts the loudest.
This is schematized in figure 6.9.
For instance, the data might consist of letters of the alphabet, and cogni¬
tive demons 1-26 might each be experts at a single letter. Then, even though
an incoming d resembles a c and an /, it resembles a d more, so the (/-demon
will shout the loudest: “in many instances a pattern is nearly equivalent to
some logical function of a set of features, each of which is individually
common to perhaps several patterns and whose absence is also common to
several other patterns” (Selfridge, 1959: 516). So Selfridge amends the ideal¬
ized Pandemonium to contain a layer of computational demons (see figure
6.10). The cognitive demons now add weighted sums of the computational
demons. The layer of computational demons is independent of the task and
must be reasonably selected (by evolution or the experimenter) in order for
learning to take place.
Learning
Decision
demon
Cognitive
demons
Computational
demons
Data or image
demons
unequivocal, that is that there is one and only one cognitive demon whose
output far outshines the rest” (1959: 523). Selfridge concludes briefly by
sketching a Pandemonium model for Morse code, or more precisely “to dis¬
tinguish dots and dashes in manually keyed Morse code” (1959: 524), but he
does not report any results.
Selfridge and Neisser (1960) report preliminary results of CENSUS, a
machine with Pandemonium architecture for recognizing 10 hand-printed
letters of the alphabet: A, E, I, L, M, N, O, R, S, T. The inputs were projected
on an “image” (or input store) of 32 x 32 (1,024) pixels. CENSUS computed
the probability of a letter given the image using 28 features. According to
Selfridge and Neisser, CENSUS made only about 10 percent fewer correct
identifications than human subjects. However, there are least three major pro¬
blems in scaling up the machine: firsts segmenting the letters from cursive
script; second^ using learning to modify the probabilities; and third, using learn¬
ing to generate its own features, which presently are restricted to those the pro¬
grammer can think up. As they comment, in closing: “We can barely guess how
this restriction might be overcome. Until it is, ‘Artificial Intelligence’ will
remain tainted with artifice” (1960: 68). Until the arrival of connectionism in
the early 1980s almost 20 years later, nothing arrived to remove the taint.
TM VNM PS Pandemonium
Notes
4 The numbers in the left column are register numbers (i.e., addresses in the com¬
puter) in decimal for ease of reading.
5 For instance, the middle two bits of the entry argument might represent the address
of its content, or one might do an exclusive-or on the two halves of the string,
or one might break the input string up into equal length strings, then sum them
arithmetically.
Study questions
Introduction
What implication is there for cognitive science of putting the theorem and the
thesis together?
Rewrite the sample productions to avoid the problems mentioned in the text.
In what ways are PSs parallel and in what ways are they serial?
Pandemonium
What is the difference between indirect (relative) and direct (random) memory
access?
Suggested reading
General
Turing machines
See Hodges (1983) for more on Turing’s life and career. A lengthy, thorough, and sur¬
prisingly accessible discussion of TMs can be found in Penrose (1989), chapter 2.
Barwise and Etchemendy (1999) is an introduction to TMs with software to build and
test them. One of the first comprehensive studies of Turing machines is Davis (1958),
and his anthology (1965) contains the classics of the field of computability and decid¬
ability. A classic introduction to Turing machines is Minsky (1967), especially chap¬
ters 6 and 7. An older survey of equivalences between Turing machines and other
systems can be found in Gross and Lentin (1970). A more recent survey can be found
in Odifreddi (1989), part I. Dawson (1998), chapter 2, is a good introduction to TMs
and their relevance to cognitive science.
Church—Turing thesis
See Copeland (1996b) for a brief but informative survey article. The 1987 issue of Notre
Dame Journal of Formal Logic 28(4) is devoted to the Church-Turing thesis. Copeland
(1997) discusses the question whether the class of functions computable by a machine
is identical to that computable by TMs (he argues they are not), and relates that issue
to common (mis)formulations of the Church-Turing thesis and more relevantly the
application of these misformulations to cognitive science. See Gandy (1988) for a
detailed, and fascinating, history of the origins of the Church-Turing thesis.
Von Neumann machine history has been explored in many places. Augarten (1984),
chapter 4, is a useful non-technical history of the stored program computer from
ENIAC to UNIVAC, and has a useful chronology of the landmarks of computation.
Slightly more technical discussions of specific machines such as ENIAC and the IAS
152 The Digital Computational Theory of Mind
machine can be found in Metropolis et al. (1980), especially part IV. See Heims (1980),
chapters 2 and 14, and Aspray (1990), especially chapters 1 and 2, for more on the life
and career of von Neumann. Some general theory of register (von Neumann) machines
can be found in Minsky (1967), chapter 11, and in Clark and Cowell (1976), chapters
1-3. For more on hash coding see Pohl and Shaw (1981: 239ff).
Production systems
A good introduction to Post’s systems can be found in Minsky (1967), chapter 13.
Newell and Simon (1972, “Historical Addendum”) contains a brief history of PSs.
Barr, Cohen, and Feigenbaum (1981, III. C4) cover PSs. PS architectures have been
used in many expert systems (see Kurzweil, 1990), as well as to model specific psy¬
chological capacities (see chapter 8).
Pandemonium
Besides the works mentioned in the text one can find scattered descriptions and uses
of Pandemonium in Dennett (1991).
7
Representation(s)
7.1 Introduction
To adequately account for human cognitive capacities we will have to see cog¬
nition as involving the “manipulation” (creation, transformation, and deletion)
of representations, and as a consequence, models of cognitive capacities will have
to simulate these processes. This raises two questions:
The first question (Ql) is called the problem of representations [with an “s”], and
the second question (Q2) is called the problem of representation [without as “s”].
An answer to (Ql) would reveal the (i) structure and (ii) important features of
the major schemes of computer representation. An answer to (Q2) would have
to tell us (i) under what conditions something is a representation, that is
represents something, and (ii) what determines exactly what it represents.
Before turning to these two questions we will survey a few preliminary issues.
First, although we will be interested primarily in “mental” representation, it
behooves us not to lose sight of the fact that many different sorts of things
represent, and their modes of representation are correspondingly diverse:
percepts, memories, images, expressions from natural and artificial languages
(such as numerals, arithmetic and logical symbols, Cartesian coordinates),
expressions from special notational systems (such as music and dance), wiring
diagrams, blueprints, maps, drawings, photographs, holograms, statues, meters,
and gauges. There are even occasional ad hoc representations, such as a chip¬
ped tree or small stack of rocks to mark a trail. Second, discussion is com¬
plicated by an unstable terminology. VVe adopt the following terminological
conventions (though we will have to be lax, especially when reporting the views
of others, since not everyone speaks this way):
For example, in English (which is after all a representational system) the word
“Venus” refers to a particular planet, but no part of it refers to or means any¬
thing (so it is atomic). But we can also refer to the planet Venus with the phrase:
“the morning star” (or “the last star seen in the morning”). Here the compo¬
nent expressions do have a meaning, and the meaning of the whole is a func¬
tion of the meaning of these parts and their grammatical arrangement (so it is
Representation (s) 155
complex and compositional). But now consider the idiomatic expression “kick
the bucket” (= die). Here the parts have a meaning, but the meaning of the
whole is not determined by these parts plus their grammatical organization
(so it is complex but not compositional). To know this meaning you just have
to memorize it. Closer to home, suppose the register of a computer has the
following information in it:
1|0|0|1|0|1|1
Construed as ASCII code for the letter “k” it is non-compositional, since one
cannot determine what it means without looking up the ASCII code from a
table. But construed as a binary coded digital number (75), it is compositional,
since that can be figured out via the place-value system of binary numbers.
The predicate calculus (PC), also called “quantification theory,” has the virtue
of a long and distinguished history in logic, where its syntax, semantics, and
deductive power have been more thoroughly studied than any competing nota¬
tion. The sample PC presented here consists of seven types of expressions:
156 The Digital Computational Theory oj Mmd
Predicates
F( ): ( ) is female
T( , ): ( ) is taller than ( )
Names
a: Agnes
b: Betty
Syntactic rule
A predicate plus the required number of names is a sentence:
Semantic rule
A sentence of the form:
predicate + name(s)
is true if and only if the thing(s) named has the property or relation predi¬
cated of it. For example:
“F(a)” is true if and only if a has the property F i.e. if and only if Agnes has
the property of being female
“T(a,b)” is true if and only if a bears the relation T to b i.e. if and only if
Agnes bears the relation of being taller than to Betty
Connectives
_&_:_and_
_V_:_or_
__: if_then_
-_: not_
Syntactic rule
A connective plus the required number of sentences is a sentence:
F(a) & T(a,b): Agnes is female and Agnes is taller than Betty
F(a) V T(a,b): Agnes is female or Agnes is taller than Betty
F(a) T(a,b): If Agnes is female then Agnes is taller than Betty
—F(a): Agnes is not female
-T(a,b): Agnes is not taller than Betty
Semantic rules
A sentence of the form:
is true if and only if the first sentence is true and the second sentence is true
(analogously for “v” and
—Sentence
Variables
X, y
P'(x): X is female
'r(x,y): X is taller than y
T(x,b): X is taller than Betty
Quantifiers
I lere the ellipses are to be filled in by anything that is well formed accord¬
ing to the syntactic rules.
(Ex) (Ey) [Fx & Txy]: there exists an x and there exists a y such that x is
female and x taller than y
(Ax) [Txb Fx]: for every x, if x is taller than Betty, then x is female
is true if and only if there is something that has the property or relation
predicated by the open sentence:
“(Ex) Fx” is true if and only if there is something, x, that is F i.e. has the
property of being female
is true if and only if for each object, that object has the property or bears
the relation being predicated by the open sentence:
“(Ax) Fx” is true if and only if for each object, x, it is F, i.e. it has the
property of being female
“(Ax) [Fx —> Txb]” is true if and only if for every object, it has the
property that if it is female, then it is taller than Betty
Inference rules
&-simpHfication: from a sentence of the form “X & Y” infer: X, or infer
Y (i.e., one can infer each).
Modus ponens: from sentences of the form “X —> Y” and X infer a sen¬
tence of the form Y.
These structural rules specify the form of representations, and the rules of
inference allow the system to make deductions. For instance, suppose the
system needs to get the representation:
Then the system can use (1), P2, and modus ponens to get the target;
Strengths
The main strengths of PC are that (1) it has an explicit semantics, and (2) its
formal mathematical properties are very well understood. Furthermore, (3) it
is a natural way of expressing some ideas, and (4) it is highly “modular” in the
sense that one can enter and delete particular statements independently of
others.
Weaknesses
One major weakness of PC representations for computer models of cognition
comes from the fact that pieces of information that typically go together “in
the world” are not stored together as any kind of unit. This makes it difficult
to retrieve relevant information quickly and efficiently when it is needed. Some
researchers therefore propose representational schemes that collect relevant
information together in various ways for efficient use. Two such schemes are
especially popular; semantic networks, and frames/scripts.
Figure 7.1 The robin network (from Barr, Cohen, and Feigenbaum, 1981: 184)
Quinlan, 1966, 1968). They are in effect a kind of graph structure, consisting
of nodes (circles, boxes, dots) connected by links (arcs, arrows, lines). Some
nodes represent objects (so-called “individual nodes”) and some nodes rep¬
resent properties (so-called “generic nodes”). The links typically represent
relations between these items. Fragments of a semantic network represent
situations. Each of these have analogs in the PC:
Inference
Inference is captured not by deducing sentences from other sentences using
inference rules, but by “spreading activation” in the network, for instance,
as activation spreads out along the lines radiating from “robin,” the system
reflects such elementary inferences as: “A robin is a bird” or “A bird has wings.”
The existence of “isa” links allow nodes to inherit properties from distant
parts of the network. For instance, any network containing the fragment
animal network illustrated in figure 7.2 will allow the lower-level nodes to
inherit the features of higher-level nodes and so to infer, for example, that
162 The Digital Computational Theory of Mind
Figure 7.2 A sample inheritance hierarchy (from Staugaard, 1987: 94, figure 3.14)
Figure 7.3 Question network (from Barr, Cohen, and Feigenbaum, 1981: 187)
Questions
Questions can be asked of a SN by giving it a fragment of a network with a
piece missing, then if the network can match the question fragment against
the network it will return the missing piece as the answer. The question “What
does Clyde own?” can be represented as in figure 7.3. The SN will match a
Representation (s) 163
Figure 7.4 Answer network (from Barr, Cohen, and Feigenbaum, 1981: 187)
portion of the network in figure 7.4, and return the answer “a nest.” So SNs
can process information by spreading activation and matching.
Strengths
The main strength of SNs can be seen in figure 7.1. In SN notation (vs. PC
notation) the symbols “robin” and “bird” occur only once, and all the
relevant information about robins or birds is connected to that node by links.
This allows the system to collect together information about robins, birds,
etc., together.
Weaknesses
The main weaknesses of SNs are first, they have a problem naturally repre¬
senting such concepts as typicality/normalcy, disjunction, and negation. How
would a network represent the fact that all robins are birds, but that just typical
birds can fly, and just normal birds have wings.^ Something could be a bird and
not fly and not have wings. How would a SN represent that fact that Clyde is
a robin or a sparrow (and not both).^ Or that Clyde is not a tiger.? Second, unless
controlled, activation can spread too far, as in:
One of the most popular and flexible types of chunked data structure
is Minsky’s “frames.” We quote Minsky’s introduction of the notion at
length because it contains many ideas, some not usually included in accounts
of this notion: “Here is the essence of the theory: When one encounters a
new situation (or makes a substantial change in one’s view of the present
problem), one selects from memory a structure called a frame. This is a remem¬
bered framework to be adapted to fit reality by changing details as neces¬
sary. A frame is a data-structure for representing a stereotyped situation, like
being in a certain kind of living room, or going to a child’s birthday party. Some
of this information is about how to use the frame. Some is about what one
can expect to happen next. Some is about what to do if these expectations
are not confirmed. We can think of a frame as a network of nodes and rela¬
tions. The top levels of a frame are fixed, and represent things that are always
true about the supposed situation. The lower levels have many terminals —
slots that must be filled by specific instances or data. Each terminal can spe¬
cify conditions its assignments must meet. (The assignments themselves are
usually smaller subframes.) Simple conditions are specified by markers that
might require a terminal assignment to be a person, an object of sufficient
value, or a pointer to a subframe of a certain type. More complex condi¬
tions can specify relations among the things assigned to several terminals.
Collections of related frames are linked together into frame systems. The
effects of important actions are mirrored by transformations between the frames
of a system. . . . Different frames of a system share the same terminals'^ this is
the critical point that makes it possible to coordinate information gathered
from different viewpoints. Much of the phenomenological power of the theory
hinges on the inclusion of expectations and other kinds of presumptions.
A frame’s terminals are normally already filled with ‘default’ assignments. Thus
a frame may contain a great many details whose supposition is not specifi¬
cally warranted by the situation. The frame systems are linked, in turn, by an
information retrieval network. When a proposed frame cannot be made to fit
reality - when we cannot find terminal assignments that suitably match its
terminal marker conditions — this network provides a replacement frame. . . .
Once a frame is proposed to represent a situation, a matching process tries to
assign values to each frame’s terminals, consistent with the markers at each
place. The matching process is partly controlled by information associated with
the frame (which includes information about how to deal with surprises) and
partly by knowledge about the system’s current goals” (see Haugeland, 1997:
111-37)!
Representation (s) 165
Frame
(-A
Slot 1 Slot 4
Slot 2 Slot 5
Slot 3 Slot 6
\_
Figure 7.5 A generic frame (from Staugaard, 1987: 97, figure 3.16)
Let’s unpack some of the main ideas in this long quotation. A frame rep¬
resents stereotypical information about a (kind of) object, property, or situa¬
tion. (Although Minsky extends the notion of frames to sequences of events,
it is common practice to give them the special name of “scripts” and we will
follow this practice.) A frame collects together information that “goes together”
in the world in the form of slots in the frame (see figure 7.5).
Slots are either filled by experience., by default values, or by pointers to other
frames (and scripts). We will illustrate these ideas, and more, with some con¬
crete examples.
Room frames
For example, consider what we know about rooms. First, we know that a room
has a typical structure. In frame theory this means that a room frame will have
slots with pointers to various typical constituents (see figure 7.6). Second, we
know that there are different typical kinds of rooms, so another set of point¬
ers will direct the system to these (see figure 7.7).
Restaurant frames
Or consider what we know about typical restaurants. First we have the infor¬
mation that a restaurant is a specific kind of eating establishment. Then we
have as slots, places where specific information fits into the frame. In the
generic restaurant frame, for instance (.see figure 7.8), the type needs to be
fixed, the location, etc., and these slots can be filled by other frames. The more
complete the filling in, the more complete the machine’s “understanding.”
166 The Digital Computational Theory of Almd
Figure 7.6 Room frame (from Winston, 1977: 186, figure 7.5)
Scripts/action frames
Scripts (also called “action frames”) are stereotyped sequences of events which
either record stereotypical expectations regarding these events, or form the basis
of plans for directing actions. They can be called by frames, or they can be ini¬
tiated to solve some problem or satisfy some goal. We will illustrate two uses
of scripts: scripts called by frames to form the basis of an expectation regard¬
ing some event sequence, and those used to form the basis of a plan of action
to solve some problem or achieve some goal.
Representation (s) 167
Room
frame
Living Bath
Dining Den
Kitchen Bedroom 1
L Dining room
Kitchen
frame frame
Refrigerator
West wall Ceiling North wall East wall
f
Figure 7.7 Frames for kinds of rooms (adapted from Staugaard, 1987: 100-1, figures 3.19,
3.21)
Eating-at-a-restaurant script
For the first use of a script, let’s return to the restaurant frame just dis¬
cussed. Notice that the “event-sequence” slot in the restaurant frame calls the
“eating-at-a-restaurant” script. What does that look like.^ A sample is shown
in figure 7.9.
Two broad kinds of information are contained in this script. First, there are
the “ingredients” of the script: the props, roles, point of view, and so forth.
These specify the participants or “actors” in the script and the situation in
which they perform. Second, there is the stereotypical temporal sequence of
events. In the script in the figure it would record, for instance, one’s expecta¬
tion to pay for one’s food after getting/eating it at a sit-down restaurant, but
one’s expectation to pay for it before getting/eating it at a fast food restaurant.
Move-disk script
The second use of scripts is to solve a problem or satisfy some goal. Such a
script (or action frame) might have the simplified form seen in figure 7.10.
168 The Digital Computational Theory oj Mind
Location:
range: an ADDRESS
if-needed: (Look at the MENU)
Name:
if-needed: (Look at the MENU)
Food-style:
range: (Burgers, Chinese, American, Seafood, French)
default: American
if-added: (Update alternatives of restaurant)
Times-of-operation:
range: a Time-of-day
default: open evenings except Mondays
Payment-form:
range: (Cash, Credit card. Check, Washing-dishes-script)
Event-sequence:
default: Eat-at-restaurant script
Alternatives;
range: all restaurants with same Foodstyle
if-needed: (Find all Restaurants with the same Foodstyle)
Figure 7.8 Restaurant frame (from Barr, Cohen, and Feigenbaum, 1981; 217-18)
Here we imagine an actor doing something to an object and this will involve
certain tasks which move the object from the source to a destination. For
instance, imagine the problem of getting a robot to move three disks from one
peg to another peg, stacked in the same order (see figure 7.11). The robot might
be instructed to begin by placing the top disk, A, on to the middle peg. The
script/action frame for this might look as in figure 7.12.
Further scripts/action frames would specify the further actions necessary
to solve the problem and achieve the goal, that is, continue moving the rest of
the disks (from top to bottom) to the middle peg, resulting in the stack C, B,
A. Then move the disks (from top to bottom) from the second peg to the third
peg, resulting in the stack A, B, C, thus solving the problem. Although this
Representation (s) 169
EAT-AT-RESTAURANT Script
Props: (Restaurant, Money, Food, Menu, Tables, Chairs)
Roles: (Hungry-persons, Wait-persons, Chef-persons)
Point-of-view: Hungry-persons
Time-of-occurrence: (Times-of-operation of restaurant)
Place-of-occurrence: (Location of restaurant)
Event-sequence:
first: Enter-restaurant script
then: if (Wait-to-be-seated-sign or Reservations)
then Get-maitre-d’s-attention script
then: Please-be-seated script
then: Order-food script
then: Eat-food script unless (Long-wait) when
Exit-restaurant-angry Script
then: if (Food-quality was better than palatable)
then Compliments-to-the-chef script
then: Pay-for-it Script
finally: Leave-restaurant script
Figure 7.9 Eating-at-a-restaurant script (from Barr, Cohen, and Feigenbaum, 1981:
218-19)
Figure 7.10 Generic script/action frame (adapted from Staugaard, 1987: 103, figure 3.23)
example is artificially simple, it gives the idea of how scripts might be used to
direct event sequences, and not just record event sequence expectations.
Strengths
The main strength of frames/scripts is their ability to group relevant infor¬
mation together for easy access, and so frames, scripts, schemata, etc., have
170 The Digital Computational Theory ojMind
r
Initial state
frame
lAl
Disk A, B, C on peg 1
Disk A on B
Task
Figure 7.1 / Initial disk state (from Staugaard, 1987; 105, figure 3.25)
Figure 7.12 Move disk script/action frame (from Staugaard, 1987: 104, figure 3.24)
Representation (s) 171
PC SN F/S
Weaknesses
The main weaknesses of frames/scripts is that (1) they (like semantic net¬
works) have no explicit semantics, (2) nor (as with semantic networks) is there
any general theory of their scope and limitations. Finally (3), there is the
problem of information that does not belong in any particular frame or script,
but is a part of our general commonsense understanding, e.g. the idea of paying
for goods or services, or the idea that unsupported disks fall when released.
All three representation schemes, then, have certain general virtues and vices,
which we can tabulate as in figure 7.13. We would like to have a representation
system with all the virtues and none of the vices of the above.
The second question we raised at the outset concerns the nature of repre¬
sentation. On the digital computational theorv, how do representations represent}
Or, in virtue of what do representations represent what they do} With regard to
the three formalisms just surveyed, we can ask: how does “T(a,b)” (= Taller
172 The Digital Computational Theory of Mind
(Agnes, Betty)) represent the fact that Agnes is taller than Betty? How
does the left portion of the semantic network represent the fact that Clyde is
a robin? How does the room frame represent a room (and not a car)? How does
the restaurant script represent the typical order of events in eating at a restau¬
rant? (and not visiting a dentist?) Well? The silence is deafening. It is a
striking fact about these latter notational systems that their representations
are not explicitly compositional, or as we will say, they have no explicit com¬
positional semantics. And for all systems we have surveyed, it is not a part of
the computational story to say what these representations are about. We know
what they are intended by their programmers to be about because of the
English-like words that appear in them. These words may help us, but inter¬
preting these words is not a part of the computer program. We might put it
like this: what do these representations represent to or for the computer rather
than us}
One possible story (Cummins, 1989, chapters 8 and 10) is that digital compu¬
tational representations represent by being isomorphic to what they are about.
According to Cummins’s way of developing this account, the kind of repre¬
sentation to be found in computers is distinctive enough to warrant a special
label “simulation representation” (“s-representation”).
An example
Representation
But where did the connection between button pressings, displays, and numbers
come from, if not the human interpreter? According to Cummins, the fact that
the above sequence of operations is isomorphic to plus (+) makes these repre¬
sentations of numbers. If the buttons were mislabeled and/or the display wires
crossed, we would have to discover what symbolized what, but we would do
that by finding which symbol-number correlation satisfied the plus operation.
As Cummins says at one point: “there is a . . . sense in which it [an adding
machine] represents numbers because it adds: we can speak of it as represent¬
ing numbers only because it simulates + under some interpretation” (1989: 93).
And in this sense a graph or equation represents sets of data, or a parabola
represents the trajectory of a projectile.
Problem
One question this proposal will have to face is how to get the representation to
be about the right things, since all sorts of objects can be made isomorphic with
representations. If a machine can be interpreted as multiplying, it can be inter¬
preted as adding. Indeed, there are an unlimited number of things that it can
be interpreted as representing, if isomorphism were sufficient. One proposal
for dealing with this problem is what might be called the “selection-by¬
purpose” approach. This is a two-step procedure whereby we first get all the
isomorphic interpretations, then we select the “right” one on the grounds that
it has something special to do with the purpose of the computation. But of
course this second step is not computational, and so takes us outside the
domain of computation. Thus, we are again in the situation that the compu¬
tational answer to question (Q2) - what is the nature of representation: how
do representations represent what they do? - is incomplete.
VVe earlier (chapter 4) posed three problems for the simple detector semantics
of the frog: the right cause (or depth) problem, the spread (or qua) problem,
and the misrepresentation (or disjunction) problem. Now that we have sur¬
veyed some standard high-level computer representation systems, we can raise
a fourth problem - the problem of accounting for logical relations between repre¬
sentations. (We could have raised this problem for the frog, but it seems point¬
less to speculate on logical relations between frog representations.) With the
DCTM we are on much stronger ground since logical relations are among the
most thoroughly studied and completely implemented relations in computer
174 The Digital Computational Theory of Mind
science - and here is where the predicate calculus (PC) shines as a knowledge
representation scheme.
But there is an important wrinkle. As we saw in our exposition of PC, the
system is determined by two sorts of rules: syntactic rules for building and
inferring sentences, and semantic rules for assigning references and truth
values. Some logical relations, such as the notion of inference and proof, are
“syntactic.” Others, such as the notions of truth and entailment (P entails Q
if the truth of P guarantees the truth of Q), are “semantic.” And we have seen
that truth in PC depends on the notion of reference, and that is a notion that
links representations to the world, and hence is outside the official limits of
computational explanations. However, it has been known since the early 1930s
that for many systems of representation, if one representation P from that
system entails another representation Q_from that system, then there is a proof
(an inference according to the rules) in that system from P to Q. That is, the
system has a kind of “completeness.” And when a system has this kind of com¬
pleteness, then the theory of proof (“proof theory”) can stand in for the theory
of truth relations (“model theory”) — internal proof relations mimic external
truth relations. Now, one thing digital computers are good at is tracking infer¬
ence relations between representations, and this has given some hope that com¬
putational representations can be said to have not only “syntactic” logical
relations, but “semantic” logical relations as well. We will see in the next
chapter that the ability of computers to use internal inference relations to track
external truth relations is seen by some as a powerful incentive for giving com¬
puter representations some sort of semantic content.
Study questions
Introduction
Betty is female
Betty is taller than Agnes
Something is female
Everything is taller than Agnes or Agnes is not female
What is a frame?
What is a script?
What problem is there for all three representation systems regarding how they
represent what they do?
Suggested reading
Introduction
Barr, Cohen, and Feigenbaum (1981) offer a survey of the representation schemes dis¬
cussed here, as well as some not discussed. Rich (1984, chapter 7) discusses semantic
networks, frames, and scripts. Cherniak and McDermott (1985), chapter 1, also survey
176 The Digital Computational Theory of Mind
representations and their role in AI. Staugaard (1987), chapter 3, contains a very good
survey of all the representation systems discussed here from a robotics/AI point of
view. Partridge (1996) surveys the representational schemes covered here, as well as
general issues of knowledge representation in AI. Thagard (1996), chapter 1, related
representations to computation, then (chapters 2 and 4) discusses predicate logic and
frames from a cognitive science point of view.
Jeffrey (1991) is a clear introduction to logic. Rich (1983) devotes chapter 5 to predi¬
cate calculus representation. Mylopoulos and Levesque (1984) contains material espe¬
cially relevant to PC. McDermott (1986) and Hewitt (1990) critically assess the role of
logic in AI.
Semantic networks
One of the first works to explore the psychological use of SNs in modeling memory
was Quinlan (1966), parts of which appeared as Quinlan (1968). The theoretical ade¬
quacy of semantic networks is explored in detail in Woods (1975). Knowledge level
issues are explored in Brachman (1979), which also contains useful history and sug¬
gestion for overcoming inadequacies. Psychological applications can be found in
Norman and Rumelhart (1975).
Frames/scripts
The classic study of frames remains Minsky (1975), and it is still recom¬
mended reading. Winograd (1975) discusses frames in relation to general issues of
machine representation, as does Hayes (1980). For a more psychological approach to
frames and scripts see Schank and Abelson (1977), or the survey in Eysenck and Keane
(1990).
McDermott (1976) and Hayes (1979) highlight the problem of the labels in represen¬
tation doing too much work. An attempt to characterize computation as embedded in
the world can be found in McClamrock (1995).
Interpretational semantics
8.1 Introduction
The driving idea behind the computational model of cognition is the idea
that cognition centrally involves the notions of mentally manipulating (creating,
transforming, and deleting) mental representations of the world around us, and
that computers are automatic symbol manipulators par excellence. We went on to
identify two major aspects of digital computers - architecture and representa¬
tion. We divided architecture into its two main components, memory and
control, and taxonomized some famous architectures on the basis of these
notions. We also surveyed some influential representation schemes in digital
artificial intelligence. We turn back now to the cognitive side. Are there
any more specific reasons for taking the computer “analogy” seriously, that is,
for thinking it is literally true - that cognition really is a species of (digital)
computation} What is the nature of mental representation? How does
computation relate to consciousness? What kinds of cognitive architectures
are there?
These are some of the questions we must at least face, if not answer. Much
of current cognitive psychology and cognitive neuroscience is devoted to
investigating these questions. We will not go into the details here. What we are
interested in is the general computational framework within which so much of
“information processing” psychology is carried out. Since there are a variety
of possible computer architectures, it would be a mistake to suppose that the
computer model requires any particular cognitive organization. In the next few
sections we will ignore the organizational details of different kinds of com¬
puters and focus on the fact that they all automatically manipulate represen¬
tations according to general principles. We will then return to the question of
cognitive organization at the end.
The Digital Computational Theory of Mind 179
The DCTM did not emerge ex nihilo\ it can instructively be viewed as a special
case of a more general theory, the so-called Representational Theory of Mind
(RTM).
(RTM)
1 Cognitive states are relations to (mental) representations which have
content,
2 Cognitive processes are (mental) operations on these representations.
Hume
For Hume, mental representations are “ideas” and we “entertain” them when
we are in mental states such as believing, desiring, and intending. Mental
processes for Hume are sequences of associated ideas, and we explain behavior
by citing current relevant ideas. However, this does not tell us what their
“semantics” is - what they are “about” and how they are “about” what they are
about. In other words, Hume’s account of ideas and mental processes does not
tell us how, for example, a belief can have a truth value, and ideas in a belief can
determine a reference and be about something. Hume seems to have held that
ideas are like images and are about what they resemble or resemble most. But
resemblance has all sorts of problems as a theory of representation. First, it is
both not general enough and too general; second, it does not really account for
mental reference; and third it does not accommodate truth and falsity. It is
180 The Digital Computational Theory of Mind
instructive to look at each of these in more detail. To see the firsts notice that
many of our ideas are not images that can resemble something (thing of abstract
notions like “justice”), they are more like concepts. And even when they are
images (thinking of the Eiffel Tower as having an image of the Eiffel Tower),
they are sometimes too specific to be correct. One’s image of the Eiffel Tower
probably also represents it as having a certain general shape, orientation, rela¬
tive size, viewing perspective (from the side, not from the top), etc. Yet none of
these features is essential to thinking about the Eiffel Tower. On the other hand,
some images are excessively indeterminate (some people say “ambiguous”).
Wittgenstein’s influential example of this was a stick figure with one leg in front
of the other, drawn on a slope - is the figure walking up, or sliding down? Both
are captured exactly by the sketch. To see the second, notice that in general,
resemblance is not sufficient for representation. Your left hand resembles itself
(and your right hand), but it does not represent them. A political cartoon might
resemble an R. Nixon imitator more than the actual R. Nixon, but we don’t want
to say the cartoon is therefore about the imitator rather than Nixon. Finally, to
see the third, images themselves are not true or false - what is true or false is
something more “sentential” or “propositional” rather than something
“nominal.” We don’t say, for example, “the Eiffel Tower” is true/false; we say,
for example, “The Eiffel Tower was built by Gustav Eiffel” is true/false.
Propositional attitudes
Later in this chapter and the next we will see how important the class of
“propositional attitudes” is to the DCTM. For now we will just take a brief
look at them. The propositional attitudes (beliefs, desires, intentions, etc.) have
both common and distinctive characteristics and it will be useful to organize
our discussion in this way.
Common characteristic
(1) As we have just seen from Frege and Russell, these states are called
“propositional attitudes” in part because they can all be factored into two parts:
the propositional content, and a part that constitutes the “attitude proper”
towards that propositional content (what Searle, 1979, 1983, calls the “psy¬
chological mode”). Likewise, attributions of propositional attitudes typically
have the form: “xAp,” where “x” represents a person, “A” represents an atti¬
tude (believes, desires, intends), and “p” represents a content. We normally
report a propositional attitude with an attitude verb plus a propositional-
expressing sentence:
Verb + Sentence
believe (that) there is life on Mars
Verb + Sentence
desire (that) there be life on Mars
want there to be life on Mars
intend that I go to Mars
Distinctive characteristics
(2) One important difference among the attitudes has to do with what
is called the “onus of match” (Austin) and “direction of fit” (Anscombe,
Searle). Although propositional attitudes share the feature that they fit the
world when they are satisfied, the way they do that can be different for differ¬
ent attitudes. For instance, beliefs represent an antecedently given world, and
if the world fits them they are true, but if it fails to fit them, the belief is false.
We will say that beliefs (and some other attitudes) have a mind-to-world direc¬
tion of fit. Desires and intentions., on the other hand, are like blueprints for the
way the world is to become - the world is to fit them, and when the world
becomes the way they represent it, they are satisfied. We say that desires and
intentions have a world-to-mind direction of fit. Notice that the attitude and the
direction of fit always go together in that it is not the case that some beliefs
have one direction of fit, other beliefs have the other. This is because the atti¬
tude determines the direction of fit. And since the elements of the proposition,
P, determine the way the world is represented to be or become, the proposition
determines the conditions of fit (no matter what the direction of fit may be) (see
figure 8.1).
(3) Another potential difference among the attitudes has to do with the
experiential character of the attitude, or the lack of such experiential charac¬
ter. For example, if you fear that there is an intruder in the house you proba¬
bly have a fairly specific feeling, one that you will not confuse with desiring that
there be an intruder in the house. But if you believe that there is life on Mars
(or an intruder in the house, for that matter), is there a specific “belief feeling”
that accompanies it in the way fearing and desiring seem to be accompanied.^
Furthermore, there is the question whether these “feelings” are essential to
holding the attitude or only an accompaniment. These are current topics of
debate in cognitive science, and we will return to some of them later.
The score
Hume’s theory (association of ideas + resemblance) has the advantage of an
account that accommodates psychological explanations, but it has no workable
account of representational content - images are (1) too special and restricted
The Digital Computational Theory of Mind 183
a class of mental representations to serve as the model for all mental reference,
and (2) they give us no account of how propositional attitudes can be true or
false, fulfilled or unfulfilled, etc. Frege and Russell’s theory accommodates the
notion of representational content in general that can be true or false, etc., but
it gives no account of psychological explanation. We would like a theory that
both accounted for psychological explanation and representational content.
Before we elaborate and justify the full digital computational theory of mind
(DCTM), we will first sketch a bit of history and motivation for the compu¬
tational approach, then present the DCTM in its basic form.
A hit of history
The imitation game The new form of the problem can be described in terms of
a game which we call the “imitation game.” It is played with three people, a man
(A), a woman (B), and an interrogator (C) who may be of either sex. The inter¬
rogator stays in a room apart from the other two. The object of the game for the
interrogator is to determine which of the other two is the man and which is the
woman. He knows them by labels X and Y, and at the end of the game he says
either “X is A and Y is B” or “X is B and Y is A.” . . . The ideal arrangement
is to have a teleprinter communicating between the two nxtms. . . . We now ask
184 The Digital Computational Theory of Mind
the question, “What will happen when a machine takes the part of A in the
game?” Will the interrogator decide wrongly as often when the game is played
like this as he does when the game is played between a man and a woman? These
questions replace our original “Can machines think?”
The new problem has the advantage of drawing a fairly sharp line between the
physical and the intellectual capacities of a man.
Winning the game I believe that in about fifty years’ time it will be possible to
program computers, with a storage capacity of about 10’, to make them play the
imitation game so well that an average interrogator will not have more than 70
percent chance of making the right identification after five minutes of ques¬
tioning. The original question, “Can machines think?” I believe to be too mean¬
ingless to deserve discussion. Nevertheless I believe that at the end of the century
the use of words and general educated opinion will have altered so much
that one will be able to speak of machines thinking without expecting to be
contradicted.
Note that although Turing discusses the conditions under which one might
conclude that a digital computer can think, this is not by itself a claim that human
thinking is computation. That is, some digital computation might be “think¬
ing” (Turing often used scare quotes here), and human cognition might be
thinking, but human cognition might not be digital computation. It is hard to
find Turing actually claiming that human thinking is a species of digital com¬
putation. In his 1947 lecture (mentioned above), Turing makes the provoca¬
tive remark: “All of this suggests that the cortex of the infant is an unorganized
machine, which can be organized by suitable interfering training. The orga¬
nizing might result in the modification of the machine into a universal machine
or something like it” (1947: 120). The idea seems to be that the cortex starts
out as a kind of jumble of connected neurons (a connectionist machine?) and
by means of training/education it becomes organized into something approxi¬
mating a universal Turing machine. If we add the idea that it is in virtue of
the computational states of the cortex that the brain has the cognitive states
that it has, we would have an early version of the computational theory of
mind.
The field of artificial intelligence, and to some extent cognitive science, has
seen the “Turing test” as a test for thinking (intelligence, cognition, mentality,
etc., these different terms are used indiscriminately). If passing the test
(winning the game) is sufficient for “thinking” and a computer can be pro¬
grammed to pass the test, then the essence of thinking must be computation.
And if thinking is just running mental algorithms (what else could it be?), then
the Church-Turing thesis tells us that these can be carried out by Turing
The Digital Computational Theory of Mind 185
machines. And Turing’s theorem tells us that each of these machines can be
mimicked by a universal Turing machine. So, the reasoning went, thinking is
a species of computation on something equivalent to a universal Turing
machine, say a von Neumann machine or a production system. Thus, the job
of artificial intelligence, cognitive psychology, and neuroscience, is to discover
the actual algorithms of the mind. (We will say more about this line of rea¬
soning in the next chapter.)
cognition^ only pain. Yet there are reasons for supposing that Putnam had
simply picked pain as a popular example mental state, and that the argument
could be generalized to mental states in general. For instance, when arguing
against the “brain state identity theory” he comments: “the brain state theo¬
rist is not just saying that pain is a brain state; he is saying that every psycho¬
logical state is a brain state.” Clearly Turing machine functionalism would
also have to be a theory of all psychological states, if it is to compete against
the brain state theory of all psychological states (note the generality): “if
the program . . . ever succeeds, then it will bring in its wake a . . . precise
definition of the notion ‘psychological state.’” And recall, the paper is not
entitled “The nature of pain states” (nor was it originally titled “Patn predi¬
cates”), but “The nature of mental states” - the generality is important. So it
was more likely Putnam, rather than Turing, who made the crucial move
from the “intelligence of computation” to the “computational theory of
intelligence.”
The Physical Symbol System Hypothesis. A physical symbol system has the
necessary and sufficient means for general intelligent action.
Note that this proposal endows a system with the capacity for “intelligent
action,” not cognition or mind directly, and one might argue that we are more
willing to call the behavior of mechanical systems “intelligent” than we are to
say they have minds - nobody seems inclined to say “smart bombs” have minds.
But undoubtedly it is tempting to suppose that a precondition for intelligence
is cognition or mind, and that seems to be the way the field has treated the
The Digital Computational Theory of Mind 187
With those historical remarks in place, let’s turn to the generic computational
theory of mind (CTM) as currently formulated (the formulation of the CTM
we use here is adopted from Fodor, 1987). The CTM account of cognition in
general, and mental states such as the propositional attitudes in particular, can
now be seen as a special case of RTM. It is an attempt to specify in more detail
the nature of the relations, operations, and representations involved:
We will take up the first two points now, the third in the next section.
Attitudes
The formality constraint tells us when we have two representations, but when
do we have two attitudes.^ What, for instance, distinguishes a belief from a desire
from an intention.^ Consider Mr. Bubblehead (see figure 8.2). Here we see a col¬
lection of representations stored under the headings “beliefs,” and “desires.”
What makes a representation a part of a belief, rather than a desire, is the way it
interacts with other representations — the way it is utilized and operated on in
theoretical and practical reasoning. If you want a drink and believe there is a coke
in the fridge, then other things being equal you will intend to go to the fridge, get
the coke, and drink it. It is customary to refer to the set of interactions between
representations that constitute belief as the “belief box,” the set of interactions
between representations that constitute desires the “desire box,” and so on for
the other propositional attitudes (see Mr. Bubblehead again). These “boxes” are
simply convenient ways of visualizing complicated computational relations.
These computational relations and operations are typically given to the machine
by programming it, though they are sometimes hardwired (we return to this
distinction in the Coda). For now we retain the neutral terminology.
Representations also have meaning, or as we will say, they have “mental
content” in that they express some proposition and so are about something.
We can formulate the generic CTM as follows:
(CTM)
1 Cognitive states are computational relations to computational mental repre¬
sentations which have content.
2 Cognitive processes (changes in cognitive states) are computational opera¬
tions on computational mental representations which have content.
The Digital Computational Theory of Mind 189
Figure 8.2 Mr. Bubblehead (adapted from Stich, 1983: 75; reproduced by permission of
the MIT Press)
In light of these, we can formulate the basic DCTM as the following digital
specification of the CTM:
190 The Digital Computational Theory oj Mind
(B-DCTM)
1 Cognitive states are computational relations to computational mental repre¬
sentations which have content.
2 Cognitive processes (changes in cognitive states) are computational opera¬
tions on computational mental representations which have content.
3 The computational architecture and representations (mentioned in 1 and
2) are digital.
Notice again that (CTM) analyzes cognitive states and processes into two parts:
a computational relation or operation, and a mental representation. It is a
general feature of digital machines that they compute in some kind of code. If
computation is a species of digital computation, then the mind manipulates
(creates, transforms, and deletes) mental representations, and these mental rep¬
resentations form, at bottom, a kind of machine code for the brain. The code
structure of this system has suggested to some that mental representations
form a language-like system. This has become known as the language of thought
(LOT) hypothesis (see Fodor, 1975). Incorporating this idea into (CTM) gives
us our first official version of the digital computational theory of mind:
(DCTM)
1 Cognitive states are computational relations to computational mental repre¬
sentations (in the language of thought) which have content.
2 Cognitive processes (changes in cognitive states) are computational opera¬
tions on computational mental representations (in the language of thought)
which have content.
3 The computational architecture and representations (mentioned in 1 and
2) are digital.
To the extent that there is evidence for a language of thought, there will be
support for one aspect of DCTM. What is the LOT and what reasons are there
to accept it.^
The Digital Computational Theory of Mind 191
The basic idea behind the language of thought (LOT) hypothesis is that cog¬
nition involves a language-like representation system. Clearly, spoken natural
languages have some properties irrelevant to cognition, such as being spoken,
being used to communicate, being acquired, etc., so the language of thought
will not be like (spoken) natural language in these respects. But in what
respects.^ Typically these;
(LOT)
1 The system has a basic “vocabulary” for representation, which can include
concepts, percepts, images etc.
2 The vocabulary items are combinable in accordance with general principles
- i.e. the system has a “syntax.”
3 The vocabulary items and structured combinations of them are about
something - they have a compositional “semantics.”
A (very) short story: Alice and Betty are leafing through their 1959 high school
yearbook from Hibbing, Minnesota. Getting to the end, Alice comments:
192 The Digital Computational Theory oj Mind
This motivation for the LOT derives from the fact that human thought
seems to be productive, in the sense that we can think an indefinitely large
series of distinct and novel thoughts, for example, 1 is a number, 2 is a number,
and so on. This ability seems to be limited not by anything intrinsic to the
thinking, but by extrinsic factors such as motivation and attention, longevity,
etc. Yet we are finite creatures with finite memory and operations, so how
can we explain this capacity to produce a potential infinity of distinct thoughts?
The LOT says: we have a finite representational system with a composi¬
tional structure, and the LOT allows old pieces to be put together in new
combinations.
This motivation for the LOT hypothesis has the form: human (and some
animal) cognition has certain systematic properties that the LOT hypothesis
would explain, and other theories cannot. So by inference to the best explana¬
tion, we should conclude that the LOT hypothesis is true.
The Digital ComputaUonal Theory of Mind 193
Thought
Fodor (1987) originally put the argument for the LOT based on the system-
aticity of thought like this: “LOT says that having a thought is being related to
a structured array of representations; and, presumably, to have the thought that
John loves Mary is ipso facto to have access to the same representations, and
the same representational structures, that you need to have the thought that
Mary loves John. So of course anybody who is in a position to have one of these
thoughts is ipso facto in a position to have the other. LOT explains the
systematicity of thought” (1987: 151).
Inference
The DCTM can be elaborated in such a way that it (1) offers the best avail¬
able solution to a traditional philosophical puzzle, the so-called “mind-body
problem,” and (2) provides a fruitful framework for empirical investigations
194 The Digital Computational Theory of Mind
into cognition. This is something few if any other theories of mind can boast
and as such it deserves careful scrutiny.
The mind-body problem can be put thus: what is the relationship between
mental phenomena (states, events, and processes) and physical phenomena
(states, events, and processes)? One of the advantages of DCTM is that it
provides a framework for constructing successful theories in cognition, and it
does this, in part, because (1) it offers a solution to the mind-body problem
that is free from the objections of other theories, and (2) it makes psycho¬
logical explanation of behavior an acceptable kind of causal explanation. We will
work our way up to this by first examining the competing views and exposing
their weaknesses.
The competition
Dualistic interactionism
This is the view, made famous by Descartes (see chapter 3), that the world has
two different substances., the mental and the physical. Mental substance is con¬
scious and temporal but not spatial (and so not divisible and not subject to the
laws of physics) whereas physical substance is both temporal and spatial (and
so divisible) and subject to the laws of physics. These substances interact
causally - mental events can cause physical events, and physical events can
cause mental events.
Pro
The interactive part goes well with our commonsense intuitions that physical
causes (drinking a lot) can have mental effects (drunkenness), and that mental
causes (deciding to wave goodbye) can have physical effects (waving goodbye).
Con
1 The dualistic part is incompatible with experimental psychology, which
includes many of the same methods as the physical sciences, and which
would not seem applicable to mental substance (see above).
2 The causal part is a mystery - how can something non-spatial and non¬
material cause something spatial and material to happen without violating
principles of conservation of energy, etc.?
The Digital Computational Theory of Mind 195
Epiphenomenalism
This is the view that although physical phenomena can cause mental phe¬
nomena, mental phenomena cannot cause physical phenomena. On this view,
just as the blinking lights on (or heat from) a computer are a by-product of
its operation and do not contribute to its processing, so mental phenomena
such as thought are just a by-product of the working of the brain, and do not
contribute to the course of action.
Pro
1 This avoids the second objection to dualism - there are, for instance, no
violations of conservation of energy and momentum.
Con
1 It does not avoid the first objection to dualism.
2 It makes human thought, decisions, etc., irrelevant to the course of human
history, and this is a bit hard to accept.
Radical behaviorism
The view that mental phenomena just are certain kinds of behavioral disposi¬
tions — dispositions to react to certain stimuli with certain responses.
Pro
It “solves” the mind-body problem by dissolving it - there is no relationship
between the mental and the physical because there is no mental.
Con
1 Without overwhelming evidence it is too hard to believe that there
are no mental phenomena and no “interaction” between mind and body
at all.
2 Psychology has not turned out this way at all. Current cognitive psychol¬
ogy posits elaborately structured mental causes for behavior.
This is the semantic view that every statement ascribing mental phenomena
is equivalent to some behavioral hypothetical (if such-and-such were to
happen, then so-and-so would result) statement, in much the same way as
statements ascribing fragility to a glass are analyzed in terms of its breaking if
dropped, etc.
196 The Digital Computattonal Theory of Mind
Con
1 There is no reason to suppose that such behavioral hypotheticals can be
given for all mental phenomena - no one has ever given even one adequate
analysis.
2 It does not account for (mental) event causation: coming to feel thirsty and
noticing that there is a glass of water on the table caused her to reach out
for the glass. This is the fundamental sort of causation.
Physicalism
Token physicalism
This is the view that each token (particular) mental phenomenon is identical with
some token {particular) physical phenomenon^ but that mental and physical types
may not correspond. Suppose that for humans every actual pain is a C-fiber
stimulation, and that C-fibers are made ultimately of hydrocarbons. Suppose
that -Martians are made of silicon and when they feel pain S-fibers are stimu¬
lated. These S-fiber stimulations would be different types of physical phe¬
nomena, and so different types of mental phenomena.
Type physicalism
This is the view that each type of mental phenomenon is identical to some type
of physical phenomenon. So, two systems can exhibit the same type of mental
phenomenon (e.g., pain) only if they exhibit the same type of physical phe¬
nomenon (e.g., C-fiber stimulation). Thus, since silicon is distinct from hydro¬
carbon, a silicon-based Martian and a hydrocarbon-based human could not
both be in pain, if pain is C-fiber stimulation and C-fibers are hydrocarbon.
Type physicalists are also token physicalists (how could one claim that all types
of mental phenomena are identical to types of physical phenomena, but par¬
ticular tokens of mental phenomena are not tokens of physical phenomena.^).
Type physicalism is a highly reductive theory — it reduces mental phenomena
to physical phenomena, though unlike “eliminativism,” it does not deny that
The Digital Computational Theory of Mind 197
there are mental phenomena. (Note: saying that Bob Dylan is Robert
Zimmerman does not eliminate Bob Dylan - there still is Bob Dylan.)
Supervenience
One popular way of getting these benefits is called “supervenience,” and when
combined with functionalism (see below) it is one of the most popular pictures
of the mind-body relationship at present - so we will take a bit of time de¬
veloping this idea. What does this mean? First some examples outside the
mind-body problem. All sorts of states supervene on other states. G. E. Moore
seems to have introduced the notion with an example from art. Consider a
painting, say the Mona Lisa. Now consider a molecule-for-molecule duplicate
of the Mona Lisa — twin Mona Lisa. Could the first be a beautiful painting.
198 The Digital Computational Theory of Mind
but not the second? They are identical down to their paint molecules; so any¬
thing beautiful about the one would seem to be exactly as beautiful about the
other. So it looks like the answer is “no,” and we say that the beauty of the
painting supervenes on its physical properties (paint molecules). Or consider
baldness - could one person who is molecule-for-molecule (and so, hair-for-
hair) identical to his twin be bald without the twin being bald? Again, the
answer seems to be “no.” Molecular twins have identical hair distribution, they
are identically bald and we say that baldness supervenes on physical properties
(hair distribution). In general, then, we say:
Now' let’s return to the more controversial domain of the mind-body problem.
nience). These are different; it is a law of nature that nothing travels faster than
light (186,000 miles/sec.), but there is nothing logically contradictory about
saying something traveled at one more mile per second than that - perhaps the
universe could have had different physical laws.
Situations'. We might want to say that the situations in which we test for same¬
ness and difference of A- and B-facts are “local,” are restricted to the indi¬
viduals involved - think again of the Mona Lisa and the bald man. In this case,
A-facts “supervene locally” on B-facts if any individual who duplicates the B-
facts, will duplicate the A-facts. Or we might want to include the whole context
in which the individual is embedded - the whole world it is a member of In
this case, A-facts “supervene globally” on B-facts if any world that duplicates
the B-facts will thereby duplicate the A-facts. The difference becomes impor¬
tant if the A-facts involve relations to the world the system finds itself in
(unlike our original examples of beauty and baldness). For instance, two organ¬
isms that are physically identical might have different survival or “fitness”
values because of differences in the environments they find themselves in, so
the Darwinian biological property of fitness would not supervene locally on
physical make-up. But it would supervene globally, because then we would have
to keep the whole world the same, including the environment the organism
finds itself in.
Functionalism
Pro functionalism
Functionalism has the advantages of dualism, behaviorism, and physicalism
w ithout their disadvantages. First, like dualism and unlike behaviorism, function-
The Digital Computational Theory of Mind 201
The DCTM
We get the digital computational theory of mind (DCTM) from machine func¬
tionalism if we make the functional relations mentioned in machine function¬
alism be the computational relations mentioned in DCTM (and of course we
keep the representations in the LOT). In this way we can see that the DCTM
202 The Digital Computational Theory ofAIind
Frog
Regarding the frog, on the positive side, its visual system can represent (detect
and track) particular objects and situations in its environment, limited though
The Digital Computational Theory of Mind 203
Digital computer
The digital computer, on the other hand, could naturally accommodate logical
relations between representations, but how could it naturally detect or track
particular objects and situations in its environment.^ Recall (see chapter 7) that
the only proposal we had as to how digital representations represent what they
do - what connects the representation to what they represent - was interpreta-
tional semantics (IS) whereby isomorphism between representation and repre¬
sented underwrote the representation relation. But we saw that thinking about
interpretation as isomorphism is too liberal. Rather, it looks as though what the
machine is “thinking about” is determined by the programmer (broadly con¬
strued to include the community which interprets the input and output of the
machine). For instance, if the programmer loads a program telling the machine
to move a bishop to such-and-such a position we might take the machine to be
solving a chess problem. If the programmer loads a program telling the machine
to move a tank battalion to such-and-such a position we might take the machine
to be planning desert warfare. Conceivably, these could be isomorphic, formally
identical programs - only annotations to the program (which are intended for
us, not the machine) would be different. We have no computational answer to the
question: w hat makes the computer, then, “think about” chess vs. war.? We need
some connection to the environment.
On this view, the content of the belief is fixed or determined by all the rele¬
vant relations it participates in, its functional role. One of the strengths of this
sort of theory is that it attempts to combine the virtues of the frog (connec¬
tion with the environment) and the data structure (logical relations). To see
this let’s return to our original example of thinking: that man is tall:
It would seem that if we are going to respect the formality constraint, then
once information is represented in the computer, it can only be acted upon in
virtue of its form, and this suggests strongly that content is determined by just
internal, conceptual role. If this is right, then DCTM is a narrower theory than
functionalism in the sense that functionalism allows that external functional
role can play a role in determining the content of a thought. On the other hand,
DCTM agrees that mental states and processes are computational relations and
operations over mental representations which have content. But it is not an
official part of DCTM how that external dimension of content is fixed, only
how the internal conceptual role is determined. Again, DCTM claims that
cognitive states and processes are computational relations to and operations
on representations (data structures). These structures have content - they
represent something. But (1) what they represent plays no direct role in these
computations, only the “form,” “syntax,” “structure” of the representations
does and (2) what they represent is not (always) explicable in computational
terms. In a sense, then, DCTM assumes or presupposes a (partially) non-
computational account of content. Well, so be it. Let’s turn to the computa¬
tional part: what is conceptual role and what determines it?
Conceptual role
The general idea behind conceptual role (CR) theory - let’s call it “generic”
CR theory — is this:
(G-CR)
The content of a representation is determined by the role the representa¬
tion plays in the conceptual system of which it is a part.
One gets different CR theories by spelling out the nature of the representa¬
tions involved, the nature of the roles they play, and the nature of the con¬
ceptual system they are a part of For instance, one early influential proposal
connected CR with conditional probability (see Field, 1977: 390):
(CP-CR)
Two representations have the same conceptual content if the conditional
probability of one representation, relative to additional pieces of informa¬
tion, is the same as the conditional probability of the other, relative to those
same pieces of information.
content, yet we seem to be able to change our beliefs about something without
changing the concept itself. Furthermore, such a definition of (sameness of)
content means that content cannot be compared between people, since it is
relative to the system that contains it. Finally, there is the problem of calcu¬
lating these probabilities, which grow exponentially.
There are three options here; (1) concepts and thoughts could be given con¬
ceptual roles simultaneously and independently; or (2) conceptual roles of
concepts are basic, thoughts are made out of concepts and so inherit their con¬
ceptual roles from their constituent concepts; or (3) the conceptual role of
thoughts is basic, and the conceptual role of concepts is whatever they con¬
tribute to the conceptual role of all of the thoughts they are constituents of
Different theorists choose different options.
DCTM
Applying this idea to DCTM, recall that according to the LOT hypothesis, rep¬
resentations are either simple vocabulary items in the LOT, or they are complex
representations constructed out of vocabulary items in the LOT. So the LOT
‘version of option (1), where concepts and thoughts are equally basic, would be:
(LOT-TC)
The content of an item (“words” or “sentences”) in the LOT of a system
is determined (at least in part) by its relations to other items (“words” or
“sentences”) in the LOT of that system. [Thoughts and concepts are equally
basic.]
The LOT version of option (2), where concepts are basic, would be:
(LOT-C)
The content of sentences in the LOT (thoughts) is determined by the CR of
their constituent “words” (concepts) plus their internal structural relations,
the content of words in the LOT (concepts) is determined by their relations
to other words (concepts) in the LOT of the system. [Concepts are basic.]
What sort of relations participate in conceptual role and so fix content.^ Con¬
ceptual role theorists rarely are specific, and when they are, they rarely say the
same thing. One popular proposal connects CR with “inference” relations.
Since customarily inferential relations involve the notion of truth, they are
defined primarily over sentences (thoughts), and the conceptual role of words
(concepts) is derivative. In its generic form, the LOT version of option (3),
where thoughts are basic, would be:
The Digital Computational Theory of Mind 207
(LOT-T)
The content of “words” in the LOT (concepts) is determined by their con¬
tribution to the “sentences” of the LOT (thoughts) they occur in. The
content of “sentences” in the LOT (thoughts) is determined by their infer¬
ential role in the system. [Thoughts are basic.]
What sorts of “inferences”? Again, theorists differ (is this getting boring?).
Inferences might include: (a) only (valid) deductive inferences,^ (b) (valid)
deductive inferences and (cogent) inductive inferences, and (c) the above two
plus the “decisions” one makes.
An example of proposal (b) is due to Block (1986): “A crucial component
of a sentence’s conceptual role is a matter or how it participates in inductive
and deductive inferences” (1986: 628; emphasis added). An example of pro¬
posal (c) is also due to Block: “Conceptual role abstracts away from all causal
relations except the ones that mediate inferences, inductive or deductive^ deci¬
sion making, and the like” (1986). However, if we follow the lead of digital rep¬
resentations surveyed earlier (see chapter 7) it will be valid deductive inferences
that point the way to an inferential theory of content, because it is valid deduc¬
tive inferences that reveal information that is a part of the concepts involved,
rather than merely collateral, incidental information about whatever falls under
the concept. Valid deductive inferences alone reveal what must be true if the
world is the way “R” represents it to be. We will now incorporate the ideas
that: representations are in the LOT, thoughts are basic, and valid deductive
inferences characterize conceptual roles, by formulating the DCTM concep¬
tual role theory of content as follows:
(DCTM-CR)
1 If “R” is a sentential representation (thought) in the LOT, then its
content is determined by the fact that “R” participates in the (valid)
inferential relations:
(P) From “R” one may infer . . . [“R” serves as a premise]
(C) From . . . one may infer “R” [“R” serves as a conclusion]
2 The specific inference relations associated with “R” give the specific
content of R.
3 If “R” is a subsentential representation (concept) in the LOT, then its
content is the contribution it makes to the content (inferences) of all the
thoughts it occurs in.
I'he specific inference relations (P), (C) give the specific content of R. The
proposal made by defenders of the DCTM is that at least some aspect of the
representational content, the computationally relevant aspect, is captured
by its inferential role. The usual examples of this approach come from the
208 The Digital Computational Theory of Mind
(DCTM-CR:&)
The conceptual role of a representation “P & Q]’ is determined by the fol¬
lowing (valid) deductive inferences:
From “P & Q:’ infer “P”
From “P & Q;’ infer “Q]’
From “P” and from “Q” infer “P & Q:
And if content is given by conceptual role, we can conclude that the content of
(“and”) is determined by the inference relations it participates in, then
the content of “&” (“and”) is determined by (DCTM-CR:&). Conceptual role
theorists, of course, must generalize this project to all representations with
content. That is, CR theorists must fill out schema (DCTM-CR) for every
representation in the system with content. Unfortunately, conceptual role
theorists have mostly remained content to repeat (DCTM-CR:&) ad nauseam,
and we should be suspicious of their reluctance (or inability) to provide a
representative range of sample analyses.
We have already, from time to time, invoked the notion of “consciousness” and
it is time to foreground this concept and try to assess its relation to the DCTM.
To begin with, though, the word “conscious(ness)” has a number of related
uses, some of which are not relevant to our present purposes:
And there may be more. In this chapter we will be concerned primarily with
meta-consciousness, and in the next chapter with consciousness-of and
phenomenal consciousness.
Consciousness as meta-cognition
One idea is that consciousness involves having thoughts about one’s mental states.
Typically, we are in such meta-states when we can report our lower-level states.
It is cases where this is absent that highlight the phenomena. In everyday life
we may lack this kind of consciousness when we go on “automatic pilot” while
performing routine tasks, such as driving the same route to work. We do not
realize at the time we are performing the task - we suddenly realize we are
at work but can’t remember how we got there. There are also analogous
examples from science.
Split brain
The anatomical relationship that must be clearly understood in a consideration of visual studies on the
bisected brain are shown here. Because of the distribution of fibers in the optic system, information
presented to each eye is projected almost equally to both hemispheres. In order to assure that information
is presented to only one hemisphere, the subject must fixate a point. As a consequence of the anatomical
arrangement shown here, information projected to the right visual field goes only to the left hemisphere, and vice-versa.
Figure 8.3 Visual lateralization (from Gazzaniga and LeDoux, 1978; 4, figure 2; figures
8.3-8.5 reproduced by permission of Kluwer Academic Plenum Publishers)
The basic testing arrangement used for the examination of laterali/ed visual and stereognostic functions.
Figure 8.4 The basic testing arrangement (from Gazzaniga and LeDoux, 1978: 5,
figure 3)
The Digital Computational Theory oj AUnd 211
since it is the left hemisphere that normally possesses natural language and
speech mechanisms. Thus, for example, if a word (such as ‘spoon’) was flashed
in the left visual field, which is exclusively projected to the right hemisphere
. . . the subject, w hen asked, would say, ‘I did not see anything,’ but then sub¬
sequently would be able, with the left hand, to retrieve the correct object from
a series of objects placed out of view . . . Furthermore, if the experimenter
asked, ‘What do you have in your hand?’, the subject would typically say ‘I
don’t know.’ Here again, the talking hemisphere did not know. It did not see
the picture, nor did it have access to the . . . touch information from the left
hand, which is also exclusively projected to the right hemisphere. Yet clearly,
the right half-brain knew the answer, because it reacted appropriately to the
correct stimulus” (Gazzaniga and LeDoux, 1978: 3-5). In a famous case sub¬
jects shown sexually explicit pictures to their right hemispheres and nothing
to their left hemisphere, when asked what they saw said “nothing,” but they
blushed and giggled. And there are interesting examples of complicated
confabulations. The situation is portrayed in figure 8.5.
Here is the report: “When a snow' scene was presented to the right hemi¬
sphere and a chicken claw was presented to the left, P. S. [the patient] quickly
and dutifully responded correctly by choosing a picture of a chicken from a
series of four cards with his right hand and a picture of a shovel from a series
of four cards with his left hand. The subject was then asked, ‘What did you
see?’, ‘I saw a claw and I picked the chicken, and you have to clean out the
chicken shed with a shovel.’ In trial after trial, we saw this kind of response.
The left hemisphere could easily and accurately identify why it had picked the
answer, and then subsequently, and without batting an eye, it would incor¬
porate the right hemisphere’s response into the framework. While we knew
exactly why the right hemisphere has made its choice, the left hemisphere could
merely guess. Yet, the left did not offer its suggestion in a guessing vein but
rather as a statement of fact as to why that card had been picked” (Gazzaniga
and LeDoux, 1978: 148-9). It would seem that the left hemisphere has meta¬
consciousness, whereas the right hemisphere has just, on these occasions at
least, consciousness-of
Dichotic listening
Figure 8.5 The chicken-snow experiment (from Gazzaniga and LeDoux, 1978: 149,
figure 42)
Subjects were asked to paraphrase the right ear sentences and any subject who
reported hearing anything intelligible in the left ear had their data thrown out.
The remaining subjects showed a definite preference to paraphrase the ambigu¬
ous sentence in the direction of the appropriate context sentence. This is best
The Digital Computational Theory of Mind 213
We will call a cognitive architecture unitary if it treats all cognition (as opposed
to sensory input and behavioral output) as homogeneous, as based on a single
set of principles. Typically, unitary models of cognition represent minds as
constructed out of two kinds of components. At the edges there are input
(sensory) data and output (motor) responses. Most importantly, at the center
is the central processing unit, and “all higher level cognitive functions can be
explained by one set of principles” (Anderson, 1983: 2). This is shown dia-
grammatically in figure 8.6.
When the “cognitive” perspective replaced behaviorism in psychology and
the cognitive sciences during the 196()s, it brought with it a conception of
mental functioning as mental computation. The most pervasive example of
computational devices at the time was the standard stored program register
architecture von Neumann machine. Inadvertently, a particular computational
architecture became identified, in many minds, with the cognitive architecture
214 The Digital Computational Theory of Mind
SENSORY CENTRAL
INPUT
bL
c
etc.
‘7)
'C
S
Control:
fetch -
execute-
store
^_
/
'A
-
I
-
/
e
Working Central
memory memory
input
output
•u c/5 c/5
l_ c U
— O bo
c cd
■77. ^ cj 3 w c« 3 CJ
(/5 ^ .« CJ bc
w 5 c a> •- CJ
k. > 0
tm
E a. Q.
Recently another cognitive architecture has been proposed to account for the
relationship between incoming stimuli and the central cognitive systems, a
theory which by the 1990s had become virtual orthodoxy: “modularity” theory
(see figure 8.8).
In traditional unitary architectures sensory inputs contrast only with a
central processor, but in modular architectures there are “input systems” (ISs)
which contrast with two other types of components: sensory transducers'^ and
central systems (CSs).
Beliefs (and
—►other
representational
states)
-► is causation
^ — is representation
Appendix
Modularity: Gall vs. Fodor
We have briefly reviewed two “modular” views of the mind - Gall’s in chapter
3, and Fodor’s in this one. How do they compare.^
pctence in that faculty was proportional to brain mass dedicated to it. Given
that the skull fits the brain “‘as a glove fits a hand’ Phrenology followed as the
night follows the day” (Fodor, 1983: 23).
4 Because of the last point, Gall’s method was to find correlations between
psychological traits and tissue mass (via bumps on the skull) across people, and
for each faculty there is one location in the brain where it is located for every¬
body. Fodor’s methodology, on the other hand, is general science (we see to
what degree features 1-9 are present for a given cognitive capacity), and there
is no presumption that the hardware is located in the same place for all people.
We can summarize some of the important differences as follows:
Notes
1 This is not to claim that physical causation is free from mystery, only that DCTM
allows us to have one mystery rather than two.
2 Recall that valid deductive inferences from, e.g., P to Q^have the property that if P
is true, Q^must be true too - i.e. validity preserves truth.
3 These serve to convert outside energy into a format usable by the mind/brain, and
we will ignore them in what follows.
Study questions
W'hat are the two main theses of the representational theory of mind.^
How is the CTM related to the RTM (i.e. how do we get the CTM from the
RTM)?
What are the three basic features of the language of thought (LOT)?
What is the argument from the systematicity of thought for the LOT?
What is the argument from the systematicity of inference for the LOl'?
State and assess the identity theory (physicalism, central state materialism)
as an answer to the M-BP (hint: there are two versions of this, type and
token).
W hat is (Turing) machine functionalism and what advantages does it have over
(plain) functionalism?
What two ways are there of taking conceptual role and which way does the
DCTM favor?
What would be the conceptual role of, e.g., “and” vs. “or”?
What is meta-consciousness?
What are the three unitary cognitive architectures based on the three machine
architectures?
Suggested reading
General
There are a number of recent survey books, articles, and chapters on the DCTM. Block
(1990) is perhaps the most authoritative article-length presentation; see also Block and
Segal (1998). Pylyshyn (1984) is much more for specialists and Pylyshyn (1989) sum¬
marizes some of this book. Crane (1995) is a very readable general discussion. Glymour
(1992), chapter 13, von Eckardt (1993), chapter 3, Kim (1996), chapter 4, Jacob (1997)
chapter 5, Rey (1997), chapter 8, Cooney (2000), part V, all introduce and discuss some
version of the DCTM under a variety of different labels.
Fodor (1975) initially formulated the LOT hypothesis and surveyed much of its
empirical support and many of its consequences for cognitive science. Fodor (1987)
and Fodor and Pylyshyn (1988) discuss productivity and systematicity arguments for
LOT. A highly readable survey of issues surrounding LOT can be found in .Maloney
(1989).
The Digital Computational Theory of Mind 223
The Fodor (1981a) article is a readable introduction to the mind-body problem in the
context of cognitive science. There are numerous good recent textbook discussions of
the mind-body problem and the major theories of mind that attempt to answer it. See,
for instance, Churchland (1988), chapters 2-5, Kim (1996), chapters 1-5, Braddon-
Mitchell and Jackson (1996), Rev (1997), Goldberg and Pessin (1997), chapter 2, and
Armstrong (1999). For more on supervenience see the authoritative short article by
Kim (1994), then look at Chalmers (1996b), chapter 2, and Kim (1993). Kim (1996),
chapter 4, is a good introduction to (Turing) machine functionalism.
A more complete list of reading follows the next part of our discussion of conscious¬
ness in chapter 9. However, an excellent general survey of topics in consciousness can
be found in Giizeldere (1997). Dennett (1991), chapter 3, and Chalmers (1995b),
(1996b), chapter 1, are excellent introductions to the philosophical challenges of con¬
sciousness. Meta-consciousness, under the title of “higher-order” and “internal mon¬
itoring” theories of consciousness, is developed in some detail by Rosenthal (1986,
1997) and Lycan (1990). It is critically discussed by Dretske (1993, 1995: ch. 4). Block
(1995) elaborates the distinction between “access” consciousness and “phenomenal”
consciousness.
Unitary
Block and Fodor (1972), especially sections II and III, argue against Turing machine
architectures for cognition, and for a more general computational conception. Newell
and Simon (1972) - see also Newell 1973 - is the classic study of production system
architectures for cognition. See Marr (1977) for critical aspects of production systems.
.More recent cognitive studies ba.sed on production systems can be found in Anderson’s
ACT system (1983), elaborated in (1993), and Klahr et al. (1987). See Laird et al. (1987)
for an early exposition of the PS-inspired Soar architecture, and Newell (1990)
for a book-length development. Newell et al. (1989) compare Anderson’s ACT and
224 The Digital Computational Theory of Mind
Soar architectures. See Lehman et al. (1998) for an introduction to Soar as a cognitive
architecture.
Modular
Fodor (1983) sets out and defends a modular architecture; Fodor (1985) is a summary
of that w'ork (with commentary and replies by Fodor); and Fodor (1986, 1989) further
elaborates on the modularity theme. Harnish and Farmer (1984), Bever (1992), and
Segal (1996) distinguish various forms of modularity. Bever (1992), Harnish (1995),
and Atlas (1997) are critical of various aspects of Fodor’s modularity doctrine. Garfield
(1987) contains a collection of articles for and against modularity and Hirschfeld and
Gelman (1994) explore the idea that central systems exhibit domain specificity, an
important feature of input systems, as does Karmiloff-Smith (1992) - see Fodor (1998)
for discussion.
9
Criticisms of the Digital
Computational Theory of Mind
We have seen that the DCTM can be stated fairly precisely, it constitutes a
coherent answer to the venerable mind-body problem, and it has been an
influential and fruitful framework for empirical investigation into cognitive
capacities. What more could one want? Well, for starters, is it true? A popular
conception of whether or not a system “can think” is the famous “Turing
test.” Reread the passage from chapter 8. These passages have some notable
features. Ftrst^ although the word “intelligence” occurs in the title of his
paper, Turing carries on the discussion in terms of “thinking,” and this
raises questions concerning the relationship between such notions as:
intelligence, thinking, cognition, mental, etc., which Turing does not explore.
Notice that we are inclined to say that a machine literally adds or plays
chess (maybe even intelligently?), but we are not so happy with saying
that such a machine therefore thinks, cogitates, or has a mental life. Second,
notice how complicated the imitation game is, with different genders and
assistants trying to help or fool the interrogator (we left this part out of the
quotation). Why not just put a variety of people in front of a “teleprinter,”
and give them some reasonable amount of time to decide which of two com¬
municators is a computer? (Although Turing appears to sanction a simpli¬
fied version of the game in section 6, reply 4, where the interrogator
is supposed to decide if a sonnet comes from a person or a machine.)
Since this more simplified set-up has come down to us as the “Turing test” we
will use Turing’s label “imitation game” for his more complicated set-up.
Third, notice how the interrogator’s questions are directed at discerning
which is the man and which the woman, not which is the computer and
which the human. Fourth, notice that Turing does not explicitly say what the
interrogator is entitled to conclude when the computer wins the imitation
game: It thinks? It’s good at imitating thinking? Or something else? What can
226 The Digital Computational Theory of Mind
The Position
1 Intentionality in human beings (and animals) is a product of causal
features of the brain . . . certain brain processes are sufficient for
intentionality.
2 The main argument of this paper is directed at establishing the follow¬
ing claim: Instantiating a computer program is never by itself a sufficient
condition of intentionality.
In the body of the paper, however, the target Searle sets up is “strong AI” vs.
“weak AI”:
Note that here we have “cognitive state,” not “intentionality.” And when
Searle constructs his example, it is actually a counterexample to a fairly
specific proposal; the claim that a machine that runs Schank’s story com¬
prehension program: “1. can literally be said to understand the story and
provide the answers to questions, and 2. that what the machine and its
program do explains the human ability to understand the story and
answer questions about it” (1980: 417). The central theses of this part of the
paper are:
A methodological principle
The counterexample
Searle imagines himself locked in a room and given batches of Chinese writing.
Since Searle knows no Chinese he does not know that there are really four dif¬
ferent groups of sentences here. One is the story, the second is a script for inter¬
preting the story, third are some questions about the story, and the fourth is a
group of answers to the questions. Finally, Searle is given a set of rules in
English, the “program,” for correlating the answers with the questions. It is
unclear if (or how) the inhabitant of the room is to apply the script to the
stories in answering the questions. If not, the Chinese room does not parallel
Schank’s program. So we will assume it does. By comparison, Searle is given
the same things in English, and from the outside there is no significant differ¬
ence in his performance; his answers in Chinese are as reliable as his answers
in English, as judged by native speakers. Searle goes on to conclude: “In the
Chinese case unlike the English case, I produce the answers by manipulating
uninterpreted formal symbols. As far as the Chinese is concerned, I simply
behave like a computer; I perform computational operations on formally spec¬
ified elements. For the purposes of the Chinese, I am simply an instantiation
of the computer program. Now the claims made by strong AI are that the pro¬
grammed computer understands the stories and that the program in some
sense explains human understanding” (1980: 418).
1 As regards the first claim, it seems to me quite obvious in the example that
I do not understand a word of the Chinese stories. . . .
2 As regards the second claim, that the program explains human understand¬
ing, we can see that the computer and its program do not provide sufficient
conditions of understanding since the computer and the program are
functioning, and there is no understanding. But does it even provide a
necessary condition or a significant contribution to understanding? . . . not
the slightest reason has been given to suppose that they are necessary condi¬
tions or even that they make a significant contribution to understanding.
(1980; 418)
What would it take to give a machine those properties that make us capable of
having intentional states? At one point Searle answers: “Only something that
has the same causal powers as brains could have intentionality” (1980: 423).
According to Searle, no purely formal model will ever be sufficient for inten¬
tionality because the formal properties themselves have no causal power except
when instantiated in a machine to produce the next stage of the program. The
causal powers Searle thinks are required for intentional phenomena go far
beyond just moving the machine into its next state. But Searle is careful to
acknowledge that some other physical or chemical process could produce these
intentional effects - it’s an empirical question. So really, contrary to the above
quote, Searle is only requiring that to duplicate the intentional capacities of
the brain (perception, action, understanding, learning, etc.), one need only
duplicate causally relevant powers sufficient for those effects: “If you can exactly
duplicate the causes you could duplicate the effects” (1980; 422). Since brains
have all sorts of causal powers not directly relevant to intentional phenomena,
such as life-sustaining activity (which can be replaced if damaged by life-
support machinery), one need not duplicate the whole causal potential of a
human brain to duplicate human intentionality. The empirical question is:
exactly what powers of the brain are causally relevant to intentionality? Searle
does not know; nobody at present knows.
Searle ends his article with a discussion of the question; why have
researchers thought that when it comes to cognition, but not, say, meteorol¬
ogy, simulation is duplication} - a computer simulation of digestion will not
make a pizza disappear. His answer is that first, they have taken too seriously
the analogy: the mind is to the brain as a program is to hardware (1980: 423).
The analogy breaks down at the two points we have already rehearsed: it leaves
out the causal powers of the brain sufficient for intentional states and proces.ses
Criticisms of the Digital Computational Theory 231
(and not just sufficient for state transition); and it leaves out the intentional
content of mental phenomena. Second, researchers have been seduced by the
notion of “information processing.” Defenders of strong AI argue that since
a computer does information processing and humans do information process¬
ing, humans and computers share an information-processing level of descrip¬
tion — the algorithmic level. But, says Searle, “In the sense in which people
‘process information’ . . . the programmed computer does not do ‘information
processing.’ Rather, what it does is manipulate formal symbols . . . its . . .
symbols don’t have any meaning as far as the computer is concerned” (1980:
423). Third, the widespread acceptance of the Turing test has left a residual
hehavtorism in strong AI, i.e. that passing the Turing test is sufficient in order
to have a machine that thinks. But the Chinese room purports to be a coun¬
terexample to that. Fourth, there is a residual dualism in strong AI in that: “The
project is to reproduce and explain the mental by designing programs, but
unless the mind is not only conceptually but empirically independent of the
brain you couldn’t carry out the project. . . . This form of dualism . . . insists
. . . that what is specifically mental about the mind has no intrinsic connection
with actual properties of the brain” (1980: 423-4).
VVe now want to see how the Chinese room argument holds up against the
DCTM as we have formulated it. We will organize our evaluation along the
two main dimensions of the DCTM: the computational relation that indicates
the type of cognitive state or process (e.g. belief, reasoning) and the representa¬
tional content of that state or process.
Systems reply
Chinese, and a fortiori neither does the system, because there isn’t anything in
the system which isn’t in him. If he doesn’t understand, then there ts no way the
system could understand because the system is just a part of him'’ (1980; emphasis
added).
But there are problems with this reply. First,, it does not follow that because
Searle does not understand Chinese no part of him does. Maybe he is in the
awkward position of split-brain patients in having a “part of’’ them able to do
things, and actually do things, that they, the patients, are unaware of doing
except by observing their own behavior. Such patients even typically deny
being able to perceive and understand things that their disassociated hemi¬
sphere can perceive and understand. This illustrates that general principles of
inference from whole to part, or from part two whole, are fallacious: water
molecules are not wet and are not a liquid, but the whole (glass of water) of
which they are a part is. A crowd can be numerous, but none of the persons
in it are. If Searle does not think the “system” understands Chinese, it has to
be for some other reason than this. Second, Searle remarks how different the
Chinese “subsystem” is from the English “subsystem,” saying: “The English
subsystem knows that ‘hamburgers’ refers to hamburgers, the Chinese sub¬
system knows only that ‘squiggle-squiggle’ is followed by ‘squoggle-squoggle’ ”
(1980). But, we may ask, how can he help himself to this knowledge without
the assumption that what the subsystem knows must be accessible to him intro-
spectively, rather than behaviorally.^ Is Searle assuming here, as the earlier (P
of I) suggests, that mental states must in principle be available to conscious¬
ness.^ Third, since Searle’s Chinese room occupant is a person, and this person
has all the above information represented in him and still “neither the biology
nor the program, alone or together, are sufficient” (Rey, 1986: 173), it would
seem that the only thing left that could have gone wrong is the way the infor¬
mation is “programmed.” Perhaps memorizing the program is not necessarily
“appropriately programming” the machine. It is more or less true that von
Neumann machines are programmed by putting instructions and data struc¬
tures in memory (and so “memorizing” them), but as Searle characterizes
strong AI, it is not required that the program be running on a von Neumann
machine - though undoubtedly these were the types of machines practition¬
ers of strong AI in fact had in mind. We will see later that what counts as
programming a “connectionist” machine is quite different.
Robot reply
Searle may have shown that algorithmic relations are not sufficient to induce
understanding or intentionality — in the sense of being about actual things in
Criticisms oj the Digital Computational Theory 233
More recently Searle (1991) has restated his Chinese room argument explic¬
itly in the following form;
He then extends this argument with the following “axiom” and draws three
more “conclusions” from them:
The Churchlands (1991) reply with their “luminous room” analogy: suppose
that someone tries to test the hypothesis that light is electromagnetic radiation
by shaking a magnet in a dark room (shaking a magnet to make light is like
running a program to understand Chinese). This person does not see any light,
so they conclude that the hypothesis is false:
Just as this would be bad physics, so following Searle’s advice on the Chinese
room would be bad artificial intelligence. According to the Churchlands, axiom
3 begs the question; it doesn’t show anything about the nature of light - we
need a research program for light to do that. This carries over to Searle’s
Chinese room argument: the luminous room argument has the same form as
the Chinese room argument, the luminous room argument begs the question
at axiom 3, so the Chinese room argument begs the question at axiom 3.
Searle replies to the Churchlands’ “luminous room” objection like this: the
analogy breaks down: light is caused by electromagnetic radiation, but symbols
themselves have no relevant causal properties. And they have no intrinsic
semantics - they can be interpreted as Chinese, chess, the stock market, etc.
He poses this as a dilemma for the Churchlands: either syntax is formal (i.e.
has to do with form, shape, or structure) or it is not formal. If it is just formal,
then it has no causal powers and the analogy breaks down. If it is not just formal
then something else must be doing the causal work, i.e. the hardware - the
physics does the work, not the program.
But strong AI claims that the program is sufficient for a mind, so this is not
strong AI. It is easy to see that the strength of Searle’s reply depends on the
strength of the disanalogy, and that in turn depends on the strength of Searle’s
reasons for thinking the “syntax” has no causal powers. Chalmers (1996a: 327)
parodies this aspect of the argument:
We turn to this issue shortly, and leave the argument here officially as a
standoff.
The score
The systems reply raises the question of the role of conscious experience in
cognition and thought, and the robot reply raises the question of the seman¬
tics (intentionality) of cognition. VV^ need to explore both of these issues
further, first consciousness, then content.
236 The Digital Computational Theory of Mind
Consciousness-of
Phenomenal consciousness
One idea is that what we left out of our earlier discussion of consciousness,
and what is crucial to interesting consciousness, is the look, feel, taste, smell,
sound, etc., of things, or as we will say, the phenomenology, qualitative charac¬
ter, or experiential aspects of these states, the what it is like to be in those states.
Some authors emphasize this notion: “we can say that a mental state is con-
Criticisms oj the Digital Computational Theory 237
Different
Nagel (1974) reminds us how different the sensory system of some other
species, a bat for example, is from ours. The bat has specialized circuits in its
brain for sending out high-pitched shrieks and then echolocating objects that
reflect those waves. Using this technology, the bat is able to make a whole
variety of precise discriminations regarding, say, size, texture, and motion -
crucial for hunting. It is not too misleading to say the bat “sees” with its ears.
Nagel then invites us to ask ourselves: what would it be like to be a bat and
detect (see? hear?) a mouse via echolocation? It is very difficult - it is pro¬
bably a totally different and perhaps unimaginable experience for us. Like us,
the bat picks up information about mice (and perhaps is even aware of its
awareness). What is so different (or what we cannot imagine) is the experien¬
tial character that goes with being a bat - bat qualia.
Missing: color
You might have a hypothetical person who could not see red, but who under¬
stood the physical theory of color and could apprehend the proposition “This
has the color with the greatest wave-length, ” but who would not be able to
understand the proposition “This is red” as understood by the normal un¬
educated person.
(Russell, 1918; 55)
Or consider Jackson’s (1986) case of Mary, the color vision expert, who knows
all there is to know scientifically about color vision, but is herself color blind
(or raised in a black and white environment). She then undergoes an operation
238 The Digital Computational Theory of Aiind
that allows her to see color (or she’s removed from the black and white envi¬
ronment). Jackson contends that she now knows something about color, that
she didn’t know before - its qualitative character. Or, put more metaphysically
and not epistemologically, there are now instances of something that were
missing before - experiences by Mary of color. Something has been added to
the world.
Missing: hlindsight
Full-blown consciousness
(3) that awareness (with that sensory qualitative character) will itself be the
object of awareness — it will be aware of its awareness. This meta-awareness can
then be the basis of reporting one’s conscious states and controlling behavior
in general.
In another typical kind of case, however, one is in a non-sensory mental state
- such as solving a mathematical problem or deciding w here to go to graduate
school. Still, in these cases too there seems to be something distinctive that it
is like to be in these states - there is a cognitive kind of qualitative character:
cognitive qualia. So here are two typical consciousness scenarios:
Recall (see chapter 8) that propositional attitudes were divided into a repre¬
sentational content (proposition) and an attitude. Some have suggested that the
contents of cognitive states are always in a natural language, or pictorial image
code. If this is right, then thinking these contents would be like having a
memory of a sensory qualia. And on the side of the attitude, there does seem
to be something it is like to reason, decide, desire, and even to believe.
In sum, we can say that full-blown consciousness involves being in a mental
state with either sensory or cognitive qualia (what it is like to have that experi¬
ence) and one is aware of being in a state with that qualitative character.
Second, to get the computer to be aware of its own internal states it seems
that we need only program it appropriately. It is qualitative (sensory and cog¬
nitive) consciousness that is the problem. The conclusion that philosophers
such as Nagel and Jackson drew from their thought experiments on bats and
color-blind vision specialists is that experiential consciousness, the qualitative,
phenomenal character, is not a physical feature of the world, and therefore that
physicalism (materialism) is incomplete. We have left that metaphysical ques¬
tion open. What these considerations do point to for cognitive science, however,
is what Levine (1983) calls “the explanatory gap,” and what Chalmers (1996b)
calls “the hard problem.”
The idea here is that there is an explanatory gap between what we can say about
the nervous system in physiological (physical) terms, and how we can explain
conscious phenomena. It would seem that all the nerve firings and hormone
secretions in the world will not add up to the visual experience of red, the taste
of pineapple, or the smell of a rose. There is a gap in our ability to connect
specific brain activity with specific conscious experiences, and unless we can
bridge this gap, we cannot explain the experiences using just physiological
notions.
Summary
The first position seems too strong. Consider so-called “dispositional beliefs,”
beliefs one has in the sense of being disposed to accept or express them upon
having them come to mind. For instance, consider the belief that zebras do not
wear rain coats in the wild (Dennett). Didn’t you believe this even before reading
this sentence called it to consciousness? Or consider “standing beliefs”; we don’t
quit believing what we believe when we are asleep (think of your beliefs about
yourself such as your name, phone number, etc.), yet we are not then conscious
of them. If so, again, these cognitive states need not be conscious to exist. And,
as we have already seen, the phenomena of “automatic pilot,” blindsight, split
brain, and dichotic listening suggest cognition without consciousness.
The second position is the official position in much of AI, cognitive science,
and perhaps cognitive psychology. It has been challenged recently by Searle
(1990b, 1992: ch. 7). He calls (3) “the connection principle,” and he reasons
for it like this:
The setup
a way they represent the world as being (see chapter 8 again). A person
may believe that the star in the sky over there is Venus without believing
it is the Morning Star, or without believing that it is the Evening Star,
even though these are the same star. A person may want to drink some
water without wanting to drink some H O, even though the water is H O.
2 2
2 There are only two ways in which these aspects can be characterized if
they are not characterized in terms of introspectively available conscious
aspectual shapes (ways of representing): behavior and neurophysiology.
3 These “aspectual features” cannot be exhaustively or completely charac¬
terized solely in terms of behavioral, or neurophysiological notions. None
of these types of notion is sufficient to give an exhaustive account of aspec¬
tual shape - of what makes something water vs. H O, or what makes some¬
2
thing the Morning Star vs. the Evening star (here Searle refers approvingly
to the Nagel (1974), and Jackson (1986) articles discussed earlier).
The argument
VVe can pose two questions at this point concerning the argument. First, at step
2 Searle seems to have left out of consideration the option we played up in
chapter 7, the data structures used in computer representations of the world.
They are neither just “physiological” nor are they behavioral, and they are, on
Searle’s ow n grounds, not experiential - so how can the argument ignore them?
Second, and relatedly, at step 3 Searle uses the “qualia” results of Nagel and
Jackson to argue against the possibility of neurological or behavioral analyses of
aspectual shapes - the way the world is represented to be. But many theoreticians
believe that not all “aspectual shapes” or ways of representing something, have
a distinctive qualitative character. They contend that believing that something
is H () may have no distinctive qualitative aspect or that believing that Venus is
2
244 The Digital Computational Theory of Mind
the Morning Star may have no distinctive qualitative aspect. One could associ¬
ate each of these with the taste of eggplant and it would not matter one whit as
to what they represent. This issue is related to the first in that according to the
DCTM, the missing data structures are ju.st what would be used to represent
non-qualitative aspects of thought. So Searle owes us an argument against this
possibility. He may think he has one and we will return to this issue.
We saw in the previous chapter that the “official” position of the DCTM
regarding content is some sort of “conceptual role” (CR) theory. Generic
conceptual role theory, as we saw, says that:
(G-CR)
The content of a representation is determined by the role the representa¬
tion plays in the conceptual system of which it is a part.
One gets different CR theories by spelling out the nature of the representa¬
tions involved, the nature of the roles they play, and the nature of the con¬
ceptual system they are a part of The version we settled on earlier identified
CR with the deductive inferential role, which was then combined with the lan¬
guage of thought (LOT) hypothesis. And the fact that inferences are defined
first and basically over thoughts (“sentences” in the LOT) means that the CR
of concepts (“words” and “phrases” in the LOT) is given in terms of their
contribution to the CR of the thoughts they occur in. Our resulting theory of
content for the DCTM was:
(DCTM-CR)
1 If “R” is a sentential representation (thought) in the LOT, then its content
is determined by the fact that “R” participates in the (valid) inferential
relations:
(P) From “R” one may infer . . . [“R” serves as a premise]
(C) From . . . one may infer “R” [“R” serves as a conclusion]
2 The specific inference relation (P), (C) associated with “R” give the
specific content of R.
3 If “R” is a subsentential representation (concept) in the LOT, then its
content is the contribution it makes to the content (inferences) of all the
sentences it occurs in.
Criticisms of the Digital Computational Theory 245
These are both quite different from our sample inference: from “P and Q]’
infer “P,” in that “animal” is not a constituent of “cat” in the way “P” is a
constituent of “P and Q.” Furthermore, the truth-rule for “and” guarantees
the validity (truth preservingness) of the inference from “and.” What guaran¬
tees the inference (CA).^ Is it just a fact about the concept cat, or is it (merely)
a truth about cats.^ Contrast (CA) with (CM): the problem is justifying letting
(CA) contribute to determining the content of “cat,” but not letting (CM) so
contribute. Or put it this way, idealist philosophers think that there is no
physical matter and that everything is mental; panpsychists think that there are
physical things, but that everything has a mind (and solipsists think they them¬
selves are the only “things” that exist!). When one of these theoreticians thinks:
there goes a cat or there is a good Merlot, do their thoughts have the same content
as ours (supposing we are not idealists, panpsychists or solipsists).^ The panpsy¬
chists seem to make inferences we would not make {that’s a good merlot There
IS a mind), and the idealist does not make inferences we would make {there goes
a cat There goes a physical object). Do these differences in inference consti¬
tute a difference in content or not? This is called the problem of “analyticity”
— the problem of justifying the inclusion of a particular inference in the “analy¬
sis” of the content.
(2) Relativism If (DCTM-CR) is right, then content is relative to the
system of representation (LOT). This raises the question how we can compare
contents across people. How could different people agree or disagree about a
given thought? Wouldn’t they have to have the same inferential relations, and
how would that be possible? Well, if the LOT was the same for all people, then
it might be pos.sible. But how could this be if people develop their psycholo¬
gies from such different experiences? One answer (see F'odor, 1975) is that the
LOT is innate, and so shared by all people. That helps to get us out of the
present problem, but it does put a heavy burden on genetics.
(3) Holism What is to keep the inferences from spreading to the whole
system, so that the content of two people’s thoughts would be the same only
if they shared every other thought} This would be so unlikely that we might not
have any psychological principles to state at the level of content.
246 The Digital Computational Theory of Mind
The issue of truth raises the issue of reference, of aboutness, because a thought
is true only if the relevant parts of the world actually are the way that thought
represents them to be. And it is the aboutness or reference relations that deter¬
mine these relevant parts. Are these relations to things in the world essential
to thought contents, or are they outside thought contents?
Adapting Putnam’s (1975b) terminology we will call a thought narrow if it
does not presuppose the existence of anything outside the thought itself, and
we will call a thought wide if it does. To take an uncontroversial example, at
the level of the attitudes proper, believing that bachelors are unhappy is narrow
because nothing is presupposed about the truth or reference of: bachelors are
unhappy. One can believe it whether or not it is true. But one cannot know (or
realize, recognize, etc.) that bachelors are unhappy unless it is true that they
are unhappy - these attitudes are wide.
DCTM
How about thought contents? Is there any reason for going “outside the
head” for such contents - is there any reason to suppose that thought contents
can be wide, or as we will also say, is there any reason to suppose that thought
contents can be external to the system itself? Putnam (1975b, 1981) offers a
series of interesting examples which are supposed to convince us that this is
true, that at least some contents are wide and external.
- on Twin Earth. . . . One of the peculiarities of Twin Earth is that the liquid
called ‘water’ is not H O but a different liquid whose chemical formula is very
2
long and complicated. I shall abbreviate this chemical formula simply as XYZ.
I shall suppose that XYZ is indistinguishable from water at normal tempera¬
tures and pressures ... on Twin Earth the word ‘water’ means XYZ ... on
Earth the world ‘water’ means H O ... in the sense in which it is used on Twin
2
Earth . . . what we call ‘water’ simply isn’t water; while . . . what the Twin
Earthians call ‘water’ simply isn’t water” (1975b: 228-9).
Although Putnam here conducts his thought experiment with the word(s)
“water,” the point extends to thoughts: you on Earth can and do think often
of water (H O), whereas twin-you does not, twin-you thinks often of XYZ
2
(and even calls it “water”). But as we use the word “water” (and we are using
our word) the twin is not thinking of water (H O). 2
Conclusion
“Even a large and complex system of representations, both verbal and visual,
still does not have an intrinsic, built-in, magical connection with what it rep¬
resents — a connection independent of how it was caused and what disposi¬
tions of the speaker or thinker are” (ibid., 5). “One cannot refer to certain kinds
of things, e.g. trees, if one has no causal interaction at all with them, or with
things in terms of which they can be described” (1981: 17).
Insofar as our judgments comport with Putnam’s, the contents of such rep¬
resentations must be, at least in part, wide or external. And notice that natural
kind concepts, such as water, are particularly resistant to conceptual role analy¬
sis, because such inferences have little to do with the central aspect of their
“meaning” - their reference. As Putnam (1988) notes, the Greeks had a very
different conception of water, yet “water” is a perfectly good translation of
hydor because both “water” and hydor refer to the same stuff. And until Dalton
(and modern atomic chemistry) in the Renaissance, it was believed that the
property of liquidity was due to the presence of water - thus alcohol and
mercury flowed because of the presence of water. Clearly the concept of water
played a very different role, yet what we think about when we think “water”
thoughts is the same. And that is because conceptual role plays virtually no
role in determining aboutness.
Suppose these conclusions regarding content are correct - suppose that con¬
ceptual role does not exhaust mental representational content, and that the
Criticisms of the Digital Computational Theory 249
mental syntax: satisfies the “formality constraint” and does not involve
semantic properties.
narrow content: semantic properties that are “in the head.”
Wide content: semantic properties that involve relations to the world (truth
conditions).
(A) The laws of the mind that cognitive science endorses just are the com¬
putational principles used by computational mechanisms that apply to
these representations,
then w'e get three styles of cognitive theory, only the first two of which poten¬
tially satisfy the formality constraint on computation:
syntactic cognitive theory: the laws of the mind are sensitive only to non-
semantic (formal, syntactic) properties of representations.
narrow cognitive theory: the laws of the mind are sensitive to narrow (“in the
head”) semantic properties of representations.
wide cognitive theory: the laws of the mind are sensitive to wide semantic
properties of representations.
Anyone who accepts the CTM (with its “formality constraint”) must subscribe
to one of the first two approaches if they also subscribe to assumption [A].
This means that according to (DCTM), mental states and processes are not
sensitive to wide content - wide content is divorced from computation, and
this can lead to potential problems.
At the very end of his Chinese room paper, Searle comments: “Of course the
brain is a digital computer. Since everything is a digital computer, brains are
too” (1980: 424). This begins the second stage of the attack on the DCTM in
general (cognition is a kind of computation), and “cognitivism” in particular
- the view that in some non-trivial sense, the brain is a digital computer. These
are distinct theses, since the brain might be a digital computer, but thinking
might not be computation (Searle’s view). Searle (1992: ch. 9) begins by posing
“such absolutely fundamental questions as, what exactly is a digital computer.?
What exactly is a symbol? What exactly is an algorithm? What exactly is a com¬
putational process? Under what physical conditions exactly are two systems
implementing the same program?” (1992: 205). Since “there is no universal
agreement on the fundamental questions” (1992: 205), Searle goes back to
Turing and gives a brief informal description of a Turing machine. He then
adds “that is the standard definition of computation” (1992: 206). However,
there is no “standard definition of a computer” — which is why Searle was
unable to report one. There are standard definitions of a Turing machine, a
Turing machine computation, and Turing machine computable functions.
There are also many specific definitions relating Turing machines and other
types of computers, definitions of what it is for each of these to perform a
computation, and proofs regarding their computational powers (see Suggested
reading for chapter 6). For instance, a machine with an architecture different
from a Turing machine might still be shown to be weakly equivalent to a Turing
machine in the sense that it will compute every function a Turing machine will
compute. But there is no generally accepted definition of a “computer” or of
a “computation” that can be used in cognitive science.
It is not clear how' or why Searle thinks the class of computers is defined syn¬
tactically in terms of the assignment of Os and Is (one of Babbage’s machines
computed in decimal, as did the original ENIAC in decimal). Since Os and
Is are simply a convenient coding device for a wide variety of symbols (letters,
numerals, graphics) we should modify Searle’s point to the assignment of
symbols: the class of computers is defined syntactically in terms of the assign¬
ment of symbols. Still, a computer requires more than symbols - it requires at
least memory, control, and certain capacities to process - a computer, after all,
computes. It is not just an algorithm - a computer will run a program which
expresses an algorithm. So, following our earlier discussion, we will assume these
are all included in Searle’s notion of an actual computer: a computer is at least
a device which has a memory, control, and manipulates symbols.
Searle turns next to the issue of what one will find if one opens up a par¬
ticular physical computer: “If you open up your home computer, you are most
unlikely to find any Os and Is or even a tape” (1992: 206). Furthermore, there
is a variety of stuff the computer might be made of: cogs and levers, hydraulics.
252 The Digital Computational Theory of Mind
silicon, neurons, cats and mice and cheese, pigeons, etc. In sum, according to
cognitivism: “We could make a system that does just what the brain does out
of pretty much anything” (1992: 207). This is so-called “multiple realizabil¬
ity,” and according to cognitivism, just as a carburetor could be made out of
brass or steel, a given program can be run on a wide variety of such hardware.
Searle demurs: “Hut there is a difference: The class of carburetors and ther¬
mostats are defined in terms of the production of certain physical effects. That
is why, for instance, nobody says you can make carburetors out of pigeons. But
the class of computers is defined syntactically in terms of the assignment of Os
and Is. The multiple realizability is a consequence not of the fact that the same
physical effect can be achieved in different physical substances, but that the
relevant properties are purely syntactical. The physics is irrelevant except in
so far as it admits of the assignment of Os and Is fi.e. symbols] and of state
transitions between them” (1992: 207).
Presumably the “relevant properties” referred to in this passage are those
properties for being a (digital) computer. So it will be crucial that Searle be
able to maintain this asymmetry between carburetors and computers - that
multiple realizability for computers is not, as it is for carburetors, just a con¬
sequence of the principle that if you can duplicate the relevant causes, you can
duplicate the relevant effects. We will maintain that, properly understood,
computers are just like carburetors in this respect.
Before going on we must distinguish the computer qua design from the com¬
puter qua physical object that conforms to the design - the computer type from
the computer token. Given a computer design (type), something, X, will be a
token of it if and only if X instantiates that design. It will be a part of the
design, however, that its physics moves it from (symbolic) state to (symbolic)
state. The “syntax” is just a place holder at the design level for physical prop¬
erties and relations at the instantiation level. For convenience let’s call Searle’s
conception of computers and computation quoted above the “syntactic defini¬
tion” of computers and computation. So far it is central to this conception
that “the class of computers is defined syntactically in terms of the assignment
of Os and Is.”
Searle draws out two further consequences of the syntactic definition of com¬
puters and computation:
digital computer because any object whatever could have syntactical ascriptions
made of it. You could describe anything in terms of Os and Is (1992: 207-8).
More specifically (i) for any object there is some description of that object such
that under that description the object is a digital computer (1992: 208) (every¬
thing is a computer), (ii) For any object there is some program such that the
object is implementing the program — the wall behind my back is now imple¬
menting WordStar because there is some pattern of molecule movements iso¬
morphic with the formal structure of WordStar (1992: 208-9) (every object has
some program running on it).
(2) Syntax ts observer-relative The ascription of syntactical properties is
always relative to an agent or observer who treats certain physical phenomena
as syntactical. (1992: 208)
According to Searle these point have some serious consequences for cogni¬
tivism: “This could be disastrous because we wanted to know if there was not
some sense in which brains were intrinsically digital computers', is there a fact of
the matter about brains that would make them digital computers? It does not
answer that question to be told, yes, brains are digital computers because every¬
thing is a digital computer” (1992: 208). “Proponents do not see universal real¬
izability as a problem because they do not see it as a consequence of the deeper
point that “syntax” is not the name of a physical feature, like mass or gravity.
Syntax is essentially an observer relative notion” (1990b: 27; 1992: 209; empha¬
sis added). “The same point can be stated without the notion of “syntax.” A
physical state of a system is a computational state relative to the assignment to
that state of some computational role, function, or interpretation. The same
problem arises without Os and Is because notions such as computation, algo¬
rithm, and program do not name intrinsic physical features of systems. Com¬
putational state are not discovered within the physics, they are assigned to the
physics (1990b: 27; 1992: 201; emphasis added). Clearly Searle views these
observations as counting against cognitivism, but it is not clear exactly what
the argument is supposed to be. We will try to state the argument(s) more
precisely.
Argument 1
6
(a) So brains are not intrinsically digital computers.
(b) So brains are not digital computers.
Notice the difference between the two versions of the conclusion. By this
argument, a brain is not intrinsically a computer because being a computer is
observer-relative. But of course then neither is my/your PC intrinsically a com¬
puter. Since my/your PC is a computer, we want to know: (Ql) why is being
intrinsically a computer so important - why isn’t it enough for a brain just to
be a computer.? (Q2) What exactly is it to be intrinsically a computer.?
The conclusion only needs to be made explicit to realize that something has
gone wrong - what? Here are some possibilities.
(1) (Al) Searle may have overestimated the ease with which a physical
object can be described as running (a segment of) a program (steps 1 and 2).
Just because the object can be put in a 1-1 correspondence wdth a time slice
of a program run, doesn’t entail that it is running that program. Running
a program requires that numerous counterfactuals be true, counterfactuals
of the form: if the program had been given such-and-such, it would have
computed so-and-so; that is, if it had been given a 2 and a 4 it would have
printed “No” or if it had been given “Control F6” it would have printed in
boldface.
(2) (A2, A3) Searle may have also overestimated the closeness of the
connection between computing and computers. One might give Searle his
rather liberal notion of computation, but claim that to be a computer requires
more. As we said earlier, it may require an architecture, memory, and control.
It may be a part of our conception of a computer that it be capable of com¬
puting, not that it be computing.
(3) It is open to the DCTM to claim that the mind/brain is a
particular kind of computer, not just “a computer.” Consider the claim
that the mind/brain is a von Neumann machine. Showing that one can
map a segment of a WordStar run onto a time slice of the molecular move¬
ments of the wall is not sufficient to show that the wall is a von Neumann
machine.
(4) Searle’s position is that there is no fact of the matter that a PC is a von
Neumann machine. Why? Because it can be described as another kind of
machine. So what - a computer could be described as a door stop, a knife as a
paper weight. The challenge (not the refutation) Searle offers DCTM is making
sense of the idea that some descriptions, in particular, some computational
descriptions, of systems are better than others. What could “better” amount
to? Among other things these descriptions would be more accurate, allow better
explanations and predictions of the machine’s behavior. Intuitively, describing
a PC as a von Neumann machine running WordStar, to continue Searle’s
example, is more accurate and allows for better explanations and predications
of its behavior than describing a wall as running WordStar. The challenge is
to say exactly why.
(5) Semantics: remember, there is at least one serious difference between
the DCTM and strong AI - the DCTM gave cognitive states semantics,
representational powers. Strong AI is a thesis about mentality and pro¬
grams. On Searle’s view, a program is formal, syntactic, non-semantic, non-
representational. So even if Searle could show that anything could be viewed
256 The Digital Computational Theory of Mind
Argument 2
Is the claim that the brain is a digital computer a piece of “natural science”?
Well, it is certainly supposed to be an empirical factual claim, but not all em¬
pirical factual claims are part of natural science. However, it is not clear that
Criticisms of the Digital Computational Theory 257
Searle needs this extra step, since presumable if the syntax is not “intrinsic”
(it is assigned, observer-relative, etc.) then it is not an empirical factual claim,
but rather a decision.
Argument 1
The problem is the inference from the second to the third step. It is not a
question of whether the program itself has causal powers, but whether the
programmed computer has causal powers. We saw from the compiler story
given earlier in chapter 6 that the programmed computer does have dis¬
tinctive causal powers. Because of this, Searle’s argument breaks down at
step 3. Searle needs the conclusion that a computer’s program state are
realized in but not caused by the programming. But this is not true. And if
step 3 is not true, then there is no asymmetry between programmed com¬
puters and brains.
258 The Digital Computational Theory oj Mtnd
Argument 2
We were given no argument for the claim that brains have intrinsic intention¬
ality in the Chinese room discussion, nor in this one. But our discussion of
wide content makes it clear that some argument is called for - recall Putnam’s
earlier discussion of “magical” theories of reference.
The machine — not just the hardware, but the programmed living machine —
is the organism we study.
(Newell and Simon, 1976)
a program into bit code in a von Neumann machine represent the configura¬
tion of flip-flops in the system, and since this is typically a difference in voltage,
they represent a specific, physical fact about the programmed machine. And
since this “syntax” is physical structure with causal consequences^ we do not just
“assign” it - physical structure is not “observer-relative.” It is, in a perfectly
normal sense, as intrinsic to the programmed machine as any structure is intrin¬
sic to matter. (Since Searle gives mass and gravitation as examples of “intrin¬
sic” physical properties both of which are relational in contemporary physics,
he must mean by “intrinsic” not observer-relative or dependent.)
It was a part of the doctrine of “strong AI” that programming is sufficient for
cognition, mentality, intelligence, etc., and that the only thing that mattered
about the hardware was that it had enough causal structure to implement the
program. Furthermore, it was characteristic of this work that it was done
on computers with von Neumann architecture - stored-program, register
machines. It might be argued that this is not essential, and that the style of
programming required for mind could require a radically different kind of
architecture.
But one can take a slightly different approach to this. One can argue that so
far the only minds we have are biologically based, and in fact are rather similar
in that biological basis. So if one is interested in how we work, how our
minds/brains work, and not just in how some possible mind might work, one
might feel it appropriate to evaluate DCTM as a model of us on hardware
grounds as well.
Not very much that is informative can be said in favor of DCTM hardware as
a model of the brain. For instance:
260 The Digital Computational Theory oj Mind
But the same could be said for a microwave oven or an airbag. If all that can
be said is that there is a level of generality at which DCTM hardware and
brains fall together, but this level of generality also captures microwave ovens
and airbags, we have not said anything interesting.
Components
Figure 9.1 Some diflerent types of neurons (from Anderson, 1995: 9, figure 1.2; repro¬
duced by permission of the MIT Press)
262 The Digital Computational Theory oj Alind
We have seen that the adequacy of the DCTM has been challenged on a
number of fronts, most seriously perhaps is its silence on issues of phenome¬
nal consciousness and wide content. In light of this we now have to face the
question: what exactly does the DCTM apply to?
Some states and processes we intuitively think of as “mental” are not naturally
thought of as digital computational. Here is an initial, and provisional, survey:
If these considerations are correct, then there are intuitively “mental” phe¬
nomena that are not (digital) computational and we will need to restrict the
domain of application of the DCTM to only a portion of our mental life. We
can organize many of the items on this list by noting that mental phenomena
in general can be divided into states and processes and further can be divided
into subcategories depending first on the nature of the phenomena, and second
on its history, as Fodor and Pylyshyn noted above. First, mental states and
processes can themselves be divided into two subcategories: experiential phe¬
nomena, and non-experiential phenomena. Experiential phenomena are char¬
acterized by their conscious, introspectively accessible experiential qualities,
and being in one of these states or processes entails having certain sorts of
sensory experiences. Non-experiential phenomena, on the other hand, have no
essential qualitative character. Of course, it could be that many mental states
Criticisms oj the Digital Computational Theory 265
Mental phenomena
--
Experiential Non-Experiential Cognitive History Non-cognitive history
Notes
1 The text suggests a related “multiple realizability argument for no syntax intrinsic
to the physics”;
Study questions
What did Turing think one could conclude if a machine wins the imitation
game?
What is the main objection to the Turing test? What do you think about it?
Against strong AI: the Chinese room and the luminous room
What does Searle intend to show with his Chinese room argument?
Consciousness
What is consciousness-of?
W'hat reasons are there for thinking that consciousness is not necessary for
cognition?
268 The Digital Computational Theory of Mind
Content
What is the Churchill ant trace example for wide content? Discuss.
What is the problem with the DCTM if content is at least partially wide?
Discuss.
How might it get the DCTM out of the problem of wide content?
Against cognitivism
How does Searle illustrate these worries (hint: the frog’s eye, a word
processor)?
W'hat are some similarities between digital hardware and neural hardware (aka
the brain)?
What are some differences between digital hardware and neural hardware (aka
the brain) structurally}
What are some differences between digital hardware and neural hardware (aka
the brain) functionally: organization and processing?
Give three examples of states or processes that we may call “mental,” but
which may not fall under the DCTM.
What feature of mental states did we conclude might make them candidates
for inclusion in the domain of the DCTM?
W hat feature of mental processes did we conclude might make them candi¬
dates for inclusion in the domain of the DCTM?
270 The Digital Computational Theory of Mind
Suggested reading
Turing test
The Turing test is discussed in Hofstadter (1981), and in some detail in Dennett (1985),
Block (1990), French (1990), and Copeland (1993b), chapter 3. Feigcnbaum and
Feldman (1963) is the classic original collection of readings relevant to the topic of
computers and intelligence, and Luger (1995) is a recent collection. On “massive adapt¬
ability” as an indication of intelligence see Copeland (1993b), chapter 4.6.
Searle’s Chinese room argument has been extensively discussed. A good place to start
is with the commentaries on the original article in Searle (1980) and Searle’s replies. A
discussion that relates the Chinese room to cognitive architecture is Chalmers (1992).
Both the Chinese room and the luminous room are discussed in Copeland (1993b),
chapters 6 and 10.7, and in Rey (1997), chapter 10.2. Chalmers (1996b), chapter 9.3,
argues for a version of strong Al, and chapter 9.4 discusses the Chinese room.
The number of works on consciousness has mushroomed in the last 15 years. There
are now many books and articles on the subject, as well as chapters of texts and mono¬
graphs (see chapter 8 again). For a concise, authoritative overview of issues surround¬
ing the nature of consciousness and its major theories see Block (1994). For a substantial
overview see Giizeldere (1997). Goldman (1993b) relates consciousness to broader
issues in cognitive science, and Dennett (1991), chapter 12, generally disparages qualia,
and the thought experiments that motivate it; Lormand (1994) replies.
For books {anthologies) from a scientific perspective see Marcel and Bisiach (1988),
Hameroff et al. (1996), and Cohen and Schooler (1996). For a philosophical perspec¬
tive see Block et al. (1997) and Shear (1997).
For hooks {monographs) from a philosophical perspective one might begin with Searle
(1992), Pdanagan (1992), Chalmers (1996b), and Seager (1999). For a more scientific
perspective one could start with Edelman (1989) or Crick (1994), then at Searle (1997)
for discussion.
Some chapters of current texts and monographs include Maloney (1989), chapter 7,
Kim (1996), chapter 7, Braddon-Mitchell and Jackson (1996), chapter 8, and Rey
(1997), chapter 11.
'I'he explanatory gap is introduced in Levine (1983) and the hard problem is intro¬
duced in Chalmers (1996b), and both are further discussed in .selections in Block
et al. (1997).
Searle’s connection principle was first introduced in Searle (199()b). It is repeated and
set in a more general context in vSearle (1992), chapter 7. Discussions of it can be found
in the commentaries to the (1990) article, in Davies (1995) and Rey (1997), chapter 9.6.
Criticisms of the Digital Computational Theory 271
Goldman (1993a) discusses the important and neglected issue of the qualitative feel
(what we called “cognitive qualia”) of propositional attitudes.
Good introductory surveys of problems with conceptual role theory can be found in
Cummins (1989), chapter 9, Lepore (1994) and Cummins (1996), chapter 4. A more
specific and advanced discussion can be found in Fodor and Lepore (1991 and 1992).
Wide psychological states were introduced in Putnam (1975b) and have been exten¬
sively discussed in the philosophy of language - see Pessin and Goldberg (1996). The
issue of computation and wide content was first raised in Stich (1978) and Fodor (1980a),
and discussed at length in Stich (1983, 1991), Baker (1987), chapter 3, Devitt (1989,
1991, 1996: ch. 5), Fodor (1991), and Fodor (1994). Two-factor theories (as well as wide
and narrow content) are discussed in various texts, see especially Kim (1996), chapter
8, Braddon-Mitchell and Jackson (1996), chapter 12, and Rey (1997), chapter 9. More
advanced articles pro and con two-factor theories can be found in McGinn (1982),
Block (1986), and Lepore and Loewer (1986). Horst (1996) is a general critique of
CTM which focuses on the conventionality of symbols.
Cognitivism
Searle (1992), chapter 9, is essentially the same as Searle (1990b). The arguments are
discussed in Chalmers (1995a) and (1996b: ch. 9), Copeland (1996a), and Harnish
(1996).
For some discussion of this issue see Crick and Asanuma (1986), Graubard (1988),
especially essays by Cowan and Sharp, Schwartz, and by Reeke and Edelman. Wasser-
man (1989), appendix A, contains some brief remarks. See Kent (1981) for more on
the “co-evolution” of brain and digital hardware, as well as comparison of the two.
See Chalmers (1996b), chapter 1, for a discussion of the difference between what he
calls “phenomenal” states, and “psychological” states, which is similar to our “experi¬
ential” and “cognitive.”
(Tc «• ’ \ vw ‘ \% >
f Hi $‘'f
W-fnia-t «'4*‘««»^<TVE^
* Kr Ml r|Hti> «MXMi|««|0
• J'P’/,1 *!. ~
T^ ' tk .1 ' > &iiuf flw !■ iftim
W »■* iXttr.n W*l* . • *». 4k ak^Sii^ctM t* 1^' * i
«ri' (♦'*■<>>
'*lHiW»4»i|ii|liM>4H4tW> Afct 5. J '/M.'n#**. v.»i|lf i ;*' «>V % ^ •7>t«[^il l'«]|
M(r p'kk 94mi| k'Na<H. <'' '! ^c €»*Hr I'kJkv*
U »*# •' HiH n ttV'4'i r*4 %>i>PAIf 14
!•»
Part III
The Connectionist
Computational Theory
of Mind
f> \
ril
lainoii'jannoD aHT **'
M, no^HT {/"rioiiKJu<{frio3
Introduction
The principal subject of Part III is the connectionist computational theory of mind
(CCTM). We view this theory as one of two special cases of the more generic
computational theory of mind (CTM). (The other special case, the digital com¬
putational theory of mind (DCTM) was investigated in Part II.) And, as we
noted earlier, the computational theory of mind is itself profitably viewed as a
special case of the older representational theory of mind (RTM):
RTM
CTM
[DCTM] CCTM
Our strategy in Part III is to first introduce the reader to some elements of
connectionist computation by way of two historically important demonstra¬
tion projects: Jets and Sharks, and NETtalk (chapter 10). If computers algo¬
rithmically manipulate symbols, then we can ask after the manipulation aspect
of computers, and the symbol aspect of computers. These networks illustrate
two different styles of connectionist modeling: the first uses an interactive acti¬
vation and competition architecture (manipulation), and local representations
(symbols). The second uses a three-layer feed-forward architecture (manipu¬
lation), and distributed representations (symbols). We then (chapter 11) look
more closely at the theory behind connectionist models by looking at their
basic building blocks (nodes and connections), how they are programmed, how
they compute, and how they learn. Next we formulate a generic version of con-
nectionism that spans various architectures and representational schemes.
Finally, we taxonomize these architectures along dimensions of memory,
control, and representation. Then, with this connectionist computational
material in place, we turn to formulating (chapter 12), and criticizing (chapter
13) the concept of mind inspired by this connectionist computational story.
10
Sample Connectionist Networks
10.1 Introduction
Here we look briefly at two very different connectionist networks. The first,
Jets and Sharks, is a simple and intuitive example of localist representations in
an interactive activation and competition network. The second, NETtalk, a three-
layer feed-forward network, is more complex and has gained some notoriety. It
illustrates distributed representations, performs a human task, and uses the back
propagation learning algorithm.
McClelland introduces the “Jets and Sharks” network to account for the
possibility of representing and storing general information in a system
without general rules. The system stores specific information in terms
of “exemplars” of objects and properties, and computes with activation
that can spread between exemplars and representations of their properties
in a way reminiscent of semantic networks discussed earlier (see chapter
7). These properties can reinforce each other when supported by a large
number of instances, or they can compete, as when they are mutually
exclusive. The purpose of the illustrative model is to show, among
other things, that the activation and competition mechanism can: (1) retrieve
the specific characteristics of particular exemplars, (2) extract central
tendencies of classes of objects from stored knowledge of particular
exemplars, (3) fill in plausible default values. Let’s look first at the
structure and operation of the network, then at some of these performance
features.
Jets and Sharks NET Talk 277
Figure 10.1 Jets and Sharks (from McClelland, 1981, reproduced in Rumelhart and
McClelland, 1986b, vol. 1: 28; reproduced by permission of the MIT Press)
The network
Typicality
For example, one could ask the network for information regarding Jets. In this
case the network was probed with “Jet,” activation was clamped onto the Jet
node, and after 200 cycles of operation, the property nodes stabilized at the
following values:
Probe Jet
Age
—1920s: 0.663
Education
—Junior High: 0.663
Marital status
—Single: 0.663
Occupation
—Pusher: 0.334
—Burglar: 0.334
—Bookie: 0.334
These are the age, education, marital status, and occupations typical of a Jet,
though no Jet has all of these properties. We might say that the network
extracted the prototypical Jet.^
Default values
The network can also fill in default values for missing information. The
network was lesioned between the “Lance” node and the “burglar” node, and
then given the “Lance” probe. W'hen activation spread after about 400 cycles
Jets and Sharks NET Talk 279
the network stabilized with the following values, indicating that it filled in
“burglar”:
Name
—Lance 0.799
Gang
—Jets: 0.710
Age
—1920s: 0.667
Education
—Junior High: 0.704
Marital status
—Married: 0.552
—Divorced: 0.347
Occupation
—Burglar: 0.641
In this case the network used the occupation of those similar to Lance to
guess his occupation. However, by increasing the instance-node to instance-
node inhibition (from 0.03 to 0.05), the activation of nodes representing
other individuals is suppressed and so their properties are not activated and
the system will not return default occupation information on Lance. This
mechanism of changing inhibition might correspond to asking oneself: “What
exactly do I know about Lance.^” versus the question “What is most likely true
of Lance.^”
Summary
TEACHER
▼
/k/
Output units OOOOOO
^//t\
Input units OOOO OOOO 0X0 OOOO OOOO OOOO OOX
( - a - c a t - )
Figure 10.2 NETtalk network architecture (from Sejnowski and Rosenberg, 1987: 147,
figure 1; figures 10.2-10.7 reproduced by permission of Complex Systems Publications)
10.3 NETtalk
Static features
Architecture
Units
1 309
2 3 layers:
Input layer: 203 units, 7 groups of 29 units each
Jets and Sharks NET Talk 281
Connections
1 Each layer is connected just to the next layer.
2 Feed-forward: activation (and inhibition) begins at the input units
and flows forward to the hidden units and on to the output
units.
Representation
Input units
Each group of 29 units has:
1 one unit for each letter of the alphabet (26 units)
2 units for punctuation and word boundary (3 units)
Output units
Each of the 26 units encodes one of:
1 21 articulatory features
2 5 stress and syllable boundary markers
Hidden units
These need to be analyzed - we will return to them in the last section of this
chapter.
We see that input and output representations can be viewed in one way as local,
in another way as distributed. The input level is local in that each unit is local
— each input unit represents a letter of the alphabet or an item of punctuation.
The output level is local in that each unit is local - each unit represents an
articulatory feature, syllable boundary, or level of stress. On the other hand,
the input level is distributed with respect to the representation of words - since
it takes more than one unit to represent words of more than one letter. Like¬
wise, the output level is distributed with respect to speech sounds or
“phonemes” - since it generally takes three articulatory features to make up
one speech sound (phoneme). Thus the description of a representation as
“local” or “distributed” is relative to what is being represented. It is custom¬
ary to say a network has “distributed” representations if at least some of its
representations are distributed. It is customary to say that a network is “local”
if all of its representations are local.
282 Connectionist Computational Theory of Mind
(a)
(b)
Total input E
Figure 10.3 Output function (from Sejnowski and Rosenberg, 1987: 148, figure 2)
Dynamic features
Computation
Programming
Units
1 The system was programmed with a sigmoid activation rule, which has
a gross profile similar to a neuron (see figure 10.3).
2 Thresholds were variable (we ignore thresholds in what follows).
Connections
1 Connections were assigned random initial weights between -0.5
and +0.5.
Jets and Sharks NET Talk 283
Learning/training procedures
The distinction between vowels and consonants was made early. Then
came word boundaries, and after 10 passes through the corpus the speech was
intelligible.
Figure 10.4 Learning curves for stress and phonemes (from Sejnowski and Rosenberg,
1987: 153, figure 4)
(a) (b)
Figure 10.5 Performance (a) degradation and (b) relearning (from Sejnowski and
Rosenberg, 1987: 154, figure 5)
Figure 10.6 Consequences of changing hidden units (from Sejnowski and Rosenberg,
1987: 155, figure 6)
2 Generalization/transfer:
(i) Start the network of 120 hidden units on 1,000 words at 98
percent, and give it 20,012 words:
77 percent correct average first pass,
85 percent correct end of first pass,
90 percent correct after 5 passes.
(ii) Try two hidden layers of 80 units each:
87 percent correct on average for first pass,
97 percent correct after 55 passes.
The overall performance of the 120-unit single-layer network, and
the 80-unit double-layer network was comparable.
3 Hidden unit analysis: levels of activation in the hidden layer were ex¬
amined for each letter of each word after the 80-unit machine reached
95 percent on the dictionary task:
(i) An average of 20 percent of the units (16 units) active per input,
so it is neither a purely “local” nor a “holographic” system.
(ii) Hierarchical cluster analysis (HCA: which items are similar, which
groups of items are similar, which groups of groups are similar,
etc.) reveals a complete separation of vowels and consonants, and
further subdivisions as well (see figure 10.7). This same procedure
of HCA was used for three networks starting with different
286 Connectionist Computational Theory of Mind
Figure 10.7 Hierarchical clustering of hidden units (from Sejnowski and Rosenberg, 1987:
158, figure 8)
It is tempting to say that these hidden units learned to detect vowels and
consonants (as well as some finer categories). Some (Clark, 1993, chapter 4)
have argued that because NETtalk cannot use this representation to, for
example, categorize sounds as consonants vs. vowels, it should not be thought
of as having the concept of consonants and vowels. One of the virtues of hidden
units is that the representational repertoire of the machine is not limited to the
categories the programmer thinks are in the stimuli, and it is free to extract
whatever regularities it can find. This means that some hidden units may
have an obvious interpretation in “our” categorization scheme, but others
may not.
Jets and Sharks NET Talk 287
Notes
1 This is only a fragment of the total network of 68 units and it is not completely
accurate (see below).
2 Though the simulation of the network on a digital computer is discrete, the
slices are small enough to approximate continuity for the purposes of
modeling.
3 The figure misrepresents Lance as single, hence Lance would be a “typical” Jet,
but in the full network Lance is married.
Study questions
How are units which are contained in the same group or “cloud” related to
one another?
NETtalk (NT)
In what respects are input and output levels of NT “local” and “distributed”?
What was discovered about representation in NT’s hidden layer - what was
coded there?
288 Connectionist Computational Theory of Mind
Suggested reading
Jets and Sharks can be explored on a home computer with the help of McClelland and
Rumelhart (1988). It is covered in some detail in Bechtel and Abrahamsen (1991),
chapter 2, and Clark (1989), chapter 5.3. NETtalk is discussed briefly in many cogni¬
tive science texts and in Clark (1993). A longer discussion of its interpretation can be
found in Verschure (1992).
11
11.1 Introduction
We turn now to a more systematic review of the basic notions used in con-
nectionist machines. We will survey some of the variety of different machines
that can be constructed using these notions, as well as some of what they have
in common. It is possible to see such models as the most recent development
in a line of thinking dating back many centuries. We already have introduced:
classical associationism (chapter 1), Pavlov and conditioned reflexes (chapter
2), Lashley and Hebb (chapter 3), perceptrons (chapter 4), and Pandemonium
(chapter 6). Some even see the move to connectionism as a kind of Kuhnian
paradigm shift out of the digital, serial, symbolic, and into the analog,
parallel, subsymbolic (see Schneider, 1987).
Ideally, these concepts should all be presented simultaneously, since they ulti¬
mately depend on each other for a complete understanding. This is impossi¬
ble, so we will develop these topics in a particular order, but keep in mind that
only after a full network has been presented and trained will it be clear how
everything fits together.
Units either get their activation from the environment (“input units”), and pass
it on to other units, or they get their activation from other units and pass it on
to the environment (“output units”), or they get it from other units and pass
it on to other units (“hidden units”). “Units” or “nodes” may be compared to
cell bodies of idealized neurons. Units will be represented by circles, with a
line leading into them (their inputs), or a line leading out of them (their
outputs), or both.
Each unit is in a certain state of activation, which can be positive, zero, or
negative. Each unit also collects activation from those units that connect into
it (the “fan in”), and a unit can pass on activation to each unit that connects
out of it (the “fan out”). The collected input is called the ‘‘‘‘net" input activa¬
tion, and the simplest method of calculating net input is just to sum the sepa¬
rate inputs of the fan in units. This net activation can then be combined with
the previous activation level of the unit to give the (new) current activation level
of the unit. In the simplest case we just let the new activation level be the net
activation level. The unit can then pass on activation to the units in its fan out.
This is the output activation. So there are three processes that can take place
in a unit: net activation, current activation, and output activation (see figures
11.2 and 11.3).
Connectionism: Basic Notions and Variations 291
Output
Figure 11.3 Connections and unit (from Hinton, 1992: 146, figure 10.2; reproduced by
permission)
0.7 1.0
A B
o o
The whole process, then, involves receiving weighted input along a connec¬
tion, summing those weighted inputs to form Net, and passing Net through
to the output.
To program a network is to state its activation rules., its thresholds (if it has them),
and to give weights to each of its connections. This can be done by the pro¬
grammer, or the network can be trained to have a certain set of weights. When
the machine is configured in the second way we say it has learned its weights
rather than having been programmed with those weights. But once learned, a
set of weights can thereafter be used to program another network. We return
to learning shortly.
Example
Unit C get inputs from two units: unit A passes activation 0.7 along a con¬
nection with strength 0.3. Unit B passes activation 1.0 along a connection with
strength 0.5 (see figure 11.4). What is the net input to unit Cl Using formula
(N) we notice that first we must multiply the output of each unit (A, B) times
the weight of its connection to C. For A this is 0.7 x 0.3 = 0.21. For B this
is 1 X 0.5 = 0.5. Finally, we sum these: 0.21 + 0.5 = 0.71. This is the net
activation of unit C.
Connectionism: Basic Notions and Variations 293
1
A
I
B
i i
Figure J1.5 Simple network 1
Simple network
Consider the simple network in figure 11.5. This network has four units (A,
B, C, D); each input unit is connected by excitation to each output unit. But
to make it compute we need to program it by giving it activation passing rule(s)
and weights (?), give it some input (.^'‘) and calculate the output (???). We now
program this network by (1) assigning weights to each connection, and (2) spec¬
ifying principles for computing the activation that each unit receives and passes
on to the next step.
Connection weights
Activation
We let the net activation be the sum of the weighted inputs, as given by (N),
we let the current activation equal the net activation, and we let the output
equal the current activation. So in this simple system, the net activation
as given by (N) is just passed through each unit. (This is not always so -
remember NETtalk.) So the programmed network looks like figure 11.6.
294 Connectionist Computational Theory of Mind
(1) (1)
1
A
i
B
1
(0.3)
i
(0.7)
Computation
We now give the network some input and calculate the output. Input: let both
input units receive an activation of 1 - as indicated in the figure in parenthe¬
ses. Unit A receives activation of just 1, its net activation is therefore just 1 (see
(N) again) and so it passes on activation of just 1. The same for unit B. Unit
C receives activation from both A and B, so the net activation C will, accord¬
ing to (N), be the sum of: the activation of each unit feeding into it times the
weight of each connection. From unit A, that will be 1 x 0.1 =0.1, from B,
that will be 1 X 0.2 = 0.2. So, by (N) we sum these two products and the result
for unit C is 0.1 + 0.2 = 0.3. That is:
So the net activation level of unit C is 0.3. Since it outputs its net activation
level, its output is 0.3. Unit D also receives activation from both units A and
B, so its output is calculated in exactly the same way:
So the net activation level of unit D is 0.7. Since it outputs its net activation
level, its output is 0.7. Thus, the simple network 1, as programmed, outputs
the values in parentheses at units C and D, given the input in parentheses at
units A and B.
Pattern associator
Structure
Consider the simple network of eight units shown in figure 11.7 (four
input and four output), arranged in two layers with each unit in each layer
connected to each unit in the other layer (Rumelhart and McClelland 1986a:
33-40).
Programming
As before, we let each unit pass on as output its net input (N). Each con¬
nection will be given a weight, some will be excitatory (+) and some sill be
inhibitory (-). These assignments are hard to represent when networks get
complicated, so we will now introduce the more perspicuous matrix style.
These two diagramming techniques are equivalent and you can in principle go
back and forth from one to the other (see figure 11.8).
Each question mark represents a weight, and we get a different network
for every distinct assignment we make. To start with, we will work with
the network shown in figure 11.9, with the following weights, which we will
296 Connectionist Computational Theory of Mind
Output
5 >
E
? ? ? ?
F
? } ?
G
} ? ?
CG) H
imagine associates rose appearances (activation on the input units) with rose
smells (activation on the output units).
We will imagine that a flower’s appearance can be broken down into four
general components, say: A, the shape of each petal, B, the overall configura¬
tion of petals, C, the color of the petals, and D, the stem. We will imagine the
same thing is true of its smell (though here the story will have to be given by
science or the perfume industry). With this scheme in place we can let a spe¬
cific activation value on a node represent a specific component. For instance,
activating the A node with “1” might indicate a rose petal shape, and “-1” on
node B might indicate a rose configuration of petals, and “—1” on node C might
indicate a rose color, and so forth for the rest of the input and output nodes.
Connectwmsm: Basic Notions and Variations 297
Computation
Given an activation value for each input unit we can compute the activation
value for each output unit. For instance, we can give the input nodes
(A, B, C, D) the following series of values (called an “input vector”) written
1>:
A: 1
B: -1
C;-l
D: 1
The activation value of each output unit would, by (N) given before, be the
sum of each of its four weighted inputs. For instance, output unit E:
Connection Weight
A-E: 1 X -0.25 = -0.25
B-E: -1 X 0.25 = -0.25
C-E:-lx0.25 =-0.25
D-E: 1 X -0.25 = -0.25
Sum = —1.00
So the net activation of unit E is -1, and since its output is the same as its net
activation, its output is also —1 (as indicated in the rose network in parenthe¬
ses). We can now say that the rose network yields as output for unit E, with
input vector (1, -1, —1, 1), the output value -1. Similar calculations for output
units F, G, H yield the values:
E:-l
F:-l
G: 1
H: 1
The output vector for the rose network is: (-1, —1, 1, 1).
Exercise 1 Verify these values for F, G, H using (N) as we just did for E.
The rose network computes the output vector (—1, -1, 1, 1) from the input
vector (1, -1, -1, 1) by transforming an input vector into the output vector,
and this transformation amounts to multiplying the input vector by the weights
on the connections (by the weight matrix illustrated in the rose network). This
298 Connectionist Computational Theory of Mind
Defective input
Connection Weight
A-E: 1 X -0.25 = -0.25
B-E: -1 X 0.25 = -0.25
C-E: 0x0.25 = 0
D-E: 1 X -0.25 = -0.25
Sum = —0.75
So the activation value of output unit E for the new degraded input vector
(1, -1, 0, 1) (i.e. colorless rose appearance) is —0.75. Calculations for the
remaining output units F, G, H go just as this one did, and yield the values:
E: -0.75
F: -0.75
G: 0.75
H: 0.75
Thus, the rose network transforms (or associates) the degraded input vector
(1, -1, 0, 1) to the output vector (-0.75, -0.75, 0.75, 0.75), and a comparison
between results of normal versus degraded input:
CHZ
Figure 11.10 Goat network
reveals that although the numbers are different the pattern of signs is the same.
As Rumelhart and McClelland (1986a: 35) put it: “the . . . pattern produced
in response will have the activation of all the . . . units in the right direction
[the right signs]; however, they will be somewhat weaker than they would be,
had a complete . . . pattern been shown.”
Superimposed networks
Exercise 3 Verify that for the input vector (—1, 1, —1, 1) the goat network
associates (computes) the output vector (-1, 1, 1,-1). Hint: note that the 0.25
s always add up to 1, so check the sign.s.
Now, can we construct a single “rose and goat” network that w ill deliver the rose
smell just for rose appearances, and goat smells just for goat appearances? (You
300 Connectionist Computational Theory of Mind
know we would not be doing this if the answer were not “yes”.) The basic fact
we will exploit is that two weight matrices, such as those embedded in the ro.se
network and the goat network, can be added together, and that once they are added
together, then the resulting matrix will produce the right output (no goaty¬
smelling roses). How do we add matrices.^ Just add their corresponding parts:
a b W X a +w b+x
+ =
_c b_ .y z. c +y d +z
In the case of the rose network and the goat network we add the two matrices
together and the resulting matrix and network is shown in figure 11.11 (see
Bechtel and Abrahamsen, 1991: 49-50).
Computation
We now need to verify that the rose and goat netw'ork will not make a goat
smell like a rose (nor vice versa). We give the network the goat appearance
vector (—1, 1, —1, 1) and see what we get as output. As usual we will compute
the first output unit E in detail:
Connection Weight
A-E: -1x0 =0
B-E: 1x0 =0
C-E:-l x0.5 = -0.5
D-E: 1 X-0.5 = -0.5
Sum= -1
Connectionism: Basic Notions and Variations 301
So far, so good, but both goat smell vectors and rose smell vectors begin with
-1 (i.e. output unit E has -1 as output value for both vectors), so to make sure
we are on the right track we should compute an output unit that is distinctive
to goat smells. As we saw, for roses, F = -1, but for goats F = 1; what do we
get from the rose and goat network for the goat appearance vector?
Connection Weight
A-F: -1 X -0.5 = 0.5
B-F: 1 X 0.5 = 0.5
C-F: -1x0 =0
D-F: 1x0 =0
Sum = 1
So the value of output unit F, for the goat appearance input vector is 1, which
is correct.
Exercise 4 Verify that the rose and goat network gives just goat smells for
goat appearances, and rose smells for rose appearances.
Recurrent networks
Output layer
Hidden layer
Input layer
Output units
o o
o oo oo
ooo
Input units
oooooo
Context units
Figure 11.13 Elman’s (augmented) recurrent network architecture (from Elman, 1990b:
349, figure 1; reproduced by permission of the MIT Press)
Recurrent networks are trained with the same algorithms (usually back-
propagation, as with NETtalk) as feedforward three-layer networks. Elman
(1990a) demonstrates how recurrent networks can find sequential regularities
in a variety of domains.
Letter prediction
For example, Elman constructed a recurrent network composed of 6 input
units, 20 hidden units, 6 output units, and 6 context units. He then constructed
a list of letters based on the following principles:
b ^ ba
d dii
g -> guuu
Letter-in-word prediction
Elman also modified the string of consonants and vowels to form 15 words
(with no breaks between them). Some 200 sentences, each consisting of 4 to 6
of these words, were constructed and strung together to form a string of 1,270
words, with each word broken down into letters yielding a stream of 4,963
letters. A recurrent network of 5 input units, 20 hidden and context units,
and 5 output units was trained on 10 complete presentations of this sequence,
which was too few for the network to memorize the whole sequence. Never¬
theless, the errors it made indicated it had begun to recognize repeating
sequences of letters in word-like groups. The network would make high errors
at the beginning of a word (it would not know what word was coming next),
but once it recognized the beginning of the word it would get better and better
at predicting what word it was that began with that letter (see figure 11.14).
Finally, Elman extended the network to recognizing the parts of speech of
words in simple sentences, and eventually to words in embedded clauses in
complex sentences.
The error for a network trained on phoneme prediction. Error is high at the beginning of a word and
decreases as the word is processed.
Figure 11.14 Error in letter-in-word prediction task (from Elman, 1990a; 194, figure 6;
reproduced by permission of the author)
can affect is the values of the connection weights. With that said, we should
note that some weight changes can be equivalent to architectural changes. For
instance, if a weight is set at 0, that is functionally equivalent to not having a
connection.
learn that when one member of the pair is presented it is supposed to produce
the other.
Unsupervised learning
Supervised learning
Layer 3
Inhibitory clusters
Layer 2
Inhibitory clusters
Layer 1
Input units
The architecture of the competitive learning mechanism. Competitive learning takes place in a context
of sets of hierarchically layered units. Units are represented in the diagram as dots. Units may be active
or inactive. Active units are represented by filled dots, inactive ones by open dots. In general, a unit in a
given layer can receive inputs from ail of the units in the next lower layer and can project outputs to all
of the units in the next higher layer. Connections between layers are excitatory and connections within
layers are inhibitory. Each layer consists of a set of clusters of mutually inhibitory units. The units
within a cluster inhibit one another in such a way that only one unit per cluster may be active. We think
of theconfiguration of active units on any given layer as representing the input pattern for the next
higherlevel. There can be an arbitrary number of such layers. A given cluster contains a fixed number
of units, but different clusters can have different numbers of units.
Hebbian learning
As we noted in chapter 3, Hebb (1949) proposed the influential idea that when
two connected neurons are active together, the strength of the connection
between them increases.' This remark raises two questions: First, when this
idea is applied to learning or training in a network, is it “supervised” learn¬
ing or not? Second, how much would this strength increase, and what would
it depend on? Turning to the first question, when two arbitrary units are
involved, as Hebb described it, the adjustment in “weight” is made without
“supervision.” But in the case of a simple associator the second (B) “cell”
would be the output node, and by adjusting connections to it, one would be in
effect “supervising” it, because the weight change would depend on its output.
So, for the networks we will investigate here, we will regard Hebbian learning
as a species of supervised learning. Turning now to the second question, there
are a variety of proposals for making Hebb’s idea precise, and we will use the
following one (see Bechtel and Abrahamsen, 1991: 48ff, 72ff): weight change
is equal to the activation of the first unit times the activation of the second
unit times a learning rate (how fast the system will change its weights). In
simple associative networks, such as we have been considering, the first
unit will be an input unit, and the second unit will be an output unit. Often
the learning rate is taken to be a fraction with the number of input units in the
denominator, and 1 in the numerator. So our working Hebbian learning rule
is this:
(H)
(1) Find the product of: an input unit, a connected output unit, and a
learning rate (1/number of input units).
(2) Add this to the previous connection weight.
(3) The result is the new connection weight.
Consider, for instance, a simple network (figure 11.16). We want to train this
network, using rule (H), to distinguish one pair of input vectors from another
pair of input vectors by responding to the first with a 1, to the second with a
0. To fix ideas, we will assume the first pair of vectors codes female faces, the
second pair codes male faces:
A B C D
o o o o
o E
Can we train the “face network” to distinguish these two groups of faces?
To train the network is to adjust the connection strengths in such a way
that given a female face vector the network will respond with a 1, and given
a male face vector the network will respond with a 0. For each vector there
are four connection weights to consider and according to (H) we must
train each connection by following these two steps: (a) first, find the product
of input activation (input vector), desired output activation (either 1 for female
or 0 for male), and the learning rate (1/number of input units = 1/4 = 0.25).
Then (b) add this to the previous connection strength - this will be the new
weight.
Vector I
Let’s start our training on vector 1, the first female vector, by training the first
connection A-E. Remember, the categorization we want it to learn in this case
is vector 1: (1, —1, —1, 1), output = 1 (see figure 11.17).
(H applied)
(i) Find the product of: input unit A, a connected output unit E, learn¬
ing rate (1/4). Input unit A gets activation 1 from vector 1, the con¬
nected output unit E gets target activation 1, and the learning rate is
0.25. So the product is: 1 x 1 x 0.25 = 0.25.
(ii) Add this to the previous connection weight 0.
(iii) The result is the new connection weight 0 + 0.25 = 0.25.
310 Connectionist Computational Theory of Mind
1 - 11-1
A B C D
o o o o
E
A B C D
o o o o
So the result of training the face network on vector 1 is to assign to the first
connection the weight 0.25. If we repeat this style of calculation for the
remaining connections we get:
So our face network is now trained on the first vector (see figure 11.18).
Vector 2
We now train the face network in the same way on vector 2. The only differ¬
ence is that at step (b) there will now' be a previous connection strength, the
weights resulting from learning vector 1. Remember, the categorization we
want the network to learn now is vector 2: (1, 1, 1, 1). Notice that they are all
Connectionism: Basic Notions and Variations 311
1111
A B C D
oE
the same. Notice also that the existing weights in the network are all the same
(0.25) except for sign. So we should expect that the results of training on vector
2 will be differences of adding or subtracting 0.25, and this is right. Let’s do
the first weight to see this.
(H applied)
(i) Find the product of: input unit A, a connected output unit E, learn¬
ing rate (1/4). Input unit A gets activation 1 from vector 2, the con¬
nected output unit E get target activation 1, and the learning rate is
0.25. So the product is: 1 x 1 x 0.25 = 0.25,
(ii) Add this to the previous connection weight 0.25,
(iii) The result is the new connection weight 0.25 + 0.25 = 0.5.
So the result of training the face network on vector 2 is to assign the first con¬
nection the weight 0.5. We repeat this style of calculation for the remaining
connections and get:
So our face network is now trained on the second vector (see figure 11.19).
Vectors 3 and 4
We now want to train the face network on vectors 3 and 4, the male faces. This
turns out to be surprisingly easy. Notice that the target activation of the output
312 Connectionist Computational Theory of Mind
unit is 0 in both cases. This means that there will be no change with these
vectors, because in (H) we multiplied the input and learning rate by the output
and that is 0, and anything multiplied by 0 is 0. So the increase is 0 and the
previous weight remains the same - the face network trained for the second
vector is our final face network. This completes the first training cycle; we made
one pass through each of the input-output pairs.
Computation
VVe now want to see if the trained network will do what it was trained to do -
will it recognize a female and male vector?
Vector 2
First let’s see what it outputs when we give it the second female vector, vector
2: (1, 1, 1, 1). Using (N) we get: sum (input x weight): 1 x 0.5 + 1 X 0 + 1 x
0 + 1 X 0.5 = 1. Which is correct. The face network, after one training cycle,
correctly categorized vector 2 as 1 — female.
Vector 4
Next let’s see what it outputs when we give it the second male vector, vector
4: (—1, —1, 1, 1). Using (N) we get: sum (input x weight): —1 x 0.5 + -1x0 +
1 X 0 + 1 X 0.5 = 0. Which is correct. The face network, after one training
cycle, correctly categorized vector 4 as 0 - male.
Technical digression What are the limitations of the Hebb rule for
learning vectors in an associative network? A linear associator trained by the
Hebb rule will learn only orthogonal vectors without interference. As we
noted in an earlier digression, two vectors are orthogonal when their inner
product equals 0; that is, if we multiply the vectors and sum the.se prod¬
ucts, that sum will be 0. If you try to teach a linear associator a new non-
orthogonal vector, its performance will degrade - it will do worse on ones
it used to perform correctly on.
Delta learning
With Hebbian learning, weights are set as a function of input activation and
target output activation (plus learning rate, but that is constant for a network).
There is no feedback from the discrepancy in performance between what the
Connectionism: Basic Notions and Variations 313
network is doing and what it should be doing. There is no opportunity for error
to have an instructive effect. Here we take error to be the difference between
the target output activation and the actual output activation:
(D)
(1) find the error (E).
(2) find the product of: the input activation x error x the learning rate.
(3) add this to the previous weight.
(4) The result is the new weight.
We will apply this learning procedure to the face network. We will use the
following vectors with the following output values:
Trial 1 (Vector 1)
As before, we now calculate the change in value of the first weight in the
network, the connection A-E, when giving the network vector VI to learn to
associate with 1. Applying (D) to this connection, we get;
(D applied)
(i) Find the error (E). This is the difference between what the output
unit should produce and what is does produce. It should produce a 1.
What does it produce? For that we need to use (N) again. The output
is the sum of the products of input activation times weights. But
314 Connectionist Computational Theory of Mind
weights are all 0 to start with. So the products are all 0, so the sum is
0 - it produces 0. So the error is the difference between 1 and 0,
i.e. 1.
(ii) Find the product of: the input activation, the error, and the learning
rate. The input activation is 1, the error is 1 and the learning rate is
0.25. So the product of these is 1 x 1 x 0.25 = 0.25. So the result is
0 + 0.25 = 0.25.
(iii) Add this to the previous weight. The previous weight was 0, so the
result is 0.25.
(iv) The result, 0.25, is the new weight.
So we know' that the new weight on the first connection, A-E, is 0.25. We now
need to do the same thing for the remaining connections. As can be seen, all
of the numbers are the same except for sign (B-E and D-E are So the
resulting weights for the first trial, vector 1, are:
A-E: 0.25
B-E: -0.25
C-E: 0.25
D-E: -0.25
Trial 2 (Vector 2)
We apply (D) to the second vector (1, 1, 1, 1) in the same way, starting with
the first connection A-E:
(D Applied)
(i) Find the error. According to (E) this is the difference between what
the output unit should produce and what is does produce, it should
produce a 1. What does it produce? For that we need to use (N)
again: the output is the sum of the products of input activation
times weights: 1 X 0.25, 1 x —0.25, 1 x 0.25, 1 x -0.25 = 0.25 + -0.25
+ 0.25 H—0.25 = 0. So the actual output is 0. So the error is 1 — 0,
i.e. again it is 1.
(ii) Find the product of: the input activation, the error, and the learning
rate. The input activation is 1, the error is 1 and the learning rate
is 0.25. So the product of these is 1 x 1 x 0.25 = 0.25. So the result
is 0 + 0.25 = 0.25.
Connectionism: Basic Notions and Variations 315
(iii) Add this to the previous weight. The previous weight was 0.25, so the
result is 0.5.
(iv) The result, 0.5, is the new weight.
A-E: 0.5
B-E: 0
C-E: 0.5
D-E; 0
Trial 3 (Vector 3)
The result of performing the same calculations on the third vector yields the
new weights (remember that the target output value is -1 in this case):
A-E: 0
B-E: -0.5
C-E: 0
D-E: 0.5
Trial 4 (Vector 4)
The result of performing the same calculations on the fourth vector yields the
new weights (remember that the target output value is -1 in this case):
Cycle I Weights
A-E: -0.5
B-E: 0
C-E: 0.5
D-E: 0
This completes the first cycle of training (one cycle of training is one training
trial on each input-output pair), and we have trained the network for each of
vectors 1-4. One could continue to train the network for more cycles. For
316 Connectionist Computational Theory of Mind
A B C D
o o o o
instance, after 20 cycles the network has the weights (shown in figure 11.20;
see also Haberland, 1994: 185):
Computation
We can verify that this network will give the right outputs for each of the four
vectors. Consider, for instance, vector 3: <1, 1, 1, -1). This should give the
output —1. Does it?
(N applied)
Sum(l x-1, 1 x-1, 1 x2,-1 X 1) = -1 +-1 +2 + -1 = -l.
11,4 Representation(s)
The first question (Ql) was called the problem of representations [with an “s”],
and the second question (Q2) was called the problem of representation [without
an “s”]. An answer to (Ql) would reveal the (1) structure and (2) important
features of the major schemes of connectionist representation. An answer to
(Q2) would have to tell us (a) under what conditions something is a represen¬
tation, i.e. represents something, and (b) what determines exactly what it
represents.
Some terminology
Local representations
The basic idea behind local representations is that a specific unit is dedicated
to a specific “concept.” We saw this in NETtalk’s input units, where a
single unit was dedicated to a single letter. Such schemes are intuitive and
simple; they are explicit and easy to understand. But they have grave
defects as cognitive models, and are unrealistic as directly implemented in
neural hardware.
Connectionism: Basic Notions and Variations 319
Distributed representations
Suppose we agree that despite their intuitive appeal, local representations are
neurologically implausible and computationally inadequate, and that we should
recruit more units into conceptual representations. An example of this was the
output units of NETtalk where the representation of phonemes was distrib¬
uted over many distinctive feature units and the same distinctive feature units
participated in representing many phonemes.
0
0
0
1 2
y/x 0 10 0
0
1 3
0
1 2
y/x 0 10 1
0
1 3' 3
* 0
1 2 2'
y/x 0 1 0 0
a b C d
1 0 e
1 1 3’ 3 f
! 0 g
i 1 2 2' h
y/x 0 1 0 0
i same as 3 (or look at values on the diagonal). The occurrence of such “cross¬
talk” is called the binding problem: how to bind the positions together as in
figure 11.22, so as to avoid the ghosts (2', 3') in figure 11.23.
One solution is to assign a unit to each pair of possible combinations ofX and
I y, as in figure 11.24. Thus dot 2 is at the node (b, h) whereas its ghost, 2',
is at node (d, f). Mutatis mutandis for dot 3 and its ghost, 3'. This would
Connectiomsm: Basic Notions and Variations 321
solve the binding problem for this simple case, but it quickly becomes com¬
putationally expensive.
If we operationally define the accuracy of a representational system as:
the number of different encodings that are generated as a dot moves a stan¬
dard distance across the plane - then with conjunctive coding the accuracy
of getting a specific point is proportional to the square root of the number
of points (see Rumelhart and McClelland, 1986a, 90). We can do much
better.
Microfeatures
Type 2: An example of this might be the hidden layer of NETtalk where acti¬
vation patterns represent vowels and consonants, but individual nodes have no
interpretation.
• upright container
• hot liquid
O glass contacting wood
• porcelain curved surface
• burnt odor
• brown liquid contacting porcelain
o oblong silver object
• finger-sized handle
• brown liquid with curved sides and bottom
Units Microfeatures
• upright container
O hot liquid
o glass contacting wood
• porcelain curved surface
o burnt odor
o brown liquid contacting porcelain
o oblong silver object
• finger-sized handle
• brown liquid with curved sides and bottom
Representation of coffee
Units Microfeatures
O upright container
• hot liquid
O glass contacting wood
o porcelain curved surface
• burnt odor
• brown liquid contacting porcelain
o oblong silver object
o finger-sized handle
• brown liquid with curved sides and bottom
Figure 11.25 Coffee vectors (from Smolensky, 1991b: 208-9, figures 2-4; reproduced by
permission of Kluwer Academic/Plenum Publishers)
Type(l)
/
Microfcature
^Single
nothing
Concept
c Hidden unit
activation pattern
Proposition
\
\
Type (2): Type (4)
Figure 11.26 Types of connectionist representations (from Ramsey, 1992: 261, figure
8.1; reproduced by permission of Oxford University Press)
mechanisms in the first place. The most commonly mentioned advantages are:
content addressability, pattern completion, (spontaneous) generalization, fault
tolerance, graceful degradation, and improved relearning. These are not inde¬
pendent notions. “Content addressability,” as we saw in chapter 6, contrasts
with location addressability - the idea is that we store material in memory
according to its representational content, not in some (arbitrary) address. We
can access this remembered information with probes that are similar (in
content) to the desired information. “Pattern completion” refers to the ability
of systems with distributed representations to correctly recognize partial
inputs. “Spontaneous generalization” refers to the ability of systems to raise
the activation of nodes that are related to the target nodes. In this way the more
specific feature “generalizes” to more general ones. “Fault tolerance” refers to
the property of such systems to ignore false or misleading input information
and still come up with the right answer, or the best fit. “Graceful degradation”
refers to the fact that the system can be damaged in various ways and it will
not just crash; rather its overall performance degrades along various general
dimensions, such as speed and/or accuracy. Finally, “improved relearning”
refers to the fact that if such a system is damaged, for example by injecting a
high amount of noise into it, then retrained, it can relearn the set much faster
than it did originally, and it will even improve its performance on items omitted
from the retraining — i.e. the retraining tends to “generalize.”
Ten questions
I Static features
Architecture
Units
1 How many units are there?
Connections
2 WTat is the pattern (geometry, “spatial” layout) of connections?
3 Which connections are excitatory, which are inhibitory (if any)?
4 W hat is the direction of the flow of activation?
326 Connectionist Computational Theory of Mind
Representation
5 What do individual units at each layer represent, if anything?
6 What do groups of units at each layer represent, if anything?
// Dynamic features
Computation
7 How does the network compute its output from its input?
Programming
Units
8 What are the “activation passing” rules (including thresholds and
decay, if any)?
Connections
9 What are the weights on the connections?
Learning/training
10 What is the procedure by which weights, and/or thresholds, are
changed?
Notes
Study questions
Given a network and an input vector, be able to use (N) to calculate an output
vector.
Think about how a single network can carry more than one association of
patterns.
Given a network, an input vector, and an output, be able to calculate the weight
assignment using the Hebb rule (H).
Given a network, an input vector and an output, be able to calculate the weight
assignment using the Delta rule (D).
Representations
Generic connectionism
Suggested reading
General
Basic notions
An early important article on connectionism (which made the label stick) was Feldman
and Ballard (1982). Probably the most influential general discussion of connectionist
(also called “parallel distributed”) models to date is Rumelhart and McClelland (1986),
vol. 1. Chapter 2: the general framework for parallel distributed processing is a good
introduction to basic notions. Rumelhart (1989) is an excellent summary article of most
of this material. Other good surveys of basic connectionism can be found in
Wasserman (1989), chapter 1, Clark (1989), chapter 5, Churchland (1990), Quinlan
(1991), chapter 2, Bechtel and Abrahamsen (1991), chapters 1-3, Haberland (1994),
chapter 6, Bechtel (1994), Stillings et al. (1995), chapter 2.10, and fdman et al. (1996),
chapter 2. Ballard (1997) is more technical, but it gives very useful details. For more
on recurrent networks and language see Elman (1992).
Some mathematics
Rumelhart and McClelland (1986a) chapter 9: An introduction to linear algebra in
parallel distributed processing, is a comprehensive introduction to just that. Shorter
discussions can be found in Wasserman (1989), appendix B: Vector and matrix opera¬
tions, and in Levine (1991), appendix 2: Difference and differential equations in neural
networks, and Caudill and Butler (1993), volume 1, appendix D.
Representations
The first and most influential general discussion of distributed representations can be
found in Rumelhart and McClelland (1986a), chapter 3: Distributed representations.
See Nadel et al. (1986) and Feldman (1989) for discussion of the neurobiology of rep¬
resentation. A more philosophical introduction to connectionist representations can be
found in Sterelny (1990), chapter 8. An extensive philosophical discussion can be found
in Cussins (1990). Smolensky (1990) is a brief but authoritative overview. A critical
appraisal of connectionist representation can be found in van Gelder (1991), Goschke
and Koppelberg (1991), and Ramsey (1992).
Anthologies
Some representative anthologies on connectionism include Nadel et al. (1989), Morris
(1989), Horgan and Tienson (1991), Ramsey, Stich, and Rumelhart (1991), Dinsmore
(1992), Davis (1992), Clark and Lutz (1995), and MacDonald and MacDonald (1995).
12
12.1 Introduction
(CTM)
1 Cognitive states are computational relations to mental representations which
have content.
2 Cognitive processes (changes in cognitive states) are computational
operations on these mental representations.
(B-CCTM)
1 Cognitive states are computational relations to mental representations which
have content.
2 Cognitive processes (changes in cognitive states) are computational opera¬
tions on these mental representations.
3 The computational architecture and representations (mentioned in 1 and
2) must be connectionist.
(CCTM)
1 Cognitive states are computational relations to computational mental
representations (in the language of thought) which have content.
2 Cognitive processes (changes in cognitive states) are computational
operations on these computational mental representations (in the language of
thought).
3 The computational architecture and representations (mentioned in 1 and
2) are connectionist.
We will now explore some of the motivations for, and distinctive features of,
the CCTM.
Motivations for pursuing CCTM models come from two major quarters, sim¬
ilarities between network and human performance, and similarities between
networks and human brains.
Human performance
Recall from our earlier discussion of Jets and Sharks, and NETtalk, that each
model exhibited some human-like performance characteristics:
NETtalk
The brain
The reasoning here is at odds with the DCTM assumption that one can study
and understand the mind (software) completely independently of the brain
(hardware). CCTM is more inclined to see mental phenomena as closely related
to neural phenomena, and although connectionist models are not neural
models, they mimic (or can be made to mimic) gross structural and functional
features of the brain. So the CCTM strategy is to argue that insofar as DCTM
hardware fails to reflect the gross structure and functioning of the brain, it
is to be considered less seriously as a potential model of our brain-based
334 Connectionist Computational Theory of Mtnd
cognition. And insofar as one shares this assumption, one will be moved by the
preceding considerations.
As we will see in chapter 13, these functional virtues are not uncontroversial.
Fixed
encoding Pattern associator Decoding/binding
network modifiable connections network
Wickelfeature Wickelfeature
representation representation
of root form of past tense
Figure 12.1 Past tense network (from Rumelhart and McClelland, 1986, vol. 2: 222,
figure 1; reproduced by permission of the MIT Press)
Levels of analysis
Conceptual
The first level is the '‘‘‘conceptuar (or ‘‘''symbolic''’) level. At this level, cognitive
structures are analyzed into familiar conceptual units, such as are captured by
words or data structures we encountered in SHRDLU (see chapter 5): “put”
“pyramid” “in” “box.” Such symbol structures have traditional semantic inter¬
pretation and are operated on by symbol manipulation techniques in virtue of
their form or shape.
Subconceptual
The second level is the subconceptual (or subsymbolic) level. This consists of
entities that are the constituents of conceptual-level descriptions. Subconcepts
(also called microfeatures) correspond to units (nodes) and concepts correspond
to groups, patterns, or vectors of nodes. Since cognitive categories are realized
in connectionist systems as patterns or vectors of activation, there are two sorts
of constituency relations: there can be pattern-subpattern relations (subset),
and there can be pattern-node relations (membership). The first is a
concept-concept relation, the second is a concept-microfeature relation.
'Fhese are both “part-whole” relations, but they are different, as illustrated in
figure 12.2 (we ignore for now the question: what is the principled difference
between a concept and a subconcept/microfeature.^).
Neural
The third level is the neural level. This consists of the structure and operation
of the nervous system as studied by neuroscience.
The Connectiomst Computational Theory 339
hot liquid
burnt odor
brown liquid
contacting porcelain
brown liquid
with curved sides
Upright container
and bottom
Porcelain curved surface
The first kind of knowledge is conscious rule application.^ Here explicit rules
in some “linguistic” system (natural languages, programming languages,
logical or mathematical notation) are applied to a concept-level task domain
such as science or the law. Typically, such knowledge is the shared result of
social and institutional practices, but this is not essential to its distinctive cog¬
nitive character. Such conscious rule application is typical of novice learners
who might repeat to themselves such things as; turn, step, swing (tennis), or:
“i” before “e” except after “c” (spelling).
Intuitive knowledge
We can now raise the question of where to situate connectionist models with
respect to these two distinctions. Let’s turn to the first distinction. According
to Smolensky, the proper treatment of connectionism places connectionist
between neurological models and traditional symbolic models - it involves
subsymbolic modeling (the numbers in parentheses are Smolensky’s 1988a
originals):
(11)
The fundamental level of the subsymbolic paradigm, the subcon-
ceptual level, lies between the neural and conceptual levels.
The conceptual level, with its conscious rule interpreter, is the natural
domain of the symbolic paradigm. This paradigm is concerned with “cultural”
knowledge, such as science or the law, formulated in natural and scientific
languages with explicit rules of inference applying to them. The theory of
effective procedures, Turing machines, and programs for von Neumann
computers provide models of how people process such knowledge, and execute
such instructions:
(3)
(a) Rules formulated in language can provide an effective formalization of
cultural knowledge.
(b) Conscious rule application can be modeled as the sequential inter¬
pretation of such rules by a virtual machine called the conscious rule
interpreter.
(c) These rules are formulated in terms of the concepts consciously used to
describe the task domain — they are formulated at the conceptual level.
(1988a: 4-5)
The Connectiomst Computational Theory 341
The symbolic paradigm holds that:
(4a,b)
(a) The programs running on the intuitive processor consist of linguistically
formalized rules that are sequentially interpreted.
(b) The programs running on the intuitive processor are composed
of elements, that is, symbols, referring to essentially the same concepts
as the ones used to consciously conceptualize the task domain.
(1988a: 5)
For instance, if we look at the examples from SHRDLU (see chapter 5) we see
that data and instructions are formulated in commonsense terms such as
PYRAMID and MOVE. These together comprise:
(4)
The unconscious rule interpretation hypothesis: The programs running on
the intuitive processor have a syntax and semantics comparable to those
running on the conscious rule interpreter. (1988a: 5)
(24)
(a) Discrete memory locations, in which items are stored without mutual
interaction.
(b) Discrete memory storage and retrieval operations, in which an
entire item is stored or retrieved in a single atomic (primitive)
operation.
(c) Discrete learning operations, in which new rules become available for use
in an all-or-none fashion.
(d) Discrete learning operations, in which conclusions become available for
use in an all-or-none fashion.
(e) Discrete categories, to which items either belong or do not belong.
(f) Discrete production rules, with conditions that are either satisfied or not
satisfied, actions that either execute or do not execute.
In the symbolic paradigm the above levels of cognition are analogized to levels
of computer systems and, as with computer systems, it is not a part of the sym¬
bolic paradigm to say exactly how the symbolic level is implemented at the
neural level (see figure 12.3). However, Smolensky rejects these claims for at
least the following rea.sons:
342 Connectionist Computational Theory of Mind
Figure 12.3 Neural and mental structures in the symbolic paradigm (from Haugeland,
1997; 235, figure 9.1; reproduced by permission of the MI'l' Press)
(5)
(a) Actual AI systems built on hypothesis (4) seem too brittle, too
inflexible, to model true human expertise.
(b) The process of articulating expert knowledge in rules seems impractical
for many important domains (e.g., common sense).
(c) Hypothesis (4) has contributed essentially no insight into how knowledge
is represented in the hrain. (1988: 5)
(7)
I'hc intuitive processor has a certain kind of connectionist architecture
(which abstractly models a few of the most general features of neural
networks). (1988a: 6)
The Connectionist Computational Theory 343
(8a)
The connectionist dynamical system hypothesis: The state of the intuitive
processor at any moment is precisely defined by a vector of numerical
values (one for each unit). The dynamics of the intuitive processor are
governed by a differential equation. The numerical parameters in this
equation constitute the processor’s program or knowledge. In learning
systems, these parameters change according to another differential
equation. (1988a: 6)
(8b)
The subconceptual unit hypothesis: The entities in the intuitive processor
with the semantics of conscious concepts of the task domain are complex
patterns of activity over many units. Each unit participates in many such
patterns. (1988: 6)
(8c)
The subconceptual level hypothesis: Complete, formal and precise
descriptions of the intuitive processor are generally tractable not at
the conceptual level, but only at the subconceptual level.
(1988a: 6-7)
(8)
The subsymbolic hypothesis: The intuitive processor is a subconceptual con¬
nectionist dynamical system that does not admit a complete, formal, and
precise conceptual-level description. (1988a: 7)
(25)
(a) Knowledge in subsymbolic computation is formalized as a large set of soft
[statistical] constraints.
(b) Inference with soft constraints is a fundamentally parallel process.
(c) Inference with soft constraints is fundamentally non-monotonic.^
(d) Certain subsymbolic systems can be identified as using statistical inference.
In sum, in the symbolic paradigm, constraints are discrete and hard, inference
is logical and serial. In the subsymbolic paradigm constraints are continuous and
soft, inference is statistical and parallel.
Smolensky rejects the idea that connectionist models are neural models:
(6)
The neural architecture hypothesis: The intuitive processor for a par¬
ticular task uses the same architecture that the brain uses for that task.
(1988a: 5)
The reason for this is the loose correspondence between properties of the cere¬
bral cortex and connectionist systems (see figure 12.4). Note that among the
“negative” correspondences are some which are due to the model not being
directly implemented in hardware, and others involve choices of features that
could just as well have been made more neurologically faithful. There are
further discrepancies Smolensky does not list, such as problems with negative
weights, and problems with the biological machanisms of back-propagation.
We return to this in chapter 13.
Figure 12.4 Neural vs. subsymbolic levels (from Smolensky, 1988a: 9, table 1; reproduced
by {)ermission of Cambridge University Press)
346 Connectionist Computational Theory oj Mind
Hybrid theory
One popular suggestion is that perhaps a “hybrid theory” - one which used
the symbolic paradigm for conscious rule application and the subsymbolic par¬
adigm for intuitive processing - could be faithful to both. But as Smolensky
(1988a, section 6) notes, the proposal has many problems. For instance: (1) how
would the two theories communicate? (2) How would the hybrid system evolve
with experience — from conscious rule application to intuition? (3) How would
the hybrid system elucidate the fallibility of actual human rule application?
(4) And how would the hybrid system get us closer to understanding how
conscious rule application is achieved neurally?
PTC
In the light of these problems, and the failure of the symbolic paradigm to
implement the subsymbolic paradigm, Smolensky opts for the opposite, and
construes the subsymbolic as basic, and the symbolic as approximate and ulti¬
mately derivative. The PTC solution is to say that the higher level of descrip¬
tion in terms of natural language or traditional “symbolic” representational
schemes is only approximate — it only approximately describes our actual
cognitive activity; in just the way that classical macro-physics only approxi¬
mately describes what micro-physics exactly describes. This is illustrated in
figure 12.5.
Approximate implementation
Figure 12.5 Neural and mental structures in the subsymbolic paradigm (from Haugeland,
1997: 237, figure 9.2; reproduced by permission of the MIT Press)
(16)
The competence to represent and process linguistic structures in a native
language is a competence of the human intuitive processor; the sub-
symbolic paradigm assumes that this competence can be modeled in
a subconceptual connectionist dynamical system. By combining such
linguistic competence with the memory capabilities of connectionist
systems, sequential rule interpretation can be implemented. (1988a: 12)
(Ql) What is the relation between the symbolic and the subsymbolic levels
and paradigms?
(Q2) What is the relation between the symbolic paradigm and the neural
level?
(Q3) What is the relation between conscious rule application, the symbolic,
and the subsymbolic levels and paradigms?
(Q4) What is the relation between intuitive processing, the symbolic, and the
subsymbolic levels and paradigms?
But pursuing these would take us further than we can presently go.
Consciousness
(17)
Consciousness: The contents of consciousness reflect only the large-scale
structure of activity patterns: subpatterns of activity that are extended
over spatially large regions of the network and that are stable for relatively
long periods of time. (1988a: 17)
Of course this is pretty vague both spatially and temporally, and a necessary
condition alone is not completely satisfactory. We want to know what properties
make these large-scale, stable patterns into something conscious — remember
the “hard problem” of consciousness (see chapter 9).
Cognition
(19)
A cognitive system: A necessary condition for a dynamical system to be
cognitive is that, under a wide variety of environmental conditions, it
maintains a large number of goal conditions. The greater the repertoire
of goals and variety of tolerable environmental conditions, the greater the
cognitive capacity of the system. (1988a: 15)
Content
How do states of a subsymbolic system get their content: their meanings and
truth conditions.^ According to PTC:
(22)
Subsymbolic semantics: A cognitive system adopts various internal states
in various environmental conditions. To the extent that the cognitive
system meets its goal conditions in various environmental conditions,
its internal states are veridical representations of the correspond¬
ing environmental states, with respect to the given goal conditions.
(1988a: 15)
sion of what the frog’s eye tells the frog’s brain, we can formulate a Simple
Connectionist Detector Semantics:
(SCDS)
Units, and sets of units, represent what activates them (or activates them
sufficiently).
rules that define the system. The entities that are semantically interpretable
are also the entities governed by the formal laws that define the system. In the
subsymbolic paradigm, this is no longer true. The semantically interpretable
entities are patterns of activation over large numbers of units in the system,
whereas the entities manipulated by formal rules are the individual activations
of cells in the network. The rules take the form of activation-passing rules,
which are essentially different in character from symbol-manipulation rules”
(1989: 54). As we have already noted, in a (distributed) connectionist system,
concept-level interpretation is assigned at the level of patterns or vectors
of activation, whereas the transition principles are stated at the node and
connection levels (activation-passing rules, connection weights). But in tradi¬
tional digital machines, the level of conceptual interpretation and the level of
transition to the next state are the same - the program level. SHRDLU,
for instance, puts a pyramid in the box because it executes the command:
PUT(IN(BOX, PYRAMID)). Here we see the object of semantic interpreta¬
tion (think of BOX, PYRAMID, PUT) causing the machine to carry out
the action and move into the next program state. Adding this dimension
gives us an expanded taxonomy that includes connectionist architectures (see
figure 12.6).
The Connectionist Computational Theory 353
These assignments are rough, but there seems to be something to the idea
that going from TMs to connectionist machines involves a fragmentation
of control, memory, and representation. As we move from left to right in the
figure, global properties emerge from the interaction of local elements
(more democracy, less autocracy), and other important properties such as
fault tolerance and graceful degradation can be seen to piggyback on lower-
level features - especially distributed control and representation. As Dennett
puts it, in a slightly different context: “Notice what has happened in the
progression from the von Neumann architecture to such . . . architectures as
production systems and (at a finer grain level) connectionist systems. There
has been what might be called a shift in the balance of power. Fixed, pre¬
designed programs, running along railroad tracks with a few branch points
depending on the data, have been replaced by flexible - indeed volatile -
systems whose subsequent behavior is much more a function of complex inter¬
actions between what the system is currently encountering and what it has
encountered in the past” (1991: 269). We will return to these features in the
next chapter.
Appendix
Connectionism and Turing’s Unorganized Machines
These were characterized by Turing using figure 12.7. In this simple machine
there are five units, connected to just two other units, as indicated in the table
on the left and the diagram on the right. Each unit is either on or off. The
activation-passing rule is this: multiply the inputs and subtract from 1, i.e. the
new value = (input 1 x input 2) — 1. There is a central clock which synchro¬
nizes all activation-passing. The result of giving each unit a 0 or a 1 is shown
in the bottom chart. For instance, on the first condition, unit 1 = 1, because its
input units (#2, #3) have the values: #2 = 1, #3 = 0. Applying the activation-
354 Connectionist Computational Theory of Mind
r ‘(r) j(r)
1 3 2
2 3 5
3 4 5
4 3 4
5 2 5
A sequence of five possible consecutive conditions for the whole machine is:
1 110 0 10
2 1110 10
3 0 11111
4 0 10 10 1
5 10 10 10
Figure 12.7 A-type unorganized machine (from Turing, 1948/69: 10; Ince, 1992: 114)
Figure 12.8 B-type circuit (from Turing, 1948/69: 11; Ince, 1992: 115)
passing rule we get: 1x0 = 0, 1 — 0=1. So unit 1 should get the value 1, which
it does. Regarding A-type machines and the brain Turing commented: “A-type
unorganized machines are of interest as being about the simplest model of a
nervous system with a random arrangements of neurons” (1948/69: 8; Ince,
1992: 120). Here we see Turing clearly envisioning these as potential abstract
neural models.
These were simply A-type machines where each connection, each arrowed line
in the diagram, is replaced with the circuit shown in figure 12.8. The result of
The Connectionist Computational Theory 355
IN OUT
Figure 12.9 Schematic B-type machine (from Turing, 1948/69: 15; Ince, 1992: 119)
P-type machines
P-type machines were the basis for Turing’s experiments on learning. Turing
clearly thought that human learning was in part based on pleasure and pun¬
ishment (hence “P-type systems”):'* “The training of the human child depends
largely on a system of rewards and punishments, and this suggests that it ought
to be possible to carry through the organizing with only two interfering inputs,
one for ‘pleasure’ . . . and the other for ‘pain’ or “punishment.” One can devise
a large number of ‘pleasure-pain’ systems” (1948/69: 17; Ince, 1992: 121).
The general character of such “P-type” systems is as follows: “The P-type
machine may be regarded as an LCM [logical computing machine, i.e. Turing
machine] without a tape, and whose description is largely incomplete. When a
configuration is reached, for which the action is undetermined, a random
choice for the missing data is made and the appropriate entry is made in the
description, tentatively, and is applied. When a pain stimulus occurs all tenta¬
tive entries are canceled, and when a pleasure stimulus occurs they are all made
356 Connectionist Computational Theory oj Mind
Notes
1 Thagard (1986), and Smolensky (1988a), thesis (li), propose that connectionism
might offer a new perspective on the mind-body problem. But since connectionist
machines abstract from substance (they could be realized in neurons or silicon), it
is difficult to see how connectionism per se could offer any help on the mind-body
problem.
2 Or, as Smolensky often terms it, conscious rule “interpretation.” This is standard
in computer science, but outside it gives the impression that “interpreting” the rule
is like “interpreting” the law.
3 “Monotonic” means constantly increasing. Traditional deductive inferences are
monotonic because the set of conclusions drawn from given premises always grows,
never shrinks. But in non-monotonic reasoning, conclusions can be subtracted, given
additional evidence.
4 Let’s not forget that 1948 was during the heyday of behaviorism, and his paper of
1950 introduced the “Turing Test,” with its clearly behaviorist leanings.
Study questions
What three levels at which the mind-brain can be described does Smolensky
distinguish?
How does PTC conceive of the relation between the subconceptual and the
neural levels?
How does PTC conceive of the relation between the subconceptual and
the conceptual levels (for both conscious rule application and intuitive
processing)?
In what two ways might a connectionist network get its semantics - its
representational capacities?
WTat is the distinction between being semantically effectual (SE) and being
semantically ineffectual (SI)?
How exactly does the difference between being SE and SI distinguish digital
from connectionist machines?
358 Connectionist Computational Theory oj Mind
In what sense is there a spectrum along which we can place computers from
Turing machines to connectionist machines?
Suggested reading
See Valentine (1989) and Walker (1990) for survey discussions of the history of con¬
nectionism in relation to earlier developments in psychology. In addition to Bechtel
(1985), scattered discussions of associationism and connectionism can be found in
Rumelhart and McClelland (1986a), and in Ramsey (1992).
Interpreting connectionism
For a survey of issues surrounding emergence see Beckermann et al. (1992). For more
discussion of cognition as adaptability to the environment see Copeland (1993b),
chapter 3, especially 3.6. See Ramsey (1992) for more on SCDS.
Taxonomizing architectures
See van Gelder (1997), section 5, for a more general taxonomy, one that is different
from ours.
13.1 Introduction
In this chapter we review a number of issues that pose prima facie problems
for the CCTM, as well as some connectionist responses. These should be taken
as provisional, since connectionism is still a young theory, with much poten¬
tial for development.
It was obvious from the beginning of connectionism that CCTM models were
indirectly related to real brains (see again our discussion of Smolensky s PTC
in chapter 12). Still, it might be useful to note some further differences, if only
of emphasis and selection, that might help guide us in making inferences from
CCTM models to brains.
Crick and Asanuma (1986: 367-71), for instance, note the gross similarity
between some connectionist units and neurons: “they have multiple inputs,
some sort of summation rule, a threshold rule, and a single output which is
usually distributed to several other units.” But they add as a note of caution;
360 Connectionist Computational Theory of Mind
“If the properties of real neurons present useful gadgets to neural modelers,
they should not be mixed together in combinations that never occur together.”
One common proposal is that it is groups of (real) neurons that correspond to
a unit, but they note, “this might be acceptable to neuroscientists if it were
carefully stated how this group might be built out of more or less real neurons,
but this is seldom if ever done.” They go on to claim that brains do not always
behave like connectionist models by listing a number of “devices loved by
[CCTM] theorists which, if interpreted literally, are not justified by the
available experimental evidence”:
We should also recall (sec chapter 9) the wide variety of types of neurons found
in the brain, whereas typically a given CCTM model contains only one type,
and add this to the above list:
5 CCTM models usually contain one sort of unit, whereas brains contain
main sorts of neurons.
Chemistry
Geometry
Learning
Scale
There are also, of course, differences of scale, and although these are in prin¬
ciple surmountable, the differences are staggering. For instance, Churchland
estimates that there are 10'' nonsensory neurons in the brain, and each neuron
makes an average 10' synaptic connections (1989, chapter 9).
Activation vectors: Supposing there are about one thousand brain subsystems,
that gives each subsystem 10" dimensions (units) to work with — one vector of
10" units can code an entire book. How many vectors can be constructed from
10" units.? If each unit can take on (a modest estimate) 10 different values, then
there are distinct activation vectors to work with. To appreciate this
number, Stephen Hawking (1988) estimates that “there are something like
[10""] particles in the region of the universe we can detect.” And remember,
this was the number of activation vectors for only one of the thousand postu¬
lated subsystems.
Weights and connections: If each neuron averages 10' connections, then each
subsystem contains 10" connections, and each of these can have any one of the
jQio.o()o.ooo.oo.) interpretations.
This means that the brain commands enormous potential for making many fine
distinctions in representational content. It should be noted that one could of
course design a network for the purpose of mimicking the brain, but it is more
interesting if such a design is the byproduct of trying to get the cognition right
362 Connectionist Computational Theory of Mind
because that would suggest that brain architecture is not accidentally related
to cognitive architecture.
Fodor and Pylyshyn (1988) (hereafter FP) cite about a dozen popular reasons
for favoring connectionist (CCTM) architectures over what they call “classi¬
cal” architectures or “conventional models” (DCTM), most of which we have
just surveyed (1988: 51^):
Eleven lures
1 Rapidity of cognitive processes in relation to neural speeds: the
“hundred step” constraint.
2 Difficulty of achieving large-capacity pattern recognition and content-
based retrieval in conventional architectures.
3 Conventional computer models are committed to a different etiology
for “rule-governed” behavior and “exceptional” behavior.
4 DCTM’s lack of progress in dealing with processes that are nonverbal
or intuitive.
5 Acute sensitivity of conventional architectures to damage and
noise.
6 Storage in conventional architectures is passive.
7 Conventional rule-based systems depict cognition as “all-or-none.”
8 Continuous variation in degree of applicability of different principles
in CCTM models.
9 Nondeterminism of human behavior is better modeled by CCTM.
10 Conventional models fail to display graceful degradation.
11 Conventional models are dictated by current technical features of com¬
puters and take little or no account of the facts of neuroscience.
Fodor and Pylyshyn want to show that these usual reasons given for preferring
connectionism are invalid, and that they all suffer from one or other of the
following defects:
Fodor's fork
1 These reasons are directed at properties not essential to cla.ssicai cogni¬
tive architectures, or
2 these reasons are directed at the implementation or neural level, not the
cognitive level.
Criticisms of the Connectionist Computational Theory 363
FP group the usual reasons for favoring connectionist architectures under five
categories, and they reply to each category in turn.
There are two targets embedded in this section. The first target of objection is
the “100-step” rule of Feldman and Ballard (1982). As Feldman (1989; 1) for¬
mulates the rule: “The human brain is an information processing system, but
one that is quite different from conventional computers. The basic computing
elements operate in the millisecond range and are about a million times slower
than current electronic devices. Since reaction times for a wide range of tasks
are a few hundred milliseconds, the system must solve hard recognition prob¬
lems in about a hundred computational steps. The same time constraints
suggest that only simple signals can be sent from one neuron to another.” FP
read the 100-step rule as something like the following argument in enthymatic
form;
(C) So, algorithmic analyses of these tasks should contain only hundreds of
steps.
FP’s reply to this argument is: “In the form that it takes in typical Connec¬
tionist discussions, this issue is irrelevant to the adequacy of Classical cogni¬
tive architecture. The TOO step constraint,’ for example, is clearly directed at
the implementation level. All it rules out is the (absurd) hypothesis that cog¬
nitive architectures are implemented in the brain in the same way as they are
implemented on electronic computers” (1988: 54—5). But a CCTM partisan
could say that if hardware is the mechanism of causation and if the hardware
is constrained to operate in a few milliseconds, then the physical states that
realize mental states can succeed each other at no faster rate than the sequence
of realized computational states allows. The proprietary physical description
of the nervous system has it firing in tens of milliseconds, so the relevant neural
states cannot causally succeed each other at a faster rate. I'hus, the instantia¬
tions of computations cannot succeed each other at a faster rate either. So com¬
putations are limited to 100 serial steps. Claiming that this is “implementation”
does not change this fact.
364 Connectionist Computational Theory of Mind
The second issue involves the conclusion that “an argument for a network
of parallel computers is not in and of itself either an argument against a clas¬
sical architecture or an argument for a connectionist architecture” (1988; 56).
This is because: “Although most algorithms that run on the VAX are serial, at
the implementation level such computers are ‘massively parallel’; they quite
literally involve simultaneous electrical activity throughout almost the entire
device” (1988; 55). “Classical architecture in no way excludes parallel execu¬
tion of multiple symbolic processes . . . see . . . Hillis (1985)” (1988: 55—6). But
a CCTM partisan could make two responses. First, the sense in which a VAX
is “massively parallel” is not the same as the sense in which a connectionist
machine is; the electrical activity spreading through a VAX is not like the
spread of activation in a connectionist network in that it does not contribute
directly to defining the entities that get semantic interpretation. Second, the
messages being passed by Hillis’ Connection machine are much more complex
than connectionist machines, which usually pass degrees of activation between
0 and 1.
(DIST)
(i) R is a distributed representation if and only if
R is instantiated in many units,
(ii) each unit that instantiates R instantiates numerous other representations
at the same time.
model, because of (DIST ii) above, zapping units that store one piece of infor¬
mation will also zap units that participate in storing other pieces of information.
FP begin by noting that “One can have a Classical rule system in which the
decision concerning which rule will fire resides in the functional architecture
and depends on continuously varying magnitudes. Indeed, this is typically how
it is done in practical ‘expert systems’ which, for example, use a Bayesian mech¬
anism in their production-system rule interpreter. The soft or stochastic nature
of rule-based processes arises from the interaction of deterministic rules with
real-valued properties of the implementation, or with noisy inputs or noisy
information transmission” (1988: 54). One glaring difference between these
and connectionist systems is that these systems read and follow Bayesian prob¬
ability equations. But reading and following Bayesian probability equations
does not make the system “depend on a continuously varying magnitude” in
the same way as connectionist systems, which use continuously varying acti¬
vation values and weights. FP also suggest that the current problems classical
psychological models have “with graceful degradation may be a special case of
their general unintelligence: they may simply not be smart enough to know
what to do when a limited stock of methods fails to apply” (1988: 54). But this
does not seem to be the way connectionist networks handle defective input;
these systems also get spontaneous completion and generalization as automatic
consequences of the same architectural features that cause them to gracefully
degrade.
FP’s next point is: “There is reason to be skeptical about whether the sorts of
properties listed above [biological facts about neurons and neural activities] are
366 Connectionist Computational Theory oj Mind
reflected in any more-or-less direct way in the structure of the system that
carries out reasoning. . . . The point is that the structure of ‘higher levels’ of
a system are rarely isomorphic, or even similar, to the structure of ‘lower levels’
of a system . . . assumptions about the structure of the brain have been adopted
in an all-too-direct manner as hypotheses about cognitive architecture” (1988:
58-60). It is not clear what the force of these observations is supposed to be.
The main point of “brain-style” modeling is simply that other things being equal
we should prefer a theory that is closer to one we can implement in neurons
than one we cannot.
FP conclude their paper by saying that “many of the arguments for con-
nectionism are best construed as claiming that cognitive architecture is
implemented in a certain kind of network (of abstract ‘units’). Understood this
way, these arguments are neutral on the question of what the cognitive
architecture is” (1988: 60-1).
and one having to do with the computational power of parallel and serial
machines.
Searle (1980) had argued that the Chinese room shows that computation is not
constitutive of, nor sufficient for, cognition, mind, intentionality, etc. So if the
brainlike character of connectionist models is irrelevant to their computational
properties, then connectionist programs (as well as digital programs) are not
constitutive of, nor sufficient for, cognition, mind, intentionality, etc. As Searle
put it: “Imagine I have a Chinese gym: a hall containing many monolingual
English speaking men. These men would carry out the same operations as the
nodes and synapses in connectionist architectures ... no one in the gym speaks
a word of Chinese, and there is no way for the system as a whole to learn the
meanings of any Chinese words. Yet with the appropriate adjustments, the
system could give the correct answers to Chinese questions” (1990b: 28). “You
can’t get semantically loaded thought contents from formal computations
alone, whether they are done in serial or in parallel; that is why the Chinese
room argument refutes strong AI in any form” (1990b: 28). Searle does not
actually describe the simulation in the Chinese gym in the way that he describes
the simulation in the Chinese room. Nodes are people, but what are the acti¬
vation levels, activation-passing rules, connections, connection strengths, what
is vector multiplication, and how is input converted into output.^ It is not clear
what in the gym would correspond to these features of connectionist models,
and since these are crucial, the analogy is extremely weak, unlike the situation
with the Chinese room and digital machines.
But let’s suppose we have a plausible model of a connectionist machine.
What is the argument.^ Perhaps this:
(1) M models P.
(2) M doesn’t have property E
But as Searle himself has often insisted, arguments of this form are invalid -
consider:
Criticisms of the Connectionist Computational Theory 369
(3) *So, P doesn’t have property F (it does not produce milk).
Hence, Searle can’t conclude that because the Chinese gym doesn’t speak/under¬
stand Chinese, the connectionist model doesn’t speak/understand Chinese.
(B) Any function that can be computed on a parallel machine can also be
computed on a serial machine.
Suppose (A) and (B) were true without qualification - what would the argu¬
ments show about (C)? Nothing, because strong AI was the claim that an
“appropriately programmed computer” has cognition, and all (A) and (B) show
370 Connectionist Computational Theory of Mind
at most is the weak equivalence of a serial and a parallel machine. But machines
running different programs can compute the same functions. So the argument
would not establish its conclusion even if the assumptions were true without
qualification. But they are not true without qualification.
These approximate:
Some connectionist models are discrete and these can be simulated by being
run on a serial digital machine, but most are continuous, and what run on a
serial digital machine are discrete approximations of them (see Jets and Sharks
again, chapter 10). Connectionists assume that important psychological prop¬
erties (such as how and what the network learns) of the original continuous
model will be preserved by the digital approximation. The important point is
that psychologically relevant properties of connectionist models being simu¬
lated on serial digital machines are properties at the level of the virtual con¬
nectionist model, not properties of the underlying serial digital machine. One
cannot infer from properties of the one to properties of the other - as far as
the model is concerned, there is nothing lower.
Regarding (B) What Searle seems to have in mind here is the computational
equivalence of a collection of Turing machines running in parallel and a single
(serial) Turing machine. It has been proven that for any computation that can
be carried out by TMs running in parallel there is a single TM that can
carry out that computation. But connectionist machines are not TMs running
in parallel, so nothing follows from this.
Conclusion
anyway. But two things should be kept in mind. First, many connectionists
(see Smolenksy, 1988a; Rumelhart, 1989) do not view connectionism as a
theory of implementation of standard programs. So if cognitive archi¬
tecture matters to cognition, connectionist cognitive architecture might make
a difference. Second, connectionists rarely propose to duplicate cognition
by simulating or modeling it. Although connectionists such as Smolensky
(1988a) claim that their models are more abstract than neurons, and so the
models might be realizable in other material, they are free to agree with
Searle that only certain kinds of stuff will duplicate cognition. This is an
important difference between classical and connectionist theories: connection¬
ists model cognitive activity as abstracted from the details of neural activity,
and it is natural for them to insist that real brains (or their equivalent) are
necessary for there to be cognitive activity. It is less natural for classical
theorists to take this stand because their theories are no more related to real
neural structures than a microwave oven. It might be objected that this is
equivalent to denying that connectionists need believe in strong AI, and so
Searle’s argument against it would be irrelevant to them. Quite so, but it points
to an alternative position between Searle’s and strong AI, that substance +
program = cognition, or more figuratively, the substance provides the ‘stuff’
of cognition and the program provides the ‘form’ - shaping it into various
thought structures.
So far, in our discussion of the CTM (that includes both the DCTM and the
CCTM) we have been assuming a number of theses:
(R) There really are the so-called propositional attitudes (Realism with
respect to the attitudes).
(C) Central features of the attitudes are given to us in our commonsense folk
psychology.
common sense picks out water as that clear tasteless liquid, etc., so that science
can go on and tell us that the clear tasteless liquid is H2O). But the stance com¬
prising these three theses together is not completely stable nor uncontrover-
sial. For instance, one could be a realist about the attitudes (R) and think folk
psychology picks them out (C), but deny that it is the job of cognitive science
to explain them (perhaps they are no more a natural scientific kind of object
than armchairs are):
(J) The Job of cognitive science is to explain cognition, and the attitudes will
not make an appearance in this account.
Or one could be realistic about the attitudes (R), believe it is the job of cogni¬
tive science to account for them (N), but think that there could or will be a
conflict between (C) and (N), and (N) will or should win:
(S) The central features of the attitudes must be discovered by Science, not
given to us in our commonsense folk psychology.
Or more radically, one could be an eliminativist and deny (R), deny that there
are propositional attitudes at all - like witches and vital forces, they are fig¬
ments of a false commonsense folk theory:
Our purpose in raising these issues is not to resolve them, but to set the stage
for an interesting on-going dispute concerning the implications of the CCTM
with regard to realism vs. eliminativism of the attitudes.
It has seemed to some that our ordinary notion of propositional attitudes
endows each of them with certain features, such as being constituted by
concepts, being semantically evaluable as true or false (fulfilled or not ful¬
filled, etc.), being discrete, and being causally effective. Digital machines
seem eminently well suited to house such states, and so their existence
seems to support the DCTM over the CCTM. But matters may even be
worse. Some contend that the CCTM cannot accommodate commonsense folk
propositional attitudes, and hence we are in the position that if connectionism
were true, then such attitudes would not exist - eliminativism. Connectionists,
if this is right, must either show that the attitudes do not really have all
these features, or that connectionist models can accommodate these features
after all.
Criticisms oj the Connecttonist Computational Theory 373
Ramsey et al. (1991), note that the causal role attitudes play is particularly
rich and satisfies what, adapting Clark (1993: 194), is called the “equipotency
condition”:
They illustrate this with a pair of examples. To take one, suppose our ambi¬
tion is to explain why Alice went to her office on a particular occasion. Suppose
that she wants to read her e-mail and she wants to talk to her research
assistant, and she believes she can do both of these (only) in her office; “com-
monsense psychology assumes that Alice’s going to her office might have been
caused by either one of the belief-desire pairs, or by both, and that deter¬
mining which of these options obtains is an empirical matter” (Ramsey
et al., 1991: 99). For example, Alice might decide that on this occasion
she only has time to check her e-mail, and so not talk to her RA, even though
she wants to. This is easily modeled in the DCTM by having the program
access one data structure but not another, thus satisfying the equipotency
condition.
Now, the question arises of whether the CCTM can reconstruct the folk notion
of a propositional attitude in the way that the DCTM apparently can. Some
authors, such as Davies (1991), see the commitment to conceptually and causally
structured representations forming a “language of thought” as a part of com-
monsense understanding of the attitudes, and goes on to conclude that due to
the context-sensitivity of connectionist symbols; “Networks do not exhibit
syntax and causal systematicity of process; the commonsense scheme is com¬
mitted to syntax and causal systematicity of process; therefore connectionism
is opposed to the commonsense scheme” (1991: 251). Other authors, such as
Ramsey et al. (1991), emphasize the propositional level, but also argue that the
CCTM cannot reconstruct the folk notion of a propositional attitude. We will
focus on this latter argument. To make their case they start with a small
network “A” (see figure 13.1).^ They encoded the propositions shown in figure
13.2 on the input layer and used back-propagation to train the network to
differentiate the true from the false ones on the output node.
Training terminated when the network consistently output greater than
0.9 for the true propositions and less than 0.1 for the false ones. The resulting
Criticisms of the Connectionist Computational Theory 375
Input nodes
Figure 13.1 The structure of network “A” (from Ramsey et al., 1991: 107, figure 4)
Added proposition
Figure 13.2 Table of propositions and coding (from Ramsey et al., 1991: 107, figure 3)
376 Connectionist Computational Theory of Mind
Network A
Figure 13.3 Weights in network “A” (from Ramsey et al., 1991: 108, figure 6)
Here is what Ramsey et al. (1991) say: “In the connectionist network . . . there
is no distinct state or part of the network that serves to represent any particular
proposition. The information encoded in network A is stored holistically and
distributed throughout the network. ... It simply makes no sense to ask
Cnttasms oj the Connectionist Computational Theory 377
Figure 13.4 Weights in network “B” (from Ramsey et al., 1991; 110, figure 9; figures
13.1-13.4 are reprinted by permission of Ridgeview Publishing Co.)
Here is what Ramsey et al. (1991) say: “The moral here is that. . . there are
indefinitely many connectionist networks that represent the information that
dogs have fur just as well as network A does. . . . From the point of view of the
connectionist model builder, the class of networks that might model a cogni¬
tive agent w ho believes that dogs have fur is not a genuine kind at all, but simply
a chaotically disjunctive set. Commonsense psychology treats the class
of people who believe that dogs have fur as a psychologically natural kind;
connectionist psychology does not” (ibid.: 111).
378 Connectionist Computational Theory oj Mind
Eliminativism
(1) If connectionism is true, then folk psychological attitudes are not propo-
sitionally modular (see HA, NKA above).
(2) Folk psychological attitudes, if they exist, are propositionally modular.
(3) So, if connectionism is true, then: [E] There really are no propositional
attitudes (repeated).
But it seems obvious to many researchers that there are propositional attitudes
(beliefs, desires, intentions, etc.), so the above argument appears to be a
reductio ad absurdum of connectionism:
As Ramsey et al. put it: “In these models there is nothing with which propo¬
sitional attitudes of commonsense psychology can plausibly be identified”
(ibid.: 116). “//'connectionist hypotheses of the sort we shall sketch turn out
to be right, so too will be eliminativism about propositional attitudes” (ibid.:
94). There are of course a number of ways friends of connectionism and/or
folk psychology could respond to this discussion. The original argument has
three steps, and replies can be directed at any of these steps.
This reply tries to show that connectionist models, contra Ramsey et al., do
have states with the last three central features of attitudes, i.e. that proposi¬
tional modularity is preserved in connectionist models, though it is not obvious
to casual inspection at the units and weights level of description. Ramsey et al.
(1991) anticipate three ways in which connectionists might argue that this is
the case:
Ramsey et al. reply Particular patterns of activation are brief and transitory
phases in the time course of a network, but beliefs (and other propositional
attitudes) are relatively stable and long-lasting cognitive states, so beliefs (and
other propositional attitudes) are not patterns of activation.
Recall that we earlier distinguished occurrent beliefs from standing beliefs
and we identified occurrent beliefs with patterns of activation. This objection
argues that standing beliefs are not patterns of activation as well. But it does
not show' that occurrent beliefs are not patterns of activation.
This avoids the endurance problem since such dispositions can be long-term,
and a disposition can exist even if the activity does not, just as we can believe
something even when not thinking about it at the moment.
Ramsey et al. reply Dispositions are not the right sort of enduring states to
be beliefs (and other propositional attitudes), they are not discrete causally
active states, as required. In particular, recall the equipotency condition
rehearsed earlier, where Alice might have gone to her office to see her assis¬
tant, but actually went on this occasion to check her e-mail: “It is hard to see
how anything like these distinctions can be captured by the dispositional
account in question. ... In a distributed connectionist system like network A,
however, the dispositional state that produces one activation pattern is func¬
tionally inseparable from the dispositional state that produces another”
(Ramsey et al. 1991, 115-16). One thing to note about this argument is that it
rests on the inability to see how to map folk psychology on to connectionist
networks. It is not a direct argument against the proposal, and so it is closer to
the next and final objection.
Ramsey et al. reply This, of course, might be true, but we must be given some
reason to suppose so, or it just begs the question.
Clark’s (1990) strategy is to show that analogs of beliefs (and other propo¬
sitional attitudes) exist in connectionist networks, but at higher levels of
description. Clark argues that “distributed, subsymbolic, superpositional con¬
nectionist models are actually more structured than Ramsey et al. think, and
hence visibly compatible with the requirements of propositional modularity.
380 Connectionist Computational Theory oj Mind
This reply tries to show' that the inference to non-existence does not follow,
that lacking the property of “propositional modularity” is not sufficient for
saying that propositional attitudes do not exist in the system. We clearly want
to draw distinction between being mistaken about the features of something
and showing the thing doesn’t exist. We saw in chapter 3 that some famous
Greeks thought the brain is a radiator for the blood and that we think with our
heart, and some famous medievals viewed thinking as taking place in the hollow-
spaces in the brain, yet we assume that they were wrong about the brain (and
the heart), not that brains and hearts (as they conceived them) don't exist. This
could be because either:
1 the principles that allow one to infer the non-existence of the attitudes
from the falseness of the description of propositional modularity are
defective {wrong principles)',
2 not all three features are actually true of propositional attitudes
{commonsense is wrong)',
3 not all three features constitute the commonsense conception
propositional attitudes {commonsense is misdescribed).
1. Wrong principles
This is the step that Stich and Warfield (1995) challenge, thereby rejecting the
Ramsey et al. conclusion to eliminativism. They see the problem as this: there
is no plausible way to bridge the gap between the fact that connectionism makes
folk psychology mistaken about the attitudes, to the conclusion that connec-
Criticisms oj the Connectionist Computational Theory 381
tionism shows the non-existence of the attitudes. They propose and reject two
principles to underwrite the inference to eliminativism. First, there is:
(DTR)
The Description Theory of Reference. Theories contain terms that refer to
entities in virtue of satisfying some (perhaps weighted) set of descriptions
associated with each term.
Reply. Stich and Warfield reply that the dominant view in the philosophy
of language now is that the description theory is false, and that its rival, the
“historical-causal chain” theory “will prove to be the correct account of the
reference of most theoretical terms” (1995: 406). On this rival account, terms
get their reference in virtue of historical-causal chains leading back to the enti¬
ties they refer to (though mental terms are rarely discussed by the authors
cited), and it is not necessary for speakers or thinkers to know a correct descrip¬
tion of the thing referred to (maybe everything one knows about, say, Julius
Caesar is wrong). If this is the way propositional attitude terms work, then dis¬
covering that there is no state which can be described in terms of modular
propositional attitudes will not show that the system does not have proposi¬
tional attitudes — just that propositional attitudes do not have those described
properties, and that we were mistaken. However, it is still controversial that
“most theoretical terms” work this way, since some at least seem to be intro¬
duced by description. Moreover, the work on theoretical terms has mostly been
restricted to physics and biology - there is no guarantee psychological (mental)
terms will be as well behaved. Finally, it is not clear we want to assimilate folk
psychological notions to theoretical terms in science.
(CP)
Constitutive Properties Some properties are constitutive of, or essential for,
being the thing it is. If nothing in the system has those properties, then the
thing those properties constitute does not exist in that system.
For instance, having 79 protons (having the atomic number 79) is one of the
constitutive properties of being gold, and if we find something with 92 protons,
we know it is uranium, not gold in disguise. Likewise, if no state of a connec¬
tionist network is constituted by the three properties of being a modular propo¬
sitional attitude, then the system does not contain commonsen.se propositional
attitudes.
382 Connectionist Computational Theory oj Mind
Reply. Stich and Warfield reply, first, that it is very difficult to make a case
that certain properties rather than others are constitutive of something — just
thinking they are is not enough. Second, Quine and others have impressively
questioned the whole idea that there are constitutive properties, or in the lin¬
guistic mode, that there are properties that are “analytically” connected to
referring terms — properties that the referents must have for the term to refer
to them. The weight of tradition is against such an idea and “if there are
philosophers who choose to follow this path, we wish them well. But we don’t
propose to hold our breath until they succeed” (1990: 409). But we might add
that the presence or absence of essential features is controversial, and not much
of this work has been carried out in the domain of psychological (mental)
notions.
(DFD)
“if the posits of the new theory strike us as deeply and fundamentally differ¬
ent from those of the old theory . . . then it will be plausible to conclude that
the theory change has been a radical one, and that an eliminativist conclu¬
sion is in order.” (ibid.: 96; emphasis added)
This does seem reasonable on the face of it, so it looks like connectionist
networks contain no “modular” propositional attitudes. What should we con¬
clude concerning eliminativism.^ Could connectionist models lack “modular”
states and still have propositional attitudes.^ We still have two responses to
consider.
2. Commonsense is wrong
Ramsey et al. (1991) contend that all three features of propositional modular¬
ity are part of our folk notion of the attitudes, and Smolensky (1995), for one,
finds only the first two in the models he investigates. Stich and Warfield
comment that if Smolensky’s future work does not uncover the third feature,
causal role, then his constructions “will not count as an adequate response to
Ramsey et al.’s challenge”. But if one were to deny that causal role is either (a)
Criticisms of the Connectionist Computational Theory 383
part of the folk notion of the attitudes, or (b) a part of the attitudes {simpliciter),
then the Ramsey et al. challenge would have to be modified.
The score
We have rehearsed a dispute regarding the status of propositional attitudes in
connectionist models. The arguments are supposed to show that connection-
ism is false because it entails that there are no propositional attitudes as
commonly understood (when it seems that there are). We have seen that
each argument has its strengths and weaknesses. It seems presently to be a
draw - neither side has established its case. The crucial issue seems to be the
distinctive causal vs. explanatory role of the attitudes, and that issue is
not yet resolved.
(SCDS)
Units, and set of units, represent what activates them (or activates them
sufficiently)
Word level
I .etter level
Feature level
Figure 13.5 Letter recognition network (from McClelland and Rumelhart, 1981: 380,
figure 3; reproduced by permission of the American Psychological Asstxriation)
Right cause
the relevant stimuli. McClelland and Rumelhart note two things: first, “each
feature is detected with some probability p . . . [w hich will] . . . vary w ith the
visual quality of the display” (1981: 381). Second, “we identify nodes accord¬
ing to the units [in the stimuli] they detect” (1981: 379).
There are no real output nodes in this model. Rather, the output is read off
of the results of temporally integrating the pattern of activation on all nodes
in accordance with a particular rule. This gives the response strength: “each cycle
for which output is requested, the program computes the probability that the
correct alternative is read out and the probability that the incorrect alternative
is read out, based on the response strength” (1981: 407).
What about letter and word nodes — the “hidden” units? These are activated
as a function of the activation of each of the nodes connected to them. Con¬
sider the letter “T”; it is activated by the simultaneous activation of a center
vertical edge detector and an upper horizontal edge detector, focused on the
same area of the “retina.”'* So what activates the “T” node is the joint (tem¬
poral and spatial) activation of the vertical and horizontal edge detectors. The
same goes for word nodes, but in spades. In this case the node is activated by
four letter-position node pairs such as: “T” in the first position, “A” in the
second position, “K” in the third position, “E” in the fourth position, for
“TAKE.”
By (SCDS) these nodes are all about what activates them, and what acti¬
vates them is earlier nodes. But what we want the theory to say, what is correct,
is that letter nodes detect letters in the world, and word detectors detect words
in the world. What is being detected are letter and w'ords our there, not earlier
nodes in the flow’ of information. The theory in which the model is embedded
must find a way of handling the transitivity of “activates” - it must find the
right cause of the node’s activation. This the right cause problem for connec¬
tionist systems.
Similarly for Rumelhart and McClelland’s (1986b) past tense learner, except
that here the A units represent word roots ad the B units represent past tense
forms, both in their Wickelfeature representations. Fixed decoding networks
convert these from and to phonological representations. The intended inter¬
pretation of the representations is clearly auditory (Rumelhart and McClel¬
land do not say how the network was stimulated, but most likely it w as from a
keyboard). The components of each Wickelfeature are complex, and Rumel¬
hart and McClelland engineered them to some extent for computational
expediency. However, the basic point is that the features they are made from
mark 10 points on four “sound” dimensions: (1) interrupted consonants vs.
continuous consonants vs. vowels; (2) stops vs. na.sals, fricatives vs. sonorants,
high vs. low; (3) front, middle back; (4) voiced vs. unvoiced, long vs. short
(see figure 13.6).
386 Connectionist Computational Theory oj Mind
Place
Interrupted Stop b p d t g k
consonant Nasal m - n - N -
Figure 13.6 Features in Wickelfeatures (from Rumelhart and McClelland, 1986a, vol. 2:
235, table 5; reproduced by permission of the MIT Press)
The question is: what are nodes which are dedicated to picking up such fea¬
tures, actually picking up? “Voiced” has to do with the vibration of the vocal
cords, but “front,” “middle,” and “back” have to do with the place of articu¬
lation in the mouth. Are these nodes shape-of-the-mouth detectors? Or should
the causal chain go back only to the feature of the sound (if there is one) that
these configurations of the mouth produce?^
No cause
As we noted earlier, the limiting case of the right-cause problem is the “no¬
cause” problem - how can we represent things which do not cause tokenings
of representations? These include logical notions, such as XOR, abstract enti¬
ties such as numbers, sets, and so forth, future entities such as our great-great
grandchildren, and perhaps even social roles such as being the mayor of New
York. Consider the XOR network of linear threshold units represented in
figure 13.7. This model has a hidden (“internal”) unit without which XOR
could not be computed, but there is no obvious interpretation to be assigned
to them at all, let alone a detector semantics. Its meaning is more a function of
the role it plays in the network than what turns it on.
Criticisms of the Connectionist Computational Theory 387
Figure 13.7 An XOR network (from Rumelhart, 1989: 151; reproduced by permission of
the MIT Press)
The next problem, as we saw in our discussion of what the frog’s eye tells the
frog’s brain (chapter 4), was made famous by Fodor as the disjunction problem.
For connectionist systems it might go like this. Suppose we define a “T” detec¬
tor as a node that is on just when it is exposed to the letter “T.”^ However, like
all devices it occasionally gets things wrong - it occasionally fires when it gets,
say, an “I.” But now our T detector is functioning like an I detector, which is
to say it is overall a T-or-I detector. Thus it would seem that misrepresenta¬
tion is impossible; a potential case of misrepresentation is converted by the
theory into the veridical representation of a disjunction. Not at all what we
wanted.
Conclusion
Where does this leave us? Let’s call a semantics for a system derivative if it
requires another system, which already has a semantics, to give the first system
its semantics. Let’s call the semantics of a system non-derivattve if no other
semantical system is required. Digital computers typically have derivative
Criticisms of the Connectionist Computational Theory 389
CCTM models share some of the problems with the DCTM, including the
problems of consciousness and qualia, but they also currently have various spe¬
cific liabilities including the following:
Notes
1 Recall that models do not model every aspect of what they model - a model of the
solar system might be made out of brass.
2 Though changing one might affect others - learning that Reno, Nevada, is west of
Los Angeles, California, might change one’s beliefs about the geography of Cali¬
fornia and Nevada.
3 They actually call it “a connectionist model of memory” and usually refer to its
contents as “information” not “beliefs.” However, if it is not a model of connec¬
tionist belief, it is irrelevant to the issue of the attitudes.
4 If they were not “focused” on the same area of the “retina,” then the simultane¬
ous presentation of “ | ” and “—” would activate the “T” detector even though
separately they do not constitute a “T.”
390 Connectionist Computational Theory oj Mind
5 Answering these questions involves hard, and to some extent unresolved problems
in acoustic and articulatory phonetics. Detector semantics is partly an empirical
issue - that’s part of its attraction.
6 The phrase “just when” includes the idea of causation - this is a causal theory.
7 What a thing actually detects doesn’t exhaust its representational potential - we
must count what it ipou/d detect in the right (that’s the hard part) counterfactual
circumstances.
8 A qualification may be required for thinking in a public language where what the
words are about is fixed by the linguistic community, or experts in the linguistic
community.
Study questions
The brain
What are some major differences between typical connectionist models and the
brain? [hint: neurons vs. units, chemistry, geometry, learning, scale]
Lures of connectionism
Were there any “lures” of connectionism not covered by Fodor and Pylyshyn’s
five groups? What are some examples?
W hat are some of the main problem areas for connectionist models?
Propositional attitudes
What are the two major strategies for replying to these arguments?
392 Connectionist Computational Theory oj Mind
Detector semantics
What are some problems that the CCTM shares with the DCTM?
Suggested reading
For basics on neurons see selected readings for chapter 3, and Arbib (1995). Kosslyn
(1983), chapter 2, is a semi-popular introduction to connectionist computation and the
brain. For more on neurons and connectionist networks see Crick and Asanuma (1986),
Schwartz (1988), Churchland (1989; 1990, section 1.4), Copeland (1993b), chapter 10.5,
McLeod et al. (1998), chapter 13, and Dawson (1998), chapter 7.IV. For a recent discus¬
sion of biologically realistic network models of brain functioning see Rolls and Treves
(1998), and for a recent discussion of connectionist neuroscience see Hanson (1999).
Lures of connectionism
Chinese gym
Churchland and Churchland (1991) reply in pa.ssing to the Chinese gym argument, but
see Copeland (1993a; 1993b, chapter 10.6) and Dennett (1991) for a more sustained
Criticisms oj the Connectionist Computational Theory 393
discussion. Both Smolensky (1988a) and Franklin and Garzon (1991) speculate that
“neural networks may be strictly more powerful than luring machines, and hence
capable of solving problems a Turing machine cannot.” We return to this issue in the
next chapter.
Propositional attitudes
Our presentation of the CCTM and eliminativism follows the discussion in Ramsey,
Stich, and Garon (1991), Clark (1990; 1993, chapter 10), and Stich and Warfield (1995).
See Smolensky (1995) for another, more technically demanding, reponse to Ramsey,
Stich, and Garon (1991) and propositional modularity. Our discussion is indebted to
Braddon-Mitchell and Jackson (1996), chapters 3 and 14, and Rey (1997), chapter 7.
The locus classtcus for eliminativism in folk psychology is Churchland (1981). For a
short introduction to issues in folk psychology see von Eckhardt (1994) and Church-
land (1994) and the many references therein. For recent work see Greenwoood (1991),
Christensen and Turner (1993), Stich (1996) chapter 3, and Carruthers and Smith
(1996).
Detector semantics
See Fodor (1984, 1987, 1990) for more on the “disjunction” problem. See Dretske
(1986a) and references therein for more on representation-as. See Ramsey (1992) for
discussion of SCDS.
For an early survey of issues in the CCTM see Rumelhart and McClelland (1986a),
chapters 1 and 26. General discussion of problems and prospects of connectionism can
be found in Rumelhart and McClelland (1986a), chapter 4; Min.sky and Papert (1969);
the epilogue to the 1988 reprinting; Rosenberg (1990a,b); and Quinlan (1991), chapter
6. Some early empirical criticisms of connectionism are Massaro (1988) and Ratcliff
(1990).
Coda: Computation for
Cognitive Science, or What IS
a Computer, Anyway?
C.l Introduction
(CTM)
(a) Cognitive states are computational relations to mental representations which
have content.
(b) Cognitive processes (changes in cognitive states) are computational
operations on mental representations which have content.
One place to start is with already-existing definitions, and there are two
major sources of definitions of computers and computation: dictionaries and
textbooks.
Dictionary
There are many problems with these definitions: the first suffers the above-
mentioned problem of circularity, and the second rules out most or all people
as capable of computation. Just think of non-calculating persons, or people
who can’t carry out repetitious and highly complex mathematical operations
at any, let alone high, speeds.
Textbook
Problems with this definition include the fact that it depends on prior defini¬
tions of programs and machines, and in fact defines a computation in their
terms (and it makes a computer carrying out a computation equivalent to a
UTM).
Figure C.I Structure of a paradigmatic symbol system (from Newell, 1980; 45, figure 2;
figures C.1-C.3 reproduced by permission of Cognitive Science)
von Eckhardt
A more general but still somewhat commonsensical definition has been offered
by von Eckhardt: “A computer is a device capable of automatically inputting,
storing, manipulating, and outputting information in virtue of inputting,
storing, manipulating, and outputting representations of that information.
These information processes occur in accordance with a finite set of rules that
are effective and that are, in some sense, in the machine itself (1993: 114). This
is a vast improvement over the first definitions in that it recognizes that sophis¬
ticated number crunching is not essential to computation. It also recognizes the
role of procedures and representations. However, it uses notions like “storing”
and “manipulating” information that need explication and could easily, like the
398 Coda
Figure C.2 Operators of the symbol system (from Newell, 1980: 46, figure 3)
Newell definition, become too specific for computers in general. Also, the char¬
acterization needs to make it clear that the “rules” need not be explicitly stored
in the memory of the machine as a program. We return to this issue in section
C.4. In sum, as a characterization of the kind of device that might be like us, it
is probably close to being an accurate reflection of the standard digital compu¬
tational model of the mind. But it is too close; we are currently after something
even more general, not just what is a computer - that it may resemble us — but
what is a computer simpliciter}
If it is not a program;
Then the result is the expression itself.
If it is a program:
Interpret the symbol of each role for that role;
Then execute the operator on its inputs;
Then the result of the operation is the result.
If it is a new expression:
Then interpret it for the .same role.
Figure C.3 The control operations of the symbol system (from Newell, 1980: 48, figure 4)
Fodor
Si ^ S2 ^ Sn
F,
And to each of these subsequent representational states we assign a formula:
Si —> S2 —> Sn
Fi -> F2 ^ F„
UpM Up _^
In this case the sequence of states constitutes a “proof” of “Q^’ from “P” and
“if P then Q”. This proposal says nothing about the state transitions of the
machine other than that they respect the desired semantic relations between
formulae, i.e., if “P” and “P —> (^’ are true, then “Q]’ must be true. Fodor also
doesn’t say why it is a part of the notion of a computer that its output should
always preserve some desired semantic relation between formulae - especially
when he illustrates this with the notion of a “proof,” which is not a semantic
notion at all, but a syntactic one (e.g. a string of formulae is a proofm an
axiomatic system if it begins with an axiom and each subsequent formula is
either an axiom or is the result of applying an inference rule to an axiom or a
previously derived formula).
Marr
light of its beliefs” (ibid.: 17). This corresponds to Mart’s level of com¬
putational theory.
Pylyshyn
This idea, that computers are devices with three proprietary levels of descrip¬
tion, has been canonized by Pylyshyn (1989: 57) as the “classical view,”
although he modifies the levels a bit. According to Pylyshyn, computers (and
minds, if they are computers) must have at least the following three distinct
levels of organization:
1 The semantic level (or knowledge level) At this level we explain why
appropriately programmed computers do certain things by saying what
they know, and what their goals are by showing that these are connected
in certain meaningful or even rational ways.
2 The symbol level The semantic content of knowledge and goals is
assumed to be encoded by symbolic expressions. Such structured
expressions have parts, each of which also encodes some semantic
content. The codes and their structures, as well as the regularities by
which they are manipulated, are another level of organization of the
system.
3 The physical level For the entire system to run, it has to be realized
in some physical form. The structure and the principles by which the
physical object functions corresponds to the physical level.
This captures to some extent Fodor’s concern that computers are symbol
systems whose physical description and semantic description go hand in hand
(levels 2 and 3). And it captures Marr’s concern that information processors
have a level of description having to do with the goals of the system and its
rationality. But the exact character of the second level is obscure. What, for
instance, are “the regularities by which they [structured expressions] are
manipulated”? Is this Marr’s algorithmic level?
Fodot^Marr^Pylyshyn
Pulling the three proposals together, along with our complaints about them,
yields a revised “levels of description” notion of a computer. We want it to
have at least three levels of description: a physical level (hardware), a structural
(formal syntactic) level (algorithmic), and a semantic level. Optionally, we also
Computation for Cognitive Science 403
want to be able to evaluate the device from the point of view of its goals and
the rationality of its strategies for achieving these;
Some of the above worries about each species of definition of a computer can
perhaps be met by combining the strong points of each into a single charac-
404 Coda
Notice that our characterization of a computer has the virtue that it does not
require or forbid it to be made out of any particular kind of stuff, but only
requires that the stuff be causally sufficient to move the machine from state to
state. Our characterization also does not require the machine to have any par¬
ticular architecture, or to have any particular operations on representations. It
is completely general up to its physicality. And if we understood what non¬
physical causation might be like, we could reformulate condition (a) in even
more general terms:
2(a') causal description under which it moves from state to state in accor¬
dance with causal laws.
These levels form an “isa” hierarchy (see chapter 7) from level 3 to level 1 in
the sense that any system that falls under level 3 falls under level 2, and any
system that falls under level 2 falls under level 1, but not necessarily conversely.
Now let’s turn to a more precise characterization of these levels:
Level 1
Level 2
Level 3
A system S uses program P to govern its execution ofP (to compute a function F)
if and only if (i) there is a 1-1 program realization function which maps the
instructions of P on to states of S; (ii) there is a set of control states such that
S computes Flj because it is in a control state which determines that FI, is
computed. Were S in a different control state, S might have computed a
different FI.
though, insofar as it suggests that the ‘software’ is something other than ‘hard¬
ware,’ something other than a physical part or feature of the system. These
suggestions of the colloquial terms are incorrect. The present account makes
clear the commitment to an actual physical and causally efficacious realization
of any program used by any physical system” (1983: 393).
the organization of nodes and connections is structural, and the stuff out
of which it is made that determine its activation-passing properties and its
connection strength properties (e.g. the chemistry of a synapse or the
diameter of a neuron) are physical. We can summarize these observations as
follows:
Functional
Descriptive
symbolic: vectors
structural: activation-passing rules, connection strengths, vector
multiplication
physical: nodes and connections
Given this similarity, it is not surprising that a digital machine can be seen
as just a limiting case, a very special and “unnatural” configuration, of a
connectionist machine: where the connections are limited to about four per
unit, where every unit passes on just two values of activation: 0, 1, etc.
That is, we build a system out of McCulloch and Pitts “neurons,” and w'e
build McCulloch and Pitts “neurons” out of connectionist nodes by not letting
them do anything continuous and by regimenting their connections. For
instance:
Given the Church-Turing thesis (see chapter 6) that any computable function
is Turing machine computable this shows that “any computable function can
be computed via a suitable neural network” (1991: 127).
Finally, Siegelmann and Sontag (1992) proved that “one may simulate all. . .
Turing machines by nets [recurrent networks] using only first-order (i.e. linear)
connections and rational weights” (1992: 440). They go on to note that “a
consequence of Theorem 1 is the existence of a universal processor net,
which upon receiving an encoded description of a recursively computable
partial function (in terms of a Turing Machine) and an input string, would
do what the encoded Turing Machine would have done on the input string”
(1992: 448). Siegelmann and Sontag go on to estimate that given Minsky’s
universal Turing machine (see chapter 6) with 4 letters and 7 control states,
there exists “a universal net with 1,058 processors. . . . Furthermore, it is
quite possible that with some care in the construction one may be able to
drastically reduce this estimate” (1992: 449). Kilian and Siegelmann (1993)
generalized the result of Siegelmann and Sontag from networks using a spe-
410 Coda
Notes
1 This definition is typical, and is repeated, for instance, in Clark and Cowell
(1976).
2 We also noted, and rejected, definitions of computers such as Newell’s in terms of
being equivalent to universal Turing machines.
3 Notice that “manipulating” symbols does not entail operations on them, nor does
it entail manipulating them with “rules” stored in the computer’s memory. It only
involves creating, changing or destroying symbols, and connectionist processes do
all three of these.
Study questions
Suggested reading
The material reviewed in this chapter is from the non-technical literature on com¬
putation. There is also a technical literature on computation from the perspective
of mathematics, logic, and computer science, which is beyond the scope of this
discussion.
Computation
See Newell and Simon (1972), and Newell and Simon (1976) for more on physical
symbol systems. Von Eckhardt (1993), chapter 3: The computational assumption,
and Glymour (1992), chapter 12: The computable, are good chapter-length non-
mathematical introductions to the topic, as is Copeland (1996a). At a more technical
level, Minsky (1967) and Davies (1958) are classic texts, and Boolos and Jeffrey (1989)
relate computation to logic. There are many good contemporary texts on computation.
One that has been used in cognitive science (see for instance Stabler, 1983) is Clark and
Cowell (1976). Odifreddi (1989), chapter 1, is an excellent but lengthy (over 100 pages)
survey of computability and recursiveness. See Hornik et al. (1989) for a proof that
multilayer feedforward networks can approximate a universal Turing machine as close
as one wishes, and Franklin and Garzon (1991) for a demonstration that “any com¬
putable function can be computed via a suitable neural net.”
Is everything a computer?
Putnam (1988) and Searle (1990b, 1992, chapter 9) are the contemporary focus for the
idea that the standard definition of a computer makes everything a computer. See Goel
(1992), Chalmers (1996a), Chalmers (1996b), chapter 9, Copeland (1996a) and Harnish
(1996) for further discussion.
Bibliography
Akmajian, A., Demers, R., Farmer, A., and Harnish, R. (1995), Linguistics: An Intro¬
duction to Language and Communication, 4th edn, Cambridge, MA; MI'I' Press.
Anderson, J. (1983), The Architecture of Cognition, Cambridge, MA: Harvard Univer¬
sity Press.
Anderson, J. (1993), Rules of the Mind, Hillsdale, NJ: Lawrence birlbaum.
Anderson, J. (1995), Introduction to Neural Networks, Cambridge, MA: MI'I' Press.
Anderson, J., and Bower, G. (1974), Human Associative Memory, New York:
Hemisphere.
Anderson, J., and Rosenfeld, E. (eds) (1988), Neurocomputing, Cambridge, MA: MI'I’
Press.
Anderson, J., and Rosenfeld, E. (eds) (1998), Talking Nets: An Oral History of Neural
Networks, Cambridge, MA: MI'I' Press.
Angell, J. R. (1911), Usages of the terms mind, consciousness and soul. Psychological
Bulletin, 8: 46-7.
Arbib, M. (ed.) (1995), Handbook of Brain Theory, Cambridge, MA: Bradford/MI'P
Press.
Aristotle, On memory and reminiscence. In R. McKeon (ed.) (1941), The Collected
Works of Aristotle, New York: Random House.
Armstrong, D. M. (1999), The Mind-Body Problem: An Opinionated Introduction,
Boulder, CO: Westview Press.
Aspry, W. (1990),_7o/t« von Neumann and the Origins of Modern Computing, Ciambridge,
MA: MIT Press.
Aspry, W., and Burks, A. (eds) (1987), Papers of John von Neumann on Computers and
Computer Theory, Cambridge, MA: Ml'F Press.
Atlas, J. (1997), On the modularity of sentence processing: semantical generality and
the language of thought. In J. Nuyts and E. Pederson (eds) (1997), Language and
Conceptualization, Cambridge: Cambridge University Press.
Augarten, S. (1984), Bit By Bit: An Illustrated History of Computers, New York: I'icknor
and Fields.
Baars, B. (1986), The Cognitive Revolution in Psychology, New York: Guilford Press.
Bain, A. (1855), The Senses and the Intellect, 4th edn. New York: D. Appleton (1902).
Bain, A. (1859), The Emotions and the Will, 4th edn. New York: D. Appleton.
414 Bibliography
Baker, L. Rudder (1987), Saving Belief, Princeton, NJ: Princeton University Press.
Ballard, D. (1997), An Introduction to Neural Computation, Cambridge, MA:
Bradford/MIT Press.
Bara, B. (1995), Cognitive Science: A Developmental Approach to the Simulation of the
Mind, Hillsdale, NJ: Lawrence Erlbaum.
Barr, A., Cohen, P, and Feigenbaum, E. (eds) (1981), The Handbook of Artificial
Intelligence, vols 1-3, Los Altos, CA: William Kaufmann.
Barsalou, L. (1992), Cognitive Psychology: An Overview for Cognitive Scientists,
Hillsdale, NJ: Lawrence Erlbaum.
Barwise, J., and Etchemendy, J. (1999), Turing’s World 3.0: An Introduction to Com¬
putability Theory, Cambridge: Cambridge University Press.
Baumgartner, P, and Payr, S. (eds) (1995), Speaking Minds: Interviews with Twenty
Eminent Cognitive Scientists, Princeton, NJ: Princeton University Press.
Beakley, B., and Ludlow, P. (eds) (1992), The Philosophy of Mind: Classical Problems,
Contemporary Issues, Cambridge, MA: Bradford/MIT Press.
Beatty, J. (1995), Principles of Behavioral Neuroscience, Dubuque: Brown and
Benchmark.
Bechtel, W. (1985), Contemporary connectionism: are the new parallel distributed
processing models cognitive or as.sociationist.^ Behaviorism, 13(1): 53-61.
Bechtel, W. (1994), Connectionism. In S. Guttenplan (ed.), A Companion to the
Philosophy of Mind, Cambridge, MA: Blackwell.
Bechtel, W., and Abrahamsen, A. (1991), Connectionism and the Mind, Cambridge, MA:
Blackwell.
Bechtel, W., and Graham, G. (eds) (1998), A Companion to Cognitive Science, Cam¬
bridge, MA: Blackwell.
Beckermann, A., Flohr, H., and Kim, J. (eds) (1992), Emergence or Reduction? Essays
on the Prospects of Nonreductive Physicalism, New York: Walter de Gruyter.
Bever, T. (1992), The logical and extrinsic sources of modularity. In M. Gunnar and
M. Maratsos (eds) (1992), Modularity and Constraints in Language and Cognition,
Hillsdale, NJ: Lawrence Erlbaum.
Block, H. (1962), The Perceptron: a model for brain functioning. I, Review of Modern
Physics 34: 123-35. Reprinted in Anderson and Rosenfeld (1988).
Block, N. (1978), Troubles with functionalism. In C. W. Savage (ed.) (1978), Percep¬
tion and Cognition: Issues in the Foundations of Psychology, Minneapolis: University
of Minnesota Press.
Block, N. (ed.) (1981), Readings in the Philosophy of Psychology, vol. 1, Cambridge, MA:
Harvard University Press.
Block, N. (1983), Mental pictures and cognitive science. Philosophical Review, 90:
499-541.
Block, N. (1986), Advertisement for a semantics for psychology. In French et al.
(1986).
Block, N. (1990), The computer model of the mind. In D. Osherson and E. Smith (eds)
(1990), An Invitation to Cognitive Science, vol. 3: Thinking, Cambridge, MA:
Bradford/MIT Press.
Bibliography 415
Burks, A., Goldstine, H., and von Neumann, J. (1946), Preliminary discussion of
the logical design of an electronic computing instrument. In Aspry and Burks
(1987).
Cajal, R. y (1909), New Ideas on the Structure of the Nervous System in Man and
Vertebrates, Cambridge, MA: Bradford/MIT Press.
Carnap, R. (1937), The Logical Syntax of Language, London: Routledge and Kegan
Paul.
Carpenter, B., and Doran, R. (1977), The other Turing machine. Computer Journal,
20(3): 269-79.
Carpenter, B., and Doran, R. (eds) (1986), A. M. Turing’s ACE Report of 1946 and
Other Papers, Cambridge, MA: MU' Press.
Carruthers, R, and Smith, P. (eds) (1996), Theories of Theories of Mind, Cambridge:
Cambridge University Press.
Caudill, M., and Butler, C. (1993), Understanding Neural Networks, vols 1 and 2,
(Cambridge, MA: Bradford/MIT Press.
Chalmers, D. (1992), Subsymbolic computation and the Chinese room. In Dinsmore
(1992).
Chalmers, D. (1995a), On implementing a computation. Minds and Machines, 4:
391^02.
Chalmers, D. (1995b), The puzzle of conscious experience. Scientific American, 273:
80-6.
Chalmers, D. (1995c), Facing up to the problem of consciousness. Journal of
Consciousness Studies, 2(3): 200-19. Reprinted in Cooney (2000).
Chalmers, D. (1996a), Does a rock implement every finite-state automaton.^ Synthese,
108: 309-33.
Chalmers, D. (1996b), The Conscious Mind, Oxford: Oxford University Press.
Chater, N., and Oaksford, M. (1990), Autonomy, implementation and cognitive archi¬
tecture: a reply to F'odor and Pylyshyn, Cognition, 34: 93-107.
Cherniak, E., and McDermott, D. (1985), Introduction to Artificial Intelligence, Reading,
MA: Addison Wesley.
Chisley, R. (1995), Why everything doesn’t realize every computation. Minds and
Machines, 4: 403-20.
Chomsky, N. (1959), Review of Skinner, Verbal Behavior, Language, 35: 26-58.
Christensen, S., and Turner, D. (eds) (1993), Folk Psychology and the Philosophy of
Mind, Hillsdale, NJ: Lawrence Erlbaum.
Church, A. (1936), An un,solvable problem of elementary number theory, American
Journal of Mathematics, 58: 345-63. Reprinted in Davis (1965).
Churchland, P. M. (1981), Eliminative materialism and propositional attitudes,
of Philosophy, 78: 67-90.
Churchland, P. M. (1988), Matter and Consciousness, rev. edn, Cambridge, MA:
Brad ford/MIT Press.
Churchland, P. M. (1989), On the nature of explanation: a PDP approach. In
P. M. Churchland (1989), A Neurocomputational Perspective, Cambridge, MA:
Bradford/MIT Press. Reprinted with some changes in Ilaugeland (1997).
Bibliography 417
Cowan, J., and Sharp, D. (1988), Neural nets and artificial intelligence. Reprinted in
Graubard (1988).
Crane, T. (1995), The Mechanical Mind, New York: Penguin.
Crevier, D. (1993), AI: The Tumultuous History of the Search for Artificial Intelligence,
New York: Basic Books.
Crick, F. (1994), The Astonishing Hypothesis: The Scientific Search for the Soul, New
York: Touchstone/Simon and Schuster.
Crick, E, and Asanuma, C. (1986), Certain aspects of the anatomy and physiology
of the cerebral cortex. In J. McClelland and D. Rumelhart (eds) (1986), Parallel
Distributed Processing, vol. 2, Cambridge, MA: Bradford/MIT Press.
Cummins, R. (1986), Inexplicit information. In Brand and Harnish (1986).
Cummins, R. (1989), Meaning and Mental Representation, Cambridge, MA: Brad¬
ford/MIT Press.
Cummins, R. (1996), Representations, Targets, and Attitudes, Cambridge, MA: Brad¬
ford/MIT Press.
Cummins, R., and Cummins, D. (eds) (1999), Minds, Brains and Computers: The Foun¬
dations of Cognitive Science, Oxford: Blackwell.
Cussins, A. (1990), The connectionist construction of concepts. In Boden (1990).
D’Andrade, R. (1989), Cultural cognition. In Posner (1989).
‘ Davies, M. (1991), Concepts, connectionism and the language of thought. In Ramsey,
Stich, and Rumelhart (1991).
Davies, M. (1995), Reply: consciousness and the varieties of aboutness. In C. Mac¬
Donald and G. MacDonald (eds) (1995), Philosophy of Psychology: Debates on
Psychological Explanation, Cambridge, MA: Blackwell.
Davis, M. (1958), Computability and Unsolvability, New' York: McGraw-Hill.
Davis, M. (ed.) (1965), The Undecidable, Hewlett, NY: Raven Press.
Davis, S. (ed.) (1992), Connectionism: Theory and Practice, Oxford: Oxford University
Press.
Dawson, M. (1998), Understanding Cognitive Science, Oxford: Blackwell.
Dennett, D. (1978a), Skinner skinned. In Brainstorms, Montgomery, VT: Bradford
Books.
Dennett, D. (1978b), Towards a cognitive theory of consciousness. In Brainstorms,
Montgomery, VT: Bradford Books.
Dennett, D. (1985), Can machines think? In .M. Shafto (ed.) (1985), How We Know,
New York: Harper and Row.
Dennett, D. (1987a), The Intentional Stance, Cambridge, MA: Bradford/.MIT
Press.
Dennett, D. (1987b), Fast thinking. In The Intentional Stance, Ciambridge, M.\:
Bradford/MIT Pres.s.
Dennett, D. (1991), Consciousness Explained, London: Penguin.
Dennett, D. (1995), Darwin’s Dangerous Idea, London: Penguin.
Descartes, R. (1641), Meditations. In E. Haldane and G. Ross (1931), Philosophical
Works of Descartes, vol. 1, Cambridge: Cambridge University Press. Reprinted by
Dover Publications (1955).
Bibliography 419
Descartes, R. (1649), Passions of the Soul. In E. Haldane and G. Ross (1931), Philo¬
sophical Works of Descartes, vol. 1, (Cambridge; Cambridge University Press.
Reprinted by Dover Publications (1955).
Devitt, M. (1989). A narrow representational theory of mind. In S. Silvers (ed.) (1989),
Re Representations, Dordrecht: Kluwer.
Devitt, M. (1991), Why Fodor can’t have it both ways. In Loewer and Rey (1991).
Devitt, M. (1996), Coming to Our Senses, Cambridge: Cambridge University Press.
Dinsmore, J. (ed.) (1992), The Symbolic and Connectionist Paradigms, Hillsdale, NJ:
Lawrence Erlbaum.
Dretske, E (1981), Knowledge and the Flow of Information, Cambridge, MA: Brad¬
ford/Ml E Press.
Dretske, F. (1986a), Aspects of representation. In Brand and Harnish (1986).
Dretske, F. (1986b), Misrepresentation. In R. Bogdan (ed.) (1986), Belief: Form,
Content and Function, Oxford: Oxford University Press.
Dretske, F. (1993), Conscious experience. Mind, 102: 263-83.
Dretske, F. (1995), Naturalizing the Mind, Cambridge, MA: Bradford/MIT Press.
Dreyfus, 11. (1972/9), What Computers Can’t Do, rev. edn. New York: Harper and Row.
Dreyfus, H., and Dreyfus, S. (1988), Making a mind vs. modeling the brain: artificial
intelligence back at a branchpoint. Reprinted in Graubard (1988).
Dunlop, C., and P'etzer, J. (1993), Glossary of Cognitive Science, New York: Paragon
I lou.se.
Ebbinghaus, H. (1885), Memory: A Contribution to Experimental Psychology, Columbia
Teacher’s College [1913].
Edelman, G. (1989), The Remembered Present: A Biological Theory of Consciousness, New
York: Basic Books.
Elman, J. (1990a), Finding structure in time. Cognitive Science, 14: 213-52.
Elman, J. (1990b), Representation and structure in connectioni.st models. In G. Altman
(ed.) (1990), Cognitive Models of Speech Processing, Cambridge, MA: Bradford/MIT
Press.
Elman, J. (1992), Grammatical structure and distributed representations. In Davis
(1992).
Elman, J. (1993), Learning and development in neural networks: the importance of
starting small. Cognition, 48(1): 71-99.
Elman, J. et al. (1996), Rethinking Innateness, Cambridge, MA: Bradford/MIT
Press.
Eysenck, M., and Keane, M. (1990), Cognitive Psychology: A Student’s Handbook,
Hillsdale, NJ: Lawrence Erlbaum.
Fancher, R. E. (1979), Pioneers of Psychology, New York: W. W. Norton.
Feigenbaum, E., and Feldman, E. (eds) (1963), Computers and Thought, New York:
McGraw-Hill. Reissued (1995) Cambridge, MA: Bradford/MIT Press.
Feldman, J. (1989), Neural representation of conceptual knowledge. In Nadel et al.
(1989).
Feldman, J., and Ballard, D. (1982), Connectionist models and their properties. Cogni¬
tive Science, 6: 205-54.
420 Bibliography
Field, H. (1977), Logic, meaning and conceptual role. Journal of Philosophy, July:
379-409.
Fikes, R., and Nilsson, N. (1971), STRIPS: a new approach to the application of
theorem proving to problem solving. Artificial Intelligence, 2: 189-208.
Findler, N. (ed.) (1979), Associative Networks, New York: Academic Press.
Finger, S. (1994), Origins of Neuroscience: A History of Explorations into Brain Func¬
tion, Oxford: Oxford University Press.
Flanagan, O. (1991), The Science of the Mind, 2nd edn, Cambridge, MA: Brad-
ford/MlT Press.
Flanagan, O. (1992), Consciousness Reconsidered, Cambridge, MA: Bradford/MIT
Press.
Fodor, J. (1975), The Language of Thought, Cambridge, MA: Harvard University Press.
Fodor, J. (1980a), Methodological solipsism considered as a research strategy in cog¬
nitive science. The Behavioral and Brain Sciences, 3(1): 63-73. Reprinted in Fodor
(1981b).
Fodor, J. (1980b), Commentary on Searle, The Behavioral and Brain Sciences, 3: 431-2.
Fodor, J. (1981a), The mind-body problem. Scientific American, 244(1): 114-23.
Fodor, J. (1981b), Representations, Cambridge, MA: Bradford/MIT Press.
Fodor, J. (1983), Modularity of Mind, Cambridge, MA: Bradford/MIT Press.
Fodor, J. (1984), Semantics, Wisconsin style, Synthese, 59: 231-50. Reprinted in F'odor
(1990).
Fodor, J. (1985), Precis of The Modularity of Mind, The Behavioral and Brain Sciences,
8: 1-6. Commentary on Fodor (1985), The Behavioral and Brain Sciences, 8: 7-42.
Reprinted in Fodor (1990).
Fodor, J. (1986), The Modularity of Mind, and Modularity of Mind: Fodor’s Response.
In Z. Pylyshyn and W. Demopolous (eds) (1986), Meaning and Cognitive Structure,
Norwood, NJ: Ablex.
Fodor, J. (1987), Psychosemantics, Cambridge, MA: Bradford/MIT Press.
Fodor, J. (1989), Why should the mind be modular.^ In G. Alexander (ed.). Reflections
on Chomsky, Oxford: Blackwell. Reprinted in Fodor (1990).
Fodor, J. (1990), A Theory of Content, Cambridge, MA: Bradford/MIT Press.
Fodor, J. (1991), Replies. In Loewer and Rey (1991).
Fodor, J. (1994), The Elm and the Expert, Cambridge, MA: Bradford/MI'P Press.
Fodor, J. (1998), In Critical Condition, Cambridge, MA: Bradford/MIT Press.
Fodor, J., and Lepore, E. (1991), Why meaning (probably) isn’t conceptual role. Mind
and Language, 4. Reprinted in S. Stich and T. Warfield (eds) (1994), Mental Repre¬
sentation: A Reader, Oxford: Blackwell.
Fodor, J., and Lepore, E. (1992), Holism: A Shopper’s Guide, Cambridge, MA:
Blackwell.
Fodor, J., and Pylyshyn, Z. (1988), Connectionism and cognitive architecture: a criti¬
cal analysis. In S. Pinker and J. Mehler (eds). Connections and Symbols, Cambridge,
MA: Bradford/MIT Press.
Fonster, K. (1978), Accessing the mental lexicon. In E. Walker (ed.) (1978), Explorations
in the Biology of Language, Cambridge, MA: Bradford/.MIT Press.
Bibliography 421
Franklin, S., and Garzon, M. (1991), Neural computability. In O. Omidvar (ed.) (1991),
Progress on Neural Networks^ vol. 1, Norwood, NJ: Ablex.
Frege, G. (1979), BegriJJsschrift. Trans. T. Bynym (1979) as Concept Script, Oxford:
Oxford University Press.
Frege, G. (1892), On sense and reference. Reprinted in Harnish (1994).
Frege, G. (1918), Tbe thought: a logical inquiry. Mind 65: 289-311. Reprinted (with
correction) in Harnish (1994).
French, P. et al. (eds) (1986), Midwest Studies in Philosophy, X, Minneapolis: Univer¬
sity of Minnesota Press.
French, R. (1990), Subcognition and the Turing test. Mind, 99: 53-65.
Gall, F. (1835), On the Functions of the Brain and of Each of Its Parts, 6 volumes,
Boston, MA: Marsh, Capen, and Lyon.
Gandy, R. (1988), The confluence of ideas in 1936. In R. Herken (ed.) (1988), The Uni¬
versal Turing Machine: A Half Century Survey, Oxford: Oxford University Press.
Gardner, H. (1985), The Mind’s New Science, New York: Basic Books.
Garfield, J. (ed.) (1987), Modularity in Knowledge Representation and Natural Language
Understanding, Cambridge, MA: Bradford/MIT Press.
Garfield, J. (ed.) (1990), Foundations of Cognitive Science, New York: Paragon House.
Gazzaniga, M. (ed.) (1995), The Cognitive Neurosciences, Cambridge, MA: Brad¬
ford/MIT Press.
Gazzaniga, M., and LeDoux, J. (1978), The Integrated Mind, New York: Plenum Press.
Geschwind, N. (1979), Specialization of the human brain. In The Brain, San Francisco:
Freeman.
Glymour, C. (1992), Thinking Things Through, Cambridge, MA: Bradford/MIT Press.
Godel, K. (1931), [trans. from the original German as] “On formally undecidable
propositions of Principia Mathematica and related systems. In Davis (1965).
Goel, V. (1992), Are computational explanations vacuous.^ Proceedings of the
I4th Annual Conference of the Cognitive Science Society, Hillsdale, NJ: Lawrence
Erlbaum.
Goldberg, S., and Pessin, A. (1997), Gray Matters: An Introduction to the Philosophy of
Mind, Armonk, NY: M. E. Sharpe.
Goldman, A. I. (1993a), The psychology of folk psychology. The Behavioral and Brain
Sciences, 16(1): 15-28.
Goldman, A. I. (1993b), Consciousness, folk psychology, and cognitive science. Con¬
sciousness and Cognition, 2: 364—82.
Goldman, A. I. (ed.) (1993c), Readings in Philosophy and Cognitive Science, Cambridge,
MA: Bradford/MIT Press.
Goldman, A. I. (1993d), Philosophical Applications of Cognitive Science, Boulder, CO:
Westview Press.
Goschke, T, and Koppelberg, S. (1991), The concept of representation and the rep¬
resentation of concepts in connectionist models. In Ramsey, Stich, and Rumelhart
(1991).
Graubard, S. (ed.) (1988), The Artificial Intelligence Debate: False Starts, Real Founda¬
tions, Cambridge, MA: .MIT Press.
422 Bibliography
Hernstein, R., and Boring, E. (eds.) (1966), A Source Book in the History of Psychology,
Cambridge, MA; Harvard University Press.
Hewitt, C. (1971), Procedural embedding of knowledge in PLANNER. In Proceedings
oj the Second Joint Conference on Artificial Intelligence, pp. 167—82, London: British
Computer Society.
Hewitt, C. (1990), 'Fhe challenge of open systems. In D. Partridge and Y. Wilks (eds)
(1990), The Foundations of AI: A Sourcebook, Cambridge: Cambridge University
Press.
Hilgard, E. (1987), Psychology in America, New York: Harcourt Brace Jovanovich.
Hillis, D. (1985), The Connection Alachine, Cambridge, MA: Bradford/MIT
Press.
Hinton, G. (1992), How neural networks learn from experience. Scientific American,
September.
Hirschfeld, L., and Gelman, S. (eds) (1994), Mapping the Mind: Domain Specificity in
Cognition and Culture, Cambridge: Cambridge University Press.
Hirst, W. (ed.) (1988), The Making of Cognitive Science: Essays in Honor of George A.
Miller, Cambridge: Cambridge University Press.
Hobbes, T. (1651), Leviathan. Reprinted Indianapolis: Bobbs-Merrill (1958).
Hodges, A. (1983), Turing: The Enigma, New York: Simon and Schuster.
Hofstadter, D. (1979), Godel, Escher, Bach, New York: Basic Books.
Hofstadter, D. (1981), The Turing test: a coffee house conversation. In D. Hofstadter
and D. Dennett (eds) (1981), The Mind's I, New York: Basic Books.
Hopcroft, J., and Ullman, J. (1969), Formal Languages and Their Relation to Automata,
New York: Addison Wesley.
Horgan, T. (1994), Computation and mental representation. In S. Stich and T. Warfield
(eds) (1994), Mental Representation, Cambridge, MA: Blackwell.
Horgan, T, and Tienson, J. (eds) (1991), Connectionism and the Philosophy of Mind,
Dordrecht: Kluwer Academic.
Hornik, K. et al. (1989), Multilayer feedforward networks are universal approximators.
Neural Networks, 2: 359-66.
Horst, S. (1996), Symbols, Computation and Intentionality; A Critique of the Computa¬
tional Theory of Mind, Berkeley: University of California Press.
Hubei, D, and Wiesel, T. (1979), Brain mechanisms of vision. In The Brain, San
Francisco: Freeman.
Hume, D. (1739), A Treatise of Human Nature, Oxford: Oxford University Press (1880).
Hunt, M. (1993), The Story of Psychology, New York: Doubleday.
Ince, D. (ed.) (1992), Mechanical Intelligence: Collected Works of A. M. Turing,
Amsterdam: North-Holland.
Jackson, F. (1986), W hat Mary didn’t Vno'fi, Journal of Philosophy, 83: 291-5.
Jacob, P. (1997), What Minds Can Do, Cambridge: Cambridge University Press.
James, W'. (1890), The Principles of Psychology, New York: Dover.
James, W. (1892), Psychology (Briefer Course), New York: Holt.
Jeffrey, R. (1991), Formal Logic: Its Scope and Limits, 3rd edn. New York: McGraw-
Hill.
424 Bibliography
Lev ine, D. (1991), Introduction to Neural Cognitive Modeling, Hillsdale, NJ: Lawrence
Erlbaum.
Levine, J. (1983), Materialism and qualia: the explanatory gap. Pacific Philosophical
Quarterly, 64: 354-61.
Lindsay, R, and Norman, D. (1972), Human Information Processing: An Introduction to
Psychology, New York; Academic Press.
Lisker, L., and Abramson, A. (1964), A cross-language study of voicing of initial stops:
acoustical measurements. Word, 20: 384-422.
Loewer, B., and Rey, G. (eds) (1991), Meaning in Mind, Oxford: Blackwell.
Lormand, E. (1994), Qualia! (now playing at a theater near you). Philosophical Topics,
22(1 and 2): 127-56.
Luger, G. (ed.) (1995), Computation and Intelligence, Cambridge, MA: Bradford/MIT
Press.
Lycan, W. (ed.) (1990), Mind and Cognition, Oxford: Blackwell.
Lycan, VV. (1997), Consciousness as internal monitoring. In Block et al. (1997).
McClamrock, R. (1995), Existential Cognition: Computational Minds in the World,
Chicago: University of Chicago Press.
McClelland,}. (1981), Retrieving general and specific information from stored knowl¬
edge of specifics. Proceedings of the Third International Conference of the Cognitive
Science Society, Berkeley, 1981.
■McClelland,}., and Rumelhart, D. (1981), An interactive activation model of context
effects in letter perception: Part I. An account of basic findings. Psychological Review,
88(5): 375-407.
McClelland,}., and Rumelhart, D. (1988), Explorations in Parallel Distributed Process¬
ing, Cambridge, MA; Bradford/MIT Press.
McCorduck, P. (1979), Machines Who Think, San Francisco: W. H. Freeman.
McCulloch, G. (1995), The Mind and Its World, London: Routledge.
McCulloch, W. (1965), Embodiments of Mind, Cambridge, MA: MIT Press.
■McCulloch, W., and Pitts, W (1943), A logical calculus of the ideas immanent
in nervous activity. Reprinted in W. McCulloch (1965), and in Anderson and
Rosenfeld (1998).
McDermott, D. (1976), Artificial intelligence meets natural stupidity, SIGART
Newsletter, 57. Reprinted in Haugeland (1981).
McDermott, D. (1986), A critique of pure reason. Research Report YALEU/CSD/RR
no. 480.
MacDonald, C., and MacDonald, G. (eds) (1995), Connectionism: Debates on Psycho¬
logical Explanation, vol. 2, Oxford: Blackwell.
McGinn, C. (1982), The structure of content. In A. Woodfield (ed.) (1982), Thought
and Object, Oxford: Oxford University Press.
McGinn, C. (1991), The Problem of Consciousness, Cambridge, MA: Blackwell.
Machlup, E, and Mansfield, U. (eds) (1983), The Study of Information: Interdisciplinary
Messages, New York: Wiley.
■McLaughlin, B^ (1993), The connectionism/classici.sm battle to win souls. Philosophi¬
cal Studies, 71;163-90^
426 Bibliography
Nadel, L., et al. (1986), 'I'he neurobiology of mental representation. In Hrand and
Harnish (1986).
Nadel, L., Cooper, L., Culicover, P., and Harnish, R. (eds) (1989), Neural Connections,
Mental Computation, Cambridge, MA: Brad ford/Mil' Press.
Nagel, T. (1974), What is it like to be a bat.^ Philosophical Review, 83: 435-50.
Nagel, T. (1993), What is the mind-body problem.? In Experimental and Theoretical
Studies in Consciousness, New York; Wiley.
Neisser, U. (1967), Cognitive Psychology, New York: Appleton, Century, Crofts.
von Neumann, J. (1945), First draft of a report on the EDVAC. In Aspry and Burks
(1987).
Newell, A. (1973), Production systems; models of control structures. In W. Chase (ed.).
Visual Information Processing, New York: Academic Press.
Newell, A. (1980), Physical symbol systems. Cognitive Science, 4(2): 135-83.
Reprinted in D. Norman (ed.) (1981), Perspectives in Cognitive Science, Norwood, NJ;
.Ablex.
Newell, A. (1983), Reflections on the structure of an interdiscipline. In Machlup and
Mansfield (1983).
Newell, A. (1990), Unified Theories of Cognition, Cambridge, MA: Harvard University
Press.
Newell, A., Rosenbloom, P., and Laird, J. (1989), Symbolic structures for cognition. In
Posner (1989).
Newell, A., and Simon, H. (1972), Human Problem Solving, hmglewood Cliffs, NJ:
Prentice-Hall.
Newell, A., and Simon, H. (1976), Computer science as an empirical inquiry. Com¬
munications of the ACM, 19(3): 113-26. Reprinted in Haugeland (1997).
Nietzsche, F". (1887), On the Genealogy of Morals, trans. (1967), New York: Vintage
Books.
NiLsson, N. (1965), Learning Machines, New York: McGraw-Hill. Reissued with an
introduction by T. Sejnowski as The Mathematical Foundations of Learning Machines,
San Mateo, CA: Morgan Kaufman (1990).
Norman, D. (1981), What is cognitive science? In D. Norman (ed.) (1981), Perspectives
on Cognitive Science, Norwood, NJ: Ablex.
Norman, D., and Rumelhart, D. (1975), Explorations in Cognition, San Franci.sco:
Freeman.
Odifreddi, P. (1989), Classical Recursion Theory, Amsterdam: North Holland.
O’Keefe, J., and Nadel, L. (1978), The Hippocampus as a Cognitive Map, Oxford: Oxford
University Press.
Osherson, D., et al. (eds) (1990, 1995), An Invitation to Cognitive Science, vols 1-3;
(1998) vol. 4, Cambridge, MA: Bradford/MI'l' Press, 1st, 2nd edns.
Papert, S. (1988), One Al or many? Reprinted in Graubard (1988).
Partridge, U. (1996), Representation of knowledge. In M. Boden (ed.) (1996),
Artificial Intelligence, New York: Academic Press.
Pavlov, I. (1927), Conditioned Refiexes, Oxford; Oxford University Press. Reprinted by
Dover B(H)ks.
428 Bibliography
Pavlov, I. (1928), Lectures on Conditioned Reflexes (trans. W. Horsley Gant), New York:
Liveright.
Penrose, R. (1989), The Emperor’s New Mind, New York: Penguin Books.
Pessin, A., and Goldberg, S. (eds) (1996), The Twin Earth Chronicles^ New York: M. E.
Sharpe.
Plunkett, K., and Elman, J. (1997), Exercises in Rethinking Innateness, Cambridge, .MA:
Bradford/MIT Press.
Pohl, I., and Shaw, A. (1981), The Nature oj Computation, Rockville, MD: Computer
Science Press.
Posner, M. (ed.) (1989), Foundations of Cognitive Science, Cambridge, MA:
Bradford/MIT Press.
Post, E. (1943), Formal reductions of the general combinatorial problem, American
Journal of Mathematics, 65: 197-268.
Putnam, H. (1960), Minds and machines. In S. Hook (ed.) (1960), Dimensions of Mind,
New York: New York University Press. Reprinted in Putnam (1975a).
Putnam, H. (1967), The nature of mental states. Reprinted in Putnam (1975a).
Originally titled “Psychological Predicates” and published in W. Capitain and D.
Merrill (eds) (1967), Art, Mind, and Religion, Pittsburgh: University of Pittsburgh
Press.
Putnam, H. (1975a), Philosophical Papers, vol. 2, Cambridge: Cambridge University
Press.
Putnam, H. (1975b), The meaning of meaning. Reprinted in Putnam (1975a), and in
Harnish (1994).
Putnam, H. (1981), Brains in a vat. In Reason, Truth and History, Cambridge:
Cambridge University Press.
Putnam, H. (1988), Representation and Reality, Cambridge, MA: Bradford/MIT Press.
Pylyshyn, Z. (1979), Complexity and the study of artificial and human intelligence.
Reprinted in Haugeland (1981).
Pylyshyn, Z. (1983), Information science: its roots and relations as viewed from the
perspective of cognitive science. In Machlup and Mansfield (1983).
Pylyshyn, Z. (1984), Computation and Cognition, Cambridge, MA: Bradford/MIT Press.
Pylyshyn, Z. (1989), Computing in cognitive science. In Posner (1989).
Quinlan, P. (1966), Semantic Memory, Report AFCRL-66-189, Bolt Beranek and
Newman, Cambridge, MA.
Quinlan, P. (1968), Semantic memory. In M. Minsky (ed.) (1986), Semantic Informa¬
tion Processing, Cambridge, MA: MIT Press. Reprinted in Collins and Smith (1988).
Quinlan, P. (1991), Connectionism and Psychology, New York: Harvester-Wheatsheaf
Ramsey, W. (1992), Connectionism and the philosophy of mental representation. In S.
Davis (ed.) (1992), Connectionism: Theory and Practice, Oxford: Oxford University
Press.
Ramsey, W., Stich, S., and Garon, J. (1991), Connectionism, eliminativism, and
the future of folk psychology. In Ramsey, Stich, and Rumelhart (1991). Also in
Grecnw(H)d (1991). Reprinted in MacDonald and .Macdonald (1995), and Stich
(1996).
Bibliography 429
Ramsey, W., Stich, S., and Rumelhart, D. (eds) (1991), Philosophy and Connectionist
Theory, Hillsdale, NJ: Lawrence Erlbaum.
Ratcliff, R. (1990), Connectionist models of recognition memory: constraints imposed
by learning and forgetting functions. Psychological Review, 97(2): 285-308.
Rey, G. (1986), What’s really going on in Searle’s Chinese room.’. Philosophical Studies,
50: 169-85.
Rey, G. (1997), Contemporary Philosophy of Mind, Oxford: Blackwell.
Rich, E. (1983), Artificial Intelligence, New York: McGraw-Hill.
Rolls, E., and Treves, A. (1998), Neural Networks and Brain Function, Oxford: Oxford
University Press.
Rosenberg, J. (1990a), Connectionism and cognition, Acta Analytica, 6: 33-46.
Reprinted in Haugeland (1997).
Rosenberg, J. (1990b), Treating connectionism properly: reflections on Smolensky,
Psychological Research, 52: 163—74.
Rosenblatt, F. (1958), The perceptron: a probabilistic model for information storage
and organization in the brain. Psychological Review, 63: 386-408. Reprinted in J.
Anderson and E. Rosenfeld (eds) (1988).
Rosenblatt, F. (1962), Principles of Neurodynamics, Washington, DC: Spartan
Books.
Rosenthal, D. (1986), Two concepts of consciousness. Philosophical Studies, 49: 329-59.
Rosenthal, D. (1997), A theory of consciousness. In Block et al. (1997).
Rumelhart, D. (1989), The architecture of mind: a connectionist approach. In Posner
(1989). Reprinted in Haugeland (1997).
Rumelhart, D., and McClelland, J. (eds) (1986a), Parallel Distributed Processing, vols 1
and 2, Cambridge, MA: Bradford/MIT Press.
Rumelhart, D., and McClelland, J. (1986b), On learning the past tenses of English
verbs. In Rumelhart and McClelland (1986a), vol. 2, ch. 17.
Rumelhart, D., and Zipser, D. (1986), Feature discovery by competitive learning. In
Rumelhart and McClelland (1986a), vol. 1, ch. 5.
Russell, B. (1918), The philosophy of logical atomism. The Monist. Reprinted (1985)
La Salle, IL: Open Court.
Scarborough, D., and Sternberg, S. (eds) (1998), An Invitation to Cognitive Science, vol.
4, Cambridge, MA: Bradford/MIT Press.
Schank, R., and Abelson, R. (1977), Scripts, Plans, Goals and Understanding, Hillsdale,
NJ: Lawrence Erlbaum. Chs 1-3 reprinted in Collins and Smith (1988).
Schneider, W. (1987), Connectionism: is it a paradigm shift for psychology.’. Behavior
Research Methods, Instruments, and Computers, 19: 73—83.
Schwartz, J. (1988), The new connectionism: developing relations between neuro¬
science and artificial intelligence. In Graubard (1988).
Scott, D. (1967), Some definitional suggestions for automata theory. Journal of Com¬
puter and System Sciences, 1: 187-212.
Seager, W. (1999), Theories of Consciousness, New York: Routledge.
Searle, J. (1969), Speech Acts, Cambridge: Cjimbridge University Press.
Searle, j. (1979), What is an intentional state.’ tWjW, January: 74—92.
430 Bibliography
Searle, J. (1980), Minds, brains and programs. Behavioral and Brain Sciences, 3: 417-24.
Reprinted in Haugeland (1997).
Searle, J. (1983), Intentionality, Cambridge: Cambridge University Press.
Searle, J. (1990a), Consciousness, explanatory inversion, and cognitive science. Behav¬
ioral and Brain Sciences, 13(4): 585-642.
Searle, J. (1990b), Is the brain a digital computer?. Proceedings of the American Philo¬
sophical Association, 64(3): 21-37. Incorporated into Searle (1992), ch. 9.
Searle, J. (1991), Is the brain’s mind a computer program?. Scientific American, 262(1):
26-31.
Searle, J. (1992), The Rediscovery of the Mind, Cambridge, MA: Bradford/MIT Press.
Searle, J. (1997), The Mystery of Consciousness, New York: New York Review.
Sechenov, L. (1863), Reflexes of the Brain. Excerpted in R. Hernstein and E. Boring
(eds) (1965), A Source Book in the History of Psychology, Cambridge, M.\: Harvard
University Press.
Segal, G. (1996), The modularity of theory of mind. In P. Carruthers and P. Smith
(eds) (1996), Theories of Mind, Cambridge: Cambridge University Press.
Seidenberg, M., et al. (1982), Automatic access to the meanings of ambiguous words
in context. Cognitive Psychology, 14: 489-537.
Sejnowski, T, and Rosenberg, C. (1987), Parallel networks that learn to pronounce
‘ English text. Complex Systems, 1: 145-68.
Selfridge, O. (1959), Pandemonium: a paradigm for learning. In D. Blake et al. (eds),
Proceedings of the Symposium on the Mechanization of Thought Processes, National
Physical Laboratory, London: HMSO.
Selfridge, O., and Neisser, U. (1960), Pattern recognition by machine. Scientific
American, August: 60-8.
Shannon, C., and Weaver, W. (1949), The Mathematical Theory of Communication,
Urbana: University of Illinois Press.
Shear, J. (ed.) (1997), Explaining Consciousness: The Hard Problem, Cambridge, MA:
Bradford/MIT Press.
Shepherd, G. (1991), Foundations of the Neuron Doctrine, Oxford: Oxford University
Press.
Shepherd, G. (1994), Neurobiology, 3rd edn, Oxford: Oxford University Press.
Sherrington, C. (1906), The Integrative Action of the Nervous System, reprinted (1973),
New York: Arno Press.
Shortliff, E. H. (1976), Computer-based Medical Consultants: MYCIN, New York:
North-Holland.
Siegelmann, H., and Sontag, E. (1992), On the computational power of neural nets.
Proceedings of the Fifth ACM Workshop on Computational Learning Theory, 440-
449.
Skinner, B. E. (1938), The Behavior of Organisms, Englewood Cliffs, NJ: Prentice-Hall.
Skinner, B. F. (1957), Verbal Behavior, Englewood Cliffs, NJ: Prentice-Hall.
Skinner, B. F. (1976), About Behaviorism, New York: Knopf
Smith, L. (1986), Behaviorism and Logical Positivism, Stanford, CA: Stanford Univer¬
sity Press.
Bibliography 431
interactive activation and competition, Lashley, K., 43, 44, 45, 49, 52, 53, 71,
275,276 72, 77, 289, 335, 421
interpretational semantics, 109, 172, 176, Lavine, S., xvii
203, 215, 111 law of effect, 41, 52
introspection, 15, 24, 33, 34, 38, 39, 229 law of exercise, 41, 52
intuitive knowledge, 339 Leahey, T, 44, 46, 47, 53
intuitive processing, 339, 344, 346, 348, learning rate, 283, 308, 309, 311, 312,
357 313, 314
intuitive processor, 321 LeDoux, J., 209, 211
Ittelson, B., xvii Leeuwenhoek, 66
Lehman, J., 224
Jackson, E, 10, 97, 223, 237, 238, 240, Lentin, A., 151
243, 270, 271, 393 Lepore, E., 10, 223, 271
Jacob, R, 222 Lettvin, J., 72, 79, 95, 96, 99, 102
James, W., 10, 15, 19, 20, 23-36, 39, levels of computation, 404—7
40-2, 44, 77, 80, 326n, 335, 336 Levesque, 176
James neuron, 26, 80 Levine, D., 328, 329
Jeffrey, R., 151, 176, 177, 412 Levine, J., 240, 270
Jets and Sharks, 275-7, 287-8, 332, 337, linear separability, 90, 91, 94, 101,
350, 370 316-17
Johnson-Laird, R, 9 and the Delta rule, 316
linguistics, 1, 2, 3, 4, 7, 8, 48
Kandel, E., 78 Lisker, L., 229
Karmiloff-Smith, A., 224 LISP, 117, 131
Keane, M., 176 local representations, 276, 279, 281, 318,
Keil, E, 9 319, 323, 327, 351
Kilian, J., 409 problems with, 319
Kim, J., 10, 222, 223, 270, 271, 358 location addressability, 324
Klahr, D., 223 Locke, John, 16, 17, 19, 20, 21, 22, 23,
Kobes, B., 122 34, 36, 105, 222
Koelliker, 68, 70 Loewer, B., 271
Kohler, W., 41 logic, 15, 20, 79, 81, 84, 100, 106, 151,
Koppelberg, S., 329 155, 157, 158, 175, 176, 177, 259,
Kosslyn, S., 11, 233, 392 263, 373, 400, 412
Kuhn, T, 289 logical behaviorism, 195-6
Kurtzweil, R., 152 Logical Computing Machine, 355
logical positivism, 41
Lackner, J., 211 Lormand, E., 270
Laird, J., 223 Ludlow, R, 36
Lambda-calculus, 130 Luger, G., 270
Lambda-definability, 130, 131 luminous room, 234—5, 266—7, 270
language of thought, 186,190-3,201,206- Lutz, R., 329
7,220,222,244-5,331-2,374,399 Lycan, W., 10, 223
Larson, T, xvii Lyons, J., xvii
Index 441
machine functionalism, 185-6, 201, 202, mind-body problem, 3, 39, 40, 135, 185,
204, 221, 223 193-8, 202, 223, 225, 265,331
machine table, 15, 127, 128 Minsky, M., 79, 82, 85, 90, 99n, 102,
Maloney, J., xvii, 222, 270 123, 130, 151, 152, 164, 165, 176,
Marcel, A., 270 250, 393, 409, 412
Mark I Perceptron, 89 misrepresentation problem see
Marr, D., 5, 10, 223, 321, 400, 401, 402, “disjunction problem”
411 modes of presentation, 191-3; see also
Marr’s “tri-level” hypothesis, 5, 400-1 aspectual shapes
Marx, M., 35, 53 modularity, 138, 143^, 160, 186,
mass action, 71, 72 215-14, 376, 378, 382
Massaro, D., 393 hard vs. soft, 217, 218
materialism see physicalism internal vs. external, 217
Maturana, H., 79, 95 modus ponens, 159, 160
McClamrock, R., 176 Moody, T, 10
McClelland,;., 276, 277, 288, 295, 299, Morris, R., 329
306, 319, 321, 328, 329, 334, 335, moving edge detector, 96
350, 358, 384, 385, 386, 392, 393 Mr. Bubblehead, 188, 200, 220
.McCorduck, R, 123 Miiller-Lyer, 216
.McCulloch, G., 36, 77, 222 multidimensional scaling, 322
.McCulloch, VV., 15, 79, 80, 82, 86, 90, multiple realizability, 85, 185, 220, 252,
93, 95, 99n, 100, 101, 102, 408 269
.McCulloch and Pitts neuron, 81, 82 MYCIN, 121
•McDermott, D., 175, 176 Myolopoulos, J., 176
.McDonald, C., 329
■McDonald, G., 329 Nadel, L., xvii, 329
.McGinn, C, 240, 271 Nagel, T, 237, 240, 241, 243, 423
■McLaughlin, B., 337, 366, 367, 392 narrow cognitive theory, 249
McLeod, R, 102, 328, 392 narrow content, 186, 249, 250, 271, 427
■McTeal, M., 122 “natural kinds” argument, 377, 378,
means-end reasoning, 31—2 391
memory natural language processing, 107, 119,
indirect vs. direct (or random), 148 121-2
location addressable vs. content Neisser, U, 46, 52, 147
addressable, 148 nerve impulse, 74—5
meta-cognition, 236 nerve-net theory, 66-9
meta-consciousness, 209, 211, 223, NETtalk, 275-6, 280-3, 286-8, 293,
236 .303, 318-19, 322, 332-3, 335, 336,
■Metropolis, 152 350, 380, 387, 410
microfeatures, 321-2, 338 neural architecture hypothesis, 344
.Mill, James, 18 neural level, 25, 33, 50, 73, 321, 338-9,
■Mill,J. S., 18, 19, 20 341, 344, 348-9, 362
■Miller, G., 2, 15, 47, 48, 49, 51, 53 Neural Networks see connectionist
■Mills, S., 35, 358 networks
442 Index
U’-r. . .
\ 1 ZOflii
HAR n 3 7005
"There are two problems that perennially plague courses in cognitive science:
students from one discipline lack an adequate background in the other
disciplines crucial to the subject, and, even within their own discipline,
students often don't possess the historical perspective necessary to understand
how contemporary problems arose and why they are important. Harnish's
rich and well-informed book is designed to solve both of these problems,
and it succeeds admirably."
Stephen Stich, Rutgers University
Tracing the history of central concepts from the nineteenth century to the present, this
study surveys the significant contributions of philosophy, psychology, neuroscience, and
computer science. The volume also investigates the theory of mind from two
contrasting approaches: the digital computer vs. neural network models.
Authoritative and comprehensive, this is the ideal text for introductory courses in cognitive
science as well as an excellent supplementary text for courses in philosophy of mind.
Cover illustration: Victor Vasarely, Etude en Bleu, 1930. Janus Pannonius Museum,
Pecs, Hungary / ® ADAGP, Paris and DACS, London 2001.