0% found this document useful (0 votes)
5 views

Toward an instance theory of automatization

Uploaded by

Steve Fryers
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Toward an instance theory of automatization

Uploaded by

Steve Fryers
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Psychological Review Copyright 1988 by the American PsychotoBicalAssociation, Inc.

1988, Vol. 95, No. 4,492-527 0033-295X/88/S00.75

Toward an Instance Theory of Automatization

Gordon D. Logan
University of Illinois

This article presents a theory in which automatization is construed as the acquisition of a domain-
speciSc knowledge base, formed of separate representations, instances, of each exposure to the task.
Processing is considered automatic if it relies on retrieval of stored instances, which will occur only
after practice in a consistent environment. Practice is important because it increases the amount
retrieved and the speed of retrieval; consistency is important because it ensures that the retrieved
instances will be useful. The theory accounts quantitatively for the power-function speed-up and
predicts a power-function reduction in the standard deviation that is constrained to have the same
exponent as the power function for the speed-up. The theory accounts for qualitative properties as
well, explaining how some may disappear and others appear with practice. More generally, it provides
an alternative to the modal view of automaticity, arguing that novice performance is limited by a
lack of knowledge rather than a scarcity of resources. The focus on learning avoids many problems
with the modal view that stem from its focus on resource limitations.

Automaticity is an important phenomenon in everyday men- tion: LaBerge and Samuels (1974) claimed that beginning read-
tal life. Most of us recognize that we perform routine activities ers may not be able to learn to read for meaning until they have
quickly and effortlessly, with little thought and conscious aware- learned to identify words and letters automatically.
ness—in short, automatically (James, 1890). As a result, we of- Over the last decade, considerable progress has been made
ten perform those activities on "automatic pilot" and turn our in understanding the nature of automaticity and the conditions
minds to other things. For example, we can drive to dinner under which it may be acquired (for reviews, see Kahneman
while conversing in depth with a visiting scholar, or we can & Treisman, 1984; LaBerge, 1981; Logan, 1985b; Schneider,
make coffee while planning dessert. However, these benefits may Dumais, & Shiffrin, 1984). There is evidence that automatic
be offset by costs. The automatic pilot can lead us astray, caus- processing differs qualitatively from nonautomatic processing
ing errors and sometimes catastrophes (Reason & Myceilska, in several respects: Automatic processing is fast (Neely, 1977;
1982). If the conversation is deep enough, we may find ourselves Posner&Snyder, 1975), effortless (Logan, 1978, 1979; Schnei-
and the scholar arriving at the office rather than the restaurant, der&Shiffrin, 1977), autonomous (Logan, 1980; Posner&Sny-
or we may discover that we aren't sure whether we put two or der, 1975;Shiffrin&Schneider, 1977; Zbrodoff & Logan, 1986),
three scoops of coffee into the pot. stereotypic (McLeod, McLaughlin, & Nimmo-Smith, 1985;
Automaticity is also an important phenomenon in skill acqui- Naveh-Benjamin & Jonides, 1984), and unavailable to con-
sition (e.g., Bryan & Hatter, 1899). Skills are thought to consist scious awareness (Carr, McCauley, Sperber, & Parmalee, 1982;
largely of collections of automatic processes and procedures Marcel, 1983). There is also evidence that automaticity is ac-
(e.g., Chase & Simon, 1973; Logan, 1985b). For example, quired only in consistent task environments, as when stimuli
skilled typewriting involves automatic recognition of words, are mapped consistently onto the same responses throughout
translation of words into keystrokes, and execution of key- practice. Most of the properties of automaticity develop
strokes (Salthouse. 1986). Moreover, the rate of automatization through practice in such environments (Logan, 1978, 1979;
is thought to place important limits on the rate of skill acquisi- Schneider &Fisk, 1982; Schneider & Shiffrin, 1977;Shiffrin&
Schneider, 1977).
Automaticity is commonly viewed as a special topic in the
This research was supported by National Science Foundation Grant study of attention. The modal view links automaticity with a
BNS 8510365. It was reported, in part, at the annual meeting of the single-capacity model of attention, such as Kahneman's (1973).
Psychonomic Society, November 1987, in Seattle, Washington. It considers automatic processing to occur without attention
Several people contributed to the work reported here through helpful (e.g., Hasher &Zacks, 1979; Logan, 1979,1980; Posner&Sny-
discussions and comments on various drafts of the article. 1 would like der, 1975; Shiffrin & Schneider, 1977), and it interprets the ac-
to thank Jerry Busemeyer, Tom Carr, Dick Foster, Rob Kail, Larry Ma- quisition of automaticity as the gradual withdrawal of attention
loney, Doug Medin, Brian Ross, Eliot Smith, and Jane Zbrodoff for
(e.g., LaBerge & Samuels, 1974; Logan, 1978; Shiffrin &
their criticisms and suggestions. I am grateful to Walter Schneider and
Schneider, 1977). The modal view has considerable power, ac-
an anonymous reviewer for helpful comments on the article. I would
counting for most of the properties of automaticity: Automatic
also like to thank Randy Tonks and Leslie Kane for testing the subjects.
Correspondence concerning this article should be addressed to Gor- processing is fast and effortless because it is not subject to atten-
don D. Logan, Department of Psychology, University of Illinois, 603 tional limitations. It is autonomous, obligatory, or uncontrolla-
East Daniel Street, Champaign, Illinois 61820. ble because attentional control is exerted by allocating capacity;

492
INSTANCE THEORY OF AUTOMATIZATION 493

a process that does not require capacity cannot be controlled The idea behind the theory is well illustrated in children's
by allocating capacity. Finally, it is unavailable to consciousness acquisition of simple arithmetic. Initially, children learn to add
because attention is the mechanism of consciousness and only single-digit numbers by counting (i.e., incrementing a counter
those things that are attended are available to consciousness by one for each unit of each addend), a slow and laborious pro-
(e.g.,Posner&Snyder, 1975). cess, but one that guarantees correct answers, if applied prop-
However, there are serious problems with the modal view. erly. With experience, however, children learn by rote the sums
Some investigators questioned the evidence that automatic pro- of all pairs of single digits, and rely on memory retrieval rather
cessing is free of attentional limitations (e.g., Cheng, 1985; than counting (Ashcraft, 1982; Siegler, 1987; Zbrodoff, 1979).
Ryan, 1983). Others found evidence of attentional limitations Once memory becomes sufficiently reliable, they rely on mem-
in tasks that are thought to be performed automatically (e.g., ory entirely, reformulating more complex problems so that they
Hoffman, Nelson, & Houck, 1983; Kahneman & Chajzyck, can be solved by memory retrieval.
1983; Paap & Ogden, 1981; Regan, 1981). The single-capacity
view of attention, in which the modal view is articulated, has Main Assumptions
been seriously challenged by multiple-resource theories, which
argue that many resources other than attention may limit per- The theory makes three main assumptions: First, it assumes
formance (e.g., Navon & Gopher, 1979; Wickens, 1984).' Oth- that encoding into memory is an obligatory, unavoidable conse-
ers argued that performance may not be limited by any re- quence of attention. Attending to a stimulus is sufficient to com-
sources, attentional or otherwise (e.g., Allport, 1980; Navon, mit it to memory. It may be remembered well or poorly, depend-
1984; Neisser, 1976). Moreover, there is growing dissatisfaction ing on the conditions of attention, but it will be encoded. Sec-
with the idea that automatization reflects the gradual with- ond, the theory assumes that retrieval from memory is an
drawal of attention (e.g., Hirst, Spelke, Reaves, Caharack, & obligatory, unavoidable consequence of attention. Attending to
Neisser, 1980; Kolers, 1975; Spelke, Hirst, & Neisser, 1976). a stimulus is sufficient to retrieve from memory whatever has
Critics argue that the idea is empty unless the learning mecha- been associated with it in the past. Retrieval may not always be
nism can be specified. successful, but it occurs nevertheless. Encoding and retrieval
The purpose of this article is to propose a theory of automa- are linked through attention; the same act of attention that

ticity that describes the nature of automatic processing and says causes encoding also causes retrieval. Third, the theory assumes
that each encounter with a stimulus is encoded, stored, and re-
how it may be acquired without invoking the single-capacity
theory of attention or the idea of resource limitations. The the- trieved separately. This makes the theory an instance theory and
ory is first described generally, then a specific version of the the- relates it to existing theories of episodic memory (Hintzman,
ory is developed to account for the speed-up and reduction in 1976; Jacoby & Brooks, 1984), semantic memory (Landauer,

variability that accompany automatization. The theory is then 1975), categorization (Jacoby & Brooks, 1984; Medin &
Schaffer, 1978), judgment (Kahneman & Miller, 1986), and
fitted to data from two different tasks—lexical decision and al-
phabet arithmetic—and experiments that test the learning as- problem solving (Ross, 1984).
sumptions of the theory are reported. Finally, the qualitative These assumptions imply a learning mechanism—the accu-
properties of automaticity are discussed in detail, implications mulation of separate episodic traces with experience—that pro-
of the theory are developed and discussed, and the theory is con- duces a gradual transition from algorithmic processing to
trasted with existing theories of skill acquisition and automati- memory-based processing. They also suggest a perspective on
zation. theoretical issues that is fundamentally different from the
modal perspective, which was derived from assumptions about
resource limitations. But are the assumptions valid? Possibly.
Automaticity as Memory Retrieval Each one receives some support.
The assumption of obligatory encoding is supported by stud-
The theory relates automaticity to memorial aspects of atten-
tion rather than resource limitations. It construes automaticity ies of incidental learning and comparisons of incidental and in-
as a memory phenomenon, governed by the theoretical and em-
pirical principles that govern memory. Automaticity is memory 1
The concept of automaticity has not been articulated well in multi-
retrieval: Performance is automatic when it is based on single- ple-resource theories (Navon & Gophei; 1979; Wickens, 1984). For the
step direct-access retrieval of past solutions from memory. The most part, investigators claim that automatic processes use resources
theory assumes that novices begin with a general algorithm that efficiently, reiterating the main assumption underlying the modal, sin-
is sufficient to perform the task. As they gain experience, they gle-capacity view. But they don't specify which resources are used more
learn specific solutions to specific problems, which they retrieve efficiently. The discussion focuses on a single resource, leaving it for
the reader to decide why that particular resource should be used more
when they encounter the same problems again. Then, they can
efficiently or whether the other resources come to be used more effi-
respond with the solution retrieved from memory or the one
ciently as well (also see Allport, 1980; Logan, 1985b).
computed by the algorithm. At some point, they may gain 2
Strictly speaking, the instance theory considers performance to be
enough experience to respond with a solution from memory on automatic when it is based on memory retrieval whether that occurs on
every trial and abandon the algorithm entirely. At that point, the 10th trial or the 10,000th. Early in practice, before subjects rely in
their performance is automatic.2 Automatization reflects a tran- memory entirely, performance may be automatic on some trials (i.e.,
sition from algorithm-based performance to memory-based those on which memory provides a solution) but not on others (i.e.,
performance. those on which the algorithm computes a solution).
494 GORDON D. LOGAN

tentional learning. The evidence overwhelmingly indicates that tization involves learning specific responses to specific stimuli.
people can learn a lot without intending to; incidental learning The underlying processes need not change at all—subjects are
is usually closer to intentional learning than to chance. The in- still capable of using the algorithm at any point in practice (e.g.,
tention to learn seems to have little effect beyond focusing atten- adults can still add by counting), and memory retrieval may
tion on the items to be learned (Hyde & Jenkins, 1969; Mandler, operate in the same way regardless of the amount of informa-
1967). However, the assumption of obligatory encoding does tion to be retrieved. Automaticity is specific to the stimuli and
not imply that all items will be encoded equally well. Attention the situation experienced during training. Transfer to novel
to an item may be sufficient to encode it into memory, but the stimuli and situations should be poor. By contrast, the modal
quality of the encoding will depend on the quality and quantity view suggests that automatization is process-based, making the
of attention. As the levels-of-processing literature has shown, underlying process more efficient, reducing the amount of re-
subjects remember the same items better when they attend to sources required or the number of steps to be executed (e.g.,
their semantic features rather than their physical features Anderson, 1982; Kolers, 1975; LaBerge & Samuels, 1974; Lo-
(Craik & Tulving, 1975). Dual-task studies show that subjects gan, 1978). Such process-based learning should transfer just as
remember less under dual-task conditions than under single- well to novel situations with untrained stimuli as it does to fa-
task conditions (Naveh-Benjamin & Jonides, 1984; Nissen & miliar situations with trained stimuli.
Bullemen 1987).3 There is abundant evidence for the specificity of automatic
The assumption of obligatory retrieval is supported by stud- processing in the literature on consistent versus varied map-
ies of Stroop and priming effects, in which attention to an item ping. Practice improves performance on the stimuli and map-
activates associations in memory that facilitate performance in ping rules that were experienced during training but not on
some situations and interfere with it in others (for a review, see other stimuli or even other rules for mapping the same stimuli
Logan, 1980). The most convincing evidence comes from stud- onto the same responses (for a review, see Shiffrin & Dumais,
ies of episodic priming that show facilitation from newly 1981). The experiments presented later in the article provide
learned associates (McKoon & Ratcliff, 1980; Ratcliff & Mc- further evidence.
Koon, 1978,1981). The assumption of obligatory retrieval does The theory differs from process-based views of automatiza-
not imply that retrieval will always be successful or that it will tion in that it assumes that a task is performed differently when
be easy. Many factors affect retrieval time (Ratcliff, 1978), in- it is automatic than when it is not; automatic performance is
cluding practice on the task (Pirelli & Anderson, 1985). The based on memory retrieval, whereas nonautomatic perfor-
prevailing conditions in studies of automaticity are generally mance is based on an algorithm. This assumption may account
good for retrieval: The same items have been presented many for many of the qualitative properties that distinguish auto-
times and so should be easy to retrieve. The algorithm, if used matic and nonautomatic performance. The properties of the al-
in parallel with retrieval, will screen out any slow or difficult gorithm may be different from the properties of memory re-
retrievals by finishing first and providing a solution to the task. trieval; variables that affect the algorithm may be different from
The assumption of an instance representation for learning the variables that affect memory retrieval. In particular, vari-
contrasts with the modal view. Many theories assume a strength ables that affect performance early in practice, when it is domi-
representation (e.g., LaBerge & Samuels, 1974; MacKay, 1982; nated by the algorithm, may not affect performance later in
Schneider, 1985), and others include strength as one of several practice, when it is dominated by memory retrieval. Thus,
learning mechanisms (e.g., Anderson, 1982). In instance theo- dual-task interference and information-load effects may dimin-
ries, memory becomes stronger because each experience lays ish with practice because they reflect difficulties involved in us-
down a separate trace that may be recruited at the time of re- ing the initial algorithm that do not arise in memory retrieval.
trieval; in strength theories, memory becomes stronger by
strengthening a connection between a generic representation of
a stimulus and a generic representation of its interpretation or 3
The assumption of obligatory encoding is similar to Hasher and
its response.
Zacks's (1979, 1984) notion of automatic encoding of certain stimulus
Instance theories have been pitted against strength theories attributed (e.g., frequency of presentation, location). However, Hasher
in studies of memory and studies of categorization. In memory, and Zacks assumed that encoding is not influenced by manipulations
strength is not enough; the evidence is consistent with pure in- of attention, intention, or strategy, whereas the instance theory assumes
stance theories or strength theories supplemented by instances only that it is obligatory. Hasher and Zacks's position was challenged
(for a review, see Hintzman, 1976). In categorization, abstrac- recently by evidence that encoding can be influenced by orienting tasks
tion is the analog of strength. Separate exposures are combined and dual-task conditions. Greene (1984) and Fisk and Schneider (1984)
into a single generic, prototypic representation, which is com- showed that subjects remembered the frequency of stimuli presented
during semantic orienting tasks better than the frequency of stimuli pre-
pared with incoming stimuli. The evidence suggests that proto-
sented during orienting tasks that focused on physical or structural fea-
types by themselves are not enough; instances are important in
tures. Naveh-Benjamin and Jonides (1986) showed that subjects re-
categorization (for a review, see Medin & Smith, 1984). The
membered the frequency of stimuli presented under single-task condi-
success of instance theories in these domains suggests that they tions better than the frequency of stimuli presented under dual-task
may succeed as well in explaining automatization. Experiment conditions (also see Naveh-Benjamin, 1987). The instance theory can
5 pits the instance theory against certain strength theories. accomodate these findings easily. It assumes that attention to an item
The instance representation also implies that automatization will have some impact on memory; it does not assume that all condi-
is item-based rather than process-based. It implies that automa- tions of attention produce the same impact.
INSTANCE THEORY OF AUTOMATIZATION 495

This theme is developed in detail in a subsequent section of the a strong constraint between the power function for the mean
article. and the one for the standard deviation: they must have the same
The assumption that automatic and nonautomatic process- exponent, c.
ing are different does not imply that they have opposite charac- The predictions for the power law follow naturally from the
teristics, as many current treatments of autornaticity imply. Au- main assumptions of the instance theory—obligatory encod-
tomatic processing may be well defined (having the properties ing, obligatory retrieval, and instance representation. The pre-
of memory retrieval), but nonautomatic processing may not be. dictions are developed mathematically in Appendix A. The re-
The set of algorithms that are possible in the human cognitive mainder of this section provides an informal account.
system is probably unbounded, and it seems highly unlikely The theory assumes that each encounter with a stimulus is
that any single property or set of properties will be common to encoded, stored, and retrieved separately. Each encounter with
all algorithms, or even to most of them. Thus, the present the- a stimulus is assumed to be represented as a processing episode,
ory does not endorse the strategy of denning automaticity by which consists of the goal the subject was trying to attain, the
listing dichotomous properties (e.g., serial vs. parallel; effortful stimulus encountered in pursuit of the goal, the interpretation
vs. effortless) that distinguish it from another specific kind of given to the stimulus with respect to the goal, and the response
processing (e.g., attentional, Logan, 1980; controlled, Shiffrin made to the stimulus. When the stimulus is encountered again
& Schneider, 1977; effortful, Hasher & Zacks, 1979; strategic, in the context of the same goal, some proportion of the process-
Posner & Snyder, 1975; and conscious, Posner & Klein, 1973). ing episodes it participated in are retrieved. The subject can
then choose to respond on the basis of the retrieved informa-
Quantitative Properties of Automaticity tion, if it is coherent and consistent with the goals of the current
task, or to run off the relevant algorithm and compute an inter-
The theory is primarily intended to account for the major pretation and a response.
quantitative properties of automatization, the speed-up in pro- The simplest way to model the choice process is in terms of
cessing and reduction in variability that result from practice. a race between memory and the algorithm—whichever finishes
The speed-up is the least controversial of the properties of auto- first controls the response. Over practice, memory comes to
maticity. It is observed in nearly every task that is subject to dominate the algorithm because more and more instances enter
practice effects, from cigar rolling to proving geometry theo- the race, and the more instances there are, the more likely it is
rems (for a review, see Newell & Rosenblpom, 1981). In each that at least one of them will win the race. The power-function
case, the speed-up follows a regular function, characterized by speed-up and reduction in variability are consequences of the
substantial gains early in practice that diminish with further race.
experience. More formally, the speed-up follows a power func-
tion,
Memory Retrieval and the Power Law for Means
and Standard Deviations

where RT is the time required to do the task, N is the number The memory process is itself a race. Each stored episode
of practice trials, and a, b, and c are constants. A represents the races against the others, and the subject can respond on the basis
asymptote, which is the limit of learning determined perhaps of memory as soon as the first episode is retrieved. The race
by the minimum time required to perceive the stimuli and emit can be modeled by assuming that each episode has the same
a response; b is the difference between initial performance and distribution of finishing times. Thus, the finishing time for a
asymptotic performance, which is the amount to be learned; retrieval process involving N episodes will be the minimum of
and c is the rate of learning. The values of these parameters vary N samples from the same distribution, which is a well-studied
between tasks, but virtually all practice effects follow a power problem in the statistics of extremes (e.g., Gumbel, 1958). Intu-
function.4 ition suggests that the minimum will decrease as N increases,
The power-function speed-up has been accepted as a nearly but the question is, will it decrease as a power function of N?
universal description of skill acquisition to such an extent that
it is treated as a law, a benchmark prediction that theories of
4
skill acquisition must make to be serious contenders (see, e.g., Power functions are linear when plotted in logarithmic coordinates.
Anderson, 1982; Grossman, 1959;MacKay, 1982; Newell &Ro- Thus,
senbloom, 1981).3 If they cannot account for the power law,
log(RT - a) = log(6) - clog(JV).
they can be rejected immediately. The instance theory predicts
a power-function speed-up. The power-function speed-up is sometimes called the log-log linear law
of learning (Newell &Rosenbloom, 1981).
The reduction in variability that accompanies automatiza- 5
Mazur and Hastie (1978) argued that learning curves were hyper-
tion is not well understood, largely because most theories ne-
bolic rather than exponential and reanalyzed a great deal of data to dem-
glect it. The literature shows that variability decreases with
onstrate their point. However, (a) they analyzed accuracy data rather
practice (e.g., McLeod, McLaughlin, & Nimmo-Smith, 1985; than reaction times—and the theories of the power-function speed-up
Naveh-Benjamin & Jonides, 1984), but the form of the function do not necessarily make predictions about accuracy—and (b) the hyper-
has not been specified; there is nothing akin to the power law. bolic function is a power function with an exponent of -1, so it is a
The instance theory predicts that the standard deviation will special case of the power law. Consequently, Mazur and Hastie's argu-
decrease as a power function of practice. Moreover, it predicts ments and analyses do not contradict the power law.
496 GORDON D. LOGAN

It would be difficult to prove mathematically that the mini- to be different from the distribution of retrieval times. But in
mum of N samples from every conceivable distribution de- practice, the deviations from the predicted power law may be
creases as a power function of N, but it is possible to prove it small. It is hard to make general analytical predictions because
for a broad class of initial distributions (all positive-valued dis- the algorithm and memory distributions may differ in many
tributions). That proof is presented in Appendix A. The power- ways. They may have the same functional form but different
function speed-up is a consequence of two counteracting fac- parameters (the exponential case is analyzed in Appendix A) or
tors: On the one hand, there are more opportunities to observe they may have different forms.
an extreme value as sample size increases, so the expected value Any distortion that does occur will be limited to the initial
of the minimum will decrease. But, on the other hand, the more part of the learning curve. Once performance depends on mem-
extreme the value, the lower the likelihood of sampling a value ory entirely it will be governed by the power law. Before that—
that is even more extreme, so the reduction in the minimum during the transition from the algorithm to memory retrieval—
that results from increasing sample size by m will decrease as the proofs no longer guarantee a power law.
sample size increases. The first factor produces the speed-up; I explored the effects of various transitions on the power-law
the second factor produces the negative acceleration that is predictions through Monte Carlo simulation, using truncated
characteristic of power functions. normal distributions for the algorithm and the memory pro-
Intuition also suggests that variability will decrease as N in- cess. Earlier simulations showed that the means and standard
creases: The losers of the race restrict the range that the winner deviations of the minimum of « samples from a truncated nor-
can occupy. The more losers, the more severe the restriction, mal decreased as a power function of n. The current simulations
and thus, the smaller the variability. Moreover, the same factors addressed whether a race against another truncated normal
that limit the reduction in the mean limit the reduction in the with different parameters would distort the power-function fits.
range that the minimum can occupy, so the reduction in vari- The algorithm was represented by nine different distributions,
ability should be negatively accelerated like the reduction in the factorially combining three means (350, 400, and 450 ms) and
mean. But does it follow a power function? And if so, is the three standard deviations (80, 120, and 160 ms). The memory
exponent the same as the one for the mean? process was represented by two distributions with different
The proofs in Appendix A show that the entire distribution means (400 and 500 ms) and the same standard deviation (100
of minima decreases as a power function of sample size, not ms). These parameters represent a reasonably wide range of
just the mean of the distribution. This implies a power-function variation, including cases in which memory is faster and less
reduction in the standard deviation as well as the mean. Because variable than the algorithm, as well as cases in which it is slower
the mean and standard deviation are both functions of the same and more variable. This is important because the outcome of
distribution, the exponent of the power function for the mean the race will depend on the mean and standard deviation of the
will equal the exponent of the power function for the standard parent distributions. Other things equal, the distribution with
deviation. the faster mean will win the race more often. Also, the distribu-
These predictions are unique to the instance theory. No other tion with the larger standard deviation will win more often be-
theory of skill acquisition or automaticity predicts a power- cause extreme values are more likely the larger the standard
function reduction in the standard deviation and constrains its deviation.
exponent to equal the exponent for the reduction in the mean. The effects of 1 to 32 presentations were simulated. The simu-
lations assumed that the algorithm was used on every trial (i.e.,
the "subject" never chose to abandon it in favor of memory)
The Power Law and the Race Between the Algorithm
and that each prior episode was retrieved on every trial. Thus,
and Memory Retrieval
for a trial on which a stimulus appeared for the «th time, reac-
According to the instance theory, automatization reflects a tion time was set equal to the minimum of n samples, one from
transition from performance based on an initial algorithm to the distribution representing the algorithm and n - 1 from the
performance based on memory retrieval. The transition may distribution representing the memory process. There were 240
be explained as a race between the algorithm and the retrieval simulated trials for each number of presentations (1-32), which
process, governed by the statistical principles described in the approximates the number of observations per data point in the
preceding section and in Appendix A. In effect, the algorithm experiments reported in subsequent sections of the article.
races against the fastest instance retrieved from memory. It is The simulations provided three types of data: mean reaction
bound to lose as training progresses because its finishing time times, standard deviation of reaction times, and the proportion
(distribution) stays the same while the finishing time for the re- of trials on which the algorithm won the race. Power functions
trieval process decreases. At some point, performance will de- were fitted to the means and standard deviations simultaneously
pend on memory entirely, either as a consequence of statistical (using STEPIT; Chandler, 1965), such that the exponent was con-
properties of the race or because of a strategic decision to trust strained to be the same for means and standard deviations as
memory and abandon the algorithm. the instance theory predicts. If the race with the algorithm dis-
Does the transition from the algorithm to memory retrieval torts the relation between means and standard deviations, the
compromise the power-law predictions derived in the preceding constrained power functions will not fit well.
section and in Appendix A? Strictly speaking, it must. The Means and standard deviations. The simulated mean reac-
proofs assume independent samples from n identical distribu- tion times appear in Figure 1, and the standard deviations ap-
tions, and the distribution for algorithm finishing times is likely pear in Figure 2. The points represent the simulated data and
INSTANCE THEORY OF AUTOMATIZATION 497

ALGORITHM SD - 80; MEMORY - 400, 100 500 ALGORITHM SD = 80; MEMORY = 500. 100
500-,

a 2
400 400-

300-1 i- 300-
i
i-o
!200- §200-

100 100
4 ' 8 12 16 20 24 28 32 0 4 8 12 16 20 24 28 32
NUMBER OF PRESENTATIONS NUMBER OF PRESENTATIONS

ALGORITHM SO - 120; MEMORY - 400, 100 500-1 ALGORITHM SD = 120; MEMORY - 500. 100
500 -i

a
400- = 400-
s
P :
300- 300

8200- 1200-

100 100
4 8 12 16 20 24 28 32 0 4 8 12 16 20 24 28 32
NUMBER OF PRESENTATIONS NUMBER OF PRESENTATIONS

500-1 ALGORITHM SO - 160; MEMORY = 400. 100 500 ALGORITHM SD - 160; MEMORY - 500. 100

(A
a
400-
z

P
300- P300-

§200- 200-

100 100
8 ' 12 ' 16 ' 20 ' 24 ' 28 ' 32 8 12 16 20 24 28 32
NUMBER OF PRESENTATIONS NUMBER OF PRESENTATIONS

Figure 1. .Reaction times from simulations of a race between an algorithm and a memory retrieval process
as a function of the number of presentations of an item. (Points represent the simulated data; lines, fitted
power functions. Power functions are constrained to have exponents equal to those of power functions fitted
to the standard deviations, whichareplottedin Figure 2. Each panel portrays three algorithms with different
means—350,400, and 450 from the bottom function to the top—and the same standard deviation—80 in
the top two panels, 120 in the middle two, and 160 in the bottom two—racing against a memory process
with a constant mean—400 in the left-hand panels, 500 in the right—and standard deviation, 100 in all
panels.)

the lines represent fitted power functions, constrained to have Two points are important to note: First, the means and stan-
the same exponent for means and standard deviations. The ex- dard deviations both decreased as the number of presentations
ponents appear in Table 1. increased, and the trend was well fit by the constrained power
498 GORDON D. LOGAN

ALGORITHM MEAN = 350; MEMORY = 400, 100 ALGORITHM MEAN = 350; MEMORY = 500, 100
160- 160-

. 120- r 120-
g
j 100- t 100-

j 80-

j 60- o- 60-

; 40-

> 20- 20-

~~l—i—I—i—I—i—l—i—l—P—l—
4 8 12 16 20 24 28 32 8 12 16 20 24 28
NUMBER OF PRESENTATIONS NUMBER OF PRESENTATIONS

ALGORITHM MEAN - 400; MEMORY - 400, 100 ALGORITHM MEAN = 400; MEMORY = 500, 100
160- 160-

140-
z
z 120- 120-
g
§ 100- 100-

§ 80- BO-

60 60-
I "
§ 40- 40-

fe 20- 20-

0 0 —1 1 1 1 1 1 1 r—i f 1 1 1—
4 8 12 16 20 24 28 32 0 4 8 12 16 20 24 28
NUMBER OF PRESENTATIONS NUMBER OF PRESENTATIONS

ALGORITHM MEAN - 450; MEMORY = 400, 100 ALGORITHM MEAN = 450; MEMORY = 500. 100
160- 160-
) >
140- 140-

120-

100- j 100-

80- 80-

60-
? 40
1
20-1 20

1—i—i—i—i—i—-i—i—i—i—'—l—i—I
4 8 12 16 20 24 28 32 0 4 8 12 16 20 24 28 32
NUMBER OF PRESENTATIONS NUMBER OF PRESENTATIONS

Figure 2. Standard deviations from the simulated race between an algorithm and a memory retrieval pro-
cess. (Points represent the simulated data; lines, fitted power functions. Power functions are constrained to
have exponents equal to those of power functions fitted to the means, which are plotted in Figure 1. Each
panel portrays three different algorithms with different standard deviations—80, 120, and 160 from the
bottom function to the top—and the same mean—350 in the top panels, 400 in the middle, and 450 in the
bottom—racing against a memory process with a constant mean—400 in the left panels, 500 in the right—
and standard deviation, 100.)
INSTANCE THEORY OF AUTOMATIZATION 499

Table 1 the variance of the algorithm, increasing as the algorithm be-


Exponents of Power Functions Fitted to the Simulated Data in came faster and more variable. They were also affected by the
Figures 1 and 2, Which Represent Races Between Various mean of the memory process, decreasing as the memory pro-
Memory Processes and Algorithms cess becomes faster.
The memory process came to dominate the algorithm rela-
Memory = 400, 100: Memory = 500, 100:
tively quickly. Averaged over all conditions, the algorithm won
Algorithm SD Algorithm SD
Algorithm the race only 21%ofthetimeafterl6presentationsandl6%of
M 80 120 160 80 120 160 the time after 32 presentations. The speed of memory retrieval
had a large effect on the outcome: The 400-ms retrieval process
350 0.156 0.340 0.367 0.185 0.193 0.248 won 90% of the time after 16 presentations and 93% of the time
400 0.251 0.301 0.465 0.227 0.252 0.251
after 32 presentations, whereas the 500-ms retrieval process
450 0.368 0.479 0.550 0.230 0.316 0.452
won only 68% of the time after 16 presentations and 76% of the
Note. The exponent for the 400,100 memory process is 0.302; the expo- time after 32 presentations.
nent for the 500, 100 memory process is 0.309. The dominance of the memory process after such a small
number of trials is important because it suggests that memory
retrieval will eventually win the race regardless of the speed and
functions (r2 ranged from .992 to 1.000 with a median of .998; variability of the algorithm. Possibly, memory retrieval will
root-mean-squared deviation between predicted and observed come to win the race even if the algorithm becomes faster with
values ranged from 2.38 ms to 5.93 ms, with a median of 3.81 practice. Thus, memory retrieval may provide a back-up mech-
ms). Thus, the race does not appear to compromise the power anism for automatization and skill acquisition even when skill
law; the instance theory can predict power functions even when and automaticity are acquired through other mechanisms.
memory retrieval must race against a faster or slower algorithm.
Second, the race distorts the form of the power function; the
Memorability and the Rate of Learning
exponents from the constrained fits are systematically different
from the fits to the memory process by itself. The exponents The speed with which the memory process comes to domi-
from the race increase in absolute magnitude as the algorithm nate the algorithm has important implications for studies of
mean increases and as the algorithm standard deviation in- automaticity: It suggests that automatization can occur very
creases. quickly (also see Logan, 1988; Naveh-Benjamin & Jonides,
The simulated data illustrate the effects of "qualitative" 1984; E. Smith & Lerner, 1986). This means that it is feasible
differences between the automatic and nonautomatic perfor- to study automatization in a single session, as was done in some
mance. Each panel has three different versions of the algorithm of the experiments reported in subsequent sections of this
racing against a single version of the memory retrieval process, article.
and in each case, initial differences due to the algorithm disap- An important question raised by the instance theory is why
pear after a few presentations. Averaged over all six panels, the automatization takes so long in other experiments. In search
difference between the 450-ms algorithm and the 350-ms algo- studies, for example, several thousand trials spanning 10 to 20
rithm decreased from 97 ms on the first presentation to 15 ms sessions are often necessary to produce automaticity (e.g.,
on the 32nd presentation. If the algorithm were a memory Shifrrin & Schneider, 1977). The discrepancy may arise for at
search process and the different conditions corresponded to least three reasons: First, the criterion for automaticity is
memory set sizes of 1 and 5, the initial slope of 24 ms/item (i.e., different in the previous studies. In Shiffrin and Schneider's
97/4) would approach zero (i.e., 15/4) after 32 presentations. search studies, automatization was not considered to be com-
Another kind of "qualitative difference" can be seen by com- plete until the slope of the search function reached zero (but
paring the two panels in each row. In this case, differences that see Cheng, 1985; Ryan, 1983). From the present perspective,
were not there initially, emerge with practice. The conditions of automatization may never be complete, in that each additional
the algorithm are the same in the right and in the left panel, but instance will have some effect on memory, even if its effect does
the memory retrieval process is much slower in the right panel. not appear in the primary performance data (also see Logan,
Averaged over the three pairs of panels, the difference between 1985b). The present perspective also suggests there may be a
right and left was -1 ms on the first presentation and 84 ms on shift in the direction of automaticity after only a few trials, and
the 32nd presentation. this shift may be a more important phenomenon to study than
The standard deviations show "qualitative differences" sim- the zero slope addressed by Shifrrin, Schneider, and others (also
ilar to those observed for the means. Initial differences due to see Logan, 1979,1985b).
the algorithm were substantially reduced as the number of pre- Second, differences in the apparent rate of automatization
sentations increased and memory retrieval came to dominate may be artifacts of the way the data are plotted. Instance theory
performance. Averaging over conditions, the initial difference argues that means and standard deviations should be plotted as
between the 160-ms algorithm and the 80-ms algorithm de- a function of the number of trials per stimulus because each
creased to 22 ms by the 32nd presentation. trial potentially adds a new instance to memory. But data are
P(algorithm first). Figure 3 presents a different perspective usually plotted against sessions, disregarding the number of tri-
on the outcome of the race, the probabilities that the algorithm als per stimulus. Consequently, a task with more stimuli per ses-
finished first. The probabilities were affected by the mean and sion (and thus fewer trials per stimulus) will appear to be
500 GORDON D. LOGAN

1.00 -i T ALGORITHM SO = 80; MEMORY = 400, 100 1.00 -|l ALGORITHM SD - 80; MEMORY - 500, 100

^0.80 - C-0.80 -
Zn

20.60 -
|
gO.40 -

°^0.20 -

0.00 0.00
T~ 12 16 20 24 2'8 32 8 12 16 20 24 28 32
NUMBER OF PRESENTATIONS NUMBER OF PRESENTATIONS

1.00 T » ALGORITHM SD - 120; MEMORY - 400. 100 ALGORITHM SD • 120; MEMORY - 500, 100

0.00
8 12 16 20 24 28 32
NUMBER OF PRESENTATIONS NUMBER OF PRESENTATIONS

1.00 i ALGORITHM SD = 160; MEMORY = 400. 100 ALGIRITHM SO - 160; MEMORY <* 500, 100
1.00

pO.80 pO.80
VI
EC
c 1
3 0.60 - s 0.60 -
r
E
go.+o - gO.40 -
o

^0.20 - ^0.20

0.00 0.00
B 12 16 20 24 28 32 6 ' 4 ' 8 12 16 20 24 28 32
NUMBER OF PRESENTATIONS NUMBER OF PRESENTATIONS

Figure 3. Probability that the algorithm finished first in the simulated race between an algorithm and a
memory retrieval process. (Each panel portrays three algorithms with different means—350, 400, and
450—and the same standard deviation—80 in the top two panels, 120 in the middle two, and 160 in the
bottom two—racing against a memory process with a constant mean—400 in the left-hand panels, 500 in
the right—and standard deviation, 100 in all panels.)

learned more slowly than a task with fewer stimuli per session Third, differences in the rate of automatization may reflect
(and thus more trials per stimulus), even if the rate of learning the memorability of the stimuli. According to instance theory,
per stimulus is equal. stimuli that are easy to remember will show evidence of auto-
INSTANCE THEORY OF AUTOMATIZATION 501

maticity relatively quickly, whereas stimuli that are hard to re- Fitting Theory to Data
member will take a long time to show evidence of automaticity.
One of the basic premises of the instance theory is that non-
It may be that the letter arrays studied by Shiffrin, Schneider,
and others are hard to remember. By contrast, the simulations automatic processes need not have anything in common, except

so far have assumed that each and every encounter with a stimu- that they are replaced by memory retrieval as practice pro-
gresses. In keeping with that premise, the instance theory was
lus is encoded and retrieved.
The effect of memorability can be modeled by slowing down fitted to data from two very different tasks, lexical decision and
alphabet arithmetic. The lexical decision task is fast, relatively
the retrieval time or by varying the probability that a stimulus
effortless, and possibly parallel, whereas the alphabet arithme-
will be encoded and retrieved. Most likely, memorability has
tic task is slow, very effortful, and clearly serial. According to
both effects (Ratcliff, 1978), but it is interesting to consider
them separately. The effects of slowing down retrieval can be the instance theory, after sufficient practice, both tasks should
be performed in the same way—by retrieving past solutions
seen in Table 1: The learning rate, as measured by the power-
function exponent was slower for the 500-ms memory process from memory. This prediction was tested indirectly, by examin-

than for the 400-ms memory process when they both raced ing the fit of power functions to the means and standard devia-
tions of reaction times in both tasks and by looking for evidence
against the same algorithm.
of item-specific learning in both tasks. The theoretical analysis
The effect of varying retrieval probability is to slow the rate
of learning by reducing the effective number of traces in the in the preceding section makes the strong prediction that both
means and standard deviations should decrease as power func-
race. Reaction times and standard deviations will still decrease
tions of the number of trials with the same exponent, c. The
as a power function of n, but with a smaller exponent, reflecting
a slower rate of learning.6 theoretical analysis also predicts that learning should be item-
based; subjects learn specific responses to specific stimuli, and
These analyses suggest that there may be no discrepancy be-
tween the rate of automatization predicted by instance theory what they learn should not transfer well to different stimuli.
These predictions are tested in Experiments 1 -4. Experiment
and the rate observed in typical studies of automaticity. It may
be possible to observe automatization in a single session, as was S pits the instance theory against certain strength theories.

suggested earlier, as long as the number of stimuli is small and


the stimuli themselves are easy to remember (also see Logan, Experiment 1
1988; Naveh-Benjamin & Jonides, 1984; Smith & Lerner,
Experiment 1 involved a lexical decision task. Subjects were
1986).
presented with strings of four letters, and their task was to indi-
cate as quickly as possible whether or not the letter string was an
English word. The experiment was intended to resemble typical
Conclusions
studies of the development of automaticity, in which subjects
are exposed to the same items repeatedly throughout practice.
The theoretical analyses in this section showed that the quan-
Subjects made lexical decisions on the same set of 10 words and
titative properties of automaticity can be accounted for by an
10 nonwords until each word and nonword was presented 16
instance theory that assumes that subjects store and retrieve
times. In this paradigm, the average lag between successive pre-
representations of each individual encounter with a stimulus.
sentations is held constant over repetitions, but the degree of
According to the instance theory, automatization reflects a shift
nonspecific practice on the task increases with the number of
from reliance on a general algorithm to reliance on memory for
repetitions. To control for nonspecific practice, subjects per-
past solutions. Thus, automatization reflects the development
formed another 16 blocks of lexical decisions, but 10 new words
of a domain-specific knowledge base; nonautomatic perfor-
and 10 new nonwords were used in each block. Further details
mance is limited by a lack of knowledge rather than by the scar-
of the procedure and the results of analyses of variance (AN-
city of resources.
OVAS) on the mean reaction times and standard deviations are
The power-function speed-up is a statistical consequence of
presented in Appendix B.
the main assumptions (obligatory encoding, obligatory re-
trieval, and instance representation), and the theory makes new
predictions about the reduction in variability. Standard devia- 6
If the original power function has an exponent of c and some propor-
tions should decrease as power functions of the number of trials,
tion p of the n episodes are stored and retrieved, then when performance
and the exponents should be the same as the exponents for the
is plotted against n, the data will follow a power function with an expo-
means. nent k < c. That is,
What is interesting about the speed-up is that none of the
underlying processes change over practice. The algorithm stays n* = (prif.
the same and so does memory retrieval. Moreover, stimuli are Taking logs of both sides yields
encoded in exactly the same way at every point in practice—
each trial results in the encoding and storage of a processing k\ogn = c(logp + logn),

episode. All that changes is the knowledge base that is available and solving for k yields
to the subject.
k = c[(h>gp + logn)/logn].
It remains to be shown that the instance theory can account
for experimental data; that is the purpose of the next section. K must be less than c because (logp + log«)/logn is less than one.
502 GORDON D. LOGAN

LEARNING: OLD VS. NEW


Experiment 2
Experiment 2 was intended to resemble typical studies of rep-
etition effects in implicit and explicit memory: Subjects per-
formed several blocks of lexical decision trials. Within each
block of trials, some words and nonwords are presented once,
some twice, some 4 times, some 6 times, some 8 times, and some
10 times. In this paradigm, the average lag between successive
repetitions is confounded with the number of repetitions (being
shorter the greater the number of repetitions), but the degree of
nonspecific practice is relatively constant over number of repe-
titions. Also, old nonwords are mixed together with new and
old words, so there should be no benefit for nonwords if subjects
respond to nonwords by default.
0 2 4 6 B 10 12 14 16 The mean reaction times are plotted in the top panel of Fig-
NUMBER OF PRESENTATIONS
QN*w Words A Old Word* ON«w Norwords XOId Nonwordi ure 6, and the standard deviations are plotted in the bottom

Figure 4. Reaction times to old and new words and nonwords as a func-
tion of the number of presentations in the lexical decision task of Exper- LEARNING: POWER FUNCTION FIT

iment 1.

The mean reaction times for new and repeated items are pre-
sented in Figure 4. There was some evidence of a general prac-
tice effect, which reduced reaction times for new items slightly
over blocks. However, the specific practice effect was much
stronger: Reaction times to repeated items decreased substan-
tially over blocks, both absolutely and relative to the new-item
controls. These effects were apparent for nonwords as well as
words.
The means and standard deviations for repeated items appear
in Figure 5, the means in the top panel and the standard devia-
6 B 10 12
tions in the bottom. The points represent the observed data; NUMBER OF PRESENTATIONS
the lines represent power functions fitted to the data under the A Word* XNoiword*

constraint that the mean and standard deviation should have


the same exponent. Words and nonwords were fitted separately.
The estimated parameters for the power functions and mea- LEARNING: POWER FUNCTION FIT
sures of goodness of fit (r2 and root-mean-squared deviation, or
rmsd) appear in Table 2. Table 2 also contains parameters and
goodness-of-fit measures for power functions fitted to means
and standard deviations separately, to give some idea of the
effect of constraining the exponent.
The data were well fit by power functions. Moreover, the con-
strained fit was almost as good as the unconstrained fit; rmsds
were within 2 ms.
The contrast between repeated words and new words suggests
that the repetition effect may be instance- or item-based, as the E 100

instance theory predicts. Process-based learning predicts no


difference, yet a difference was observed. The repetition effect
for nonwords is harder to interpret. Nonword reaction times
could have decreased because subjects responded to them by
2 4 6 B 10 12 14 16
remembering what they did when they last saw them, or be- NUMBER OF PRESENTATIONS
A Words X Nonwords
cause subjects responded to them by default (i.e., by failing to
find evidence that they were words), and the default response
Figure 5. Reaction times (top panel) and standard deviations (bottom
became faster as the word decisions became faster. The present panel) as a function of the number of presentations in the lexical deci-
experiment does not allow us to decide between these alterna- sion task of Experiment 1. (Points represent the data; lines represent
tives. That was a major reason for investigating other schedules the best-fitting power function. Power functions for means and standard
of repetition. deviations were constrained to have the same exponent.)
INSTANCE THEORY OF AUTOMATIZATION 503

Table 2
Parameter Estimates From Constrained and Separate Fits of Power Functions to Means and Standard Deviations
of Reaction Times in the Lexical Decision Tasks of Experiments 1-3

Experiment 2: Experiment 3: Experiments:


Experiment 1: Learning Repetition Meain lag 12 Mean lag 24

Parameter Word Nonword Word Nonword Word Nonword Word Nonword

Constrained fits

ART 485 482 478 518 501 541 518 574


BRT 101 216 106 143 87 141 95 115
C 0.749 0.563 1.033 1.078 1.622 0.702 0.816 0.880
ASD 81 46 84 95 88 89 86 99
BSD 43 157 47 76 54 84 64 75
r2 0.996 0.994 0.998 0.998 0.996 0,996 0.996 0.996
rmsd 8.9 12.2 5.6 5.6 9.5 10.2 8.6 9.6

Separate fits

ART 482 409 479 511 500 540 573 573


BRT 103 280 105 148 88 142 115 115
CRT 0.698 0.327 1.061 0.940 1.571 0.687 1.069 0.859
ASD 85 75 82 102 88 90 61 100
BSD 41 140 48 72 54 84 85 75
CSD 1.063 0.970 0.919 1.601 1.748 0.724 0.387 0.928
r1 0.996 0.996 0.998 0.998 0.996 0.998 0.998 0.996
rmsd 8.9 10.9 5.6 5.1 9.5 10.2 8.3 9.6

Note. ART = asymptote for mean reaction time; BRT = multiplicative constant for mean reaction time; ASD = asymptote for standard deviation;
BSD = multiplicative constant for standard deviation; C = exponent fitted to means and standard deviations simultaneously; CRT = exponent
fitted to means separately; CSD = exponent fitted to standard deviations separately; r2 = squared correlation between observed and predicted
values; rmsd = root mean squared deviation from prediction.

panel. The points represent the observed data, and the lines rep- Experiment 3
resent fitted power functions constrained such that means and
standard deviations have the same exponent. As in Experiment In Experiment 3, the number of repetitions was manipulated
1, the constrained power-function fit was excellent for both the by varying the number of blocks in which a word or nonword
means and the standard deviations. The parameters of the con- appeared. Some words and nonwords were presented in only 1
strained power functions and measures of goodness of fit appear block, others in 2 consecutive blocks, and others in 4, 8, and 16
in Table 2. Table 2 also contains parameters and measures of consecutive blocks. Thus, the lag between successive presenta-
goodness of fit for power functions fitted to the means and stan- tions was held constant, like the first experiment, but old and
dard deviations separately. Again, the constrained fit was nearly new items were mixed randomly, like the second experiment.
identical to the separate fit. In addition, the mean lag between successive presentations
These results confirm the conclusion from Experiment 1: was varied between subjects to see whether the confounding of
that the repetition effect is specific to individual stimuli be- lag with repetitions in Experiment 2 was likely to have affected
cause repeated and new stimuli were mixed randomly in each the results. One half of the subjects had a mean lag of 12 items
block. Thus, subjects could not have adjusted speed-accuracy between successive presentations, and the other half had a mean
criteria, and so forth, in anticipation of repeated stimuli, as lag of 24 items (see Appendix B for further details of the proce-
they could have in the previous experiment. Significantly, this dure).
conclusion applies to the nonwords as well as the words: Sub- The data from the mean lag 12 group are presented in Figure
jects responded to repeated nonwords faster than they re- 7; the data from the mean lag 24 group are presented in Figure
sponded to new words, so they could not have sped up their 8. In each case, the top panel contains the mean reaction times
reaction times to repeated nonwords by default responding. and the bottom panel contains the standard deviations. The
Thus, the data suggest that subjects remembered their previ- points represent the observed data, and the lines represent con-
ous encounters with specific nonwords and with specific strained power functions fitted to the means and standard devi-
words, as instance theory predicts. It remains possible, how- ations. Mean lag had no significant effects on performance, nei-
ever, that the benefit from repeated presentations could be an ther main effect nor interactions (see Appendix B). For both lag
artifact of the shorter lag between successive repetitions for conditions, reaction times and standard deviations decreased
stimuli that are repeated more often. Experiment 3 was in- with repetition, as the instance theory predicts. The constrained
tended to address that issue. power functions fitted the data very well, almost as well as the
504 GORDON D. LOGAN

REPETITION: POWER FUNCTION FTT


ations of reaction times that was predicted by the instance the-
ory. The means and standard deviations both decreased as
power functions of practice, and the functions had the same
exponent. This was confirmed in each experiment. It was true
for words and for nonwords, and it was true for three different
schedules of repetition. The constraint between the means and
the standard deviations amounts to predicting co-occurrence of
different properties of automaticity, which has been a conten-
tious issue in the recent literature (see Kahneman & Chajzyck,
1983;Paap&Ogden, 1981;Regan, 1981 vs. Logan, 1985b; Jon-
ides, Naveh-Benjamin, & Palmer, 1985); it will be discussed in
detail in a subsequent section.
Second, Experiments 1-3 indicate that the repetition effect is
specific to individual stimuli, as the instance theory predicts.
6 B 10 12
This was evident in the contrast between repeated items and
NUMBER OF PRESENTATIONS
& Words X Nonwords new-item controls in Experiment 1 and in the contrast between

REPETITION; POWER FUNCTION FIT


HEAN LAO 12: POWER FUNCTION FIT

0 2 4 6 B 10 12 14 1B
NUMBER OF PRESENTATIONS
A Words XNonwords NuneeR OF PRESENTATIONS
AUord» XNonuord*
Figure 6. Reaction times (top panel) and standard deviations (bottom
panel) as a function of the number of presentations in the lexical deci-
MEAN LAG 12: POWER FUNCTION FIT
sion task of Experiment 2. (Points represent the data; lines represent
the best-fitting power function. Power functions for means and standard
deviations were constrained to have the same exponent.)

separate fits (see Table 2 for parameter values and measures of


goodness of fit).
These results indicate that the benefit from repetition is spe-
cific to individual stimuli because the repeated stimuli that
showed benefit were mixed randomly with new stimuli. As in
Experiment 2, this was true for nonwords as well as words,
which suggests that subjects remembered individual encounters
with specific nonwords as well as words. Unlike the previous
experiment, there was no confound between lag and the number
of presentations, so the increased benefit with multiple repeti- 0 2 4 6 e 10 12 14 16
tions was not an artifact of lag. Indeed, the null effect of mean NUMBER OF PRESENTATIONS
A Words XNonwords
lag suggests that lag is not an important variable when varied
within the limits of Experiments 1-3. Figure 7. Reaction times (top panel) and standard deviations (bottom
panel) as a function of the number of presentations for the mean-lag =
Discussion of Experiments 1-3 12 condition of the lexical decision task in Experiment 3. (Points repre-
sent the data; lines represent the best-fitting power function. Power
Experiments 1-3 support several conclusions: First, they functions for means and standard deviations were constrained to have
demonstrate a constraint between the means and standard devi- the same exponent.)
INSTANCE THEORY OF AUTOMATIZATION 505

MEAN LAG 24: POWER FUNCTION FIT


nonwords; if words showed benefit relative to controls, non-
words should show a cost (e.g., Balota & Chumbley, 1984).

Experiment 4: Alphabet Arithmetic

Experiment 4 examined practice effects in an alphabet arith-


metic task. Subjects were asked to verify equations of the form
A + 2 = C, B + 3 = E, C+4 = G, and £> + 5 = 7. Subjects
typically reported that they performed the task by counting
through the alphabet one letter at a time until the number of
counts equaled the digit addend, and then comparing the cur-
rent letter with the presented answer. For example, E + 5 = K
would involve counting five steps through the alphabet (F, G,
H, I, J), comparing the J with the given answer, K, and respond-
6 8 10 12 ing false. Consistent with subjects' reports, the time to verify
NUM6ER OF PRESENTATIONS
A Words X Nonwords alphabet arithmetic equations increases linearly with the digit
addend (i.e., with the number of counts). The slope of the func-
tion is typically 400-500 ms per count, with an intercept of
MEAN LAG 24: POWER FUNCTION FIT
1,000 ms (Logan & Klapp, 1988).
This experiment was intended to provide a test of the instance
theory that was very different from the lexical decision tasks
reported earlier. Reaction times in alphabet arithmetic are
nearly an order of magnitude longer than reaction times in lexi-
cal decision, and the practice effects extend over several ses-
sions. It was also intended to mimic children's acquisition of
addition, which involves a transition from a serial counting al-
gorithm to memory retrieval (e.g., Ashcraft, 1982; Siegler,
1987; Zbrodoff, 1979). Presumably, with enough practice, adult
subjects would learn to perform the alphabet arithmetic task by
memory retrieval instead of counting.
Subjects were trained on 10 letters from one half of the alpha-
bet (either ./J through J, or .K through T). Each letter appeared
NUMBER OF PRESENTATIONS
with four different addends (2, 3, 4, and 5) in a true equation
A Words XNonwords and in a false equation. False equations were true plus one (e.g.,
Figure 8. Reaction times (top panel) and standard deviations (bottom A + 2 = D) or true minus one (e.g., A + 2 = B); the kind of false
panel) as a function of the number of presentations for the mean-lag = equation varied between subjects. Thus, each subject experi-
24 condition of the lexical decision task in Experiment 3. (Points repre- enced a total of 80 different problems during training (10
sent the data; lines represent the best-fitting power function. Power letters X 4 digit addends X true vs. false). Each problem was
functions for means and standard deviations were constrained to have presented 72 times; 6 times per session for 12 sessions. There
the same exponent.) were 480 trials per session. Further details of the procedure can
be found in Logan and Klapp (1988).
The means and standard deviations of the reaction times are
repeated and new items in Experiments 2 and 3. This specificity plotted in Figures 9 and 10. Figure 9 contains data from true
of learning rules out process-based theories that explain autom- responses, and Figure 10 contains data from false responses.
atization and skill acquisition in terms of the development of The data are plotted in logarithmic coordinates so that the
general procedures that deal with new stimuli as effectively as power functions (solid lines) fitted to the data (points) will ap-
old ones (e.g., Anderson, 1982; Rabbitt, 1981). Such theories pear as straight lines. The instance theory predicts that the line
may apply in other domains, but they are not appropriate in the for mean reaction times should be parallel to the line for the
present situation. standard deviations. The slope of the line is the exponent of the
Third, Experiments 1-3 demonstrate that nonwords as well power function in linear coordinates, and the theory predicts
as words can benefit from repetition. This suggests that the rep- that means and standard deviations will have the same expo-
etition effects are based on memory for specific episodes rather nent. The functions fitted to the data in Figures 9 and 10 were
than adjustments to semantic memory (e.g., Feustal, Shifrrin, constrained to have the same exponent for the mean and stan-
& Salasoo, 1983; Jacoby & Brooks, 1984; Salasoo, Shifrrin, & dard deviation, so the lines are parallel. The question is whether
Feustal, 1985). It also suggests that repetition effects may derive the points depart systematically from the parallel lines. Param-
from the acquisition of associative information rather than item eters of the fitted power functions and measures of goodness of
information. If repetition affects item-specific familiarity, sub- fit are presented in Table 3.
jects would not be able to discriminate words from repeated In evaluating the fits in the figures, it is important to note that
506 GORDON D. LOGAN

YES RESPONSES: ADDEND = 2 YES RESPONSES: ADDEND


U.10' ,.10'n

1
I

10' 10'
10 10'
NUMBER OF PRESENTATIONS NUMBER OF PRESENTATIONS

, 10 4 - YES RESPONSES: ADDEND = 4


YES RESPONSES: ADDEND = 5

10'-

10'•
10'
10 10' 10 101
NUMBER OF PRESENTATIONS NUMBER OF PRESENTATIONS

Figure 9. Means and standard deviations of reaction times to verify true alphabet arithmetic equations as a
function of the number of presentations and the magnitude of the digit addend. (Points represent observed
data; lines represent fitted power functions constrained to have the same exponent for means and standard
deviations, parallel lines in log-log plots.)

the log scale introduces systematic distortions. Differences are of 2, 3, and 4. The functions for means and standard deviations
exaggerated at low values of the variable and compressed at high appear parallel, as they should in log-log plots if they both de-
values. Consequently, low values will appear to fit less well than crease as power functions with the same exponent. Indeed, the
high values. The standard deviations, which range from 162 ms constrained fits were as good as separate fits for all addends (see
to 1,408 ms, will appear to fit less well than the means, which Table 3). Thus, the predictions of the instance theory are con-
range from 816 ms to 4,285 ms. Objectively, the means were firmed in the arithmetic task as well as in the previous lexical
fitted better than the standard deviations, but the difference was decision tasks.
not as large as it appears in the figures. The fits were not perfect, however. The power functions for
The power functions fit the data reasonably well for addends addends of 2, 3, and 4 tended to overestimate the last few pre-
INSTANCE THEORY OF AUTOMATIZATION 507

NO RESPONSES: ADDEND = 2 NO RESPONSES: ADDEND = 3


. 10 S . 10 S

10'

10 10'
10 10'
NUMBER OF PRESENTATIONS NUMBER OF PRESENTATIONS

NO RESPONSES: ADDEND - 5

10 10'
10
NUMBER OF PRESENTATIONS NUMBER OF PRESENTATIONS

Figure 10. Means and standard deviations of reaction times to verify false alphabet arithmetic equations as
a function of the number of presentations and the magnitude of the digit addend. (Points represent observed
data; lines represent fitted power functions constrained to have the same exponent for means and standard
deviations, parallel lines in log-log plots.)

sentations. The deviation was much worse for addend = 5 The discontinuity reflects a strategy shift reported by
equations, where a discontinuity appeared in the learning many of the subjects. Several subjects reported deliberately
curves at about 24 presentations; learning seems faster after the learning the 5s because they were the most difficult problems.
discontinuity than before. This discontinuity, which is inconsis- Typically, subjects developed mnemonics for the 5s, which
tent with the instance theory, cannot be accounted for easily by allowed them to respond on the basis of memory. For ex-
any current theory of skill acquisition. Most theories of skill ample, when I tried the experiment as a pilot subject, I used
acquisition predict a power-fimction learning curve, which is psychologists names as mnemonics: I remembered A + 5 =
continuous. Thus, the evidence against the instance theory is F as A. F. Sanders, E + 5 = J as E. J. Gibson, and G + 5 =
also evidence against its competitors. L as Gordon Logan. These mnemonics provided anchor
508 GORDON D. LOGAN

Table 3
Parameter Estimates From Constrained and Separate Fits of Power Functions to Means and Standard Deviations
of Reaction Times in the Alphabet Arithmetic Task ofExperiment 5

True equations: Addend False equations: Addend

Parameter

ART 384 195 0 0 357 362 29 0


BRT 2052 2770 3643 4816 2371 3110 4065 5037
C 0.317 0.255 0.238 0.327 0.327 0.324 0.282 0.319
ASD 0 0 0 75 0 0 0 10
BSD 828 893 1144 1346 903 1158 1249 1513
r2 0.968 0.960 0.908 0.867 0.960 0.960 0.949 0.895
rmsd 59.4 82.0 148.9 206.3 72.3 87.4 116.1 193.9

ART 133 30 5 0 0 98 97 0
BRT 2246 2892 3679 4834 2630 3307 3976 5028
CRT 0.251 0.228 0.241 0.328 0.245 0.274 0.269 0.319
ASD 30 106 112 76 76 0 0 0
BSD 889 817 1099 1334 889 1088 1274 1470
CSD 0.383 0.332 0.303 0.321 0.416 0.547 0.290 0.301
r1 0.974 0.960 0.908 0.869 0.962 0.964 0.949 0.896
rmsd 54.9 80.9 149.2 206.3 69.9 84:7 115.3 193.5

Note. ART = asymptote for mean reaction time; BRT = multiplicative constant for mean reaction time; ASD = asymptote for standard deviation;
BSD = multiplicative constant for standard deviation; C = exponent fitted to means and standard deviations simultaneously; CRT = exponent
fitted to means separately; CSD = exponent fitted to standard deviations separately; r2 = squared correlation between observed and predicted
values; rmsd = root mean squared deviation from prediction.

points for subjects, which made it easier for them to learn learned specific responses to specific stimuli during training,
the 4s. which is consistent with instance- or item-based learning and
One might model the addend = 5 data with two learning inconsistent with process-based learning.
curves, one reflecting the inefficient mnemonic strategy from To summarize, the predictions of the instance theory were
Trials 1-24 and another reflecting the more efficient strategy supported in the alphabet arithmetic task as they had been in
from Trials 25-72.1 fitted separate power functions to the data the lexical decision task of Experiments 1-3. Specifically, the
before and after the 24th trial. As before, means and standard training data showed that the means and standard deviations of
deviations were fitted simultaneously with the constraint that reaction times decreased as power functions of practice with
they have the same exponent. The fits were much better than the same exponent (appearing parallel in log-log plots) and the
the previous ones. For true responses, rmsd decreased from 206 transfer data showed that learning was item-based rather than
ms to 90 ms; (or false responses, it decreased from 194 ms to process-based.
88ms.
Thus, the addend = 5 data may be inconsistent with a strict Evidence of Separate Instances
interpretation of the instance theory, in which neither the algo-
rithm nor the memory process changes over practice, but they The fits in the previous experiments provide evidence that is
are consistent with a more general view of instance theory, consistent with the instance theory, but they do not uniquely
which interprets automaticity as a transition from an algorithm support it. Most theories of skill acquisition predict a power-
to a memory process. There seem to have been two transitions function reduction in the mean (e.g., Anderson, 1982; Cross-
in the addend = 5 condition, one to an inefficient memory pro- man, 1959; MacKay, 1982; Newell & Rosenbloom, 1981), so
cess and another to a more efficient one, reminiscent of the pla- the fits to the means are not unique. The fits to the standard
teaus described by Bryan and Harter (1899). deviations, constrained to have the same exponent as the
One final piece of evidence deserves comment. Logan and means, were predicted by the instance theory and no other. But
Klapp (1988) reported data from subsequent sessions of the ex- the fits do not disconfirm predictions of the other theories; the
periment that bear on the instance theory. In particular, they other theories simply made no prediction. Furthermore, the ev-
ran a 13th session in which subjects were tested on the un- idence that automaticity is specific to the stimuli experienced
trained half of the alphabet (e.g., subjects trained on A-J were during training may rule out process-based theories of automa-
tested on K-T, and vice versa). Reaction times increased sub- tization, but it does not uniquely support instance theory.
stantially, compared with the previous session, and the slope of Strength theories, which assume that practice strengthens con-
the function relating reaction time to the magnitude of the ad- nections between generic stimuli and generic responses, can
dend, which had been close to zero on the previous session, in- also predict item-based learning (e.g., Anderson, 1982;
creased to about 400 ms/count. This suggests that subjects MacKay, 1982; Schneider, 1985).
INSTANCE THEORY OF AUTOMATIZATION 509

The instance theory can be distinguished from strength theo- and then transferred to a frequency judgment task. Two groups
ries by demonstrating the existence of separate memory repre- were trained under consistent interpretation conditions, and
sentations of each encounter with a stimulus (see, e.g., Hintz- two groups were trained under varied interpretation conditions.
man, 1976). According to the instance theory, each prior epi- To manipulate the consistency of interpretation, subjects were
sode is represented in memory, whereas strength theories shown three kinds of stimuli: words, pronouncible nonwords,
represent past history by a single strength measure, casting and unpronouncible nonwords. Subjects could interpret these
aside the details of individual episodes. The difference between stimuli under lexical decision instructions, distinguishing be-
these representational assumptions can be seen clearly in a va- tween words and nonwords, or under pronunciation decision
ried-mapping design, in which subjects give different responses instructions, distinguishing between pronouncible and unpro-
or different interpretations on different exposures to the same nouncible letter strings. Logan (1988) showed that alternating
stimulus. Varied mapping typically produces little or no evi- between these interpretations over successive presentations im-
dence of learning (e.g., Schneider & Shiffrin, 1977). paired learning, relative to consistent-interpretation controls.
According to the instance theory, each exposure would result One consistent-interpretation group performed lexical deci-
in a separate trace. At retrieval time, there are two possibilities: sions on each training trial, and the other performed pronuncia-
All of the separate traces could be retrieved, but they would tion decisions. The two varied-interpretation groups alternated
yield inconsistent or incoherent information about how the cur- between lexical decisions and pronunciation decisions over suc-
rent stimulus should be interpreted, so the subject should ignore cessive presentations of the stimuli. One group began with lexi-
them and respond on the basis of the algorithm. Alternatively, cal decisions, and the other began with pronunciation decisions.
the current instructional set and the current stimulus could act In the transfer task, all four groups saw new stimuli as well as
as a compound retrieval cue and retrieve only those episodes the stimuli they were trained on, and were asked to estimate
that were encoded in the context of the current instructions. In the frequency with which each stimulus had appeared during
the first case, there would be no evidence of learning; in the training. Further details of the method can be found in Appen-
second case, there would be less learning than would be ob- dix C.
served in consistent-mapping control conditions. Training results. The training data are presented in Figure
By contrast, in some versions of strength theory (e.g., Schnei- 11 as benefit scores. Benefit was calculated by subtracting reac-
der, 1985), the different interpretations in varied mapping can- tion time for second and subsequent presentations of a stimulus
cel one another out, resulting in no net gain in strength, and from reaction time for the first presentation of a stimulus, in
thus, no evidence of learning. Subjects would have no choice order to remove differences due to the initial algorithm and fa-
but to rely on the algorithm. Other versions of strength theory cilitate comparisons of learning effects between conditions.
might be constructed in which strength accrued separately to Benefit scores for consistent-interpretation lexical decision sub-
each interpretation. As with instance theory, the retrieved inter- jects and for varied-interpretation subjects who began with lexi-
pretations would conflict with each other, so the subjects should cal decision are presented in the top panel of Figure 11. Benefit
ignore memory and rely on the algorithm. scores for consistent-interpretation pronunciation subjects and
The instance theory can be distinguished from the first ver- for varied-interpretation subjects who began with pronuncia-
sion of strength theory by transferring subjects to a frequency- tion decisions are presented in the bottom panel of Figure 11.
judgment task, just as instance theories of memory were distin- The error rates are presented in Appendix C.
guished from strength theories by frequency judgment tasks The lexical decision subjects showed more benefit for consis-
(e.g., Hintzman, 1976). Instance theory predicts that subjects tent interpretations than for varied interpretations. The differ-
trained under varied interpretation conditions should be just as ence was largest for pronouncible nonwords, intermediate for
sensitive to the frequency with which individual stimuli were words, and smallest for unpronouncible nonwords. The pro-
presented as subjects trained under consistent interpretation nunciation subjects showed less clear-cut results: Consistent in-
conditions, because both groups of subjects would encode rep- terpretation produced an advantage over varied interpretation
resentations of each encounter with a stimulus. By contrast, only for pronouncible nonwords, and even that difference di-
strength theory would predict that subjects trained under varied minished after 10 or 11 presentations. These conclusions were
interpretation conditions would be less sensitive to frequency confirmed by ANOVA, reported in Appendix C.
information than subjects trained under consistent interpreta- The results of the lexical decision group are sufficient for the
tion conditions because there is no separate episodic trace rep- present purpose, which is to determine whether there is a disso-
resenting each encounter with the stimulus. ciation between repetition effects and frequency judgments as
In other words, instance theory predicts a dissociation be- measures of memory following training with varied interpreta-
tween repetition effects and frequency judgments as measures tion. If there is a dissociation, then frequency judgments follow-
of memory after varied-interpretation training, whereas certain ing lexical decisions ought to be just as accurate for varied-in-
strength theories predict no such dissociation. terpretation subjects as for consistent-interpretation subjects; if
there is no dissociation, then frequency judgments should be
less accurate for varied-interpretation subjects than for consis-
Experiment 5
tent-interpretation subjects.
To test the dissociation, four groups of subjects were trained Transfer results. The average frequency estimates for each
in the paradigm from Experiment 3 (i.e., some stimuli were pre- group are presented in Figure 12. The left-hand panels present
sented in 1 block, others in 2, 4, 8, or 16 consecutive blocks), the data from consistent- and varied-interpretation lexical deci-
510 GORDON D. LOGAN

LEXICAL DECISION measures of memory: Consistency of stimulus-to-interpreta-


tion mapping had strong effects on lexical decision perfor-
.160- mance, but had negligible effects on frequency judgments. This

I dissociation was predicted by the instance theory, on the as-


sumption that subjects encode separate representations of each
I 120- encounter with each stimulus. It would not be predicted in a
straightforward manner by strength theories in which strength-
fe
ening one interpretation weakens the other, resulting in no net
8 gain in strength (e.g., Schneider, 1985).
fc
UJ
It is possible, however, to have a theory in which multiple
S exposures to a stimulus are represented both as a strength value
m 40-
and as separate episodic traces. That sort of theory could ac-
count for the present results, with the strength values underly-
ing the repetition effects and the separate traces underlying the
4 8 12 16
frequency judgments. However, the theoretical development of
NUMBER OF PRESENTATIONS
the instance theory showed that separate traces can produce
repetition effects, so it would not be clear whether the strength
values or the separate traces were responsible for the repetition
PRONUNCIATION DECISION effects. A more parsimonious interpretation would be that there
are only separate traces in memory, which produce both the
160- repetition effect and the frequency judgments. The onus would
be on the strength theorists to show that strength values did in
fact underlie repetition effects.
120-

General Discussion

The experiments tested the instance theory in three ways:


First, they tested the power-law predictions and found that, as
m 40- predicted, means and standard deviations of reaction times de-
creased as power functions of the number of trials with the same
exponent (Experiments 1-4). Second, the experiments deter-
mined whether learning was item-based, as the instance theory
4 8 1 2 16 predicts, or process-based, as the modal view predicts. The data
NUMBER OF PRESENTATIONS
supported the instance theory: Subjects learned specific re-
Figure 11. Benefit scores as a function of the number of presentations sponses to specific stimuli and did not transfer well to new stim-
for consistent (solid lines) and varied (broken lines) interpretations of uli (Experiments 1-4) or to new approaches to old stimuli (Ex-
words (boxes), pronouncible nonwords (triangles), and unpronouncible periment 5). Third, Experiment 5 tested the instance theory
nonwords (stars) in the lexical decision task (top panel) and pronuncia-
against certain strength theories, asking whether subjects retain
tion task (bottom panel) of Experiment 5.
separate representations of each stimulus even if they don't use
them to support performance. The data showed evidence of
separate representations, as the instance theory predicts.
The success of the theory in these three tests suggests that
sion subjects; the right-hand panels present the data from con-
it should be taken seriously in studies of skill acquisition and
sistent- and varied-interpretation pronunciation subjects. The
automaticity. The remainder of this section discusses implica-
top panels present frequency estimates for words, the middle
tions of the instance theory for current issues and controversies.
panels present estimates for pronouncible nonwords, and the
bottom panels present estimates for unpronouncible nonwords.
Frequency estimates from subjects trained with varied inter- Instance Theory and the Properties of Automaticity
pretations were very close to the estimates from subjects trained
The instance theory provides a new perspective on many of
with consistent interpretations. If anything, they were a little
the qualitative properties that distinguish automatic and nonau-
more accurate. Thus, in the lexical decision subjects, at least,
tomatic processing. In some cases, the properties derive from
there was a dissociation between frequency estimates and repe-
the assumptions about representation and retrieval from mem-
tition effects as measures of memory. These conclusions were
ory. In other cases, the differences occur because automatic and
confirmed by ANOVAS, reported in detail in Appendix C.
nonautomatic processing are based on different processes. Fac-
tors that affect the initial algorithm need not affect the memory
Conclusions
retrieval process, and vice versa. In the remainder of this sec-
The results from the lexical decision subjects showed a disso- tion, these principles are applied to several of properties of au-
ciation between repetition effects and frequency judgments as tomaticity.
INSTANCE THEORY OF AUTOMATIZATION 511
LEXICAL DECISION PRONUNCIATION DECISION
, 2 -. WORDS WORDS

- 2 0 2 4 6 8 10 12 14 16 0 2 4 6

NUMBER OF PRESENTATIONS NUMBER OF PRESENTATIONS

LEXICAL DECISION PRONUNCIATION DECISION


12, PRONOUNCIBLE NONWORDS PRONOUNCIBLE NONWORDS

s
I H

-2 0 2 i 6 8 1 0 12 14 16 ^2 0 2 4 6 8 10 12 14 16
NUMBER OF PRESENTATIONS NUMBER OF PRESENTATIONS

LEXICAL DECISION PRONUNCIATION DECISION


UNPRONOUNCIBLE NONWORDS ,2-, UNPRONOUNCIBLE NONWORDS

- 2 0 2 4 6 8 10 12 14 - 2 0 2 4 6 8 10 12 14
NUMBER OF PRESENTATIONS NUMBER OF PRESENTATIONS

Figure 12. Estimated frequencies of occurrence as a function of actual frequencies of occurrency for consis-
tent (solid lines) and inconsistent (broken lines) interpretations of words (top panels), pronouncible non-
words (middle panels), and unpronouncible nonwords (bottom panels) after training on the lexical decision
task (left panels) and the pronunciation task (right panels) of Experiment 5.

Autonomy ing comes from Stroop and priming studies, in which an irrele-
vant stimulus influences the processing of a relevant stimulus
Phenomenal experience and experimental data suggest that (but see Zbrodoff & Logan, 1986). For example, subjects are
automatic processing is autonomous, in that it can begin and slower to name colors when the colors form words that repre-
run on to completion without intention (for reviews, see Kah- sent irrelevant color names (BLUE in red ink; Stroop, 1935),
neman & Treisman, 1984; Logan, 1985b; Zbrodoff & Logan, and subjects make lexical decisions faster if the target word
1986). Instance theory accounts for the autonomy by assuming (e.g., DOCTOR) is preceded by a related word (e.g., NURSE)
that memory retrieval is obligatory; attention to a stimulus is than if it is preceded by an unrelated word (e.g., BUTTER;
sufficient to cause the retrieval of all of the information associ- Meyer & Schvaneveldt, 1971). The modal interpretation of
ated with the stimulus, whether or not the subject intends to such effects is that the irrelevant stimulus activates its memory
retrieve it. representation and that activation speeds or slows responses to
The major evidence for the autonomy of automatic process- related stimuli—in other words, the irrelevant stimulus re-
512 GORDON D. LOGAN

trieves response-related information from memory. The Stroop and priming studies are interesting because they focus
modal interpretation also assumes that activation is not only on a different sense of automaticity than the practice studies
obligatory but also is independent of intention and attention; that were addressed by the instance theory. Stroop and priming
it occurs whether or not the subject attends to the stimulus or studies are concerned with the activation of memory represen-
intends to process it (e.g., Logan, 1980; Neely, 1977; Posner & tations, whereas practice studies are concerned with the use of
Snyder, 1975). activated memory representations in the performance of tasks.
Two recent findings are difficult to reconcile with the modal Stroop and priming studies consider a stimulus to be processed
view. First, Francolini and Egeth (1980) and Kahneman and automatically if it retrieves anything from memory, whereas
Henik (1981) showed that Stroop interference was stronger practice studies consider a stimulus to be processed automati-
when subjects attended to the irrelevant stimulus. Kahneman cally only if it retrieves something useful from memory. The
and Henik showed subjects two words (e.g., MOST and BLUE), different senses are apparent in interpreting control conditions
one of which was colored (e.g., red) and one of which was black. in the different paradigms: In Stroop tasks, neutral (noncolor)
The task was to name the color of the colored word (in this case, words are assumed to activate their memory representations
red). The irrelevant color word (BLUE) interfered with color automatically even though the activation has no effect on color
naming much more when it was colored, and hence was at- naming. But in practice studies, varied-mapping control stim-
tended, than when it was black, and hence unattended. Franco- uli are not considered to be processed automatically, even
lini and Egeth (1980) found similar results in a numerical though they may retrieve information from memory (cf. Exper-
Stroop task in which subjects counted red letters or digits and iment 5).
ignored black ones; irrelevant digits interfered more when they It is tempting to think of Stroop-type automaticity as one
were red, hence attended, than when they were black, and hence component of the automaticity addressed in practice studies be-
ignored. cause memory retrieval is the first step in performing a task
Second, M. Smith (1979; M. Smith, Theodor, & Franklin, automatically. One could imagine a progression from one type

1983) and Henik, Friedrich, and Kellogg (1983) showed that of automaticity to the other, as a small number of stimulus ex-
posures may cause enough memory activation to produce
Stroop and priming effects can be completely eliminated by
Stroop and priming effects but not enough to provide a reliable
manipulating the way in which the subject processes the irrele-
basis for performing the task. However, Stroop and priming
vant stimulus. M. Smith and her colleagues showed that prim-
effects do not provide a pure measure of memory retrieval.
ing effects could be eliminated if subjects treated the priming
Memory retrieval affects performance by interacting with a
stimulus as a letter string and not as a word (i.e., by searching
subsequent decision process, just as it does in practice studies
for a letter within it vs. making a lexical decision). Henik et al.
(e.g., Logan, 1980). The intentional and attentional effects de-
showed that Stroop effects could be eliminated in the same way.
scribed earlier suggest that the appearance of Stroop and prim-
Both sets of results are difficult to deal with in the modal
ing effects depends on the relation between retrieval cues and
interpretation of Stroop and priming effects because an irrele-
decision processes, just as practice effects do. Thus, one need
vant stimulus is supposed to activate its memory representation
not expect Stroop and priming effects to appear sooner than
whether or not the subject attends to it or intends to process it.
practice effects. It may be possible to relate the two senses of
Instance theory can account for the attention effects (i.e., Fran-
automaticity theoretically, but that does not mean that their
colini & Egeth, 1980; Kahneman & Henik, 1981) with its as-
empirical manifestations will be related in a straightforward
sumption that attention is sufficient to cause associated infor-
manner.
mation to be retrieved from memory. Attention may not be nec-
essary, but it is sufficient, which means that (a) information at
the focus of attention may be more strongly activated than in-
Control
formation outside the focus, and therefore may provide stronger Phenomenal experience suggests that automatic processing
retrieval cues, and (b) information at the focus will be more is closely controlled. Behavior on "automatic pilot" is usually
likely to activate its memory representation than information coherent and goal-directed (Reason & Myceilska, 1982); skilled
outside the focus. practitioners are better than novices even though they perform
Instance theory would interpret the intention effects (i.e., automatically (Logan, 1985b). Experimental data also suggest
Henik et al., 1983; M. Smith, 1979; M. Smith et al., 1983) as that automatic processing is closely controlled. Skilled activities
retrieval effects: The instruction to process the irrelevant stimu- such as speaking and typing can be inhibited quickly in re-
lus in a different way leads subjects to use different retrieval sponse to an error or a signal to stop, often within a syllable or
cues in conjunction with the stimulus, to retrieve the required a keystroke after the error or stop signal (Ladefoged, Silverstein,
information from memory. The effects on subsequent process- &Papcun, 1973;Levelt, 1983; Long, 1976; Logan, 1982; Rab-
ing will depend on what was retrieved and how it is related to bitt, 1978). Thoughts may be harder to inhibit than overt ac-
the subsequent task. If subjects make a lexical decision about tions (Logan, 1983, 1985a), but even they can be controlled by
the prime, then information about the meaning of the prime deliberately thinking of other things (Wenger, Schneider, Carter,
will be retrieved and available to influence a subsequent lexical & White, 1987).
decision. But if subjects search for a letter in the prime, infor- By contrast, the modal view is that automatic processing is
mation about letters will be retrieved, which may not affect a not well controlled. Shiffrin and Schneider (e.g., 1977) explic-
subsequent lexical decision. itly distinguish between automatic and controlled processing,
INSTANCE THEORY OF AUTOMATIZATION 513

and the idea is implicit in other approaches (e.g., LaBerge & interference with practice, in cases in which the concurrent task
Samuels, 1974; Posner & Snyder, 1975). The main motivation interferes with memory retrieval but not with the initial algo-
for the modal view on control is the evidence that automatic rithm. Thus, instance theory predicts a shift in dual-task inter-
processes are autonomous. But autonomy does not imply lack ference rather than a reduction in dual-task interference (also
of control. A process is autonomous if it can begin and run on see Logan, 1985b).
to completion without intention (Zbrodoff & Logan, 1986); Alternatively, instance theory would predict a reduction in
that does not mean it cannot be initiated and guided to comple- dual-task interference with automatization because automati-
tion intentionally. zation provides subjects with more ways to do the task. Initially,
According to the instance theory, automatic processes are they have only one way to perform the task—following the in-
used intentionally. Automaticity exploits the autonomy of the structed algorithm—so a concurrent task that interferes with
retrieval process, harnessing it so that it provides information the algorithm must affect their performance. After automatiza-
that is relevant the person's goals. Thus, automatic processes tion, however, they can use the algorithm or rely on memory
are controlled. The retrieval process can be controlled by ma- retrieval. If the concurrent task interferes with the algorithm,
nipulating retrieval cues or stimulus input, or both, and the sub- they can use memory retrieval; if it interferes with memory re-
sequent decision process can be inhibited before it results in an trieval, they can use the algorithm. In either case, their perfor-
overt response. Automatic processing may be a little harder to mance need not suffer. The general point is that automatic per-
control than algorithm-based processing, but only because it formance can be more flexible than nonautomatic perfor-
tends to be faster and allows less time for an act of control to mance, providing the subject with ways to avoid dual-task
take effect. It may be controlled differently, but it is controlled interference (also see MacKay, 1982).
nonetheless (for related views, see Logan, 1985b; Neumann, Another major line of evidence that automatic processing is
1984). effortless comes from search studies that manipulate the num-
ber of items in the memory set, the number of items in the dis-
play, or both. Initially, reaction time increases linearly with the
Effortlessness
number of memory set or display items, but after considerable
Phenomenal experience and experimental data also suggest practice, the slope approaches zero (for a review, see Schneider
that automatic processing is effortless (see Jonides, 1981; Lo- & Shiffrin, 1977). The initial linear increase is interpreted as
gan, 1978, 1979; Posner & Snyder, 1975; Shiffrin & Schneider, evidence that unpracticed search is effortful—search is as-
1977). Typical treatments of automaticity assume that process- sumed to be serial in order to minimize the momentary load on
ing capacity or cognitive resources are severely limited and that capacity—and the zero slope at the end of practice is interpreted
automatization is a way around the capacity limitations. By as evidence that search has become effortless or capacity free.
contrast, the instance theory does not assume any capacity limi- There are many criticisms of this interpretation (e.g., Cheng,
tations; novice performance is limited by a lack of knowledge 1985; Ryan, 1983), but the basic findings can be handled easily
rather than by scarcity of processing capacity or resources. Sub- by instance theory: Initially, performance depends on a search
jects perform slowly at first because they have no other way to algorithm in which subjects try to find the probe item in the
perform the task. As their knowledge base builds up with prac- memory set or the memory item in the display. Several different
tice, they have the choice of responding on the basis of the algo- algorithms could be used for the task, most of which would pro-
rithm or on the basis of memory retrieval. Presumably, they do duce the linear increase in reaction time with memory set size
not switch to memory retrieval until it is faster than the algo- or display size (see Townsend & Ashby, 1983). After practice,
rithm and at least as reliable. subjects retrieve the appropriate response directly from mem-
The major experimental evidence for effortless automatic ory, without searching, when given a probe or a multi-item dis-
processing comes from dual-task experiments that show that play as a retrieval cue (cf. Ryan, 1983; Schneider, 1985). This
practiced subjects suffer less interference from a concurrent scheme provides a natural account of memory search; whether
task than do unpracticed subjects (e.g., Bahrick & Shelley, it can work for visual search is not immediately clear (i.e., visual
1958; Logan, 1978, 1979). The usual interpretation is that the search may train an automatic attention response, which is not
practiced task requires fewer resources than the unpracticed part of the instance theory; Shiffrin & Schneider, 1977). The
task. That may be the case, but instance theory would suggest principle here is the same as in dual-task studies: Factors that
that the reduction in dual-task interference occurs because of affect the algorithm, such as display size or memory set size, do
the shift from the algorithm to memory retrieval: Because not necessarily affect memory retrieval.
memory retrieval and the algorithm are different processes, Instance theory accounts for the phenomenal experience of
tasks that interfere with the algorithm will not necessarily inter- effortlessness during automatic performance by suggesting that
fere with memory retrieval. The reduction in interference may memory retrieval is often easier than the algorithm. Indeed,
occur only because experimenters choose concurrent tasks that subjects would not be expected to switch from the algorithm to
interfere with the initial algorithm; if the concurrent task does memory retrieval until memory retrieval was quick and effort-
not interfere with the algorithm, another concurrent task is less.
chosen. So the reduction in interference may be an artifact of
the experimenter's choice of procedures rather than a general Unconsciousness
reduction in resource demands. Indeed, instance theory sug- The evidence that automatic processes are unconscious is
gests that it may be possible to find an increase in dual-task primarily phenomenal—we cannot easily introspect on the
514 GORDON D. LOGAN

things we do automatically. Traditional views have some diffi- difficult digit search task). Thus, either automaticity or the
culty dealing with this aspect of automaticity because con- dual-task conditions (or both) could have impaired memory in
sciousness is not an easy concept to express in the information- Experiment 2. One cannot tell from their experiments which
processing language that dominates current theorizing (e.g., was the important factor. The literature suggests that dual-task
Carr et al., 1982; Marcel, 1983; Posner & Snyder, 1975). I sus- conditions may have been responsible: Recent evidence indi-
pect that the attribution of unconsciousness to automatic pro- cates that dual-task conditions at encoding severely impair
cessing stems from the belief that automatic processing is pro- memory (Naveh-Benjamin & Jonides, 1986). Nissen and Bul-
cessing without thinking. In everyday language, for example, we lemer (1987) presented data suggesting that subjects cannot re-
say that we solved a problem automatically when the solution member stimuli processed in dual-task conditions. And Kol-
springs to mind without much thought or when it is the first ers's (1975) data suggested that subjects can remember stimuli
thing that occurs to us. Instance theory can capture this intu- processed automatically. They may not remember well, but
ition by specifying precisely what it means to think about a solu- they do remember.
tion (i.e., to compute a solution by applying an algorithm) and Typically, poor memory for stimuli processed automatically
what it means to find a solution without thinking (i.e., to re- is interpreted as an encoding deficit—encoding is either cursory
trieve a solution from memory), and both of them can be ex- or not done at all. By contrast, the instance theory interprets it
pressed easily in information-processing language. as a retrieval effect. The theory assumes that events are encoded
Another reason for believing that automatic processing may in the same way on each trial, whether it be the 10th or the
be unconscious is that algorithms may involve a series of steps 10,000th. In each case, a separate trace is created to represent
or stages on the way to a solution, each of which may be intro- the event. However, the traces may have a different impact on
spected upon, whereas memory retrieval is a single-step process retrieval, depending on how many other traces there are in
(e.g., the resonance metaphor of Hintzman, 1986, and Ratcliff, memory and depending on the retrieval task. The impact of
1978, or the holographic retrieval process described by Eich, trace i + 1 relative to trace;' will decrease as;' increases, follow-
1982, and Murdock, 1982, 1983). Thus, automatic processes ing a kind of Weber function. Adding one trace to zero makes
may not be available to the "mind's eye" long enough to provide more of a difference than adding one trace to 10 or one trace to
conscious evidence of their inner workings. 1,000.
The nature of the retrieval task is also important. Some re-
trieval tasks, like recognition and recall, require subjects to
Poor Memory
identify one trace out of many. Other retrieval tasks, like the
Phenomenal experience suggests that automatic processing kind used in studies of automaticity and categorization (e.g.,
results in poor memory for what was processed. When dis- Hintzman, 1986; Medin & Schaffer, 1978), allow subjects to re-
tracted, we may start up from a stop light, shift gears, and attain spond to the whole set of retrieved traces without focusing on
cruising speed on "automatic pilot," and then find we have no any one in particular. In the former, competitive retrieval tasks,
memory of having done so (also see Reason, 1984). There have the other traces can interfere with retrieval of the one desired
not been many experimental tests of memory for things pro- trace, for example, by adding noise to the decision process. The
cessed automatically, but the few that exist support the poor- task is like finding a particular tree in a forest; the more dense
memory hypothesis. the forest, the harder it is to find. Performance should get worse
For example, Kolers (1975) investigated acquisition of skill as the task becomes automatic and more traces are added to
at reading spatially transformed text (rof elpmaxe) and found memory. By contrast, in the latter, cooperative retrieval tasks,
that memory for transformed texts was better than memory for the different traces serve the same purpose, working together for
normal texts. He attributed the difference to the automaticity a common goal. The task is like finding a forest; the more trees
of normal reading. He showed as well that memory for trans- there are, the easier it is to find. Performance should get better
formed text declined as subjects acquired skill. Thus, his results as the task becomes more automatic and more traces are added
suggested that subjects can remember stimuli they process au- to memory.
tomatically, although not as well as stimuli they processed algo- The distinction between competitive and cooperative re-
rithmically. trieval tasks is well illustrated in the frequency paradox in recog-
Fisk and Schneider (1984) had subjects search for exemplars nition versus lexical decision:8 Low-frequency words are recog-
of target categories in a series of words presented one after the nized better than high-frequency words, but low-frequency
other. One experiment examined memory early in practice, words produce slower lexical decisions than high-frequency
during controlled processing; the other examined memory fol-
lowing automatic processing. The results were clear: Subjects
7
The poor memory for novel distractors may be a result of shallow
remembered much more accurately in Experiment 1 than in
processing and incongruity rather than automaticity. The targets were
Experiment 2. In fact, there was little evidence of memory for
very familiar and the distractors were new, so a simple familiarity judg-
some stimuli (novel distractors) in Experiment 2, which led Fisk
ment would allow accurate performance. Subjects may not interpret
and Schneider (1984) to conclude that automatic processing left the distractors beyond deciding they are unfamiliar, and that would not
no traces in memory.7 However, automaticity was confounded produce good memory. Also, memory is typically worse for no items
with dual-task conditions: Experiment 1 used only single-task than for yes items, possibly because no items are not congruent (Craik
conditions, and Experiment 2 used only dual-task conditions &Tulving, 1975).
(i.e., the category search task performed concurrently with a * I would like to thank Tom Can- for providing this example.
INSTANCE THEORY OF AUTOMATIZATION 515

words. This paradox occurs because the tasks require different als were memory-based and which were algorithm-based. Auto-
retrieval processes: Recognition requires subjects to discrimi- maticity is also relative in that memory strengthens progres-
nate one particular exposure to a word from all other exposures sively throughout practice, and it is appropriate to say that per-
to the word. High-frequency words have more exposures than formance is more automatic after 10,000 trials than after 1,000
low-frequency words, so a single exposure is harder to discrimi- trials, even if both performances are entirely memory-based.
nate. By contrast, lexical decision requires subjects to discrimi- Each trial has the same impact on memory regardless of the
nate any exposure to the word from no exposure. High-fre- number of trials that went before it. Thus, instance theory sug-
quency words have more exposures, so more of them are likely gests there are no limits on the degree of automaticity that may
to be retrieved, making the discrimination easier. be attained; automaticity may never be complete.
It is easier to judge automaticity relatively than absolutely:
Instance Theory and Issues in Automaticity The more automatic the performance, the faster it should be,
the less effortful, the more autonomous, and so on (also see Lo-
The instance theory provides new perspectives on the diagno- gan, 1985b). Assessments of relative automaticity can be made
sis of automaticity and the relation between automaticity and most confidently when two performances of the same task are
categorization. The perspectives are described in this section. compared, preferably at two different levels of practice. One
would expect practice to make a task more automatic, and the
more practiced task is likely to be more automatic. It is more
Diagnosing Automaticity
difficult to assess the relative automaticity of two different tasks.
Much of the interest in automaticity comes from studies that For example, one task may appear less effortful than another
attempt to determine whether specific processes, such as letter because its algorithm is easier, not because its performance is
encoding (Posner & Boies, 1971), lexical access (Becker, 1976), based more on memory retrieval. However, many of these prob-
and abstraction of superordinate categories (Barsalou & Ross, lems may be minimized and sometimes avoided by a careful
1986), are automatic. The processes are often acquired through task analysis (e.g., Jonides et al., 1985).
experience outside the laboratory, so the assessment of automa-
ticity is an absolute judgment; the process is either automatic
Automaticity and Categorization
or not automatic. Researchers often assess the properties of au-
tomaticity and consider a process to be automatic if it possesses The instance theory of automaticity bears a strong resem-
some or all of the properties. blance to instance theories of categorization (e.g., Hintzman,
However, the diagnosis of automaticity is fraught with prob- 1986; Jacoby & Brooks, 1984; Medin & Schaffer, 1978). In-
lems: Different researchers use different lists of properties and stance theories of categorization argue that subjects decide the
disagree on the necessity and sufficiency of the properties they category membership of a test stimulus by comparing it with
list. For example, Posner and Snyder (1975) list only 3 proper- the instances stored in memory and assigning it to the category
ties, whereas Hasher and Zacks (1979) list 5, and Schneider, containing the most similar instances. These theories are sim-
Dumais, and Shiffrin (1984) list 12. Hasher and Zacks (1979) ilar to the instance theory of automaticity in that both assume
have argued that all properties must be present in a truly auto- that (a) performance depends on retrieval of specific instances
matic process, but others are less restrictive. To make matters from memory and (b) the retrieval process is cooperative, com-
worse, some researchers have questioned the agreement be- paring all of the available traces with the test stimulus. The theo-
tween properties, showing for example, that an obligatory pro- ries are also similar in that they focus on learning; both automa-
cess may be effortful (Kahneman & Chajzyck, 1983; Paap & ticity and categorization are acquired abilities.
Ogden, 1981; Regan, 1981). Studies of automatization differ from studies of category
Automaticity is also difficult to diagnose absolutely from the learning in that there is an initial algorithm that allows subjects
perspective of instance theory, because instance theory does not to perform accurately until memory retrieval can take over. In
assume that any of the properties are necessary or sufficient. category learning, there is no algorithm to "bootstrap" perfor-
Automaticity is denned in terms of the underlying process— mance. Performance is inaccurate until the category is well
automatic performance is based on memory retrieval—and not learned. Studies of category learning are also different in that
in terms of necessary and sufficient properties. As described the stimuli presented to subjects are usually very similar to each
earlier, many of the properties may be characteristic of automa- other and subjects are expected to exploit the similarities in
ticity, but no property or set of properties define it. To deter- forming categories and in generalizing to new instances. Studies
mine whether a process is automatic, one must determine of automatization, by contrast, often use easily discriminable
whether it is based on memory retrieval, and that is difficult to stimuli (e.g., words or letters) and rarely test for generalization
do because there are no accepted criteria for deciding whether to new instances (but see Salasoo et al., 1985; Schneider & Fisk,
something is based on memory retrieval.9 1984). Finally, studies of category learning usually involve sub-
Instance theory suggests that automaticity is both absolute
and relative. It is absolute in that performance may sometimes
be based entirely on memory retrieval and sometimes entirely ' It may be possible to diagnose instance-based automaticity by show-
on the algorithm. It is relative in that performance may be based ing better transfer to trained instances than to untrained ones. That
on memory retrieval on some proportion of the trials. It may be requires knowing something about the subject's history with the task
possible to estimate the proportion without knowing which tri- and materials, which is often not easy in practice.
516 GORDON D. LOGAN

stantially less practice than do studies of automatization—one a target item. There is a cost associated with filtering out the
session versus 10 or 20. distractors (Kahneman, Treisman, & Burkell, 1983), and sub-
None of these differences seem very fundamental. Category jects may learn to overcome that cost. I suspect that the auto-
learning studies could use orienting tasks to allow accurate ini- matic attention response may not be important in tasks that
tial performance, and studies of automatization could vary do not require spatial filtering, such as the lexical decision and
stimulus similarity and test for generalization and transfer. Cat- alphabet arithmetic tasks studied earlier. For those tasks, cate-
egory learning studies could investigate practice effects over the gorization (and hence, retrieval of instances) may be enough.
range used in studies of automatization, and studies of automa- Karlin and Bower (1976) and Jones and Anderson (1982) pre-
tization could focus on the effects of initial practice (e.g., Exper- sented data suggesting that categorization may be sufficient for
iments 1-3). The differences could be viewed as guidelines for automatization in memory search.
future research to bring the different areas closer together. As a Categorization has many properties that are important in au-
result, automaticity might come to be viewed more as a general tomaticity. In particular, categorizing an item makes available
cognitive phenomenon and less as a special topic in the psychol- a host of facts associated with category membership, permitting
ogy of attention. inferences that go beyond the given perceptual information
The current version of the instance theory of automaticity (Murphy & Medin, 1985). When this happens quickly and
would need to be changed in order to deal with the effects of effortlessly by a single act of memory retrieval, it captures the
stimulus similarity. The current version assumes that each repe- essence of automaticity.
tition of the same stimulus produces exactly the same trace and Perhaps the main motivation for distinguishing between au-
that each different stimulus produces an entirely different trace. tomaticity and categorization is the fear that categorization is
There is no cross talk between nonidentical traces to produce somehow more fundamental; if automaticity is like categoriza-
the effects seen in the studies of category learning. To account tion, then it is epiphenomenal (e.g., Cheng, 1985). I believe this
for those effects, I would have to make more detailed assump- fear is ungrounded. Automaticity and categorization are both
tions about how the trace is represented and how the retrieval fundamental constructs, reflecting different aspects of the same
process works (cf. Hintzman, 1986). That is an important di- learning process, namely, the storage and retrieval of instances.
rection for future work, but it is beyond the scope of this article.
The close relations between categorization and automatiza- Process-Based Versus Instance-Based Learning
tion envisioned by instance theory contrast sharply with the
views of Shiffrin and Schneider (1977; also see Schneider & The instance theory assumes that there is no change in the initial
Shiffrin, 1985), who argued that automatization was more than algorithm or in the memory process as practice progresses. All
categorization. The main evidence for their claim is a visual that changes with practice is the data base on which the memory
search experiment in which subjects were trained to detect let- process operates. This assumption makes the theory easy to ana-
ters from one set in arrays made from another set of letters. lyze and simulate, but it is unlikely to be true in general. The mem-
After several sessions of practice, during which performance ory process may change through forgetting or through changes in
improved dramatically, Shiffrin and Schneider switched sets: attention (e.g., the addend = J condition in Experiment 4). Or the
Subjects now had to detect the former distractor letters in arrays algorithm may change through process-based learning. There is
made from former targets. If automaticity depended only on some evidence of process-based learning in the literature (Pirolli &
categorization, there should be good transfer because the dis- Anderson, 1985; E. Smith & Lerner, 1986) and even in the present
crimination between targets and distractors is essentially the experiments. For example, the new-item control condition in Ex-
same. However, transfer was abysmal. Performance was as bad periment 1 showed some improvement with practice even though
as it was on the initial practice session and took the same num- none of the stimuli were repeated.
ber of training sessions to reach pretransfer levels. Possibly, what appears to be process-based learning may ac-
Shiffrin and Schneider (1977) argued that automatization tually reflect a different sort of instance-based learning. Sub-
affected the targets' ability to attract attention. Well-practiced jects may be reminded of a better way to approach the task by
targets elicit an automatic attention response, which pulls atten- retrieving an example of an approach that was successful on
tion away from its current focus. In the transfer task, the former a similar problem in the past (see Ross, 1984). Thus, process
targets would pull attention away from the current targets (for- changes may reflect a discrete shift to a different strategy rather
mer distractors), causing failures of detection. Extensive post- than a gradual evolution or refinement of a single process. Al-
transfer practice would be necessary to build up the automatic ternatively, subjects may parse their experience into instances
in a different way than the experimenter imagines. In a category
attention responses to the current targets. Other studies pro-
judgment task, for example, a subject who is asked to decide
vided further evidence for automatic attention responses, so
whether a trout is a fish may encode the trial as another instance
Shiffrin and Schneider concluded that there was more to auto-
offish rather than the first instance of trout (cf. Barsalou & Ross,
maticity than categorization; automaticity involved categoriza-
1986), and show learning that depends on the number of pre-
tion and the automatic attention response.
sentations offish rather than trout.10 It may be difficult to sepa-
I do not deny the automatic attention response or its impor-
tance in some cases of automaticity. However, it is not the only
10
mechanism of automaticity; I argue that memory for instances 1 would like to thank Eliot Smith for suggesting this hedge. It may
is another. The automatic attention response may be specific to account for two aspects of Schneider and Fisk's (1984) data that seem
visual search tasks, in which the subject must focus spatially on troublesome for instance theory: (a) the null effect of number of in-
INSTANCE THEORY OF AUTOMATIZATION 517

rate these variations of instance-based learning from true pro- are several different methods for performing a task and subjects
cess-based learning, but the issue is important and the results know or have available all of the different methods when they
may be worth the effort. Whatever the outcome, we would learn first begin the task. There is no provision for learning new meth-
something important about the strategies people use in acquir- ods or improving old ones as practice progresses. In this respect,
ing and utilizing knowledge. it differs from the instance theory, which assumes that subjects
Retrieval of instances may play an important role in process- acquire new methods (i.e., memory for past solutions) that
based learning. Even when subjects learn an algorithm, the con- strengthened over practice (i.e., by accumulating similar in-
ditions necessary for instance-based learning are satisfied— stances). Grossman's theory may account well for the acquisi-
subjects may have no choice but to encode and retrieve specific tion of motor skills, like cigar rolling or typing, but it would not
instances. One could imagine interactions between instance- account for the lexical decision and alphabet arithmetic tasks
based and process-based learning in which retrieval of instances described earlier, which involve developing and strengthening
helps eliminate ineffective variations of the process (they would new methods.
be less likely to finish before memory retrieval) and execution
of the process helps eliminate ineffective memory retrievals. Newell and Rosenbloom

Relations to Other Theories of Skill Newell and Rosenbloom (1981; Rosenbloom & Newell, 1986)
Acquisition and Automaticity proposed a theory based on the idea of chunking. They argued
that subjects acquire responses to stimulus patterns or chunks,
Several existing theories predict a power-function speed-up which they can execute the next time the pattern occurs. They
from assumptions that differ substantially from the instance assumed that subjects learned patterns at several different lev-
theory. Many of the theories addressed different tasks and pro- els, some encompassing single elements, some encompassing
cesses than the instance theory addresses, and few of them deal several elements, and some encompassing the whole stimulus.
with automaticity directly. In this section, I compare those theo- They argued that the smaller patterns recur more frequently
ries with instance theory, but I do not intend to argue that in- than the larger patterns (e.g., moving a single pawn vs. opening
stance theory is more correct or more accurate than the other a chess game), so subjects will have more opportunities to mani-
theories. Instead, I view the theories as addressing different situ- fest their learning the smaller the pattern. This principle ac-
ations, and some theories may fit some situations better than counts for the power-function speed-up: Subjects benefit from
others; it seems likely to me that humans can learn in more than having learned the smaller patterns early in practice because
one way. Choice among theories is something like choice among they recur often. Later on, they will have learned most of the
statistical tests. One considers the assumptions of the statistical smaller patterns and will benefit no more from subsequent oc-
model and determines whether they can be satisfied in a given currences. Subjects will tend to benefit from larger patterns later
situation. If so, one uses the test; if not, one chooses another test. in practice because they recur less often and because there are
Analysis of variance, for example, is not wrong or inaccurate more of them to be learned. Thus, early learning will be rapid,
because it cannot deal with data from Bernouli trials; it is sim- as the smaller patterns are acquired and utilized, and later learn-
ply inappropriate. When two different tests can be used on the ing will be relatively slow, as the larger patterns are gradually
same data, one can ask which is more powerful and more accu- learned and gradually utilized.
rate, as in comparisons between parametric and nonparametric Newell and Rosenbloom's theory differs from the instance
statistics; but first, one must be sure that the assumptions fit the theory in that it assumes that there is no strengthening of the
situation. response to an individual chunk once it is learned. Their theory
applies best to situations in which the stimuli are highly pat-
Grossman terned, allowing chunks to be formed at many different levels.
It would not apply well to the lexical decision and alphabet
Grossman (1959) proposed a theory of skill acquisition in
arithmetic experiments reported earlier because the tasks could
which subjects sampled methods for performing a task until
not be performed by breaking the stimuli down into chunks and
they found the fastest one. He assumed there was a pool of possi-
responding to chunks at different levels. None of the component
ble methods and that subjects would select one at random for
letters predicted which response to make; subjects had to re-
each performance of the task. Afterward, they compared the
spond to the stimuli as wholes.
speed of the method they selected with their average speed over
several trials, and if the method was faster, they increased the
probability of selecting it for the next trial. In the long run, the MacKay
fastest method would have the highest probability of being se-
MacKay (1982) proposed a theory in which learning occurs
lected. The theory predicts a power-function speed-up because
by strengthening connections between nodes in a network that
it is easier to find a faster method early in practice than later on.
describes the representation underlying perception and action.
Grossman's (1959) theory applies to situations in which there
His theory produces the power-function speed-up in two ways:
First, the connections are strengthened in proportion to the
difference between the current strength and the maximum pos-
stances (4 vs. 8 per category) on the rate of learning and (b) the success- sible strength. The proportion is constant over learning, which
ful transfer to new instances (ranging from 60% to 90%). means that changes in strength will be greater early in learning
518 GORDON D. LOGAN

than they will be later on. Second, MacKay assumed that the jects did not learn to count by twos, then fours, and so on; in-
representation was hierarchical and that the higher nodes were stead, they switched from counting by ones to remembering.
smaller in number and farther from their maximum strength
than the lower nodes. Thus, early learning would be dominated
Schneider
by large changes in the strength of a few higher nodes, and later
learning would be dominated by small changes in the strength Walter Schneider has been developing a theory of automatiza-
of all nodes. tion for several years (Schneider, 1985; Schneider & Detweiler,
MacKay's (1982) theory differs primarily from the instance 1987). Schneider's theory involves two kinds of learning, prior-
theory in that it is a strength theory. It assumes that practice ity learning, which attracts attention to display positions that
strengthens connections between generic stimuli and generic re- are likely to contain targets, and association learning, which
sponses, whereas the instance theory accounts for strengthening connects stimuli directly to responses. The mechanism underly-
by encoding and retrieving separate representations of each en- ing both kinds of learning is proportional strengthening; after
counter with a stimulus. MacKay intended his theory to apply each successful trial, priority and associative strength are both
to behavior that is more complex than the behavior to which increased by an amount proportional to the difference between
the instance theory has been applied (i.e., generating sentences their current strength and the maximum possible strength. The
vs. performing lexical decisions), but that is not a major differ- power-function speed-up is a consequence of proportional
ence. It should be possible to decide between his theory and strengthening.
mine empirically, with experiments like the present Experi- Schneider's theory differs from the instance theory in specific
ment 5 (also see Hintzman, 1976). details. Schneider was concerned with the microstructure of
skill acquisition and made what he considered to be physiologi-
cally plausible assumptions about the underlying representa-
Anderson
tions and the processes that operate on them. So far, he has ad-
Anderson (1982, 1987) proposed a theory of skill acquisition dressed automatization only in tasks that combine visual and
that has several different learning mechanisms. Some involved memory search, developing a detailed model of initial perfor-
translating verbal instructions into procedures for performing mance on those tasks. It is not obvious how his theory would
a task, others involved generalizing and differentiating existing deal with other tasks at the same level of detail (e.g., lexical deci-
procedures, and one involved simple strengthening. The most sion and alphabet arithmetic).
important mechanisms in the present context are composition Schneider also interpreted the properties of automaticity
and strengthening. differently than I do, having taken a more conventional posi-
Composition involves collapsing a series of steps into one by tion. For example, he characterized nonautomatic processing
combining adjacent procedures. The amount of practice neces- (which he called controlled processing) as "slow, generally serial,
sary to reduce a complex procedure to a single step depends on effortful, capacity-limited, [and] subject-controlled" (Schnei-
the number of steps and on the probability of combining adja- der, 1985, p. 476), whereas I argue that there may be no charac-
cent steps. The more steps and the lower the probability of com- teristics common to all or even to most instances of nonauto-
bining, the longer it will take. Composition reduces the number matic processing. He assumed that controlled processing is nec-
of steps by a constant proportion on each iteration of the proce- essary for learning; there is no further learning once processing
dure, producing rapid learning early in practice and slower is automatic. By contrast, I assume that learning occurs on each
learning later on. trial, whether processing is automatic or not. It may be harder
Strengthening involves increasing the speed with which pro- to find evidence of learning once processing is automatic, for
ductions are executed. It operates mainly on composed produc- reasons I described earlier, but each trial continues to lay down
tions, increasing the strength on each exposure by a constant a separate trace.
proportion. Strengthening and composition work together to Finally, Schneider's theory differs from the instance theory in
produce the power-function speed-up. The other learning assuming two learning mechanisms instead of one. His associa-
mechanisms may contribute something to the speed-up, but tion-learning mechanism is similar to the learning mechanism
composition and strengthening are the major contributors. in the instance theory, but there is nothing in the instance the-
Anderson's (1982,1987) theory differs from the instance the- ory corresponding to his priority learning mechanism. I believe
ory in being much more detailed and embedded in a very pow- priority learning is important primarily in visual search, in
erful theory of general cognition. Some of its learning mecha- which targets must be discriminated from simultaneously pre-
nisms are stimulus-specific, like those of instance theory, but sented distractors. Association learning (or instance learning)
others are more general, providing process-based learning. should be sufficient to account for situations that do not require
Thus, Anderson's theory would apply better than the instance such discrimination (e.g., memory search, lexical decision, and
theory to situations in which people learn general procedures alphabet arithmetic).
rather than specific responses to specific stimuli. Anderson's Despite these differences, Schneider's theory is similar to the
composition process will work only when the structure of the instance theory in assuming that automatization reflects a tran-
task allows adjacent steps to be collapsed. It is unlikely to work sition from algorithm-based processing ("controlled" process-
in the lexical decision experiments reported above, in which ing) to memory-based processing. The language may be very
one single-step process was replaced by another. Nor is it likely different, but the underlying idea is basically the same. The most
to work in alphabet arithmetic or arithmetic in general. Sub- fundamental differences lie in the assumptions about strength-
INSTANCE THEORY OF AUTOMATIZATION 519

ening; his is a strength theory, and mine is not. One could distin- nal of Experimental Psychology: Human Perception and Perfor-
guish between the theories empirically, with experiments that mance, 8, 757-777.
distinguish strength theories from instance theories (e.g., Ex- Chandler, P. J. (1965). Subroutine STEPIT: An algorithm that finds the
values of the parameters which minimize a given continuous function
periment 5; also see Hintzman, 1976).
[Computer program]. Bloomington: Indiana University, Quantum
Chemistry Program Exchange.
Concluding Remarks Chase, W. G., & Simon, H. A. (1973). Perception in chess. Cognitive
Psychology, 4, 55-81.
The purpose of this article was to present an alternative to Cheng, P. W. (1985). Restructuring versus automaticity: Alternative ac-
the modal view of automatization as the gradual withdrawal of counts of skill acquisition. Psychological Review, 92, 414-423.
attention. The instance theory accounts for many of the facts Craik, F. I. M., & Tulving, E. (1975). Depth of processing and retention
addressed by the modal view without assuming any resource of words in episodic memory. Journal of Experimental Psychology:
General, 104, 268-294.
limitations, attentional or otherwise. Novice performance is
Grossman, E. R. F. W. (1959). A theory of the acquisition of speed-skill.
limited by a lack of knowledge rather than by a lack of re-
Ergonomics, 2, 153-166.
sources. The theory accounts for the power-function speed-up
Eich, J. M. (1982). A composite holographic associative recall model.
that is perhaps the most stable and least controversial property Psychological Review, 89, 627-661.
of automatization. In doing so, it predicted a power-function Feustal, T. C., Shintin, R. M., & Salasoo, A. (1983). Episodic and lexical
reduction in the standard deviation and a constraint between contributions to the repetition effect in word identification. Journal
the mean and standard deviation (same exponent), which was of Experimental Psychology: General. 112, 309-346.
confirmed in each experiment. Fisher, R. A., & Tippett, L. H. C. (1928). Limiting forms of the fre-
An important feature of the theory as developed so far is that quency distribution of the largest and smallest member of a sample.
it accounts for the facts of automaticity by assuming that only Proceedings of the Cambridge Philosophical Society, 24, 180-190.
the knowledge base changes with practice. This assumption Fisk, A. D., & Schneider, W. (1984). Memory as a function of attention,
level of processing, and automatization. Journal of Experimental Psy-
may strike many readers as implausible, but it accounts for a
chology: Learning. Memory and Cognition, 10, 181-197.
surprising amount of variance in a number of learning tasks.
Francolini, C. M., & Egeth, H. (1980). On the nonautomaticity of "au-
The theory may be viewed as a null hypothesis against which to
tomatic" activation: Evidence of selective seeing. Perception andPsy-
evaluate competing theories that assume changes in the under-
chophysics, 27, 331-342.
lying processes with practice. The main point of the fits to ex- Frechet, M. (1927). Sur la loi de probabilite de 1'ecart maximum [On
perimental data is that there may not be much variance left for the law of probability of the maximum deviation]. Annales de la So-
competing theories to explain. ciete Polonaise deMathematique, Cracovie, 6, 93-116.
Greene, R. L. (1984). Incidental learning of event frequency. Memory
and Cognition, 12, 90-95.
References Gumbel, E. J. (1958). Statistics of extremes. New York: Columbia Uni-
versity Press.
Allport, D. A. (1980). Attention and performance. In G. Claxton (Ed.),
Cognitive psychology (pp. 112-153). London: Routledge & Kegan Hasher, L., & Zacks, R. T. (1979). Automatic and effortful processes
Paul. in memory. Journal of Experimental Psychology: General, 108, 356-
Anderson, J. R. (1982). Acquisition of cognitive skill. Psychological Re- 388.
view, 89. 369-406. Hasher, L., & Zacks, R. T. (1984). Automatic processing of fundamental
Anderson, J. R. (1987). Skill acquisition: Compilation of weak-method information: The case of frequency of occurrence. American Psychol-
problem solutions. Psychological Review, 94, 192-210. ogist. 39, 1372-1388.
Ashcraft, M. H. (1982). The development of mental arithmetic: A chro- Henik, A., Friedrich, F. J., & Kellogg, W. A. (1983). The dependence of
nometric approach. Developmental Review, 2, 213-236. semantic relatedness effects on prime processing. Memory and Cog-
Bahrick, H. P., & Shelley, C. (1958). Time-sharing as an index of autom- nition, 11, 366-373.
atization. Journal of Experimental Psychology, 56, 288-293. Hintzman, D. L. (1976). Repetition and memory. In G. H. Bower (Ed.),
Balota, D. A., & Chumbley, J. I. (1984). Are lexical decisions a good The psychology of learning and motivation (pp. 47-91). New \brk:
measure of lexical access? The role of word frequency in the neglected Academic Press.
decision stage. Journal of Experimental Psychology: Human Percep- Hintzman, D. L. (1986). "Schema abstraction" in a multiple-trace
tion and Performance. 10, 340-357. model. Psychological Review, 93, 411-428.
Barsalou, L. W., & Ross, B. H. (1986). The roles of automatic and strate- Hirst, W., Spelke, E. S., Reaves, C. C., Caharack, G., & Neisser, U.
gic processing in sensitivity to superordinate and property frequency. (1980). Dividing attention without alternation or automaticity. Jour-
Journal of Experimental Psychology: Learning, Memory and Cogni- nal of Experimental Psychology: General, 109, 98-117.
tion, 12, 116-134. Hoffman, J. E., Nelson, B., & Houck, M. R. (1983). The role of atten-
Becker, C. A. (1976). Allocation of attention during visual word recog- tiona/ resources in automatic detection. Cognitive Psychology, 15,
nition. Journal of Experimental Psychology: Human Perception and 379-410.
Performance, 2, 556-566. Hyde, T. S., & Jenkins, J. J. (1969). Differential effects of incidental
Bryan, W. L., & Barter, N. (1899). Studies of the telegraphic language. tasks on the organization and recall of highly associated words. Jour-
The acquisition of a hierarchy of habits. Psychological Review, 6, nal of Experimental Psychology. 82,472-481.
345-375. Jacoby, L. L., & Brooks, L. R. (1984). Nonanalytic cognition: Memory,
Carr, T. H., McCauley, C., Sperber, R. D., & Parmalee, C. M. (1982). perception, and concept learning. In G. H. Bower (Ed.), The psychol-
Words, pictures, and priming: On semantic activation, conscious ogy of learning and motivation (pp. 1-47). New \brk: Academic
identification, and the automaticity of information processing. Jour- Press.
520 GORDON D. LOGAN

James, W. (1890). Principles of psychology. New York: Holt. Logan, G. D. (1985b). Skill and automaticity: Relations, implications
Jones, W. P., & Anderson, J. R. (1982). Semantic categorization and and future directions. Canadian Journal of Psychology, 39, 367-386.
high-speed scanning. Journal of Experimental Psychology: Learning, Logan, G. D. (1988). Repetition priming and automaticity: Common
Memory, and Cognition, S, 237-242. underlying mechanisms. Manuscript submitted for publication.
Jonides, J. (1981). Voluntary versus automatic control aver the mind's Logan, G. D., & Klapp, S. T. (1988). Automatizing alphabet arithmetic:
eye movement. In J. Long & A. D. Baddeley (Eds.), Attention and Evidence for an instance theory of automatization. Manuscript sub-
performance IX (pp. 187-203). Hillsdale, NJ: Erlbaum. mitted for publication.
Jonides, J., Naveh-Benjamin, M., & Palmer, J. (1985). Assessing auto- Long, J. (1976). Visual feedback and skilled keying: Differential effects
maticity.ActaPsychoIogica, 60,157-171. of masking the printed copy and the keyboard. Ergonomics, 19, 93-
Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Pren- 110.
tice-Hall. MacKay, D. G. (1982). The problem of flexibility, fluency, and speed-
Kahneman, D., & Chajzyck, D. (1983). Tests of the automaticity of accuracy trade-off in skilled behavior. Psychological Review, 89,483-
reading: Dilution of Stroop effects by color-irrelevant stimuli. Journal 506.
of Experimental Psychology: Human Perception and Performance, 9, Mandler, G. (1967). Organization and memory. In K. W. Spence & J. T.
497-509. Spence (Eds.), The psychology oflearning and motivation (Vol. l,pp.
Kahneman, D., & Henik, A. (1981). Perceptual organization and atten- 327-372). New York: Academic Press.
tion. In M. Kubovy & J. Pomerantz (Eds.), Perceptual organization Marcel, A. T. (1983). Conscious and unconscious perception: An ap-
(pp. 181-211). Hillsdale, NJ: Erlbaum. proach to the relations between phenomenal experience and percep-
Kahneman, D., & Miller, D. T. (1986). Norm theory: Comparing reality tual processes. Cognitive Psychology, 15,238-300.
to its alternatives. Psychological Review, 93, 136-15 3. Mazur, J. E., & Hastie, R. (1978). Learning as accumulation: A reexam-
Kahneman, D., &Treisman, A. M. (1984). Changing views of attention ination of the learning curve. Psychological Bulletin, 85, 1256-1274.
and automaticity. In R. Parasuraman & R. Davies (Eds.), Varieties of McLeod, P., McLaughlin, C, & Nimmo-Smith, I. (1985). Information
attention (pp. 29-61). New York: Academic Press. encapsulation and automaticity: Evidence from the visual control of
Kahneman, D., Treisman, A., & Burkell, J. (1983). The cost of visual finely timed actions. In M. I. Posner & O. S. Marin (Eds.), Attention
filtering. Journal of Experimental Psychology: Human Perception and and performance XI(pp. 391-406). Hillsdale, NJ: Erlbaum.
Performance, 9, 510-522. McKoon, G., & Ratcliff, R. (1980). Priming in item recognition: The
Karlin, M. B., & Bower, G. H. (1976). Semantic category effects in vi- organization of propositions in memory for text. Journal of Verbal
sual word search. Perception and Psychophysics, 19,417-424. Learning and Verbal Behavior, 19, 369-386.
Kolers, P. A. (1975). Memorial consequences of automatized encoding. Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification
Journal of Experimental Psychology: Human Learning and Memory, learning. Psychological Review, 85, 207-238.
7,689-701. Medin, D. L., & Smith, E. E. (1984). Concepts and concept formation.
Kucera, H., & Francis, W. N. (1967). Computational analysis of pre- Annual Review of Psychology, 35, 113-138.
sent-day American English. Providence, RI: Brown University Press. Meyer, D. E., & Schvaneveldt, R. W. (1971). facilitation in recognizing
LaBerge, D. (1981). Automatic information processing: A review. In J. pairs of words: Evidence of a dependence between retrieval opera-
Long & A. D. Baddeley (Eds.), Attention and performance IX (pp. tions. Journal of Experimental Psychology, 90,227-234.
173-186). Hillsdale, NJ: Erlbaum. Murdock, B. B. (1974). Human memory: Theory and data. Hillsdale,
LaBerge, D., & Samuels, S. J. (1974). Toward a theory of automatic NJ: Erlbaum.
information processing in reading. Cognitive Psychology, 6,293-323. Murdock, B. B. (1982). A theory for the storage and retrieval of item
Ladefoged, P., Silverstein, R., & Papcun, G. (1973). Interruptibility of and associative information. Psychological Review, 89, 609-626.
speech. Journal of the Acoustical Society of America, 54, 1105-1108. Murdock, B. B. (1983). A distributed memory model for serial-order
Landauer, T. K. (1975). Memory without organization: Properties of a information. Psychological Review, 90, 316-338.
model with random storage and undirected retrieval. Cognitive Psy- Murphy, G. L., & Medin, D. L. (1985). The role of theories in concep-
chology, 7, 495-531. tual coherence. Psychological Review, 92, 289-316.
Levelt, W. J. M. (1983). Monitoring and self-repair in speech. Cogni- Naveh-Benjamin, M. (1987). Coding of spatial location information:
tion, 14, 41-104. An automatic process? Journal of Experimental Psychology: Learn-
Logan, G. D. (1978). Attention in character classification: Evidence for ing, Memory, and Cognition, 13, 595-605.
the automaticity of component stages. Journal of Experimental Psy- Naveh-Benjamin, M., & Jonides, J. (1984). Maintenance rehearsal: A
chology: General, 107, 32-63. two-component analysis. Journal of Experimental Psychology:
Logan, G. D. (1979). On the use of a concurrent memory load to mea- Learning, Memory, and Cognition, 10, 369-385.
sure attention and automaticity. Journal of Experimental Psychology: Naveh-Benjamin, M., & Jonides, J. (1986). On the automaticity of fre-
If uman Perception and Performance, 5, 189-207. quency coding: Effects of competing task load, encoding strategy, and
Logan, G. D.(1980). Attention and automaticity in Stroop and priming intention. Journal of Experimental Psychology: Learning, Memory,
tasks: Theory andtete. Cognitive Psychology, 12,523-553. and Cognition, 12, 378-386.
Logan, G. D. (1982). On the ability to inhibit complex movements: A Navon, D. (1984). Resources—A theoretical soup stone? Psychological
stop-signal study of typewriting. Journal of Experimental Psychology: Review, 91, 216-234.
Human Perception and Performance, 8,778-792. Navon, D., & Gopher, D. (1979). On the economy of the human pro-
Logan, G. D. (1983). On the ability to inhibit simple thoughts and ac- cessing system. Psychological Review, 86,214-255.
tions: 1. Stop-signal studies of decision and memory. Journal of Ex- Neely, J. H. (1977). Semantic priming and retrieval from lexical mem-
perimental Psychology: Learning, Memory, and Cognition, 9, 585- ory: Roles of inhibitionless spreading activation and limited-capacity
606. attention. Journal of Experimental Psychology: General, 106, 226-
Logan, G. D. (1985a). On the ability to inhibit simple thoughts and 254.
actions: 2. Stop-signal studies of repetition priming. Journal of Exper- Neisser, U. (1976). Cognition and reality. San Francisco: Freeman.
imental Psychology: Learning, Memory, and Cognition, 11,675-691. Neumann, O. (1984). Automatic processing: A review of recent findings
INSTANCE THEORY OF AUTOMATIZATION 521

and a plea for an old theory. In W. Prinz&A. F. Sanders (Eds.), Cogni- of automatic processing. In M. I. Posner & O. S. Marin (Eds.), Atten-
tion and motor processes (pp. 255-293). Berlin: Springer-Verlag. tion and performance XI (pp. 475^492). Hillsdale, NJ: Erlbaum.
Newell, A., & Rosenbloom, P. S. (1981). Mechanisms of skill acquisition Schneider, W., & Detweiler, M. (1987). A connectionist/control archi-
and the law of practice. In J. R. Anderson (Ed.), Cognitive skills and tecture for working memory. In G. H. Bower (Ed.), The psychology of
their acquisition (pp. 1-55). Hillsdale, NJ: Erlbaum. learning and motivation (Vol. 21, pp. 53-119). New York: Academic
Nissen, M. J., & Bullemer, P. (1987). Attentional requirements of learn- Press.
ing. Cognitive Psychology, 19, 1-32. Schneider, W., Dumais, S. T., & Shiffrin, R. M. (1984). Automatic and
Faap, K. R., & Ogden, W. C. (1981). Letter encoding is an obligatory control processing and attention. In R. Parasuraman &, R. Davies
but capacity-demanding operation. Journal of Experimental Psychol- (Eds.), Varieties of attention (pp. 1-27). New York: Academic Press.
ogy: Human Perception and Performance, 7, 518-527. Schneider, W., & Fisk, A. D. (1982). Degree of consistent training: Im-
Pirolli, P. L., & Anderson, J. R. (1985). The role of practice in fact provements in search performance and automatic process develop-
retrieval. Journal of Experimental Psychology: Learning, Memory, ment. Perception andPsychophysics, 31, 160-168.
and Cognition, 11, 136-153. Schneider, W., & Fisk, A. D. (1984). Automatic category search and its
Posner, M. I., & Boies, S. J. (1971). Components of attention. Psycholog- transfer. Journal of Experimental Psychology: Learning, Memory and
Cognition, 10,1-15.
icalReview, 78, 391-408.
Schneider, W., & Shiffrin, R. M. (1977). Controlled and automatic hu-
Posner, M. I, & Klein, R. (1973). On the functions of consciousness.
man information processing: 1. Detection, search and attention. Psy-
In S. Kornblum (Ed.), Attention and performance IV (pp. 21-35).
chological Review, 84, 1-66.
New York: Academic Press.
Schneider, W., & Shiffrin, R. M. (1985). Categorization (restructuring)
Posner, M. I., & Snyder, C. R. R. (1975). Attention and cognitive con-
and automatization: Two separable factors. Psychological Review, 92,
trol. In R. L. Solso (Ed.), Information processing and cognition: The
424-428.
Loyola symposium (pp. 55-85). Hillsdale, NJ: Erlbaum.
Shiffrin, R. M., & Dumais, S. T. (1981). The development of automa-
Rabbitt, P. M. A. (1978). Detection of errors by skilled typists. Ergo-
tism. In J. R. Anderson (Ed.), Cognitive skills and their acquisition
nomics, 21, 945-958. (pp. 111-140). Hillsdale, NJ: Erlbaum.
Rabbitt, P. M. A. (1981). Sequential reactions. In D. H. Holding (Ed.), Shiffrin, R. M., & Schneider, W. (1977). Controlled and automatic hu-
Human skills (pp. 153-175). New York: Wiley. man information processing: II. Perceptual learning, automatic at-
Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, tending, and a general theory. Psychological Review, 84, 127-190.
85, 59-108. Siegler. R. S. (1987). The perils of averaging data over strategies: An
Ratcliff, R., & McKoon, G. (1978). Priming in item recognition: Evi- example from children's addition. Journal of Experimental Psychol-
dence for the prepositional structure of sentences. Journal of Verbal ogy: General, 116, 250-264.
Learning and Verbal Behavior, 17, 403-417. Smith, E. R., & Lerner, M. (1986). Development of automatism of so-
Ratcliff, R., & McKoon, O. (1981). Automatic and strategic compo- cial judgments. Journal of Personality and Social Psychology, 50,
nents of priming in recognition. Journal of Verbal Learning and Ver- 246-259.
bal Behavior, 20, 204-215. Smith, M. C. (1979). Contextual facilitation in a letter search task de-
Reason, J. T. (1984). Lapses of attention in everyday life. In R. Parasura- pends on how the prime is processed. Journal of Experimental Psy-
man & R. Davies (Eds.), Varieties of attention (pp. 515-549). New chology: Human Perception and Performance, 5, 239-251.
York: Academic Press. Smith, M. C., Theodor, L., & Franklin, P. E. (1983). The relationship
Reason, J. T., & Myceilska, K. (1982). Absent minded: The psychology between contextual facilitation and depth of processing. Journal of
of mental lapses and everyday errors. Englewood Cliffs, NJ: Prentice- Experimental Psychology: Learning, Memory, and Cognition, 9,697-
Hall. 712.
Regan, J. E. (1981). Automaticity and learning: Effects of familiarity on Spelke, E., Hirst, W., & Neisser, U. (1976). Skills of divided attention.
naming letters. Journal of Experimental Psychology: Human Percep- Cognition, 4,215-230.
Stroop, J. R. (1935). Studies of interference in serial verbal reactions.
tion and Performance, 7, 180-195.
Journal of Experimental Psychology, 18,643-662.
Rosenbloom, P. S., & Newell, A. (1986). The chunking of goal hierar-
Townsend, J. X, & Ashby, F. G. (1983). Stochastic modeling of elemen-
chies: A generalized model of practice. In R. S. Michalski, J. G. Car-
tary psychological processes. Cambridge, MA: Cambridge University
bonell, & T. M. Mitchell (Eds.), Machine learning: An artificial intelli-
Press.
gence approach (Vol. 2, pp. 247-288). Los Altos, CA: Morgan Kauf-
Wenger, D. M., Schneider, D. J., Carter, S., & White, T. (1987). Paradox-
mann.
ical effects of thought suppression. Journal of Personality and Social
Ross, B. H. (1984). Remindings and their effects in learning a cognitive
Psychology, 53, 1-9.
skill. Cognitive Psychology, 16, 371-416.
Wickens, C. D. (1984). Processing resources in attention. In R. Para-
Ryan, C. (1983). Reassessing the automaticity-control distinction: Item suraman &R. Davies (Eds.), Varieties of 'attention (pp. 63-102). New
recognition as a paradigm case. Psychological Review, 90, 171-178. York: Academic Press.
Salasoo, A., Shiflrin, R. M., & Feustal, T. C. (1985). Building perma- Zbrodoff, N. J. (1979). Development of counting and remembering as
nent memory codes: Codification and repetition effects in word iden- strategies for performing simple arithmetic in elementary school chil-
tification. Journal of Experimental Psychology: General, 114, 50-77. dren. Unpublished master's thesis, University of Toronto, Toronto.
Salthouse, T. A. (1986). Perceptual, cognitive, and motoric aspects of Zbrodoff, N. J., & Logan, G. D. (1986). On the autonomy of mental
transcription typing. Psychological Bulletin, 99, 303-319. processes: A case study of arithmetic. Journal of Experimental Psy-
Schneider, W. (1985). Toward a model of attention and the development chology: General, 115. 118-130.
(Appendixes follow on next page)
522 GORDON D. LOGAN

Appendix A

Formal Analysis of the Power-Function Speed-Up

The instance theory assumes that (a) each encounter with a stimulus Thus, the distribution of minima from an exponential distribution is
is encoded into memory, (b) all prior encounters with a stimulus are itself an exponential distribution, with a rate constant n times larger
retrieved when the stimulus is encountered again, (c) each encounter is than the rate constant for the initial distribution.
stored and retrieved independently, and (d) the subject can respond as The mean of the distribution decreases as a power function of n with
soon as the first trace is retrieved from memory. Assuming further that an exponent of—1:
the distribution of retrieval times is the same for each of the n traces,
reaction times can be modeled as the minimum of«independent sam- f, = (iw)-1 = (n-'XH--1). (A3)
ples from the same distribution. The purpose of Appendix A is to show Since the standard deviation of an exponential distribution equals the
that reaction times will decrease as a power function of sample size, mean, Equation A3 implies that the standard deviation also decreases
which leads to the prediction that means and standard deviations de- as a power function of n with the same exponent, — 1. This was the
crease as power functions of practice with the same exponent. Also ad- prediction tested in Experiments 1-4. The exponential model makes a
dressed is the race between the algorithm and memory retrieval, exam- stronger prediction, that the mean equals the standard deviation. Later,
ining the effect of the algorithm on the power-function speed-up and the the results are generalized to other distributions so that the exponent
number of trials required for memory to dominate the algorithm. need not equal -1, and the mean need not equal the standard deviation.
The distribution of minima from an arbitrary distribution can be The exponential distribution permits an easy analysis of the race be-
written as a function of the initial distribution. The cumulative distribu- tween the algorithm and memory retrieval. In general, the distribution
tion function is of the minima from two distributions,^;) and/*(<), is

and the probability density function is


If the distribution of finishing times for the algorithm, f^t), and the
memory process, fm(t), are exponential, then

(Gumbel, 1958, p. 76). However, it is difficult to derive general predic-


tions for changes in the mean and standard deviation with sample size,
= (w. + mvjexp[-(wa + nwm)t].
«, that would be true for every initial distribution. Instead, we derived
specific predictions for two initial distributions, the exponential and the The parameters of the resulting distribution are the sums of the parame-
Weibull, and we derived general predictions for the class of positive- ters of the parent distributions. The mean (and the standard deviation)
valued distributions by working backward from the asymptotic distri- of the resulting distribution,
bution of minima.
The exponential was chosen for the initial derivation because it is easy T, = (». + nwmrl, (A4)
to work with; the entire model can be expressed with exponential distribu-
decreases as n increases. If the mean for the algorithm is the same as the
tions. However, predictions from the exponential model deviate systemati-
mean for memory, then the mean and the standard deviation of the
cally from empirical data, so aspects of the model were explored in a partic-
resulting distribution decrease as a power function of n + 1 with an
ular generalization of the exponential, the Vifeibull distribution. The
exponent of -1. To the extent that the algorithm mean differs from the
Weibull was chosen for three reasons: First, with appropriate parameter-
memory mean, this relation will be distorted. However, the distortion
ization, it resembles reaction-time distributions; second, it avoids the devi-
will decrease as «increases and memory wins the race more often. At
ation from the data found with the exponential model; and third, as dis-
some point, memory will win virtually all the time.
cussed in the final section, it turns out to be an important asymptotic
The probability that the algorithm will win the race can be derived
distribution of minima from positive-valued distributions.
easily if the underlying distributions are exponential. If so, memory re-
trieval and the algorithm can be viewed as simultaneous Poisson pro-
The Exponential Distribution cesses with rates w, and nwm, respectively. Then the probability that the
algorithm finishes first is
The exponential distribution is commonly used in models of memory
retrieval because of its empirical success and analytical tractability /"(algorithm ftrst) = wj(w, + «»„),
(Murdock, 1974; Ratcliff, 1978). Its distribution function is
which decreases rapidly as n increases. If the mean for the algorithm
F(t) = I - exp[-wt], equals the mean for the memory process, then the probability that the
algorithm finishes first equals 1 /(n + 1); it decreases as a power function
and its density function is of« + 1 with an exponent of—1.
In summary, the instance theory can be modeled as a race between
two exponential distributions, one representing finishing times for the
From Equation Al, the distribution function for the minimum of n algorithm and one representing finishing times from a memory process
samples from the same exponential distribution is with n independent traces. That model predicts (a) a power-function
reduction in mean reaction time, (b) a power-function reduction in the
Fj(0 = 1 — exp[~nutf],
standard deviation of reaction times, and (c) a common exponent for
and from Equation A2, the density function is means and standard deviations. These predictions are not compromised
much by the race with the algorithm, provided that the mean for the
fi(t) = «exp[-(n - l)wt]wexp[-wt]
algorithm is reasonably close to the mean for memory retrieval.
= (nw)exp[-Hw?]. The exponential distribution imposes severe restrictions on the pre-
INSTANCE THEORY OF AUTOMATIZATION 523

dictions, namely, that the exponent of the power function equals -1 and The distribution of minima from a Weibull distribution is itself a
that the mean equals the standard deviation. These restrictions create Weibull distribution. As sample size increases, the distribution of min-
problems for the model. Most exponents from empirical power func- ima contracts as a power function of sample size with an exponent of
tions are less than 1 in absolute magnitude (Newell & Rosenbloom, - 1/c. Thus, the mean and the standard deviation and all of the quan tiles
1981), and the reduction in the mean rarely equals the reduction in of the distribution should decrease as power functions of n with the
the standard deviation (see Experiments 1-4). How can the exponential same exponent, - 1/c. Experiments 1-4 tested the equality of the expo-
model deal with these problems? nents for means and standard deviations.
One possibility is to relax (and make more realistic) the assumption The Weibull imposes less severe restrictions on the predicted power
that each and every encounter with a stimulus is stored and retrieved. functions than does the exponential. The exponent must be the same
If instead, each encounter was stored and retrieved with probability p, for the mean and standard deviation, but it is not fixed at any particular
the observed power function would be value. Since c is 1 or larger in most applications, the exponents of power
functions, — 1/c, should be I or less, as is commonly observed (Newell
& Rosenbloom, 1981). Moreover, the mean of a Weibull is not equal to
Taking logs of both sides yields the standard deviation.
So far, a power-function speed-up and reduction in standard deviation
fclogn = -(logp + logn). has been predicted from two specific initial distributions. The next sec-
tion attempts to generalize these results to the class of positive-valued
Solving for k yields
distributions, of which the exponential and Weibull are members.
k = -(logp + log«)/logn,
Stability Postulate
which is less than 1 if p is less than 1 (since logs of proportions are
negative), and it decreases as p decreases (since logs of proportions in- The power law can be generalized further by working backward from
crease in absolute magnitude as the proportion decreases); it predicts the asymptotic distribution of the minimum. Most readers will be famil-
slower learning, the lower the probability of storage and retrieval, which iar with the normal distribution as the asymptotic distribution of sums
is reasonable. or averages; by the central limit theorem, the distribution of sums or
Thus, the exponential model, supplemented by the (reasonable) as- averages from an arbitrary initial distribution will converge on the nor-
sumption of imperfect storage and retrieval, predicts power functions mal distribution as sample size increases, regardless of the form of the
with exponents less than 1 in absolute magnitude, which is commonly initial distribution. As it turns out, there are three distributions that are
observed (Newell & Rosenbloom, 1981). However, the exponential asymptotic in that sense for minima. The distribution of minima from
model still predicts an identical reduction in the mean and standard an arbitrary initial distribution will converge on one of the three asymp-
deviation, which is not generally observed (see, e.g., Experiments 1-4). totic distributions as sample size increases (Gumbel, 1958). Which of
the three it converges on depends on rather general properties of the
initial distribution (whether the extremes are limited or unlimited).
Weibull Distribution
Only one asymptotic distribution—the third—is relevant to reaction
The Weibull distribution is a generalization of the exponential distri- time data because it applies to random variables with only positive val-
bution. Its distribution function is ues. Interestingly, the third asymptotic distribution is the Weibull.
Proving that the asymptotic distribution of minima follows the power
F(t) = 1 - expHt/af], law is important because it implies that minima from any positive-val-
ued distribution will eventually conform to the power law as sample size
and its probability density function is
increases. At some point, the initial distribution will converge on the
/(«) = (cAOO/ar 'expH'W asymptote and what is true of the asymptotic distribution will be true
of samples from the initial distribution. Before the distribution con-
Iff = 1, the Weibull reduces to an exponential distribution with rate verges, the power law may be approximately correct.
parameter I/a. The Weibull is a flexible distribution. The parameter c The power-law predictions derive from the stability postulate, which
determines its shape, ranging from exponential when c = 1 to normal was used initially by Frechet (1927) and Fisher and Tippett (1928) to
looking when c = 5. Intermediate values produce density functions that derive the three asymptotic distributions (Gumbel, 1958). The follow-
resemble reaction time distributions— truncated on the left, with a long ing proof is a condensed version of one given by Gumbel (pp. 157-
tail on the right Thus, the Weibull may provide a reasonable approxi- 162). It deals with the maximum rather than the minimum to simplify
mation for retrieval times from a variety of processes. calculation, but the results apply to the minimum as well as the maxi-
From Equation A 1 , the distribution function for the minimum of n mum; what is true oft for the maximum is true of—/ for the minimum.
samples from the same Weibull distribution is According to the stability postulate, a distribution is stable with re-
spect to its maximum (or minimum) if the distribution of maxima
FM = ( - {exp[-(I/an}"
(minima) sampled from it retains the same form as the initial distribu-
tion as sample size increases, changing only in its scale or in translation
of its origin. As shown earlier, the exponential and Weibull are stable in
= 1 - exp[-«n"W]
this sense. Analogous to Equation A2, the distribution function for the
maximum of n independent samples from the same distribution is

Thus, FM = F"(t).

f, = »-"'f The stability postulate states that the probability that the largest value
is / or less after n samples is equal to the probability of a linear function
and oft derived from the initial distribution. That is,

a, = n '"<T. (AS).
524 GORDON D. LOGAN

where aa and bn are both functions of n, the sample size. For the third
asymptotic distribution, (is negative with a maximum of zero (for the Consequently,
minimum, r is positive with a minimum of zero). The additive constant,
bn, drops out, and Equation A5 reduces to fit)" =

f, = iTT,
F"(t) = (A6) and

Equation A6 implies that raising F(t) to the mth power is the same as
multiplying tbyan; that is, For the third asymptotic distribution, c is positive. Thus, the scale of
the distribution contracts as a power function of n, the sample size,
F m"(t) = F(amant). whether we are dealing with the maximum (in which case, 1 is negative
and bounded above at zero) or the minimum (in which case, t is positive
But from Equation A6, and bounded below at zero). This means that all quantiles of the distri-
bution of minima should decrease as a power function of n, and all the
power functions should have the same exponent, -c. In particular, the
mean and the standard deviation should both decrease as power func-
This implies the functional equation: tions of n with the same exponent, —c.
Gumbel (1958) then derived the form of the three asymptotic distri-
"m = ama,. butions, showing that the third is a Weibull distribution. But that is
beyond the requirements of this article. The important point for now is
The solution to Equation A7 is a power function, that an arbitrary distribution that is bounded on the left (i.e., has only
positive values) will converge on the Weibull distribution as sample size
«„ = if, increases. There may be departures from the power law for small sample
sizes, before the distribution of minima reaches asymptote, but after-
because ward the distribution will contract following the power law.

Appendix B

Method and Results From Experiments 1-3

The details of the method and results from the lexical decision experi- or a nonword and that their task was to indicate whether a word or a
ments are reported in this section. nonword had appeared. They were told to respond as quickly as possible
without making more than 10% errors. One half of the subjects pressed
Experiment 1 the right-most key of the panel of eight to indicate that they saw a word
and pressed the left-most key to indicate that they saw a nonword,
Method whereas the other half did the opposite. Subjects were not told anything
about the schedule of stimulus repetitions or even that some stimuli
Subjects. In all, 24 subjects were tested. Some subjects came from would be repeated.
introductory psychology courses and received course credit for partici- There were two main conditions, the experimental condition, in
pating; some were volunteers and were paid $4 for participating. which the same set of words and nonwords were exposed repeatedly, and
Apparatus and stimuli. The stimuli were four-letter words and non- the control condition, in which new words and nonwords were shown on
words. The words were common nouns selected from the Kucera and each trial. Each condition involved blocks of 20 trials, 10 with words
Francis (1967) frequency norms. The words ranged in absolute fre- and 10 with nonwords. In the experimental condition, a set of 10 four-
quency from 7 to 923 per million, with a mean of 56.05. The nonwords letter words and 10 four-letter nonwords were chosen randomly from
were constructed by replacing one letter of each word. In most cases, the population of 340 stimuli. Each word and nonword was shown once
the nonwords were pronouncible. per block in a different random order in each block, for a total of 16
The stimuli were displayed on a point-plot CRT (Tektronix Model blocks. Thus, the lag between successive repetitions varied from 1 to 40,
604, equipped with P31 phosphor), controlled by a PDF 11/03 com- with a mean of 20.
puter. They were displayed as uppercase letters, formed by illuminating In the control condition, a different set of 10 four-letter words and 10
about 20 points in a 5 X 7 matrix. They were displayed at the center of four-letter nonwords was chosen randomly without replacement from
the screen. Viewed at a distance of 60 cm (maintained by a headrest), the set of 340 stimuli for each of the 16 blocks. Each control stimulus
each word and nonword subtended 2.67° X .57° of visual angle. appeared in only one block and never appeared in the experimental
Each trial began with a fixation point exposed in the center of the blocks.
screen and a 900-Hz warning tone. The tone and fixation point were A different random sampling of stimuli was prepared for each subject,
presented for 500 ms, followed immediately by the word or nonword and the order of trials within blocks was randomized separately for each
for that trial, which was also presented for 500 ms. After the word or subject.
nonword was extinguished, a 1,500-ms intertrial interval began.
Subjects responded by pressing the two outermost telegraph keys in
Results
a panel of eight mounted on a moveable board that sat between the
headrest and the CRT. An ANOVA on the reaction-time data revealed the following effects:
Procedure. Subjects were told that on each trial they would see a word The main effect of number of presentations was significant, F(15,
INSTANCE THEORY OF AUTOMATIZATION 525

Table Bl Table B2
Error Rates From Lexical Decision Task in Experiment I Error Rates for the Lexical Decision Task in Experiment 2

Condition Condition

Presentation Old word Old nonword New word New nonword Presentation Word Nonword

1 .05 .11 .08 .06 1 .06 .05


2 .05 .07 .07 .06 2 .02 .04
3 .02 .07 .08 .04 3 .02 .03
4 .02 .07 .10 .09 4 .01 .03
5 .03 .05 .04 .05 5 .02 .03
6 .02 .05 .04 .05 6 .02 .02
1 .02 .05 .05 .08 7 .01 .02
8 .04 .04 .09 .07 8 .02 .03
9 .03 .06 .07 .08 9 .00 .03
10 .03 .02 .10 .05 10 .01 .02
11 .04 .06 .10 .05
12 .03 .04 .08 .05
13 .04 .05 .06 .04
14 .04 .05 .05 .03
A different sample of words and nonwords was selected for each
15 .03 .03 .07 .04
block, for a total of 10 blocks. A different random sample of stimuli was
16 .03 .04 .08 .05
used for each subject, and the order of trials within blocks was random-
ized separately for each subject.
Subjects were instructed as in Experiment 1.

345) = 7.52, p < .01, MS, = 6,809.85, as was the main effect of control
versus experimental conditions, F(l, 23) = 31.15, p < .01, MS, = Results
52,687.35, and the main effect of word versus nonword, F(l, 23) =
An ANOVA on the reaction times revealed a main effect of number of
43.11, p< .01, MS, - 31,748.23. Presentations and conditions inter-
presentations, f\9, 225) = 65.41, p < .01, MS, = 961.51; a main effect
acted significantly, F(l5, 345) = 2.92, p < .01, MS, = 7,186.63, reflect-
of word versus nonword, F(l, 25) = 60.77, p < .01, MS, = 5,276.15;
ing the extra benefit from repeating specific stimuli in the experimental
and an interaction between presentations and word versus nonword,
condition. Presentations and word versus nonword interacted signifi-
F(9, 225) = 5.28,p < .01,MS, = 650.99.
cantly, F( 15,345) = 4.23, p < .01, MS, = 2,177.17, reflecting the greater
An ANOVA on the standard deviations revealed a main effect of num-
effect of repetition with nonwords than with words. In addition, there
ber of presentations, F(9, 225) = 12.37, p < .01, MS, = 1,279.93, a
were significant interactions between conditions and word versus non-
main effect of word versus nonword, F(l, 25) = 14.65, p < .01, MS, =
word, F(l, 23) = S.59,/7 < .01, MS, = 7,904.85, and between presenta-
2,955.69, and a significant interaction between them, F(9, 225) = 2.11,
tions, conditions, and word versus nonword, F( 15,345) = 1.84, p < .05,
p <. 05, MS,= 1,118.63.
MS,= 2,306.41.
The error rates are presented in Table B2. Again, no statistical analy-
An ANOVA on the standard deviations revealed a significant main
ses were attempted, but the error rates tended to decrease with repeti-
effect of the number of presentations, f\ 15,345) = 3.44, p < .01, MS, =
tion and tended to be lower for words than for nonwords.
5,011.76, and a significant main effect of control versus experimental
conditions, F(l, 23) = 29.30, p < .01, MS, = 19,509.57. In addition,
there were significant interactions between presentations and condi-
tions, F( 15,345) = 2.32, p < .01, MS, = 5,407.23, and between presen- Table B3
tations and word versus nonword, F(15, 345) = 2.46, p < .01, MS, = Error Rates From Lexical Decision Task in Experiment 3
3,656.31.
The error rates are presented in Table Bl. Several subjects had error Lag = 12 Lag = 24
rates of zero in many cells of the design, so no statistical analysis was
attempted. Note, however, that error rates tended to be lower for re- Presentation Word Nonword Word Nonword
peated items than for new items and lower for words than for nonwords.
1 .06 .07 .06 .04
2 .02 .04 .02 .03
Experiment 2 3 .03 .05 .02 .04
4 .03 .04 .02 .04
Method 5 .01 .03 .01 .04
6 .02 .04 .04 .04
Subjects. In all, 26 subjects were recruited from the population sam-
7 .03 .03 .03 .03
pled in Experiment 1. 8 .04 .05 .02 .04
Apparatus and stimuli. Apparatus and stimuli were the same as in 9 .01 .02 .02 .02
Experiment 1. 10 .02 .05 .02 .02
Procedure. The procedure was the same as in Experiment 1, with the 11 .02 .05 .02 .02
following exceptions: In each block of the experiment, one word and 12 .01 .02 .03 .02
one nonword were presented once, one was presented twice, one 4 13 .04 .05 .03 .01
times, one 6 times, one 8 times, and one 10 times, for a total of 62 trials. 14 .02 .02 .02 .02
15 .02 .07 .03 .03
The lag between successive repetitions varied randomly. The mean and
16 .01 .02 .04 .01
range of lags decreased as the number of repetitions increased.
526 GORDON D. LOGAN

Experiment 3 in which four words and four nonwords were presented each block.
Each subject experienced two sets of 16 blocks of 24 trials.
Method Each subject received a different random sample of the 340 stimuli,
and the order of stimuli within blocks was randomized separately for
Subjects. In all, 32 subjects from the population sampled in the previ-
each subject.
ous experiments served in Experiment 3. A total of 16 subjects served in
the mean-lag-12 condition and 16 served in the mean-lag-24 condition.
Results
Apparatus and stimuli. Apparatus and stimuli were the same as in
the previous experiments. An ANOVA on the reaction times revealed a significant main effect of
Procedure. The procedure was the same as in the preceding experi- repetition, J=(15,450) = 11.70, p<.01, MS, = 2,333.58; forword versus
ments with the following exceptions: Each word and nonword appeared nonword, F( 1,30) = 78.29, p < .01, MS, = 13,943.47; and for the inter-
once in each block of trials. Some words and nonwords appeared in action between them, ^15,450) = 2.44,p<. 01, MS,~ 1,086.01. There
only 1 block, others appeared in 2 successive blocks, others appeared in was no significant effect of mean lag, F(l, 30) < 1, MS, = 218,538.63,
4,8, and 16 successive blocks. The mean lag and the range of lags was the and no interaction between lag and number of presentations, F{\5,
same for each number of repetitions. The mean lag was varied between 450) < 1, MS, = 2,333.58, or between lag and word versus nonword,
subjects by manipulating the number of stimuli in each repetition con- F(l, 30) = 1.27, MS, = 13,943.47.
dition presented in a single block of trials. One half of the subjects (16) An ANOVA on the standard deviations revealed a significant main
had a mean lag of 12; only one stimulus from each repetition condition effect of repetition, F(15,450) = 5.47, p < .01, MS, = 3,003.22, and a
occurred in each block, except for the 16-repetition condition, in which significant effect for word versus nonword, F(l, 30) = 11.46, p < .01,
two words and two nonwords occurred in each block. Each subject ex- MS, = 5,307.10. No other effects were significant.
perienced four sets of 16 blocks of 12 trials. The other half of the sub- The error rates appear in Table B3. No statistical analyses were at-
jects (16) had a mean lag of 24; two stimuli from each repetition condi- tempted. Error rates tended to decrease with repetition and to be lower
tion were presented each block, except for the 16-repetition condition, for words than for nonwords.

Appendix C

Details of Method and Results for Experiment 5


Method
new pronouncible nonwords, and 16 new unpronouncible nonwords.
Subjects. A total of 48 subjects from an introductory psychology The stimuli were presented one at a time in the center of the CRT for
class served as subjects to fulfill course requirements. 500 ms, and subjects gave verbal estimates of the frequency with which
Apparatus and stimuli. The apparatus was the same as that used they were presented in the training phase. The experimenter sat in the
in Experiments 1-3. The stimuli were five-letter words, pronouncible room with the subject and typed each frequency estimate into the com-
nonwords, and unpronouncible nonwords. The words were nouns se- puter. Subjects were told that some stimuli were new and some had been
lected from the Kucera-Francis (1967) norms to match exactly the dis- presented 16 times, so their frequency estimates should range between
tribution of log frequencies of the four-letter words used in Experiments 0 and 16.
1-3. The average absolute frequency was 75.27 per million, with a range
of 8 to 787. There were 340 words in total. The nonwords were made
by substituting letters in the words, making a total of 340 pronouncible Results
and 340 unpronouncible nonwords. Pronouncibility was determined
by consensus of three native speakers of English, Training phase. ANOVAS were performed on the benefit scores. Two
Procedure. In the training phase, subjects saw words, pronouncible separate analyses were performed, one for lexical decision subjects (i.e.,
nonwords, and unpronouncible nonwords. Some of them were pre- the consistent lexical decision group and the varied interpretation group
sented in only 1 block, others in 2 consecutive blocks, others in 4 consec- that began with lexical decisions) and one for the pronunciation decision
utive blocks, others in 8 consecutive blocks, and others in 16 consecutive subjects (i.e., the consistent pronunciation group and the varied inter-
blocks. Altogether, there were 16 stimuli of each type presented once pretation group that began with pronunciation decisions). Each analysis
and 8 stimuli of each type presented 2,4, 8, and 16 times. Each block involved consistent versus varied interpretation as a between-subjects
involved a total of 48 trials, and altogether, there were 16 training factor, and stimulus type (word vs. pronouncible nonword vs. unpro-
blocks. nouncible nonword) and number of presentations (3, 5, 7, 9, 11, 13,
Subjects in the consistent interpretation groups (n = 12 per group) and 15) as within-subjects factors. The number-of-presentations factor
made lexical decisions or pronunciation decisions throughout the train- included only the odd-numbered presentations, for which the same de-
ing phase. Subjects in the varied interpretation groups alternated be- cision was made in each group.
tween lexical decisions and pronunciation decisions each block For lexical decision subjects, the main effect of consistent versus var-
throughout training, one half beginning with lexical decisions and one ied interpretation was significant, W, 22) = 10.79, p < .01, MS, =
half beginning with pronunciation decisions. Because of the way the 44,573.09, as was the main effect of stimulus type, F(2,44) = 5.71, p <
blocks were structured, subjects interpreted each stimulus in one way .01, MS, = 11,068.67, and the main effect of number of presentations,
on odd-numbered presentations and the other way on even-numbered F(6,132) = 4.28, p < .01, MS, = 3,297.74. There were significant inter-
presentations, regardless of the total number of presentations each stim- actions between presentations and stimulus type, P( 12,264) = 1.80, p <
ulus received. .05, MS, = 1,590.09, and between presentations, stimulus type, and
In the transfer phase, subjects saw the stimuli they were presented consistent versus varied interpretation, f(l2, 264) = 1.85, p < .05,
with in the training phase randomly intermixed with 16 new words, 16 MS,= 1,590.09.
INSTANCE THEORY OF AUTOMATIZATION 527

Tabled
Error Rates for Lexical Decision and Pronunciation Tasks in Experiment 5

Lexical decision Pronunciation

Consistent Varied Consistent Varied

Presentation WORD PRNW UPNW WORD PRNW UPNW WORD PRNW UPNW WORD PRNW UPNW

1 .11 .15 .02 .04 .12 .00 .10 .16 .02 .03 .12 .02
2 .06 .16 .01 .06 .18 .01
3 .04 .16 .01 .00 .20 .00 .04 .17 .01 .00 .09 .02
4 .04 .14 .02 .05 .15 .02
5 .04 .15 .02 .00 .15 .00 .03 .16 .03 .01 .06 .03
6 .04 .13 .02 .03 .15 .02
7 .03 .13 .01 .02 .18 .01 .05 .15 .01 .01 .07 .04
8 .04 .13 .00 .04 .14 .00
9 .01 .16 .01 .01 .04 .05 .01 .18 .01 .00 .07 .04
10 .02 .18 .00 .02 .19 .00
11 .08 .16 .00 .01 .02 .03 .06 .18 .00 .03 .07 .03
12 .03 .16 .00 .02 .18 .01
13 .05 .16 .00 .00 .18 .03 .03 .18 .00 .01 .05 .04
14 .04 .16 .01 .03 .18 .01
15 .01 .15 .00 .05 .19 .00 .01 .14 .00 .01 .07 .04
16 .05 .12 .00 .05 .09 .00

Note. PRNW = pronouncible nonword; UPNW = unpronouncible nonword.

For pronunciation subjects, the main effect of consistent versus varied tion between presentations and stimulus type, /*i(10, 220) = 15.93, p <
interpretation was not significant, F(l, 22) < 1, MSf = 61,977.54, but .Ol,MS,= 1.418.
the main effect of stimulus type was significant, F\2, 44) = 13.91, p < For pronunciation subjects, there were no significant effects of consis-
.01, MS, = 18,579.01. No other effects were significant. tent versus varied interpretation; neither main effect, F(\, 22) < 1,
The error rates are presented in Table C1. No statistical analyses were MS, = 26.433, nor interactions. There were significant effects of presen-
performed on the error rates. tations, F(5, 110) = 220.20, p < .01, MS, = 2.817; Presentations X
Transfer phase. ANOVAS were performed on the frequency estimates. StimulusType,^10,220)= 14.62,p<.01, MS,= 1.189; and Group X
As before, one analysis was performed on the lexical decision groups Presentations X Stimulus Type, F(10, 220) = 3.25, p < .01, MS,=
and one on the pronunciation decision groups. For lexical decision sub- 1.189.
jects, there were no effects of consistent versus varied interpretation;
neither the main effect, F\l, 22) = 2.99,p < .10, MS, = 41.608, nor the
Received February 3, 1987
interactions were significant. However, there were significant effects of
number of presentations, F(5, 110) - 109.65, p < .01, MS, = 4.696; Revision received November 10, 1987
stimulus type, f\2,44) = 10.96, p < .01, MS, = 5.861; and the interac- Accepted December 16, 1987

You might also like