Roediger 2011
Roediger 2011
net/publication/291166244
CITATIONS READS
384 86,864
3 authors:
Megan A. Sumeracki
Rhode Island College
52 PUBLICATIONS 1,433 CITATIONS
SEE PROFILE
All content following this page was uploaded by Adam L Putnam on 17 January 2018.
Contents
1. Introduction 2
1.1 Direct and indirect effects of testing 3
2. Benefit 1: The Testing Effect: Retrieval Aids Later Retention 4
3. Benefit 2: Testing Identifies Gaps in Knowledge 8
4. Benefit 3: Testing Causes Students to Learn More from the Next 10
Study Episode
5. Benefit 4: Testing Produces Better Organization of Knowledge 12
6. Benefit 5: Testing Improves Transfer of Knowledge to New Contexts 14
7. Benefit 6: Testing can Facilitate Retrieval of Material That was not 17
Tested
8. Benefit 7: Testing Improves Metacognitive Monitoring 20
9. Benefit 8: Testing Prevents Interference from Prior Material when 22
Learning New Material
10. Benefit 9: Testing Provides Feedback to Instructors 24
11. Benefit 10: Frequent Testing Encourages Students to Study 26
12. Possible Negative Consequences of Testing 28
13. Conclusion 31
References 32
Abstract
Testing in school is usually done for purposes of assessment, to assign
students grades (from tests in classrooms) or rank them in terms of abilities
(in standardized tests). Yet tests can serve other purposes in educational
settings that greatly improve performance; this chapter reviews 10 other
benefits of testing. Retrieval practice occurring during tests can greatly
enhance retention of the retrieved information (relative to no testing or
even to restudying). Furthermore, besides its durability, such repeated
retrieval produces knowledge that can be retrieved flexibly and transferred
to other situations. On open-ended assessments (such as essay tests),
retrieval practice required by tests can help students organize information
and form a coherent knowledge base. Retrieval of some information on a
test can also lead to easier retrieval of related information, at least on
1
2 Henry L. Roediger et al.
delayed tests. Besides these direct effects of testing, there are also indirect
effects that are quite positive. If students are quizzed frequently, they tend
to study more and with more regularity. Quizzes also permit students to
discover gaps in their knowledge and focus study efforts on difficult mate-
rial; furthermore, when students study after taking a test, they learn more
from the study episode than if they had not taken the test. Quizzing also
enables better metacognitive monitoring for both students and teachers
because it provides feedback as to how well learning is progressing. Greater
learning would occur in educational settings if students used self-testing as
a study strategy and were quizzed more frequently in class.
1. INTRODUCTION
Benefits of testing? Surely, to most educators, this statement repre-
sents an oxymoron. Testing in schools is usually thought to serve only the
purpose of evaluating students and assigning them grades. Those are
important reasons for tests, but not what we have in mind. Most teachers
view tests (and other forms of assessment, such as homework, essays, and
papers) as necessary evils. Yes, students study and learn more when given
assignments and tests, but they are an ordeal for both the student (who
must complete them) and the teacher (who must construct and grade
them). Quizzes and tests are given frequently in elementary schools,
often at the rate of several or more a week, but testing decreases in
frequency the higher a student rises in the educational system. By the
time students are in college, they may be given only a midterm exam and
a final exam in many introductory level courses. Of course, standardized
tests are also given to students to assess their relative performance com-
pared to other students in their country and assign them a percentile
ranking. However, for purposes of this chapter, we focus on the testing
that occurs in the classroom as part of the course or self-testing that
students may use themselves as a study strategy (although surveys show
that this practice is not widespread).
Why might testing improve performance? One key benefit is the
active retrieval that occurs during tests. William James (1890, p. 646)
wrote:
while they listened to a story, with instructions that they would later be
asked to recall the names of the pictures. The pictures were integrated into
the story so that when an object was named in the story, the picture
appeared on the screen. Subjects were told that paying attention to the
story would help them retain the pictures (which was true). After hearing
the story and seeing the pictures, subjects were given free recall tests in
which they were given a blank sheet of paper and had to recall as many of
the names of the 60 pictures as possible.
After hearing the story, one group of subjects was told that they could
leave and return a week later for a test. A second group was given a single
test that lasted 7 min and then they were excused. The third group was
given three successive 7-min tests after the learning phase; that is, they
recalled the pictures once, were given a new blank sheet and recalled as
many items as possible a second time, and then repeated the process a third
time. The group that recalled pictures once recalled about 32 pictures and
the group that recalled them three times recalled 32, 35, and 36 pictures
(i.e., performance increased across tests, a phenomenon called hyperm-
nesia; Erdelyi & Becker, 1974).
For present purposes, the data of most interest are those on the final
retention test 1 week later when the students returned to the lab for more
testing. Students in all three groups had heard the story and seen the
pictures once, so the only difference among the three groups was how
many tests they had taken just after studying the materials (0, 1, or 3). How
did this manipulation affect recall? The data to answer this question are
shown in Figure 1, where it can be seen that those who had not been
tested recalled 17.4 pictures, those who had been tested once recalled 23.3
pictures, and those who had previously been tested three times recalled
31.8 pictures. Thus, taking three tests improved recall by nearly 80% a
week later relative to the condition with no tests.
Another way to consider the data is by comparing the scores on the
immediate test just after study to those a week later. Recall that on the first
test after study, subjects produced about 32 items. We can assume that
those subjects who were not tested immediately after study could have
recalled 32 had they been tested, yet a week later they could recall only 17,
showing 45% forgetting. However, the group that was tested three times
immediately were still able to recall 32 items a week after study, thus
giving three tests essentially eliminated forgetting after a week. This
outcome shows the power of testing.
Yet a critic might complain that the Wheeler and Roediger (1992)
results could be due to an artifact. Perhaps, the critic would maintain, the
outcome in Figure 1 has nothing to do with testing per se. Rather, all
‘‘testing’’ did was to permit selective restudy of information. The group
that did not take a test did not restudy any material, whereas the group that
took the single test restudied 32 of the 60 pictures, and the group with
6 Henry L. Roediger et al.
[(Figure_1)TD$IG]
35
31.8
30
Number of pictures recalled
25 23.3
20
17.4
15
10
0
0 1 3
Number of initial tests
Figure 1 The number of pictures recalled on a final recall test after a 1-week delay,
adapted from Table 1 of Wheeler and Roediger (1992). The number of initial tests
strongly influenced final test recall. On the first immediate recall test, subjects recalled,
on average, 32.25 pictures. The results indicate that taking three immediate recall tests
will effectively eliminate forgetting over a 1-week period.
three tests restudied 32, then 35, and finally 36 pictures (mostly studying
the same items each time). Perhaps it was merely this process of restudy-
ing that led to good performance a week later. After all, it is hardly a
surprise to find that the more often a person studies material, the better
they remember it. Thompson, Wenger, and Bartling (1978) voiced this
interpretation of testing research. In a similar vein, Slamecka and Katsaiti
(1988) argued that repeated testing may create overlearning on a certain
subset of items and that such overlearning is somehow responsible for the
effect.
These criticisms of the testing effect are often voiced, but dozens of
studies have laid them to rest by including a ‘‘restudy’’ control group in
addition to a testing group. That is, in the comparison condition, students
restudy the set of material for the same amount of time that others are
engaged in taking a test. When this procedure is followed, the testing
group is at a disadvantage in terms of restudy of information compared to
the restudy group. The reason is that in the testing condition subjects only
have the opportunity to restudy the amount of information they can recall
(about 53%—32 60 100—in the Wheeler and Roediger study),
whereas in the restudy condition subjects usually receive the entire set of
material again (100%). Thus, if the testing effect were due to restudying,
Ten Benefits of Testing and Their Applications to Educational Practice 7
using such a restudy control should make the testing effects disappear or
even reverse. However, this does not happen, at least on delayed tests.
Consider an experiment by Roediger and Karpicke (2006a). They
used relatively complex prose passages on such topics as ‘‘sea otters’’ that
were full of facts. The test given was free recall; subjects were asked to
recall as much as they could from the passage when given its name and the
protocols were scored in terms of the number of idea units recalled from
the passage. In one condition, subjects studied the passage once and were
tested on it three times; on each test, they recalled about 70% of the
material. Another group studied the passage three times and was tested
once (recalling 77%). Finally, a third group studied the passage four times,
so subjects had the greatest study exposure to the material (reading the
passage four times) in this condition. Thus, subjects in the three condi-
tions were exposed in one form or another to the material four times via
various numbers of studies and test events. We can label the conditions
STTT, SSST, and SSSS, where S stands for study of the passage and T
stands for its testing.
The data of critical interest were those that occurred on a final crite-
rion test, which was given 5 min or 1 week after the learning session. As
can be seen in the left-hand side of Figure 2, when the final test was given
shortly after the initial four study/test periods, recall was correlated with
the number of study episodes: the SSSS condition led to better perfor-
mance than the SSST condition that in turn was better than STTT
condition. As students have known for generations, cramming does work
if a test occurs immediately after studying. However, for subjects given the
final test a week later, exactly the opposite ordering of performance
emerged: the more students had been tested during the learning session,
the better was performance. This outcome occurred despite the fact that
subjects who had repeatedly studied the material had received much more
exposure to it. Once again, receiving tests greatly slowed down forgetting
(see also Karpicke, 2009; Karpicke & Roediger, 2008; Wheeler, Ewers, &
Buonanno, 2003). Another point to take from Figure 2 is that a testing
effect is more likely to emerge at longer delays after study. On a test given
soon after studying, repeated studying can lead to performance greater
than that with testing.
We could add dozen more experiments to this section on the basic
testing effect (e.g., Carpenter & DeLosh, 2005, 2006; Cull, 2000; Pyc &
Rawson, 2007), but we will desist. Many experiments will be reviewed
later that have the same kind of design and establish conditions in which
testing memory produces a mnemonic boost relative to a restudy control
condition (as in Roediger & Karpicke, 2006a) or relative to a condition
with no further exposure (as in Wheeler & Roediger, 1992). However,
even in the latter case, we can rest assured that the testing effect is mostly
due to causes other than restudying the material.
8 Henry L. Roediger et al.
[(Figure_2)TD$IG]
1.0
SSSS
SSST
0.8 STTT
Proportion of idea units correctly recalled
0.6
0.4
0.2
0.0
5 min 1 week
Retention interval
Figure 2 Mean number of idea units recalled on the final test taken 5 min or 1 week
after the initial learning session. During learning, subjects studied prose passages and
then completed a varying number of study (S) and test (T) periods. Error bars
represent standard errors of the mean (estimated from Figure 2 of Roediger and
Karpicke (2006a)).
Adapted from Experiment 2 of Roediger and Karpicke (2006a).
missed than those that were correctly retrieved (see Son & Kornell,
2008).
Kornell and Bjork (2007) provided evidence from a laboratory exper-
iment that students are typically unaware that learning can occur during
testing. In one experiment, students learned a set of Indonesian–English
vocabulary words by repeated trials. They had the option of studying the
pairs or being tested on them (with feedback) on each occasion and could
switch between the two modes at any point. Most students began in the
study mode, although nearly everyone changed to the test mode after the
first two trials. Kornell and Bjork interpreted this outcome as indicating
that students wanted to achieve a basic level of knowledge before testing
themselves. In addition, Kornell and Bjork also reported the results of a
survey in which students were asked whether they quizzed themselves
while studying (using a quiz at the end of a chapter, a practice quiz,
flashcards, or something else); 68% of respondents replied that they
quizzed themselves ‘‘to figure out how well I have learned the informa-
tion I’m studying’’ (Kornell & Bjork, 2007, p. 222). Only 18% of respon-
dents recognized that testing actually facilitated further learning.
In another survey on study habits, Karpicke, Butler, and Roediger
(2009) asked college students to list their most commonly used study
habits (rather than asking directly if they used testing, as in the Kornell
and Bjork (2007) survey). When the question was framed in this open-
ended manner, only 11% of students listed retrieval practice as a study
technique they used, suggesting that students may be generally unaware of
the direct or indirect benefits of testing. On a forced response question,
students had to choose between studying and testing in a hypothetical
situation of preparing for a test. Only 18% of students chose to self-test
and more than half of those explained that they chose to self-test to
identify what they did or did not know to guide further study. Thus these
two points are in broad agreement with the Kornell and Bjork (2007)
findings.
In further surveys, McCabe (2011) found that college students’ knowl-
edge of effective study strategies is quite poor without specific instruction.
She provided students with educational scenarios and asked them to select
study strategies that would be effective. She based her strategies on findings
from cognitive psychology studies, including such principles as dual coding
and retrieval practice. McCabe found that students were generally unaware
of the effectiveness of the strategies. If this is the case with college students,
one can only assume that high school students and others in lower grades
would, at best, show the same outcome.
Testing one’s memory allows one to evaluate whether the informa-
tion is really learned and accessible. Karpicke et al. (2009) suggested that
one of the reasons students reread materials rather than testing themselves
is that rereading leads to increased feelings of fluency of the material—it
10 Henry L. Roediger et al.
seems so familiar as they reread it they assume they must know it. Also, in
contrast to self-testing, restudying is easy. In short, students may lack
metacognitive awareness of the direct benefits of testing, while at the
same time understand that self-testing can be useful as a guide to future
studying. Testing helps students learn because it helps them understand
what facts they might not know, so they can allocate future study time
accordingly.
[(Figure_3)TD$IG]
1.0
0.8
Proportion recalled
0.6
0.4
STSTST
SSSTST
0.2 SSSSST
0.0
0 2 4 6
Test period
[(Figure_4)TD$IG]
1
Repeated studying
Repeated testing
0.8
Proportion recall
0.6
0.4
0.2
0
Repeated studying Repeated testing
this material was related to the questions that had been answered on the
initial test. In the restudy condition, students read the answers but did not
receive a test. After 24 h, all the students returned to complete a final test
covering the entire passage. Results of the final test revealed that retention
of the nontested information was superior when students had taken a test
relative to conditions in which they restudied the material or in which
they had no further exposure after study. Chan et al. concluded that
testing not only improves retention for information covered within a test,
but also improves retention for nontested information, at least when that
information is related to the tested information.
In contrast, other researchers have found that retrieving some infor-
mation may actually lead to forgetting of other information, a finding
termed retrieval-induced forgetting (e.g., Anderson, Bjork, & Bjork,
1994). In a typical retrieval-induced forgetting experiment, subjects first
study words in categories and then take an initial test. For some categories,
half of the items are repeatedly retrieved during the initial test; for other
categories, none of the items are retrieved during the initial test. The
general finding is that the unpracticed items from the categories cued for
retrieval practice are impaired on a later retention test relative to items
from the nontested categories.
Retrieval-induced facilitation and retrieval-induced forgetting are
obviously contradictory findings. Consequently, Chan (2009) sought to
differentiate between conditions causing facilitation and conditions caus-
ing forgetting in these paradigms. In two experiments, he demonstrated
the importance of integration of the materials and the delay of the test for
the retrieval-induced facilitation and retrieval-induced forgetting effects.
In his first experiment, subjects studied two prose passages; each passage
was presented one sentence at a time on the computer. During study,
some subjects were given the sentences in a coherent order and were told
to integrate the information (the high integration condition). For another
group of subjects, the sentences within each paragraph were scrambled to
disrupt integration of information during study (the low integration
condition). Similar to the Chan et al. (2006) experiments, an initial test
occurred immediately after studying one of the passages, and subjects
completed the same test twice in a row. Subjects completed the final test
covering material from both the passages 20 min or 24 h after the com-
pletion of the initial learning phase.
Figure 5 depicts performance on the final test. Results reveal both a
retrieval-induced facilitation effect (see the fourth pair of bars in Figure 5)
and a retrieval-induced forgetting effect (see the first pair of bars in
Figure 5) within the same experiment. This outcome demonstrates the
importance of both integration of materials and delay of the final test for
these effects. When subjects were instructed to integrate the information
during study (i.e., the high integration condition) and the test was delayed
Ten Benefits of Testing and Their Applications to Educational Practice 19
[(Figure_5)TD$IG]
0.8
Control
0.7 Nontested - related
Probability of correct recall
0.6
0.5
0.4
0.3
Low integration High integration Low integration High integration
20 min 24 h
Figure 5 Performance on the final test for questions drawn from the passage that
was not tested initially (control items) and questions drawn from the tested passage but
were not present on the initial test (nontested related items). During the initial learning
session, subjects studied two passages either in a coherent order with integration
instructions or in a randomized order (low integration). Subjects completed an
initial test for one of the passages. The final test was completed 20 min or 24 h after
the initial learning session. Error bars represent standard errors.
Adapted from Experiment 1 of Chan (2009).
0
Correct recall Intrusions Correct Recall
Initial Test Final Test
12
List 5 (nmber of words recalled)
0
Correct recall Intrusions Correct recall
Initial Test Final Test
Figure 6 Mean number of words recalled from list 5 on the initial test and the final
test when both interrelated lists (top panel) and unrelated words lists were used
(bottom panel). Error bars represent standard errors of the mean (estimated from
Figures 1 and 2 of Szpunar et al. (2008)). Subjects learned five successive lists of
words and between each list some subjects completed a free recall test while other
subjects completed a filler task (math problems). All subjects were tested after list 5
and were given a final cumulative free recall test.
24 Henry L. Roediger et al.
just beginning) can fail to realize the state of knowledge of their students
and pitch their presentations at too high a level. (Most readers can think of
their first calculus or statistics course in this regard.) The general idea is
that once we know something and understand it well, it is hard to imagine
what it was like not to know it. For example, Newton (1990) conducted
a study in which students sat across from each other separated by a
screen. Each was given a list of 25 common tunes that most Americans
know (Happy Birthday to You, the Star Spangled Banner, etc.). One
student (the sender) was picked to tap out the tune with his or her
knuckles on the table and give an estimate of the likelihood that the
other student could name the tune. The other student (the receiver)
tried to decipher the tune and name it. This is a classic situation
similar to a teacher and student where one person knows the infor-
mation (tune, in this case) and is trying to communicate it to the other
person who does not know it. When the senders judged how well
they did in communicating the tune to the other student, they
thought they succeeded about 50% of the time. However, the students
on the receiving end of the taps could recognize the tune only 3% of
the time! When the sender was tapping out Happy Birthday, she was
hearing all that music in her mind’s ear and tapping in time to it.
What the receiver heard, however, was a series of erratic taps. This tale
is an allegory of an expert in a subject matter trying to teach it to a
novice, especially the first time. Again, it is hard to know what it is
like not to know something you know well.
One hopeful new technology may help overcome the instructor’s curse
of knowledge. The introduction of student response (clicker) systems that
permit teachers to quiz students’ understanding during lectures may pro-
vide assessment on the fly. Teachers can give 2–3-item quizzes in the
middle of a lecture to assess understanding of a difficult point; if many
students fail to answer correctly, the instructor can go back and try to
present the information in a different way. As smart phones increase in use
and become more standardized, they may be adapted in classrooms for the
same purpose. These new technologies represent a relatively new approach
that provides immediate feedback to both students and instructors about
students’ understanding.
A more formal approach that utilizes testing to understand the current
state of individual students is referred to as formative assessment (Black &
Wiliam, 1998a, 1998b; for a brief review of formative assessment from a
cognitive psychology perspective, see Roediger and Karpicke (2006b)).
Formative assessment not only helps teachers better understand what their
students know, but also aims to improve the metacognitive judgments of
the students’ own knowledge. Students will be better able to assess their
current knowledge state and their goal knowledge state, as well as under-
stand what steps they need to take to close that gap if they are given proper
26 Henry L. Roediger et al.
that retrieval practice produced better retention later than did concept
mapping, a widely used study technique. We expect that when other such
studies are conducted, they may show that some quizzing is as beneficial
as, or more beneficial than, an equal amount of time spent on lecturing
(just as testing is better than restudying). In addition, as discussed above,
having classroom quizzes may keep motivation up and provide the indi-
rect benefit of having students study more. At any rate, we do not think
this criticism holds water, but future research may change our opinion.
Second, critics sometimes argue that retrieval practice through testing
produces ‘‘rote’’ learning of a superficial sort, as if the student can parrot
back the information but not really understand it or know it in a deep
fashion. Learning is said to become ‘‘inert’’ or ‘‘encapsulated’’ in little
factoid bubbles. Perhaps this criticism is justified in some cases, but we
think that good programs of quizzing with feedback usually prevent this
problem. We reviewed evidence previously showing that retrieval (via
testing) can lead to deep knowledge that can be used flexibly and trans-
ferred to other contexts (e.g., Butler, 2010). Again, the burden is on the
critics to show that testing leads to problems rather than simply asserting
that these problems might exist. The next two criticisms are based on data
and must be taken more seriously.
Third, many studies have documented a phenomenon variously called
output interference (Tulving & Arbuckle, 1966), the inhibitory effects of
recall (Roediger, 1974, 1978), or retrieval-induced forgetting
(Anderson et al., 1994). The basic phenomenon is that while the act of
retrieval may boost recall of the retrieved information (the testing effect),
it can actually harm recall of nontested information. We discussed this
point in Section 7. Thus, in educational settings, the fear is that if students
repeatedly retrieve some information, they may actually cause themselves
to forget other information.
There is now a vast literature on these topics (see B€auml (2008) for a
review). Although the various phenomena encapsulated under the rubric
of retrieval-induced forgetting are highly reliable, as we discussed in
Section 7, the implications for educational practice may not be great.
For one thing, the phenomenon is often short lived, so if a delay is
interposed between retrieval practice and testing, the inhibition dissipates
or even evaporates altogether (MacLeod & Macrae, 2001). In addition,
most experiments on retrieval-induced forgetting have used word lists. As
noted in Section 7, when well-integrated materials such as prose passages
are used, the inhibition effect can disappear (Anderson & McCulloch,
1999) or even reverse altogether, leading to retrieval-induced facilitation
(Chan et al., 2006). As discussed previously, Chan and his collaborators
(see also Chan, 2009, 2010) showed that testing can sometimes enhance
recall of material related to the tested material. Thus, although much
research remains to be done, the various phenomena showing that testing
30 Henry L. Roediger et al.
13. CONCLUSION
We have reviewed 10 reasons why increased testing in educational
settings is beneficial to learning and memory, as a self-study strategy for
students or as a classroom tactic. The benefits can be indirect—students
study more and attend more fully if they expect a test – but we have
emphasized the direct effects of testing. Retrieval practice from testing
provides a potent boost to future retention. Retrieval practice provides a
relatively straightforward method of enhancing learning and retention in
educational settings. We end with our 10 benefits of testing in summary
form:
Benefit 1: The testing effect: Retrieval aids later retention.
Benefit 2: Testing identifies gaps in knowledge.
Benefit 3: Testing causes students to learn more from the next learning
episode.
Benefit 4: Testing produces better organization of knowledge.
Benefit 5: Testing improves transfer of knowledge to new contexts.
Benefit 6: Testing can facilitate retrieval of information that was not tested.
32 Henry L. Roediger et al.
REFERENCES
Abbott, E. E. (1909). On the analysis of the factors of recall in the learning process.
Psychological Monographs, 11, 159–177.
Amlund, J. T., Kardash, C. A., & Kulhavy, R. W. (1986). Repetitive reading and recall of
expository test. Reading Research Quarterly, 21, 49–58.
Anderson, M. C., Bjork, R. A., & Bjork, E. L. (1994). Remembering can cause for-
getting: Retrieval dynamics in long-term memory. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 20, 1063–1087.
Anderson, M. C., & McCulloch, K. C. (1999). Integration as a general boundary con-
dition on retrieval-induced forgetting. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 25, 608–629.
Bacon, F. T. (1979). Credibility of repeated statements: Memory for trivia. Journal of
Experimental Psychology: Human Learning and Memory, 5, 241–252.
Barnett, S. M., & Ceci, S. J. (2002). When and where do we apply what we learn? A
taxonomy for far transfer. Psychological Bulletin, 128, 612–637.
B€auml, K. H. (2008). Inhibitory processes. In H. L. Roediger (Ed.), Cognitivepsychologyof
memory (pp. 195–217). Vol. 2 of Learning and Memory: A comprehensive reference, 4 vols
( J. Byrne, Ed.). Oxford: Elsevier.
Begg, I., Armour, V., & Kerr, T. (1985). On believing what we remember. Canadian
Journal of Behavioral Science, 17, 199–214.
Bjork, R. A., & Bjork, E. L. (1992). A new theory of disuse and an old theory of stimulus
fluctuation. In A. Healy, S Kosslyn, & R. Shiffrin, (Eds.), From learning processes to
cognitive processes: Essays in honor of William K. Estes (Vol. 2, pp. 35–67). Hillsdale,
NJ: Erlbaum.
Black, P., & Wiliam, D. (1998a). Assessment and classroom learning. Assessment in
Education: Principles, Policy & Practice, 5, 7–74.
Black, P., & Wiliam, D. (1998b). Inside the black box: Raising standards through class-
room assessment. Phi Delta Kappan, 80, 139–147.
Brown, A. S., Schilling, H. E. H., & Hockensmith, M. L. (1999). The negative suggestion
effect: Pondering incorrect alternatives may be hazardous to your knowledge. Journalof
Educational Psychology, 91, 756–764.
Butler, A. C. (2010). Repeated testing produces superior transfer of learning relative to
repeated studying. Journal of Experimental Psychology: Learning, Memory, and Cognition,
36, 1118–1133.
Butler, A. C., Karpicke, J. D., & Roediger, H. L. (2007). The effect and timing of feedback
on learning from multiple-choice tests. Journal of Experimental Psychology: Applied, 13,
273–281.
Butler, A. C., Karpicke, J. D., & Roediger, H. L. (2008). Correcting a metacognitive
error: Feedback increases retention of low-confidence correct responses. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 34, 918–928.
Ten Benefits of Testing and Their Applications to Educational Practice 33
Butler, A. C., & Roediger, H. L. (2008). Feedback enhances the positive effects and
reduces the negative effects of multiple-choice testing. Memory & Cognition, 36,
604–616.
Carpenter, S. K., & DeLosh, E. L. (2005). Application of the testing and spacing effects to
name learning. Applied Cognitive Psychology, 19, 619–636.
Carpenter, S. K., & DeLosh, E. L. (2006). Impoverished cue support enhances subsequent
retention: Support for the elaborative retrieval explanation of the testing effect.
Memory & Cognition, 34, 268–276.
Carpenter, S. K., Pashler, H., & Vul, E. (2006). What types of learning are enhanced by a
cued recall test? Psychonomic Bulletin & Review, 13, 826–830.
Carrier, M., & Pashler, H. (1992). The influence of retrieval on retention. Memory &
Cognition, 20, 633–642.
Chan, J. C. K. (2009). When does retrieval induce forgetting and when does it induce
facilitation? Implications for retrieval inhibition, testing effect, and text processing.
Journal of Memory and Language, 61, 153–170.
Chan, J. C. K. (2010). Long-term effects of testing on the recall of nontested materials.
Memory, 18, 49–57.
Chan, J. C. K., McDermott, K. B., & Roediger, H. L. (2006). Retrieval-induced facil-
itation: Initially nontested material can benefit from prior testing of related material.
Journal of Experimental Psychology: General, 135, 553–571.
Congleton, A., & Rajaram, S. (2010, November). Examining the immediate and delayed
aspects of the testing e¡ect. Paper presented at the meeting of Psychonomic Society, St.
Louis, MO.
Cranney, J., Ahn, M., McKinnon, R., Morris, S., & Watts, K. (2009). The testing effect,
collaborative learning, and retrieval-induced facilitation in a classroom setting.
EuropeanJournal of Cognitive Psychology, 21, 919–940.
Crowder, R. G. (1976). Principles of learning and memory. Hillsdale, NJ: Erlbaum.
Cull, W. L. (2000). Untangling the benefits of multiple study opportunities and repeated
testing for cued recall. Applied Cognitive Psychology, 14, 215–235.
Detterman, D. K. (1993). The case for the prosecution: Transfer as an epiphenomenon. In
D. K. Detterman, and R. J. Sternberg (Eds.),Transfer on trial: Intelligence, cognition, and
instruction (pp. 1–24). Westport, CT: Ablex Publishing.
Dunlosky, J., & Hertzog, C. (1998). Training programs to improve learning in later
adulthood: Helping older adults educate themselves. In D. J. Hacker, J. Dunlosky,
and A. C. Graesser, (Eds.), Metacognition in educational theory and practice (pp. 249–275).
Mahwah, NJ: Erlbaum.
Ebbinghaus, H. (1885). Über das Ged€achtnis. Leipzig: Duncker & Humblot.
Erdelyi, M. H., & Becker, J. (1974). Hypermnesia for pictures: Incremental memory for
pictures but not words in multiple recall trials. Cognitive Psychology, 6, 159–171.
Fazio, L. K., Agarwal, P. K., Marsh, E. J., & Roediger, H. L. (2010). Memorial con-
sequences of multiple-choice testing on immediate and delayed tests. Memory &
Cognition, 38, 407–418.
Finn, B., & Metcalfe, J. (2007). The role of memory for past test in the under-confidence
with practice effect. Journal of Experimental Psychology: Learning, Memory, and Cognition,
33, 238–244.
Finn, B., & Metcalfe, J. (2008). Judgments of learning are influenced by memory for past
test. Journal of Memoryand Language, 58, 19–34.
Gates, A. I. (1917). Recitation as a factor in memorizing. Archives of Psychology, 6(40).
Gick, M. L., & Holyoak, K. J. (1980). Analogical problem solving. CognitivePsychology, 12,
306–355.
Glover, J. A. (1989). The ‘‘testing’’ phenomenon: Not gone but nearly forgotten. Journalof
Educational Psychology, 81, 392–399.
34 Henry L. Roediger et al.
Hasher, L., Goldstein, D., & Toppino, T. (1977). Frequency and the conference of
referential validity. Journal ofVerbal Learning andVerbal Behavior, 16, 107–112.
Izawa, C. (1966). Reinforcement-test sequences in paired-associate learning. Psychological
Reports, 18, 879–919.
Izawa, C. (1968). Function of test trials in paired-associate learning. JournalofExperimental
Psychology, 75, 194–209.
Izawa, C. (1970). Optimal potentiating effects and forgetting-prevention effects of tests in
paired-associate learning. Journal of Experimental Psychology, 83, 340–344.
Jacoby, L. L., Wahlheim, C. N., & Coane, J. H. (2010). Test-enhanced learning of natural
concepts: Effects on recognition memory, classification, and metacognition. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 36, 1441–1451.
James, W. (1980). The principles of psychology. New York: Holt.
Johnson, C. I., & Mayer, R. E. (2009). A testing effect with multimedia learning. Journalof
Educational Psychology, 101, 621–629.
Jones, H. E. (1923). The effects of examination on the performance of learning. Archivesof
Psychology, 10, 1–70.
Karpicke, J. D. (2009). Metacognitive control and strategy selection: Deciding to practice
retrieval during learning. Journal of Experimental Psychology: General, 138, 469–486.
Karpicke, J. D., & Blunt, J. R. (2011). Retrieval practice produces more learning than
elaborative studying with concept mapping. Science, 331(6018), 772–775.
Karpicke, J. D., Butler, A. C., & Roediger, H. L. (2009). Metacognitive strategies in
student learning: Do students practise retrieval when they study on their own? Memory,
17, 471–479.
Karpicke, J. D., & Roediger, H. L. (2007). Repeated retrieval during learning is the key to
long-term retention. Journal of Memoryand Language, 57, 151–162.
Karpicke, J. D., & Roediger, H. L. (2008). The critical importance of retrieval for
learning. Science, 319, 966–968.
Kelly, C. M. (1999). Subjective experience as a basis for ‘‘objective’’ judgments: Effects of
past experience on judgments of difficulty. In D. Gopher & A. Koriat (Eds.), Attention
and Performance XVII, 515–536.
Koriat, A., Scheffer, L., & Ma’ayan, H. (2002). Comparing objective and subjective
learning curves: Judgments of learning exhibit increased underconfidence with prac-
tice. Journal of Experimental Psychology: General, 131, 147–162.
Kornell, N., & Bjork, R. A. (2007). The promise and perils of self-regulated study.
Psychonomic Bulletin & Review, 14, 219–224.
Kornell, N., & Metcalfe, J. (2006). Study efficacy and the region of proximal learning
framework. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32,
222–609.
Kornell, N., & Son, L. K. (2009). Learners’ choices and beliefs about self-testing. Memory,
17, 493–501.
Leeming, F. C. (2002). The exam-a-day procedure improves performance in psychology
classes. Teaching of Psychology, 29, 210–212.
Lyle, K. B., & Crawford, N. A. (2011). Retrieving essential material at the end of lectures
improves performance on statistics exams. Teaching of Psychology, 38, 94–97.
MacLeod, M. D., & Macrae, C. (2001). Gone but not forgotten: The transient nature of
retrieval-induced forgetting. Psychological Science, 12, 148–152.
Marsh, E. J., Agarwal, P. K., & Roediger, H. L. (2009). Memorial consequences of
answering SAT II questions. Journal of Experimental Psychology: Applied, 15, 1–11.
Marsh, E. J., Roediger, H. L., Bjork, R. A., & Bjork, E. L. (2007). The memorial
consequences of multiple-choice testing. Psychonomic Bulletin & Review, 14, 194–199.
Ten Benefits of Testing and Their Applications to Educational Practice 35
Masson, M. E., & McDaniel, M. A. (1981). The role of organizational processes in long-
term retention. Journal of Experimental Psychology: Human Learning and Memory, 2,
100–110.
Mawhinney, V. T., Bostow, D. E., Laws, D. R., Blumenfeld, G. J., & Hopkins, B. L. (1971).
A comparison of students studying-behavior produced by daily, weekly, and three-
week testing schedules. Journal of Applied BehaviorAnalysis, 4, 257–264.
McCabe, J. (2011). Metacognitive awareness of learning strategies in undergraduates.
Memory & Cognition, 39, 462–476.
McDermott, K. B., & Arnold, K. M. (2010, November). Test taking facilitates future
learning. Paper presented at the meeting of the Psychonomic Society, St. Louis, MO.
Metcalfe, J. (2002). Is study time allocated selectively to a region of proximal learning?
Journal of Experimental Psychology: General, 131, 349–363.
Metcalfe, J., & Finn, B. (2008). Evidence that judgments of learning are causally related to
study choice. Psychonomic Bulletin & Review, 15, 174–179.
Michael, J. (1991). A behavioral perspective on college teaching. Behavioral Analysis, 14,
229–239.
Nelson, T. O., Dunlosky, J., Graf, A., & Narens, L. (1994). Utilization of metacognitive
judgments in the allocation of study during multitrial learning. Psychological Science, 5,
207–213.
Nelson, T. O., & Leonesio, R. J. (1988). Allocation of self-paced study time and the
‘‘labor-in-vain effect.’’. Journal of Experimental Psychology: Learning, Memory and
Cognition, 14, 676–686.
Newton, L. (1990). Overcon¢dence in the communication of intent: Heard and unheard melodies.
Unpublished doctoral dissertation. Stanford, CA: Stanford University.
Pashler, H., Cepeda, N. J., Wixted, J. T., & Rohrer, D. (2005). When does feedback
facilitate learning of words? Journal of Experimental Psychology: Learning, Memory, and
Cognition, 31, 3–8.
Pyc, M. A., & Rawson, K. A. (2007). Examining the efficiency of schedules of distributed
retrieval practice. Memory & Cognition, 35, 1917–1927.
Pyc, M. A., & Rawson, K. A. (2010). Why testing improves memory: Mediator effec-
tiveness hypothesis. Science, 330, 335.
Remmers, H. H., & Remmers, E. M. (1926). The negative suggestion effect on true–false
examination questions. Journal of Educational Psychology, 17, 52–56.
Roediger, H. L. (1974). Inhibiting effects of recall. Memory & Cognition, 2, 261–269.
Roediger, H. L. (1978). Recall as a self-limiting process. Memory & Cognition, 6, 54–63.
Roediger, H. L., & Karpicke, J. D. (2006a). Test enhanced learning: Taking memory tests
improves long-term retention. Psychological Science, 17, 249–255.
Roediger, H. L., & Karpicke, J. D. (2006b). The power of testing memory: Basic research
and implications for educational practice. Perspectives on Psychological Science, 1,
181–210.
Roediger, H. L., & Marsh, E. J. (2005). The positive and negative consequences of
multiple-choice testing. Journal of Experimental Psychology: Learning, Memory and
Cognitive, 31, 1155–1159.
Roenker, D. L., Thompson, C. P., & Brown, S. C. (1971). Comparison of measures for the
estimation of clustering in free recall. Psychological Bulletin, 76, 45–48.
Rohrer, K., Taylor, K., & Sholar, B. (2010). Tests enhance the transfer of learning. Journal
of Experimental Psychology: Learning, Memory, and Cognition, 36, 233–239.
Shaughnessy, J. J., & Zechmeister, E. B. (1992). Memory-monitoring accuracy as influ-
enced by the distribution of retrieval practice. Bulletin of the Psychonomic Society, 30,
125–128.
Slamecka, N. J., & Katsaiti, L. T. (1988). Normal forgetting of verbal lists as a function of
prior testing. Journal ofVerbal Learning andVerbal Behavior, 10, 400–408.
36 Henry L. Roediger et al.
Son, L. K., & Kornell, N. (2008). Research on the allocation of study time: Key studies
from 1890 to the present (and beyond). In J. Dunlosky, & R. A. Bjork, (Eds.), Ahand-
book of memoryand metamemory (pp. 333–351). Hillsdale, NJ: Psychology Press.
Spitzer, H. F. (1939). Studies in retention. Journal of Educational Psychology, 30, 641–656.
Szpunar, K. K., McDermott, K. B., & Roediger, H. L. (2007). Expectation of a final
cumulative test enhances long-term retention. Memory & Cognition, 35, 1007–1013.
Szpunar, K. K., McDermott, K. B., & Roediger, H. L. (2008). Testing during study
insulates against the buildup of proactive interference. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 34, 1392–1399.
Thomas, A. K., & McDaniel, M. A. (2007). Metacomprehension for educationally
relevant materials: Dramatic effects of encoding-retrieval interactions. Psychonomic
Bulletin & Review, 14, 212–218.
Thompson, C. P., Wenger, S. K., & Bartling, C. A. (1978). How recall facilitates subse-
quent recall: A reappraisal. Journal of Experimental Psychology: Human Learning and
Memory, 4, 210–221.
Toppino, T. C., & Brochin, H. A. (1989). Learning from tests: The case of true–false
examinations. Journal of Educational Research, 83, 119–124.
Toppino, T. C., & Luipersbeck, S. M. (1993). Generality of the negative suggestion effect
in objective tests. Journal of Educational Psychology, 86, 357–362.
Tulving, E. (1962). Subjective organization in free recall of ‘‘unrelated’’ words.
Psychological Review, 69, 344–354.
Tulving, E. (1967). The effects of presentation and recall of material in free-recall learning.
Journal ofVerbal Learning andVerbal Behavior, 6, 175–184.
Tulving, E., & Arbuckle, T. (1966). Input and output interference in short-term associa-
tive memory. Journal of Experimental Psychology, 72, 145–150.
Underwood, B. J. (1957). Interference and forgetting. Psychological Review, 64, 49–60.
Wheeler, M. A., Ewers, M., & Buonanno, J. F. (2003). Different rates of forgetting
following study versus test trials. Memory, 11, 571–580.
Wheeler, M. A., & Roediger, H. L. (1992). Disparate effects of repeated testing:
Reconciling Ballard’s (1913) and Bartlett’s (1932) results. Psychological Science, 3,
240–245.
Zaromb, F. M. (2010). Organizational processes contribute to the testing effect in free
recall. (Unpublished doctoral dissertation). Washington University of St. Louis, Saint
Louis, MO.
Zaromb, F. M., & Roediger, H. L. (2010). The testing effect in free recall is associated with
enhanced organizational processes. Memory & Cognition, 38, 995–1008.