Test Enhanced Learning
Test Enhanced Learning
Research Article
Test-Enhanced Learning
Taking Memory Tests Improves Long-Term Retention
Henry L. Roediger, III, and Jeffrey D. Karpicke
Washington University in St. Louis
ABSTRACTTaking
Volume 17Number 3
the future than if they had not been tested. This phenomenon,
called the testing effect, has been studied sporadically over a
long period of time (e.g., Gates, 1917), but is not well known
outside cognitive psychology.
Most experiments on the testing effect have been conducted in
the verbal learning tradition using word lists (e.g., Hogan &
Kintsch, 1971; Izawa, 1967; McDaniel & Masson, 1985;
Thompson, Wenger, & Bartling, 1978; Tulving, 1967; Wheeler,
Ewers, & Buonanno, 2003) or picture lists (Wheeler & Roediger,
1992) as materials. There have been a few experiments using
materials found in educational contexts, beginning with Spitzer
(1939; see too Glover, 1989, and McDaniel & Fisher, 1991).
However, the title of Glovers article from 17 years ago still sums
up the current state of affairs: The testing phenomenon: Not
gone but nearly forgotten.
Our aim in the two experiments reported here was to investigate
the testing effect under educationally relevant conditions, using
prose materials and free-recall tests without feedback (somewhat
akin to essay tests used in education). Most previous research has
used tests involving recognition (like multiple-choice tests) or
cued recall (like short-answer tests). A second purpose of our
experiments was to determine whether testing facilitates learning
beyond the benefits of restudying the material. In some testingeffect experiments, a study-test condition is compared with a
study-only condition on a delayed retention test. When the subjects in the former condition outperform those in the latter on a
final test, one can wonder whether the testing effect is simply due
to study-test subjects being reexposed to the material during the
test. It is no surprise that students will learn more with two
presentations of material rather than one (although some of the
word-list experiments cited earlier overcame this problem; see too
Carrier & Pashler, 1992; Cull, 2000). To evaluate this restudying
explanation of the testing effect, we had students in our control
conditions restudy the entire set of materialwhich should, if
anything, bias performance results in favor of this condition,
because students who take free-recall tests (without feedback)
can only reexperience whatever material they can recall.
Students in our experiments studied short prose passages
covering general scientific topics. In Experiment 1, they either
249
Test-Enhanced Learning
Method
Subjects
One hundred twenty Washington University undergraduates,
ages 18 to 24, participated in partial fulfillment of course requirements.
Materials
Two prose passages were selected from the reading comprehension section of a test-preparation book for the Test of English
as a Foreign Language (TOEFL; Rogers, 2001). Each passage
covered a single topic (The Sun and Sea Otters), and each
was divided into 30 idea units for scoring purposes. The passages were 256 and 275 words in length, respectively.
Design
A 2 3 mixed-factorial design was used. Learning condition
(restudy vs. test) was manipulated within subjects, and delay of
the final test (5 min, 2 days, or 1 week) was manipulated between
subjects. The order of learning conditions (restudy or test) and
the order of passages (The Sun or Sea Otters) were counterbalanced across subjects.
Procedure
Subjects were tested during two sessions, in small groups (4 or
fewer). They were told that Phase 1 consisted of four 7-min
periods and that during any given period they would be asked to
study one passage for the first time, restudy one of the passages,
or take a recall test over one of the passages. During each study
period, subjects read one passage for 7 min. During the test
period, subjects were given a test sheet with the title of the to-berecalled passage printed at the top and were asked to write down
as much of the material from the passage as they could remember, without concern for exact wording or correct order.
Subjects solved multiplication problems for 2 min between
periods and for 5 min after the final period in Phase 1.
250
Fig. 1. Mean proportion of idea units recalled on the final test after a 5min, 2-day, or 1-week retention interval as a function of learning condition
(additional studying vs. initial testing) in Experiment 1. Error bars represent standard errors of the means.
Volume 17Number 3
twice recalled more than subjects who had studied once and
taken a recall test. However, this pattern of results was reversed
on the delayed tests 2 days and 1 week later. On these tests of
long-term retention, subjects who had taken an initial test recalled more than subjects who had only studied the passages.
The results were submitted to a 2 3 analysis of variance
(ANOVA), with learning condition (restudying or testing) and
retention interval (5 min, 2 days, or 1 week) as independent
variables. This analysis revealed a main effect of testing versus
restudying, F(1, 117) 5 36.39, Zp 2 5 .24, which indicated that,
overall, initial testing produced better final recall than additional studying. Also, the analysis revealed a main effect of retention interval, F(2, 117) 5 50.34, Zp 2 5 .46, which indicated
that forgetting occurred as the retention interval grew longer.
However, these main effects were qualified by a significant
Learning Condition Retention Interval interaction, F(2, 117)
5 32.10, Zp 2 5 .35, indicating that restudying produced better
performance on the 5-min test, but testing produced better
performance on the 2-day and 1-week tests.
Post hoc analyses confirmed that on the 5-min retention tests,
restudying produced better recall than testing (81% vs. 75%),
t(39) 5 3.22, d 5 0.52. However, the opposite pattern of results
was observed on the delayed retention tests. After 2 days, the
initially tested group recalled more than the additional-study
group (68% vs. 54%), t(39) 5 6.97, d 5 0.95. The benefits of
initial testing were also observed after 1 week: The tested group
recalled 56% of the material, whereas the restudy group recalled
only 42%, t(39) 5 6.41, d 5 0.83. Figure 1 depicts another
interesting finding: The initially tested group recalled as much
on the 1-week retention test as the additional-study group did
after only 2 days (the initially tested group actually recalled
slightly more). This surprising result indicates that taking an
initial recall test prevented forgetting of information for an additional 5 days relative to repeated study.
Experiment 1 demonstrated that after an initial study episode,
additional studying or testing had different effects on immediate
and delayed final tests: Relative to testing, additional studying
aided performance on immediate retention tests; in contrast,
prior testing improved performance on delayed tests. The
crossover interaction observed in Figure 1 is all the more impressive considering that no feedback was given on the tests.
The testing effect on delayed retention tests is not simply due to
reexposure to studied material during tests, but rather is due to
some other process that has positive effects on retention. We
consider candidate processes in the General Discussion.
EXPERIMENT 2
In Experiment 2, we investigated the effects of repeated studying and repeated testing on retention, in part to replicate and
extend the results of Experiment 1, but more to ask about effects
of repeated testing. We were interested in the effects of repeated
testing because most testing-effect experiments compare per-
Volume 17Number 3
251
Test-Enhanced Learning
Sum
3.4
3.2
3.4
3.5
3.5
3.6
3.6
3.7
14.2
10.3
3.4
Note. Condition labels indicate the order of study (S) and test (T) periods.
252
TABLE 2
Mean Ratings on the Questionnaire Given After the Initial
Learning Session in Experiment 2
Rating
Condition
SSSS
SSST
STTT
Interesting
Readable
Remember
3.8
4.1
4.6
2.5
2.5
2.8
4.8
4.2
4.0
Note. Condition labels indicate the order of study (S) and test (T) periods.
Subjects rated how interesting the passage was (1 5 very boring, 7 5 very
interesting), how readable the passage was (1 5 very easy to read, 7 5 very
difficult to read), and how well they believed they would remember the passage
in 1 week (1 5 not very well, 7 5 very well).
Final Tests
The critical data are the mean proportions of idea units recalled
on the final tests 5 min or 1 week later, displayed in Figure 2. The
pattern of final test scores replicates the pattern of results found
in Experiment 1. On the 5-min test, recall was correlated with
repeated studying: The SSSS group recalled more than the SSST
group (83% vs. 78%), who in turn recalled more than the STTT
group (71%). However, on the 1-week test, recall was correlated
with the number of tests given earlier: The STTT group recalled
more than the SSST group (61% vs. 56%), who in turn recalled
more than the SSSS group (40%).
Volume 17Number 3
Fig. 2. Mean proportion of idea units recalled on the final test after a 5min or 1-week retention interval as a function of learning condition (SSSS,
SSST, or STTT) in Experiment 2. The labels for the learning conditions
indicate the order of study (S) and test (T) periods. Error bars represent
standard errors of the means.
Volume 17Number 3
253
Test-Enhanced Learning
254
Volume 17Number 3
Carrier, M., & Pashler, H. (1992). The influence of retrieval on retention. Memory & Cognition, 20, 633642.
Cull, W.L. (2000). Untangling the benefits of multiple study opportunities and repeated testing for cued recall. Applied Cognitive
Psychology, 14, 215235.
Gates, A.I. (1917). Recitation as a factor in memorizing. Archives of
Psychology, 6(40).
Glover, J.A. (1989). The testing phenomenon: Not gone but nearly
forgotten. Journal of Educational Psychology, 81, 392399.
Hogan, R.M., & Kintsch, W. (1971). Differential effects of study and test
trials on long-term recognition and recall. Journal of Verbal
Learning and Verbal Behavior, 10, 562567.
Izawa, C. (1967). Function of test trials in paired-associate learning.
Journal of Experimental Psychology, 75, 194209.
Koriat, A., Bjork, R.A., Sheffer, L., & Bar, S.K. (2004). Predicting ones
own forgetting: The role of experience-based and theory-based
processes. Journal of Experimental Psychology: General, 133,
643656.
Landauer, T.K., & Bjork, R.A. (1978). Optimum rehearsal patterns and
name learning. In M.M. Gruneberg, P.E. Morris, & R.N. Sykes
(Eds.), Practical aspects of memory (pp. 625632). London: Academic Press.
Leeming, F.C. (2002). The exam-a-day procedure improves performance in psychology classes. Teaching of Psychology, 29,
210212.
Loftus, G. (1985). Evaluating forgetting curves. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 11, 397406.
Logan, J.M., & Balota, D.A. (2005). Spaced and expanded retrieval effects in younger and older adults. Manuscript submitted for publication.
McDaniel, M.A., & Fisher, R.P. (1991). Tests and test feedback as
learning sources. Contemporary Educational Psychology, 16,
192201.
McDaniel, M.A., Kowitz, M.D., & Dunay, P.K. (1989). Altering memory
through recall: The effects of cue-guided retrieval processing.
Memory & Cognition, 17, 423434.
McDaniel, M.A., & Masson, M.E.J. (1985). Altering memory representations through retrieval. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 11, 371385.
McDermott, K.B., Kang, S., & Roediger, H.L., III. (2005, January). Test
format and its modulation of the testing effect. Paper presented at
the biennial meeting of the Society for Applied Research in
Memory and Cognition, Wellington, New Zealand.
Volume 17Number 3
Morris, C.D., Bransford, J.D., & Franks, J.J. (1977). Levels of processing versus transfer-appropriate processing. Journal of Verbal
Learning and Verbal Behavior, 16, 519533.
Pashler, H., Cepeda, N.J., Wixted, J.T., & Rohrer, D. (2005). When does
feedback facilitate learning of words? Journal of Experimental
Psychology: Learning, Memory, and Cognition, 31, 38.
Pashler, H., Zarow, G., & Triplett, B. (2003). Is temporal spacing of tests
helpful even when it inflates error rates? Journal of Experimental
Psychology: Learning, Memory, and Cognition, 29, 10511057.
Peterson, L.R., Wampler, R., Kirkpatrick, M., & Saltzman, D. (1963).
Effect of spacing of presentations on retention of paired-associates
over short intervals. Journal of Experimental Psychology, 66, 206
209.
Rawson, K.A., & Kintsch, W. (2005). Rereading effects depend on time
of test. Journal of Educational Psychology, 97, 7080.
Roediger, H.L., III. (1990). Implicit memory: Retention without remembering. American Psychologist, 45, 10431056.
Roediger, H.L., III, Gallo, D.A., & Geraci, L. (2002). Processing approaches to cognition: The impetus from the levels-of-processing
framework. Memory, 10, 319332.
Roediger, H.L., III, & Karpicke, J.D. (2006). The power of testing
memory: Implications for educational practice. Unpublished manuscript, Washington University in St. Louis.
Roediger, H.L., III, & Thorpe, L.A. (1978). The role of recall time in
producing hypermnesia. Memory & Cognition, 6, 296305.
Rogers, B. (2001). TOEFL CBT Success. Princeton, NJ: Petersons.
Spitzer, H.F. (1939). Studies in retention. Journal of Educational
Psychology, 30, 641656.
Thompson, C.P., Wenger, S.K., & Bartling, C.A. (1978). How recall
facilitates subsequent recall: A reappraisal. Journal of Experimental Psychology: Human Learning and Memory, 4, 210221.
Tulving, E. (1967). The effects of presentation and recall of material in
free-recall learning. Journal of Verbal Learning and Verbal Behavior, 6, 175184.
Wheeler, M.A., Ewers, M., & Buonanno, J. (2003). Different rates of
forgetting following study versus test trials. Memory, 11, 571580.
Wheeler, M.A., & Roediger, H.L., III. (1992). Disparate effects of repeated testing: Reconciling Ballards (1913) and Bartletts (1932)
results. Psychological Science, 3, 240245.
255